I have been successful in integrating Tableau with Spark Thrift Server using Samba ODBC. I have tried using the cache table during the Initial SQL and the performance has been great till now. I am now looking for a way to cache and un cache a few of the frequently used tables when they are updated using through our data pipelines.
The challenge that I am facing is that the cache table done via Tableau will remain in cache through the lifetime of the thrift server but when I write my data pipeline process and submit spark jobs it will use a different spark context. Can anyone please suggest how can I connect to the thrift server context through the backend process.
Is there a way to re-use the thrift services from spark-submit or spark-shell?
At the end of my data pipeline will it be a good idea to invoke a simple shell script that will connect to the thrift service and refresh the cache?
Note: both my backend and the BI tool are using the same cluster as I have used the same yarn cluster while starting the thrift service as well as submitting the backend jobs.