0 votes
1 view
in BI by (36.9k points)

I have been successful in integrating Tableau with Spark Thrift Server using Samba ODBC. I have tried using the cache table during the Initial SQL and the performance has been great till now. I am now looking for a way to cache and un cache a few of the frequently used tables when they are updated using through our data pipelines.

The challenge that I am facing is that the cache table done via Tableau will remain in cache through the lifetime of the thrift server but when I write my data pipeline process and submit spark jobs it will use a different spark context. Can anyone please suggest how can I connect to the thrift server context through the backend process.

  1. Is there a way to re-use the thrift services from spark-submit or spark-shell?

  2. At the end of my data pipeline will it be a good idea to invoke a simple shell script that will connect to the thrift service and refresh the cache?

Note: both my backend and the BI tool are using the same cluster as I have used the same yarn cluster while starting the thrift service as well as submitting the backend jobs.

1 Answer

0 votes
by (13.4k points)

By using  beeline, you can try to connect to the Thrift Service on the same cluster using the same URL & credentials. Once the data Pipeline completes run

UNCACHE TABLE MyTable; CACHE TABLE MyTable;

...