0 votes
1 view
in BI by (13.4k points)

The DataStax spark Cassandra connector is great for interacting with Cassandra through Apache Spark. With Spark SQL 1.1, we can use the thrift server to interact with Spark with Tableau. Since Tableau can talk to Spark, and Spark can talk to Cassandra, there's surely some way to get Tableau talking to Cassandra through Spark (or rather Spark SQL). I can't figure out how to get this running. Ideally, I'd like to do this with Spark Standalone cluster + a Cassandra cluster (i.e. without additional Hadoop set up). Is this possible? Any pointers are appreciated.

1 Answer

0 votes
by (36.9k points)
  • The HiveThriftServer has a HiveThriftServer2.startWithContext(sqlContext) option so you could create your sqlContext referencing C* and the appropriate table / CF and then pass that context to the thrift server.

  • In this manner:

import  org.apache.spark.sql.hive.HiveContext

import  org.apache.spark.sql.catalyst.types._

import  java.sql.Date

val  sparkContext  = sc

import  sparkContext._

val  sqlContext  = new HiveContext(sparkContext)

import  sqlContext._

makeRDD((1,"hello") :: (2,"world") ::Nil).toSchemaRDD.cache().registerTempTable("t")

import  org.apache.spark.sql.hive.thriftserver._

HiveThriftServer2.startWithContext(sqlContext)

  • So instead of starting the default thrift server from Spark, you could just lunch your custom one.

...