Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in BI by (17.6k points)

I've been stuck on this problem for days. So any help would be greatly appreciated.

I'm trying to make a copy of cassandra table to hive (so that I can put it into hive metastore and then access it from Tableau). The Hive -> Tableau part works, but not the Cassandra to Hive part. Data isn't being copied to Hive metastore.

Here are the steps I've taken:

I followed the instructions from the README of this project: https://github.com/tuplejump/cash/tree/master/cassandra-handler

I generated hive-cassandra-..jar, copied it and cassandra-all-.jar, cassandra-thrift-*.jar to hive lib folder.

Then I started hive and tried the following:

hive> add jar /usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar;

Added [/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar] to class path

Added resources: [/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar]

hive> list jars;

/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar

hive> create temporary function tmp as 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'

    > ;

FAILED: Class org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler not found

I don't know why hive can't see CqlStorageHandler ...

Thanks! 

1 Answer

0 votes
by (47.2k points)

An alternative you can consider is to write a simple java program to write the data to a file that you can then load to hive.

package com.company.cassandra;

import com.datastax.driver.core.Cluster;

import com.datastax.driver.core.Cluster.Builder;

import com.datastax.driver.core.ResultSet;

import com.datastax.driver.core.ResultSetFuture;

import com.datastax.driver.core.Row;

import com.datastax.driver.core.Session;

public class CassandraExport {

    public static Session session;


 

    public static void connect(String username, String password, String host, int port, String keyspace) {

        Builder builder =  Cluster.builder().addContactPoint(host);

        builder.withPort(port);

        if (username != null && password != null) {

            builder.withCredentials(username, password);

        }

        Cluster cluster = builder.build();

        session = cluster.connect(keyspace);

    }

    public static void main(String[] args) {

        //Prod

        connect("user", "password", "server", 9042, "keyspace");

        ResultSetFuture future = session.executeAsync("SELECT * FROM table;");

        ResultSet results = future.getUninterruptibly();

        for (Row row : results) {

            //Print the columns in the following order

            String out = row.getString("col1") + "\t" +

                            String.valueOf(row.getInt("col2")) + "\t" +

                            String.valueOf(row.getLong("col3")) + "\t" +

                            String.valueOf(row.getLong("col4"));

            System.out.println(out);

        }

        session.close();

        session.getCluster().close();

    }


 

}

Write the output to a file and then load it to a hive.

hive -e "use schema; load data local inpath '/tmp/cassandra-table' overwrite into table mytable;”

Related questions

0 votes
1 answer
0 votes
1 answer
asked Dec 29, 2019 in BI by Vaibhav Ameta (17.6k points)
0 votes
1 answer
...