Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

What are the differences between Apache Spark SQLContext and HiveContext ?

1 Answer

0 votes
by (32.3k points)

HiveContext is a super set of the SQLContext. Additional features include the ability to write queries using the more complete HiveQL parser, access to Hive UDFs, and the ability to read data from Hive tables. And if you want to work with Hive you have to use HiveContext, obviously. 

But with the arrival of Spark 2.0, the windowing operations are supported in SQLContext and the new this version features some additional improvements in parsing and has much better SQL 2003 compliance so it is significantly less dependent on Hive to achieve core functionality and because of that HiveContext (SparkSession with Hive support) seems to be slightly less important when compared to Spark SQLContext

When programming against Spark SQL we have two entry points depending on whether we need Hive support. The recommended entry point is the HiveContext to provide access to HiveQL and other Hive-dependent functionality. The more basic SQLContext provides a subset of the Spark SQL support that does not depend on Hive.

The biggest problem with HiveContext is that it comes with large dependencies.

Browse Categories

...