Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

What is the difference between SparkContext, JavaSparkContext, SQLContext and SparkSession?
Is there any method to convert or create a Context using a SparkSession?
Can I completely replace all the Contexts using one single entry SparkSession?
Are all the functions in SQLContext, SparkContext, and JavaSparkContext also in SparkSession?
Some functions like parallelize have different behaviors in SparkContext and JavaSparkContext. How do they behave in SparkSession?
How can I create the following using a SparkSession?

  1. RDD
  2. JavaRDD
  3. Dataset

1 Answer

0 votes
by (32.3k points)

In previous versions of Spark there were different contexts that were entry points to the different api(sparkcontext for the core api, SQL context for the spark-sql API, streaming context for the Dstream api etc…) this was actually the source of confusion for the developer and was a point of optimization for the spark team, so in the most recent version of spark there is only one entry point i.e. the spark session.

Since Spark 2.0, Spark session is a unified entry point of a spark application. It provides a way to interact with various spark’s functionality with a lesser number of constructs. Instead of having a spark context, hive context, SQL context, now all of it is encapsulated in a Spark session.

SQLContext is actually an entry point of SparkSQL which can be received from sparkContext. Before 2.x.x, RDD ,DataFrame and Data-set were three different data abstractions.

SQLContext is basically a class and is used for initializing the functionalities of Spark SQL. For initializing SQLContext class object, SparkContext class object (sc) is required.


 

In order to initializing the SparkContext through spark-shell, we execute the below command:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

Now, Talking about sparkContext, it is a Scala implementation entry point and JavaSparkContext is a java wrapper of sparkContext.


 

Is there any method to convert or create Context using Sparksession ?

yes. its sparkSession.sparkContext() and for SQL, sparkSession.sqlContext()

Can I completely replace all the Context using one single entry SparkSession ?

yes. you can get respective contexts from sparkSession.

Does all the functions in SQLContext, SparkContext,JavaSparkContext etc are added in SparkSession?

Not directly. you got to get respective context and make use of it.something like backward compatibility

How to use such function in SparkSession?

get respective context and make use of it.

How to create the following using SparkSession?

  • In Spark 2+, Spark Context is available via Spark Session:

val rdd = spark.sparkContext().textFile(yourFileOrURL)

  • JavaRDD: For JAVARDD same as above is done but in java implementation.

  • Dataset: what sparkSession returns is Dataset if it is structured data.

            Dataset<String> listDS = sparkSession.createDataset(list, Encoders.STRING())

Related questions

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...