Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I am new to Spark and Spark SQL.

How does createOrReplaceTempView work in Spark?

If we register an RDD of objects as a table will spark keep all the data in memory?

2 Answers

0 votes
by (32.3k points)

Often we might want to store the spark Data frame as the table and query it, to convert Data frame into temporary view that is available for only that spark session, we use registerTempTable or createOrReplaceTempView (Spark > = 2.0) on our spark Dataframe.

createorReplaceTempView is used when you want to store the table for a particular spark session.

createOrReplaceTempView creates (or replaces if that view name already exists) a lazily evaluated "view" that you can then use like a hive table in Spark SQL. It does not persist to memory unless you cache the dataset that underpins the view.

scala> val s = Seq(1,2,3,4).toDF("num")

s: org.apache.spark.sql.DataFrame = [num: int]

scala> s.createOrReplaceTempView("nums")

scala> s.createOrReplaceTempView("nums")

scala> spark.table("nums")

res6: org.apache.spark.sql.DataFrame = [num: int]

scala> spark.table("nums").cache

res7: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [num: int]

scala> spark.table("nums").count

res8: Long = 4

The data is cached fully only after the .count call.


 

Relevant quote (comparing to persistent table): "Unlike the createOrReplaceTempView command, saveAsTable will materialize the contents of the DataFrame and create a pointer to the data in the Hive metastore." https://spark.apache.org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables

If you wish to learn Spark visit this Spark Tutorial.

0 votes
by (33.1k points)

The CreateOrReplaceTempView will create a temporary view of the table on memory, it is not persistent at this moment but you can run SQL query on top of that. If you want to save it you can either persist or use saveAsTable to save.

First, we read data in csv format and then convert to data frame and create a temp view.

Reading data in csv format

val data =  spark.read.format("csv").option("header","true").option("inferSchema","true").load("FileStore/tables/pzufk5ib1500654887654/campaign.csv")

To print the schema

data.printSchema

data.createOrReplaceTempView("Data")

We can run SQL queries on top the table view we just created

%sql select Week as Date,Campaign Type,Engagements,Country from Data orderby Date asc

Hope this answer helps you!

Browse Categories

...