Share HDInsight SPARK SQL Table saveAsTable does not work

Question

asked Jul 20, 2019 in BI by Vaibhav Ameta (17.6k points)

I want to show the data from HDInsight SPARK using tableau. I was following this video where they have described how to connect the two systems and expose the data.

currently, my script itself is very simple as shown below:

/* csvFile is an RDD of lists, each list representing a line in the CSV file */
val csvLines = sc.textFile("wasb://[email protected]/*/*/*/mydata__000000.csv")
// Define a schema
case class MyData(Timestamp: String, TimezoneOffset: String, SystemGuid: String, TagName: String, NumericValue: Double, StringValue: String)
// Map the values in the .csv file to the schema
val myData = csvLines.map(s => s.split(",")).filter(s => s(0) != "Timestamp").map(
s => MyData(s(0),
s(1),
s(2),
s(3),
s(4).toDouble,
s(5)
)
).toDF()
// Register as a temporary table called "processdata"
myData.registerTempTable("test_table")
myData.saveAsTable("test_table")

unfortunately, I run into the following error

warning: there were 1 deprecation warning(s); re-run with -deprecation for details
org.apache.spark.sql.AnalysisException: Table `test_table` already exists.;
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:209)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:198)

I have also tried to use the following code to overwrite the table if it exists

import org.apache.spark.sql.SaveMode
myData.saveAsTable("test_table", SaveMode.Overwrite)

but still, it gives me the same error.

warning: there were 1 deprecation warning(s); re-run with -deprecation for details
java.lang.RuntimeException: Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.execution.SparkStrategies$DDLStrategy$.apply(SparkStrategies.scala:416)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)

Can someone please help me fix this issue?

1 Answer

Ashok · Answer 1 · 2019-07-27T13:35:22+0000

.toDF()is actually used to create the sqlContext and not the hiveContext based DataFrame. Update the code in this way:

// Map the values in the .csv file to the schema
val myData = csvLines.map(s => s.split(",")).filter(s => s(0) != "Timestamp").map(
    s => MyData(s(0),
            s(1),
            s(2),
            s(3),
            s(4).toDouble,
            s(5)
    )
)
// Register as a temporary table called "myData"
val myDataFrame = hiveContext.createDataFrame(myData)
myDataFrame.registerTempTable("mydata_stored")
myDataFrame.write.mode(SaveMode.Overwrite).saveAsTable("mydata_stored")

also, make sure that the s(4) has proper double value otherwise add try/catch to handle it. i did something like this:

def parseDouble(s: String): Double = try { s.toDouble } catch { case _ => 0.00 }
parseDouble(s(4))

Share HDInsight SPARK SQL Table saveAsTable does not work

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources