0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I'm trying to learn about SparkSQL. I've been following the example described here:http://spark.apache.org/docs/1.0.0/sql-programming-guide.html

Everything works fine in the Spark-shell, but when I try to use sbt to build a batch version, I get the following error message: object sql is not a member of package org.apache.spark

Unfortunately, I'm rather new to sbt, so I don't know how to correct this problem. I suspect that I need to include additional dependencies, but I can't figure out how.

Here is the code I'm trying to compile:

/* TestApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

case class Record(k: Int, v: String)

object TestApp {
 def main(args: Array[String]) {
   val conf = new SparkConf().setAppName("Simple Application")
   val sc = new SparkContext(conf)
   val sqlContext = new org.apache.spark.sql.SQLContext(sc)
   import sqlContext._
   val data = sc.parallelize(1 to 100000)
   val records = data.map(i => new Record(i, "value = "+i))
   val table = createSchemaRDD(records, Record)
   println(">>> " + table.count)
 }
}

1 Answer

0 votes
by (25.3k points)

In order to solve your problem you just have to add the following line in the sbt file.

libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.0.0"

So, the contents of sbt file should look like this:

name := "Test Project"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

And I also think there is an additional problem in your program. It seems to me that there are too many arguments in the call to createSchemaRDD. That line should read as follows:

val table = createSchemaRDD(records)

...