Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)
edited by

I can't figure it out, but guess it's simple. I have a spark dataframe df. This df has columns "A","B" and "C". Now let's say I have an Array containing the name of the columns of this df:

column_names = Array("A","B","C")

I'd like to do a in such a way, that I can specify which columns not to select. Example: let's say I do not want to select columns "B". I tried!="B"))

but this does not work, as it gives

org.apache.spark.sql.DataFrame cannot be applied to (Array[String])

1 Answer

0 votes
by (32.3k points)

Since Spark 1.4 you can use drop method:

For Scala:

case class Point(x: Int, y: Int)

val df = sqlContext.createDataFrame(Point(0, 0) :: Point(1, 2) :: Nil)


For PySpark:

df = sc.parallelize([(0, 0), (1, 2)]).toDF(["x", "y"])


## DataFrame[x: bigint]

Browse Categories