Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I would like to display the entire Apache Spark SQL DataFrame with the Scala API. I can use the show() method:

myDataFrame.show(Int.MaxValue)


Is there a better way to display an entire DataFrame than using Int.MaxValue?

1 Answer

0 votes
by (32.3k points)

Generally, it is not advisable to display an entire DataFrame to stdout, because that means you need to pull the entire DataFrame with all of its values to the driver (unless DataFrame is already local, which you can check using df.isLocal).

However, you can use the df.collect method which returns Array[T] and then iterate over each line and print it:

df.collect.foreach(println)

or

You can also use df.rdd.foreachPartition(f) to print out partition-by-partition without flooding driver JVM

Browse Categories

...