Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Big Data Hadoop & Spark by (11.4k points)

I would like to display the entire Apache Spark SQL DataFrame with the Scala API. I can use the show() method:

myDataFrame.show(Int.MaxValue)


Is there a better way to display an entire DataFrame than using Int.MaxValue?

1 Answer

0 votes
by (32.3k points)

Generally, it is not advisable to display an entire DataFrame to stdout, because that means you need to pull the entire DataFrame with all of its values to the driver (unless DataFrame is already local, which you can check using df.isLocal).

However, you can use the df.collect method which returns Array[T] and then iterate over each line and print it:

df.collect.foreach(println)

or

You can also use df.rdd.foreachPartition(f) to print out partition-by-partition without flooding driver JVM

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94.1k users

Browse Categories

...