Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I'm attempting to print the contents of a collection to the Spark console.

I have a type:

linesWithSessionId: org.apache.spark.rdd.RDD[String] = FilteredRDD[3]


And I use the command:

scala> linesWithSessionId.map(line => println(line))


But this is printed :

res1: org.apache.spark.rdd.RDD[Unit] = MappedRDD[4] at map at :19

How can I write the RDD to console or save it to disk so I can view its contents?

1 Answer

0 votes
by (32.3k points)
edited by

You are just performing a transformation(map). In order to view the contents of an RDD, you need to perform actions on the RDDs.

One way is to use collect():

myRDD.collect().foreach(println)

Though it will work finely, when the RDD has billions of lines, collect() won’t be a good option. 

You can use take() instead. It will display the result in very less time:

myRDD.take(n).foreach(println)

If you want to know more about Spark, then do check out this awesome video tutorial:

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...