Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I need to use the(rdd.)partitionBy(npartitions, custom_partitioner method that is not available on the DataFrame. All of the DataFrame methods refer only to DataFrame results. So then how to create an RDD from the DataFrame data?

1 Answer

0 votes
by (32.3k points)

To convert a dataframe back to rdd simply use the .rdd method:

rdd = df.rdd

But the setback here is that it may not give the regular spark RDD, it may return a Row object. In order to have the regular RDD format run the code below:


 

rdd = df.rdd.map(tuple)

or

rdd = df.rdd.map(list)

Browse Categories

...