0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I need to use the(rdd.)partitionBy(npartitions, custom_partitioner method that is not available on the DataFrame. All of the DataFrame methods refer only to DataFrame results. So then how to create an RDD from the DataFrame data?

1 Answer

0 votes
by (25.6k points)

To convert a dataframe back to rdd simply use the .rdd method:

rdd = df.rdd

But the setback here is that it may not give the regular spark RDD, it may return a Row object. In order to have the regular RDD format run the code below:


 

rdd = df.rdd.map(tuple)

or

rdd = df.rdd.map(list)

...