0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I would like to convert a string column of a dataframe to a list. What I can find from the Dataframe API is RDD so I tried converting it back to RDD first, and then apply toArray function to the RDD. In this case, the length and SQL work just fine. However, the result I got from RDD has square brackets around every element like this [A00001]. I was wondering if there's an appropriate way to convert a column to a list or a way to remove the square brackets.

1 Answer

0 votes
by (32.5k points)

This should return the collection containing single list:

dataFrame.select("YOUR_COLUMN_NAME").rdd.map(r => r(0)).collect()

Without doing mapping, you will just get a Row object, which contains every column from the database.

This will probably get you a list of Any type. And if for specifying the result type, use .asInstanceOf[YOUR_TYPE] in (r => r(0).asInstanceOf[YOUR_TYPE]) mapping.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !