Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I have a resulting RDD labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions). This has output in this format:

[(0.0, 0.08482142857142858), (0.0, 0.11442786069651742),.....]


What I want is to create a CSV file with one column for labels (the first part of the tuple in above output) and one for predictions(second part of tuple output). But I don't know how to write to a CSV file in Spark using Python.

How can I create a CSV file with the above output?

1 Answer

0 votes
by (32.3k points)

Just map the lines of the RDD (labelsAndPredictions) into strings (the lines of the CSV) then use rdd.saveAsTextFile().

def toCSVLine(data):

  return ','.join(str(d) for d in data)

lines = labelsAndPredictions.map(toCSVLine)

lines.saveAsTextFile('hdfs://my-node:9000/tmp/labels-and-predictions.csv')

Browse Categories

...