Remember

Register

All Courses Ask a Question

Questions
Unanswered
Ask a Question
Blog
Tutorials
Interview Questions

Back

Login

Explore Courses Blog Tutorials Interview Questions

community
Big Data Hadoop & Spark
How to write the resulting RDD to a csv file in...

How to write the resulting RDD to a csv file in Spark python

How to write the resulting RDD to a csv file in Spark python

0 votes

2 views

asked Jul 20, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

I have a resulting RDD labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions). This has output in this format:

[(0.0, 0.08482142857142858), (0.0, 0.11442786069651742),.....]

What I want is to create a CSV file with one column for labels (the first part of the tuple in above output) and one for predictions(second part of tuple output). But I don't know how to write to a CSV file in Spark using Python.

How can I create a CSV file with the above output?

apache-spark

Please log in to add a comment.

Please log in to answer this question.

1 Answer

0 votes

answered Jul 23, 2019 by Amit Rawat (32.3k points)

Just map the lines of the RDD (labelsAndPredictions) into strings (the lines of the CSV) then use rdd.saveAsTextFile().

def toCSVLine(data):
return ','.join(str(d) for d in data)
lines = labelsAndPredictions.map(toCSVLine)
lines.saveAsTextFile('hdfs://my-node:9000/tmp/labels-and-predictions.csv')

Please log in to add a comment.

Related questions

0 votes

1 answer

Spark 2.0.x dump a csv file from a dataframe containing one array of type string

asked Jul 23, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

apache-spark

+4 votes

5 answers

I need to skip three rows from the dataframe while loading from a CSV file in scala

asked Jun 19, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

apache-spark
apache-spark-sql
scala
rdd
dataframe

0 votes

1 answer

How to partition and write DataFrame in Spark without deleting partitions with no new data?

asked Jul 17, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

apache-spark

0 votes

1 answer

apache spark - check if file exists

asked Jul 25, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

apache-spark
hadoop
hdfs

0 votes

1 answer

Specifying the filename when saving a DataFrame as a CSV

asked Jul 26, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

apache-spark

1.2k questions

2.7k answers

501 comments

693 users

All categories
Python (132)
Java (165)
SQL (251)
Linux (14)
Big Data Hadoop & Spark (67)
Data Science (75)
R Programming (49)
C Programming (7)
DevOps and Agile (162)
AI and Deep Learning (32)
Machine Learning (9)
AWS (54)
Azure (26)
GCP (4)
RPA (2)
Selenium (12)
Blockchain (1)
Salesforce (24)
Others (12)
BI (30)
Web Technology (57)
Digital Marketing (3)
Technology Trends (6)

Browse Categories

Master Program
Big Data
Data Science
Business Intelligence
Salesforce
Cloud Computing Courses
Digital Marketing
Database
Programming
Testing
Project Management
Web Development Courses

© COPYRIGHT 2011-2024 INTELLIPAAT.COM. ALL RIGHTS RESERVED.

...