Specifying the filename when saving a DataFrame as a CSV

Question

Aarav

commented Jun 24, 2021 by anonymous

1 Answer

Can you provide the solution for the above question for writing it to adls from Databricks in Pyspark — anonymous, Jun 24, 2021

Amit Rawat · Answer 1 · 2019-07-28T06:38:03+0000

Hadoop File Format is used by Spark and this file format requires data to be partitioned - that's why you have part- files. In order to change filename, try to add something like this in your code:

import org.apache.hadoop.fs._;
val fs = FileSystem.get(sc.hadoopConfiguration);
val file = fs.globStatus(new Path("path/file.csv/part*"))(0).getPath().getName();

fs.rename(new Path("csvDirectory/" + file), new Path("mydata.csv"));
fs.delete(new Path("mydata.csv-temp"), true);

or just do:

import org.apache.hadoop.fs._;
val fs = FileSystem.get(sc.hadoopConfiguration());

fs.rename(new Path("csvDirectory/data.csv/part-0000"), new Path("csvDirectory/newData.csv"));

If you want to learn more about Big Data, visit Big Data Tutorial and Big Data Certification by Intellipaat.

Specifying the filename when saving a DataFrame as a CSV

1 Answer

Related questions

Browse Categories