Overwrite specific partitions in spark dataframe write method

Question

1 Answer

Amit Rawat · Answer 1 · 2019-07-10T10:12:53+0000

Finally! This is now a feature in Spark 2.3.0: https://issues.apache.org/jira/browse/SPARK-20236

To use it, you need to set the spark.sql.sources.partitionOverwriteMode setting to dynamic, the dataset needs to be partitioned, and the write mode overwrite. Example:

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
data.write.mode("overwrite").insertInto("partitioned_table")

I recommend doing a repartition based on your partition column before writing, so you won't end up with 400 files per folder.

Before Spark 2.3.0, the best solution would be to launch SQL statements to delete those partitions and then write them with mode append.

Overwrite specific partitions in spark dataframe write method

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources