Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)
I'm having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction with it.

Thus far the only method I have found is using Spark with the pyspark.sql.DataFrame Parquet support.

I have some scripts that need to write Parquet files that are not Spark jobs. Is there any approach to writing Parquet files in Python that doesn't involve pyspark.sql?

1 Answer

0 votes
by (32.3k points)

There are currently 2 libraries capable of writing Parquet files:

Both of them are still under development and they come with a number of disclaimers (no support for nested data e.g.), so you will have to check whether they support everything you need.

fastparquet does have write support, here is a snippet to write data to a file:

from fastparquet import write

write('outfile.parq', df)

Related questions

Browse Categories