Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Big Data Hadoop & Spark by (11.4k points)
I'm having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction with it.

Thus far the only method I have found is using Spark with the pyspark.sql.DataFrame Parquet support.

I have some scripts that need to write Parquet files that are not Spark jobs. Is there any approach to writing Parquet files in Python that doesn't involve pyspark.sql?

1 Answer

0 votes
by (32.3k points)

There are currently 2 libraries capable of writing Parquet files:

Both of them are still under development and they come with a number of disclaimers (no support for nested data e.g.), so you will have to check whether they support everything you need.

fastparquet does have write support, here is a snippet to write data to a file:

from fastparquet import write

write('outfile.parq', df)

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94.1k users

Browse Categories

...