Is gzip format supported in Spark?

Question

1 Answer

Amit Rawat · Answer 1 · 2019-07-10T11:15:10+0000

Spark can create distributed datasets from any file stored in the Hadoop distributed file system (HDFS) or other storage systems supported by Hadoop (including your local file system, Amazon S3, Hypertable, HBase, etc). Spark supports text files, SequenceFiles, and any other Hadoop InputFormat.

In Spark, support for gzip input files should work the same as it does in Hadoop. For example, sc.textFile("sample.gz") should automatically decompress and read gzip-compressed files (textFile() is actually implemented using Hadoop's TextInputFormat, which supports gzip-compressed files).

Is gzip format supported in Spark?

Is gzip format supported in Spark?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions