Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19k points)

I read about sequence file format in few blogs. Since, I am still new to Hadoop I am not actually able to understand what is the application or purpose of sequence files. So, it would be really helpful if anyone can explain me what actually is a sequence file and where it is used in Hadoop?

1 Answer

0 votes
by (33.1k points)

Sequence files are binary files containing serialized key/value pairs. You can compress a sequence file at the record (key-value pair) or block levels. This is one of the advantages of using a sequence file. Also, sequence files are binary files, they provide faster read/write than that of text file format.

Sequence files allow you to solve this problem of small files. They are the files containing key-value pairs. So, you can use it to hold multiple key-value pairs where the key can be unique file metadata like  filename+timestamp and value is the content of the ingested file. Now, this way you are able to club too many small files as a single file and then you can use this for processing as input for MapReduce. This is the reason why sequence files often are used in custom-written map-reduce programs.

Hope this answer helps you!

Browse Categories

...