0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

Although I use Hadoop frequently on my Ubuntu machine I have never thought about SUCCESS and part-r-00000 files. The output always resides in part-r-00000 file, but what is the use of SUCCESS file? Why does the output file have the name part-r-0000? Is there any significance/any nomenclature or is this just a randomly defined?

1 Answer

0 votes
by (31.4k points)
edited by

In Hadoop, whenever there is a successful creation of any job, the MapReduce runtime creates a _SUCCESS file in the output directory. This may be useful for applications that need to see if a result set is complete just by inspecting HDFS.

And coming on the part-x-yyyyy, it is the default name given to the output files.

In such output files:

  • x is either written  'r' or 'm', depending on the map-only job or reduce-only job, respectively.

  • yyyyy is the Reducer, or Mapper task number (defined as, 00000)

So, if a job has 20 reducers, it will generate files that are named from part-r-00000 to part-r-00019, one for each reducer task.

If you want to change the default name of your output file. You just need to go to the Driver class to change the default name of the output file:

job.getConfiguration().set("mapreduce.output.basename", "intellipaat")

If you want to know more about Hadoop, refer to the following video tutorial:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...