Hadoop one Map and multiple Reduce

Question

1 Answer

Amit Rawat · Answer 1 · 2019-07-07T18:56:58+0000

If you are expecting every reducer to work on exactly the same mapped data, then at least the "key" should be different.

Output can be written for multiple parts in the mapper, and as the key (where $i is for the i-th reducer, and $key is your original key). And you need to add a "Partitioner" to make sure these n records are distributed between reducers, based on $i. Then using "GroupingComparator" you need to group records by the original $key.

Simply, I would say just define your map output key to be a composite key that combines the metric-type and the actual key (for that metric) that would group the keys as metric-type and the actual key and the reducer can invoke a different reduction method depending on the metric type.

Let's say you need two kinds of reducers, 'R1' and 'R2'. Add ids for these as a prefix to your o/p keys in the mapper. So, in the mapper, a key 'K' now becomes 'R1:K' or 'R2:K'.

Then, in the reducer, pass values to implementations of R1 or R2 based on the prefix.

By default each reducer will generate a separate output file such as part-0000 and this output will be stored in HDFS. Now for merging all the reducers output to a single file, we write our own code explicitly, using hadoop -fs getmerge command for multiple outputs.

You can refer to the following video tutorial if you want more information regarding Hadoop:

Hadoop one Map and multiple Reduce

1 Answer

Related questions

Browse Categories