Understanding the workflow of MapReduce with an Example
On a daily basis the micro-blogging site Twitter receives nearly 500 million tweets, i.e., 3000 tweets per second. We can see the illustration on Twitter with the help of MapReduce.
In the above example Twitter data is an input, and MapReduce Training performs the actions like Tokenize, filter, count and aggregate counters.
Tokenize: Tokenizes the tweets into maps of tokens and writes them as key-value pairs.
Filter: It filters the unwanted words from maps of tokens.
Count: Generates a token counter per word.
Aggregate counters: Prepares a combination of similar counter values into small manageable units.