Examples of MapReduce

MapReduce is a powerful tool used in big data systems by companies like Google, Facebook, and Netflix. It helps break down and process huge amounts of data quickly across many computers.

Whether you’re just starting out as a big data enthusiast, a data engineer, or a Hadoop developer, in this simple guide, we will walk through real-world MapReduce examples like word count, log analysis, and recommendation systems. You will also see step-by-step explanations and sample code to help you understand everything easily.

Table of Contents

What is MapReduce?

Let’s begin with the basics. MapReduce is a programming paradigm designed for processing large data sets across a cluster of computers. Developed by Google and popularized by Apache Hadoop, MapReduce allows for scalable, fault-tolerant data processing.

It operates in two main phases:

  • Map: Breaks down a task into smaller subtasks and processes them independently.
  • Reduce: Combines the results of the Map phase into a final outcome.

Just picture counting the words in hundreds of amazing books! You share the joy by assigning different books to different friends (Map), and they happily send back their word counts. Finally, one person puts all the counts together in one big tally (Reduce).

How MapReduce Works

On a daily basis the micro-blogging site Twitter receives nearly 500 million tweets, i.e., 3000 tweets per second. We can see the illustration on Twitter with the help of MapReduce.

Mapreduce examples

In the above example Twitter data is an input, and MapReduce performs the actions like Tokenize, filter, count and aggregate counters.

  • Tokenise: Tokenises the tweets into maps of tokens and writes them as key-value pairs.
  • Filter: It filters the unwanted words from maps of tokens.
  • Count: Generates a token counter per word.
  • Aggregate counters: Prepares a combination of similar counter values into small manageable units.

Top 5 MapReduce Examples with Code

1. Word Count Example

Word count is the most popular MapReduce example. It shows how MapReduce processes big data by counting how often each word appears in a large text file. This basic example helps beginners understand the Map and Reduce steps clearly and is often used in Hadoop training and tutorials. Thisproblem is the “Hello World” of MapReduce and helps understand how key-value pairs work in Hadoop.

MapReduce Code in Java

// Mapper  
public void map(LongWritable key, Text value, Context context) {  
    String[] words = value.toString().split(" ");  
    for (String word : words) {  
        context.write(new Text(word), new IntWritable(1));  
    }  
}  
// Reducer  
public void reduce(Text key, Iterable values, Context context) {  
    int sum = 0;  
    for (IntWritable val : values) {  
        sum += val.get();  
    }  
    context.write(key, new IntWritable(sum));  
}

How It Works

  • Input: “Hello MapReduce Hello Hadoop”
  • Map Output: (Hello, 1), (MapReduce, 1), (Hello, 1), (Hadoop, 1)
  • Reduce Output: (Hello, 2), (MapReduce, 1), (Hadoop, 1)

This Hadoop MapReduce example is used in search engines and text analytics.

2. Log Analysis with MapReduce

MapReduce is great for log analysis. You can use it to process huge server logs, find error patterns, or track website visits. This big data example helps system admins and engineers understand how MapReduce works in real-world applications for analyzing data from millions of log entries. Companies like Twitter and Netflix use MapReduce for log file analysis to track:

  • User activity
  • Server errors
  • Traffic patterns

Sample Log Analysis Job

// Mapper: Extract HTTP status codes  
public void map(LongWritable key, Text value, Context context) {  
    String[] parts = value.toString().split(" ");  
    String statusCode = parts[8]; // HTTP status code  
    context.write(new Text(statusCode), new IntWritable(1));  
}  
// Reducer: Count occurrences  
public void reduce(Text key, Iterable values, Context context) {  
    int count = 0;  
    for (IntWritable val : values) {  
        count += val.get();  
    }  
    context.write(key, new IntWritable(count));  
} 

Output Example

  • 200 OK: 10,000 hits
  • 404 Not Found: 500 hits
  • 500 Server Error: 50 hits

This helps in performance monitoring and debugging web applications.

3. Temperature Analysis

MapReduce helps with weather data processing, like analyzing temperature records. For example, it can find the highest or lowest temperature by year. This example shows how MapReduce works with structured data, making it useful for scientists and data analysts working with large sets of climate or sensor data. Meteorological departments use MapReduce for big data to compute:

  • Maximum temperature by city
  • Average rainfall
  • Climate trends

MapReduce Code for Max Temperature

// Mapper: Extract city and temperature  
public void map(LongWritable key, Text value, Context context) {  
    String[] data = value.toString().split(",");  
    String city = data[0];  
    int temp = Integer.parseInt(data[1]);  
    context.write(new Text(city), new IntWritable(temp));  
}  


// Reducer: Find max temperature  
public void reduce(Text key, Iterable values, Context context) {  
    int maxTemp = Integer.MIN_VALUE;  
    for (IntWritable val : values) {  
        maxTemp = Math.max(maxTemp, val.get());  
    }  
    context.write(key, new IntWritable(maxTemp));  
}  

Sample Output

  • New York: 38°C
  • London: 29°C
  • Tokyo: 35°C

4. Recommendation Systems

MapReduce can power basic recommendation systems by analyzing user behavior, such as purchase history or ratings. It groups and filters large datasets to find patterns. This is a great example of using MapReduce in machine learning, especially in e-commerce and streaming platforms like Netflix or Amazon. Companies like Amazon and Netflix use MapReduce in Hadoop for:

  • Collaborative filtering
  • User behavior analysis
  • Personalized recommendations

Example: Movie Recommendations

// Mapper: User → (Movie, Rating)  
public void map(LongWritable key, Text value, Context context) {  
    String[] parts = value.toString().split(",");  
    String user = parts[0];  
    String movie = parts[1];  
    int rating = Integer.parseInt(parts[2]);  
    context.write(new Text(user), new Text(movie + ":" + rating));  
}  


// Reducer: Find top-rated movies per user  
public void reduce(Text key, Iterable values, Context context) {  
    Map<String, Integer> movieRatings = new HashMap<>();  
    for (Text val : values) {  
        String[] parts = val.toString().split(":");  
        movieRatings.put(parts[0], Integer.parseInt(parts[1]));  
    }  
    // Logic to recommend top movies  
    context.write(key, new Text("Recommended: " + topMovies));  
}

5. Social Media Analytics

MapReduce is useful for social media analytics, like counting hashtags on Twitter. It helps process large amounts of real-time social data to find trending topics. This example shows how MapReduce can handle unstructured data and provide insights for marketers, analysts, and digital media teams. Twitter processes 500M+ tweets daily using MapReduce for big data analytics:

  • Trending hashtags
  • Sentiment analysis
  • User engagement metrics

MapReduce Example for Hashtag Count

// Mapper: Extract hashtags  
public void map(LongWritable key, Text value, Context context) {  
    String tweet = value.toString();  
    String[] words = tweet.split(" ");  
    for (String word : words) {  
        if (word.startsWith("#")) {  
            context.write(new Text(word), new IntWritable(1));  
        }  
    }  
}  


// Reducer: Count hashtag frequency  
public void reduce(Text key, Iterable values, Context context) {  
    int count = 0;  
    for (IntWritable val : values) {  
        count += val.get();  
    }  
    context.write(key, new IntWritable(count));  
} 

Output Example

  • BigData: 50,000 mentions
  • AI: 30,000 mentions
  • Machine Learning: 25,000 mentions

MapReduce vs. Spark vs. Flink

Feature MapReduce Apache Spark Apache Flink
Processing Speed Batch In-memory (Faster) Stream + Batch
Ease of Use Moderate High (SQL, MLlib) High
Fault Tolerance High High Very High
Best For Batch ETL Real-time analytics Event-driven apps

Conclusion

MapReduce is a simple yet powerful way to handle large-scale data processing. It works by splitting tasks into smaller steps, Map and Reduce, which can run across many machines at once. It’s built to scale up to petabytes of data and can keep working even if some machines fail. That’s why it’s used in big data tools like Hadoop, Hive, and NoSQL databases.

Whether you’re analysing logs, counting words, or building recommendation systems, MapReduce helps make big data manageable and useful. It’s a key concept for anyone interested in data engineering, big data analytics, or distributed computing.

Related Blogs What’s Inside
Kafka Interview Questions Lists essential Kafka questions for big data streaming job interviews.
Sparse Matrix in Data Structure Explains sparse matrices for efficient storage and computation in data structures.
Splunk Interview Questions Details Splunk questions for preparing for log analytics and monitoring roles.
Hive Interview Questions Showcases Hive questions for data warehousing and SQL-based big data roles.
Kafka Tutorials Offers a guide to Apache Kafka for real-time data streaming and processing.
Apache Spark Interview Questions Provides Apache Spark questions for big data processing job interviews.
HDFS Interview Questions Lists HDFS questions for Hadoop file system and big data storage roles.
Top Big Data Challenges Explores key challenges in managing and processing large-scale big data.
Sqoop Interview Questions Details Sqoop questions for data transfer roles between Hadoop and databases.

About the Author

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.