Why isn't Hadoop implemented using MPI?

Question

asked Jul 10, 2019 in Big Data Hadoop & Spark by Aarav (11.4k points)

Correct me if I'm wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes.

What are the technical reasons for this?

I could hazard a few guesses, but I do not know enough of how MPI is implemented "under the hood" to know whether or not I'm right.

Come to think of it, I'm not entirely familiar with Hadoop's internals either. I understand the framework at a conceptual level (map/combine/shuffle/reduce and how that works at a high level) but I don't know the nitty gritty implementation details. I've always assumed Hadoop was transmitting serialized data structures (perhaps GPBs) over a TCP connection, eg during the shuffle phase. Let me know if that's not true.

1 Answer

Amit Rawat · Answer 1 · 2019-07-10T04:26:09+0000

MPI is Message Passing Interface. As by its name it clarifies that there is no data locality. You send the data to another node for it to be computed on. Thus MPI is network-bound in terms of performance when working with large data.

One of the big features of Hadoop/map-reduce is the fault tolerance. Fault tolerance is not supported in most of the current MPI implementations that is why the implementation of Hadoop using MPI is not practiced.

MapReduce with the Hadoop Distributed File System that duplicates data so that you can do your computer in local storage - streaming off the disk and straight to the processor.

A solution to this problem of Hadoop implementation using MPI is being thought about in future versions of OpenMPI.

If you want to know more about Hadoop, then do check out this awesome video tutorial:

Why isn't Hadoop implemented using MPI?

Why isn't Hadoop implemented using MPI?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions