What is the difference between Apache Mahout and Apache Spark's MLlib?

Question

2 Answers

Shivangi · Answer 1 · 2019-07-04T11:06:38+0000

The main difference lies in their framework. For Mahout, it is Hadoop MapReduce and in the case of MLib, Spark is the framework.

Mahout has proven capabilities that Spark’s MlLib lacks. Apache Mahout is mature and comes with many ML algorithms to choose from and it is built atop MapReduce. So, it is constrained by disk accesses and is slow. Because of this, it does not handle iterative jobs very well. Machine Learning algorithms use many iterations, so due to this iterative property Manhout runs very slowly. Whereas, MlLib is built on top of Spark, which makes it much faster than Mahout. But, Mahout is a much more stable and mature framework and is highly recommended if the size of data is huge.

Get certification in Mahout by enrolling in Mahout Training.

Amit Rawat · Answer 2 · 2019-09-18T13:01:03+0000

MLlib is a unattached collection of high-level algorithms that runs on Spark. This is what Mahout used to be the only Mahout of old was on Hadoop MapReduce. In 2014 Mahout announced it would no longer accept Hadoop Mapreduce code and completely switched new development to Spark (with other engines possibly in the offing, like H2O).

The most important thing to come out of this is a Scala-based generalized distributed optimized linear algebra engine and conditions including an interactive Scala shell. Perhaps the most relevant word is "generalized". Since it runs on Spark anything possible in MLlib can be applied with the linear algebra engine of Mahout-Spark.

If you need a common engine that will do a lot of what tools like R do but on really big data, look at Mahout. If you need a particular algorithm, look at each to see what they have. For instance, Kmeans runs in MLlib but if you need to cluster A'A (a co-occurrence matrix used in recommenders) you'll need them both because MLlib doesn't have a matrix transpose or A'A.

If you want more knowledge regarding Spark, refer the following video:

What is the difference between Apache Mahout and Apache Spark's MLlib?

2 Answers

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources