Why spark.ml don't implement any of spark.mllib algorithms?

Question

asked Jul 26, 2019 in Machine Learning by ParasSharma1 (19k points)

Following the Spark MLlib Guide we can read that Spark has two machine learning libraries:

spark.mllib, built on top of RDDs.

spark.ml, built on top of Dataframes.

According to this and this question on StackOverflow, Dataframes are better than RDDs and should be used whenever possible.

The problem is that I want to use common machine learning algorithms (e.g: Frequent Pattern Mining,Naive Bayes, etc.) and spark.ml (for dataframes) don't provide such methods, only spark.mllib(for RDDs) provides this algorithms.

If Dataframes are better than RDDs and the referred guide recommends the use of spark.ml, why aren't common machine learning methods implemented in that lib?

What's the missing point here?

1 Answer

Anurag · Answer 1 · 2019-07-26T12:17:55+0000

You can simply use Spark 2.0.0

Currently, Spark moves strongly Spark 2.0.0

Spark moves completely towards DataFrame API with the ongoing deprecation of RDD API. While a number of native "ML" algorithms are growing the main points highlighted below are still valid and internally many stages are implemented directly using RDDs.

Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0 with the ongoing deprecation of RDD API. While a number of native "ML" algorithms are growing the main points highlighted below are still valid and internally many stages are implemented directly using RDDs.

Check this for more details-Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0.

Hope this answer helps you! For more details, study the Apache Spark Tutorial.

Why spark.ml don't implement any of spark.mllib algorithms?

1 Answer

Related questions

Browse Categories