What's the difference between Spark ML and MLLIB packages

Question

1 Answer

Amit Rawat · Answer 1 · 2019-07-13T07:06:59+0000

org.apache.spark.mllib is the first of the two Spark APIs while org.apache.spark.ml is the new API.

spark.mllib carries the original API built on top of RDDs.
spark.ml contains higher-level API built on top of DataFrames for constructing ML pipelines.

However, the spark.ml is considered as the recommended package because with DataFrames the API is more versatile and flexible. But users will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features as for existing algorithms not all of the functionality has been ported over to the new Spark ML API. But it is expected to have more features in the coming time.

In Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark has now changed to the DataFrame-based API in the spark.ml package. Now mllib is slowly getting deprecated(this already happened in case of linear regression) and most probably will be removed in the next major release.

What's the difference between Spark ML and MLLIB packages

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources