Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)
I'm trying to find out if it is possible to have "incremental training" on data using MLlib in Apache Spark.

My platform is Prediction IO, and it's basically a wrapper for Spark (MLlib), HBase, ElasticSearch and some other Restful parts.

In my app data "events" are inserted in real-time, but to get updated prediction results I need to "pio train" and "pio deploy". This takes some time and the server goes offline during the redeploy.

I'm trying to figure out if I can do incremental training during the "predict" phase, but cannot find an answer.

1 Answer

0 votes
by (33.1k points)

It seems like you are using spark MLlib's ALS model which performs matrix factorization. The result of the model are two matrices a user-features matrix and an item-features matrix.

Let's say we are going to receive a stream of data with ratings or transactions for the case of implicit, a real (100%) online update of this model will be to update both matrices for each new rating information coming by triggering a full retrain of the ALS model on the entire data again + the new rating. One is limited by the fact that running the entire ALS model is computationally expensive and the incoming stream of data could be frequent.

A single rating can not change the matrices much plus we have optimization approaches which are incremental, for example SGD. There is an interesting (still experimental) library written for the case of Explicit Ratings which does incremental updates for each batch of a DStream:

The idea of using an incremental approach such as SGD. So even if we do an update to the single new rating, only to the user feature matrix for this specific user, and only the item-feature matrix for this specific item rated, and the update is towards the gradient, we guarantee that we move towards the minimum, of course as an approximation, but still towards the minimum.

Hope this answer helps you! To know more about ALS, study Machine Learning Algorithms. Also, one can go through Machine Learning Tutorials as well.

Incremental training of ALS model
by (100 points)
I am not sure i got this. "There is an interesting (still experimental) library written for the case of Explicit Ratings which does incremental updates ..." What is that library? Is it in spark itself?

Browse Categories