0 votes
1 view
in Machine Learning by (19.9k points)

I'm evaluating tools for production ML-based applications and one of our options is Spark MLlib, but I have some questions about how to serve a model once it's trained?

For example in Azure ML, once trained, the model is exposed as a web service that can be consumed from any application, and it's a similar case with Amazon ML.

I'm evaluating tools for production ML-based applications and one of our options is Spark MLlib, but I have some questions about how to serve a model once it's trained?

For example in Azure ML, once trained, the model is exposed as a web service that can be consumed from any application, and it's a similar case with Amazon ML.

How do you serve/deploy ML models in Apache Spark?

How do you serve/deploy ML models in Apache Spark?

1 Answer

0 votes
by (32.4k points)

Spark MLlib Model:

Apache Spark provides a machine learning library known as MLlib. Spark MLlib consists of various machine learning algorithms such as classification, regression, clustering, and collaborative filtering. It also has tools such as featurization, pipelines, persistence, and utilities for handling linear algebra operations, statistics and data handling.

You can use an MLlib in the following three ways :

  • For training of machine learning model inside an application then applying prediction. This can be done in a spark application or a notebook.

  • To train a model and save it if it implements an MLWriter then load in an application or a notebook and run it with your data.

  • To train a model with Spark and then export it to PMML format using jpmml-spark. This PMML allows different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among tools and applications without the need for custom coding. e.g from Spark ML to R.

It is an architecture in which you have a RESTful service in which you can build using spark-jobserver per example to train and deploy but needs some development.

Hope this answer helps.

...