Loading a pyspark ML model in a non-Spark environment

Question

asked Aug 3, 2019 in Machine Learning by ParasSharma1 (19k points)

I am interested in deploying a machine learning model in python, so predictions can be made through requests to a server.

I will create a Cloudera cluster and take advantage of Spark to develop the models, by using the library pyspark. I would like to know how the model can be saved in order to employ it on the server.

I have seen that the different algorithms have the .save functions (like it is answered in this post How to save and load MLLib model in Apache Spark), but as the server will be in a different machine without Spark and not in the Cloudera cluster, I don't know if it is possible to use their .load and .predict functions.

Can it be made by using the pyspark library functions for prediction without Spark underneath? Or would I have to do any transformations in order to save the model and use it elsewhere?

1 Answer

Anurag · Answer 1 · 2019-08-03T10:09:14+0000

After spending an hour I got this working code, This may not be optimized.

import os
import sys
# Path for spark source folder
os.environ['SPARK_HOME']="E:\\Work\\spark\\installtion\\spark"
# Append pyspark to Python Path
sys.path.append("E:\\Work\\spark\\installtion\\spark\\python")
try:
from pyspark.ml.feature import StringIndexer
from numpy import array
from math import sqrt
from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.mllib.clustering import KMeans, KMeansModel
print ("Successfully imported Spark Modules")
except ImportError as e:
sys.exit(1)
if __name__ == "__main__":
sconf = SparkConf().setAppName("KMeansExample").set('spark.sql.warehouse.dir', 'file:///E:/Work/spark/installtion/spark/spark-warehouse/')
sc = SparkContext(conf=sconf)
parsedData = array([0.0,0.0, 1.0,1.0, 9.0,8.0, 8.0,9.0]).reshape(4,2)
clusters = KMeans.train(sc.parallelize(parsedData), 2, maxIterations=10, runs=10, initializationMode="random")
clusters.save(sc, "mymodel") // this will save model to file system
sc.stop()

This code will create a kmean cluster model and save it in a file system:

from flask import jsonify, request, Flask
from sklearn.externals import joblib
import os
import sys
# Path for spark source folder
os.environ['SPARK_HOME']="E:\\Work\\spark\\installtion\\spark"
# Append pyspark to Python Path
sys.path.append("E:\\Work\\spark\\installtion\\spark\\python")
try:
from pyspark.ml.feature import StringIndexer
# $example on$
from numpy import array
from math import sqrt
from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.mllib.clustering import KMeans, KMeansModel
print ("Successfully imported Spark Modules")
except ImportError as e:
sys.exit(1)
app = Flask(__name__)
@app.route('/', methods=['GET'])
def predict():
sconf = SparkConf().setAppName("KMeansExample").set('spark.sql.warehouse.dir', 'file:///E:/Work/spark/installtion/spark/spark-warehouse/')
sc = SparkContext(conf=sconf) # SparkContext
sameModel = KMeansModel.load(sc, "clus")
response = sameModel.predict(array([0.0, 0.0])) // pass your data
return jsonify(response)
if __name__ == '__main__':
app.run()

The above API is written in Flask.

Loading a pyspark ML model in a non-Spark environment

Loading a pyspark ML model in a non-Spark environment

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions