Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I want to find the parameters of ParamGridBuilder that make the best model in CrossValidator in Spark 1.4.x,

In Pipeline Example in Spark documentation, they add different parameters (numFeatures, regParam) by using ParamGridBuilder in the Pipeline. Then by the following line of code they make the best model:

val cvModel = crossval.fit(training.toDF)


Now, I want to know what are the parameters (numFeatures, regParam) from ParamGridBuilder that produces the best model.

I already used the following commands without success:

cvModel.bestModel.extractParamMap().toString()
cvModel.params.toList.mkString("(", ",", ")")
cvModel.estimatorParamMaps.toString()
cvModel.explainParams()
cvModel.getEstimatorParamMaps.mkString("(", ",", ")")
cvModel.toString()


Any help?

1 Answer

0 votes
by (32.3k points)

To get a proper ParamMap object, I will suggest you to use CrossValidatorModel.avgMetrics: Array[Double] to find the argmax ParamMap:

implicit class BestParamMapCrossValidatorModel(cvModel: CrossValidatorModel) {

  def bestEstimatorParamMap: ParamMap = {

    cvModel.getEstimatorParamMaps

           .zip(cvModel.avgMetrics)

           .maxBy(_._2)

           ._1

  }

}

Now, Running on the CrossValidatorModel trained in the Pipeline Example that you mentioned gives:

scala> println(cvModel.bestEstimatorParamMap)

{

   hashingTF_2b0b8ccaeeec-numFeatures: 100,

   logreg_950a13184247-regParam: 0.1

}

Browse Categories

...