Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

I am trying to solve the regression task. I found out that 3 models are working nicely for different subsets of data: LassoLARS, SVR and Gradient Tree Boosting. I noticed that when I make predictions using all these 3 models and then make a table of 'true output' and outputs of my 3 models I see that each time at least one of the models is really close to the true output, though 2 others could be relatively far away.

When I compute minimal possible error (if I take the prediction from 'best' predictor for each test example) I get an error that is much smaller than an error of any model alone. So I thought about trying to combine predictions from these 3 different models into some kind of ensemble. Question is, how to do this properly? All my 3 models are build and tuned using scikit-learn, does it provide some kind of a method that could be used to pack models into ensemble? The problem here is that I don't want to just average predictions from all three models, I want to do this with weighting, where weighting should be determined based on properties of specific examples.

Even if scikit-learn does not provide such functionality, it would be nice if someone knows how to properly address this task - of figuring out the weighting of each model for each example in data. I think that it might be done by a separate regressor built on top of all these 3 models, which will try output optimal weights for each of 3 models, but I am not sure if this is the best way of doing this.

1 Answer

0 votes
by (33.1k points)

I used to train a set of my regression models. I run all of them on training data. I get predictions for examples with each of my algorithms and save these 3 results into pandas dataframe with columns 'predictedSVR', 'predictedLASSO' and 'predictedGBR'. And I add the final column into this datafrane which I call 'predicted' which is a real prediction value.

For example:

from sklearn linear_model

stacker= linear_model.LinearRegression()[['predictedSVR', 'predictedLASSO', 'predictedGBR']], df['predicted'])

So when I want to make a prediction for new example I just run each of my 3 regressors separately and then I do:


on outputs of my 3 regressors. And get a result.

The problem here is that I am finding optimal weights for regressors 'on average, the weights will be same for each example on which I will try to make prediction.

For more details, study the Sklearn Tutorial. Also, go through Gradient Boosting to have a clear picture.

Hope this answer helps you!

Browse Categories