+1 vote
1 view
in Machine Learning by (4.8k points)

Below is my pipeline and it seems that I can't pass the parameters to my models by using the ModelTransformer class, which I take it from the link (http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)

The error message makes sense to me, but I don't know how to fix this. Any idea how to fix this? Thanks.

# define a pipeline
pipeline = Pipeline([
('vect', DictVectorizer(sparse=False)),
('scale', preprocessing.MinMaxScaler()),
('ess', FeatureUnion(n_jobs=-1, 
     ('rfc', ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1,  n_estimators=100))),
     ('svc', ModelTransformer(SVC(random_state=1))),],
('es', EnsembleClassifier1()),

# define the parameters for the pipeline
parameters = {
'ess__rfc__n_estimators': (100, 200),

# ModelTransformer class. It takes it from the link
class ModelTransformer(TransformerMixin):
    def __init__(self, model):
        self.model = model
    def fit(self, *args, **kwargs):
        self.model.fit(*args, **kwargs)
        return self
    def transform(self, X, **transform_params):
        return DataFrame(self.model.predict(X))

grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, refit=True)

Error Message: ValueError: Invalid parameter n_estimators for estimator ModelTransformer.

1 Answer

+2 votes
by (7.9k points)

Essentially, GridSearchCV is additionally an expert, implementing fit() and predict() strategies, utilized by the pipeline.

So instead of:

grid = GridSearchCV(make_pipeline(StandardScaler(), LogisticRegression()), param_grid={'logisticregression__C': [0.1, 10.]}, cv=2, refit=False)

Do this:

clf = make_pipeline(StandardScaler(), GridSearchCV(LogisticRegression(),  param_grid={'logisticregression__C': [0.1, 10.]},  cv=2, refit=True)) 



it will do is, call the StandardScalar() only once, for one call to clf.fit() instead of multiple calls as you described.


Welcome to Intellipaat Community. Get your technical queries answered by top developers !