Back

Explore Courses Blog Tutorials Interview Questions
+1 vote
6 views
in Python by (250 points)

I can't decipher however the sklearn.pipeline.Pipeline works precisely.

Some explanation in this documentation. What is the meaning of

Pipeline of transforms with a final estimator.

For more clear asking the asking.

I am trying to figure out how can I fit a transformer and how can a estimator be a transformer.

What happens when I call pipln.fit() or pipln.fit_transform()? How does they work?

Here is an example where I am calling pipeline and passing two transformer and one estimator:

pipln = Pipeline([("t1",transformer1),

 ("t2",transformer2), 

("est",estimator)])

2 Answers

+3 votes
by (10.9k points)
edited by

Transformer are classes which implement both fit() and transform() methods.

Classifiers are classes which implement both fit() and predict() methods.

APipeline is a series of algorithms chained, composed, and scrambled together in some ways to process a stream of data, it takes inputs and it gives out outputs.ML pipelines mostly have a “fit” and “transform” method. Pipeline allows you to a grid search over a set of parameters for each step of its meta-estimator. Pipeline helps in making a concise code since it encapsulates the predictor and transformer.

For example-

pipeline = Pipeline([ ('vector', CountVectorizer()),

('trans', TfidfTransformer()),

('cls', SGDClassifier()), ])

prdct = pipeline.fit(Xtrain).predict(Xtrain)

prdct = pipeline.predict(Xtest)

For your second question,

fit(self,X,y=None, **fir_params)

Fit() method is called to fit the model.

Parameters:

X : iterable

It fulfills the input requirements of the first step of the pipeline.

y : iterable, defauilt=None

Training targets. It fulfills label requirements for all steps of the pipeline.

**fit_params : dict of string -> object

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

 fit_transform(self, X, y=None, **fit_params)

It is called to fit the model and transform with the final estimator.

Parameters:

X : iterable

Training data. It fulfills the input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. It fulfills label requirements for all steps of the pipeline.

**fit_params : dict of string -> object

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

 

0 votes
by (33.1k points)

In machine learning, we often repeat the same procedure to process data or training of the model. It’s quite inefficient to perform the same task every time for different values, so we use pipelines.

Pipeline:  We use a pipeline to assemble several steps that can be cross-validated together while setting different parameters.

Transformer: In scikit learn - some classes that have a  fit and transform method or fit_transform method to transform the data according to the parameters defined in the pipeline.

Predictor - Pipeline class that has fit and predict methods, or fit_predict method to make predictions by passing the values through them.

For example:

import pandas as pd

from sklearn.svm import SVC

from sklearn.preprocessing import StandardScaler

winedf = pd.read_csv('winequality-red.csv',sep=';')

steps = [('scaler', StandardScaler()), ('SVM', SVC())]

from sklearn.pipeline import Pipeline

pipeline = Pipeline(steps)

It surely helps to enforce the desired order of application steps which in turn helps in reproducibility and creating a convenient work-flow.

When we use pipln.fit() - each transformer inside the pipeline will be fitted on outputs of the previous transformer (the First transformer is learned on the raw dataset). The last estimator may be transformer or predictor, you can call fit_transform() on pipeline only if your last estimator is transformer (that implements fit_transform, or transform and fit methods separately), you can call fit_predict() or predict() on pipeline only if your last estimator is a predictor. So you just can't call fit_transform or transform on a pipeline, the last step of which is a predictor.

Browse Categories

...