Python - What is exactly sklearn.pipeline.Pipeline?

Question

2 Answers

Shrutiparna · Answer 1 · 2019-05-29T10:19:10+0000

Transformer are classes which implement both fit() and transform() methods.

Classifiers are classes which implement both fit() and predict() methods.

APipeline is a series of algorithms chained, composed, and scrambled together in some ways to process a stream of data, it takes inputs and it gives out outputs.ML pipelines mostly have a “fit” and “transform” method. Pipeline allows you to a grid search over a set of parameters for each step of its meta-estimator. Pipeline helps in making a concise code since it encapsulates the predictor and transformer.

For example-

pipeline = Pipeline([ ('vector', CountVectorizer()),
('trans', TfidfTransformer()),
('cls', SGDClassifier()), ])
prdct = pipeline.fit(Xtrain).predict(Xtrain)
prdct = pipeline.predict(Xtest)

For your second question,

fit(self,X,y=None, **fir_params)

Fit() method is called to fit the model.

Parameters:

X : iterable

It fulfills the input requirements of the first step of the pipeline.

y : iterable, defauilt=None

Training targets. It fulfills label requirements for all steps of the pipeline.

**fit_params : dict of string -> object

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

fit_transform(self, X, y=None, **fit_params)

It is called to fit the model and transform with the final estimator.

Parameters:

X : iterable

Training data. It fulfills the input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. It fulfills label requirements for all steps of the pipeline.

**fit_params : dict of string -> object

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

Anurag · Answer 2 · 2019-06-18T11:31:38+0000

In machine learning, we often repeat the same procedure to process data or training of the model. It’s quite inefficient to perform the same task every time for different values, so we use pipelines.

Pipeline: We use a pipeline to assemble several steps that can be cross-validated together while setting different parameters.

Transformer: In scikit learn - some classes that have a fit and transform method or fit_transform method to transform the data according to the parameters defined in the pipeline.

Predictor - Pipeline class that has fit and predict methods, or fit_predict method to make predictions by passing the values through them.

For example:

import pandas as pd
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
winedf = pd.read_csv('winequality-red.csv',sep=';')
steps = [('scaler', StandardScaler()), ('SVM', SVC())]
from sklearn.pipeline import Pipeline
pipeline = Pipeline(steps)

It surely helps to enforce the desired order of application steps which in turn helps in reproducibility and creating a convenient work-flow.

When we use pipln.fit() - each transformer inside the pipeline will be fitted on outputs of the previous transformer (the First transformer is learned on the raw dataset). The last estimator may be transformer or predictor, you can call fit_transform() on pipeline only if your last estimator is transformer (that implements fit_transform, or transform and fit methods separately), you can call fit_predict() or predict() on pipeline only if your last estimator is a predictor. So you just can't call fit_transform or transform on a pipeline, the last step of which is a predictor.

Python - What is exactly sklearn.pipeline.Pipeline?

2 Answers

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources