Transformer are classes which implement both fit() and transform() methods.
Classifiers are classes which implement both fit() and predict() methods.
APipeline is a series of algorithms chained, composed, and scrambled together in some ways to process a stream of data, it takes inputs and it gives out outputs.ML pipelines mostly have a “fit” and “transform” method. Pipeline allows you to a grid search over a set of parameters for each step of its meta-estimator. Pipeline helps in making a concise code since it encapsulates the predictor and transformer.
For example-
pipeline = Pipeline([ ('vector', CountVectorizer()),
('trans', TfidfTransformer()),
('cls', SGDClassifier()), ])
prdct = pipeline.fit(Xtrain).predict(Xtrain)
prdct = pipeline.predict(Xtest)
For your second question,
fit(self,X,y=None, **fir_params)
Fit() method is called to fit the model.
Parameters:
X : iterable
It fulfills the input requirements of the first step of the pipeline.
y : iterable, defauilt=None
Training targets. It fulfills label requirements for all steps of the pipeline.
**fit_params : dict of string -> object
Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.
fit_transform(self, X, y=None, **fit_params)
It is called to fit the model and transform with the final estimator.
Parameters:
X : iterable
Training data. It fulfills the input requirements of first step of the pipeline.
y : iterable, default=None
Training targets. It fulfills label requirements for all steps of the pipeline.
**fit_params : dict of string -> object
Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.