Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Data Science by (17.6k points)

I've just discovered the Pipeline feature of scikit-learn, and I find it very useful for testing different combinations of preprocessing steps before training my model.

A pipeline is a chain of objects that implement the fit and transform methods. Now, if I wanted to add a new preprocessing step, I used to write a class that inherits from sklearn.base.estimator. However, I'm thinking that there must be a simpler method. Do I really need to wrap every function I want to apply in an estimator class?

Example:

class Categorizer(sklearn.base.BaseEstimator):

    """

    Converts given columns into pandas dtype 'category'.

    """

    def __init__(self, columns):

        self.columns = columns

    def fit(self, X, y):

        return self

    def transform(self, X):

        for column in self.columns:

            X[column] = X[column].astype("category")

        return X

1 Answer

0 votes
by (41.4k points)

For having a general solution that works for many other use cases also, and not just a transformer,  we can write your own decorator if  there is a  state-free function that do not implement fit.

Refer to the code below for an example:

class TransformerWrapper(sklearn.base.BaseEstimator):

    def __init__(self, func):

        self._func = func

    def fit(self, *args, **kwargs):

        return self

    def transform(self, X, *args, **kwargs):

        return self._func(X, *args, **kwargs)

And after this you can do the following

@TransformerWrapper

def foo(x):

  return x*2

Which is similar to 

def foo(x):

  return x*2

foo = TransformerWrapper(foo)

And that is what sklearn.preprocessing.FunctionTransformer is doing .

You can also use  sklearn function by

from sklearn.preprocessing import FunctionTransformer

@FunctionTransformer

def foo(x):

  return x*2

If you wish to learn about scikit learn then visit this Scikit Learn Tutorial.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94k users

Browse Categories

...