Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (19k points)
recategorized by

I am totally new to Machine Learning and I have been working with unsupervised learning technique.

Image shows my sample Data(After all Cleaning) Screenshot : Sample Data

I have this two Pipeline built to Clean the Data:

num_attribs = list(housing_num)

cat_attribs = ["ocean_proximity"]


num_pipeline = Pipeline([

    ('selector', DataFrameSelector(num_attribs)),

    ('imputer', Imputer(strategy="median")),

    ('attribs_adder', CombinedAttributesAdder()),

    ('std_scaler', StandardScaler()),


cat_pipeline = Pipeline([

    ('selector', DataFrameSelector(cat_attribs)),

    ('label_binarizer', LabelBinarizer())


Then I did the union of this two pipelines and the code for the same is shown below :

from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[

        ("num_pipeline", num_pipeline),

        ("cat_pipeline", cat_pipeline),


Now I am trying to do fit_transform on the Data But Its showing Me the Error.

Code for Transformation:

housing_prepared = full_pipeline.fit_transform(housing)


Error message: fit_transform() takes 2 positional arguments but 3 were given

1 Answer

0 votes
by (33.1k points)

Your problem can be solved by making a custom transformer that can handle 3 positional arguments:

Import and make a new class:

from sklearn.base import TransformerMixin 

class MyLabelBinarizer(TransformerMixin):

    def __init__(self, *args, **kwargs):

        self.encoder = LabelBinarizer(*args, **kwargs)

    def fit(self, x, y=0):

        return self

    def transform(self, x, y=0):

        return self.encoder.transform(x)

In the above code, we kept your code the same, instead of using LabelBinarizer(), use the class we created: MyLabelBinarizer().

Hope this answer helps.

If you wish to learn about Machine Learning then visit this Machine Learning Course.

Browse Categories