Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (19k points)
recategorized by

I am totally new to Machine Learning and I have been working with unsupervised learning technique.

Image shows my sample Data(After all Cleaning) Screenshot : Sample Data

I have this two Pipeline built to Clean the Data:

num_attribs = list(housing_num)

cat_attribs = ["ocean_proximity"]


num_pipeline = Pipeline([

    ('selector', DataFrameSelector(num_attribs)),

    ('imputer', Imputer(strategy="median")),

    ('attribs_adder', CombinedAttributesAdder()),

    ('std_scaler', StandardScaler()),


cat_pipeline = Pipeline([

    ('selector', DataFrameSelector(cat_attribs)),

    ('label_binarizer', LabelBinarizer())


Then I did the union of this two pipelines and the code for the same is shown below :

from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[

        ("num_pipeline", num_pipeline),

        ("cat_pipeline", cat_pipeline),


Now I am trying to do fit_transform on the Data But Its showing Me the Error.

Code for Transformation:

housing_prepared = full_pipeline.fit_transform(housing)


Error message: fit_transform() takes 2 positional arguments but 3 were given

1 Answer

0 votes
by (33.1k points)

Your problem can be solved by making a custom transformer that can handle 3 positional arguments:

Import and make a new class:

from sklearn.base import TransformerMixin 

class MyLabelBinarizer(TransformerMixin):

    def __init__(self, *args, **kwargs):

        self.encoder = LabelBinarizer(*args, **kwargs)

    def fit(self, x, y=0):

        return self

    def transform(self, x, y=0):

        return self.encoder.transform(x)

In the above code, we kept your code the same, instead of using LabelBinarizer(), use the class we created: MyLabelBinarizer().

Hope this answer helps.

If you wish to learn about Machine Learning then visit this Machine Learning Course.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.5k answers


108k users

Browse Categories