0 votes
1 view
in Data Science by (17.6k points)

I want one of my ONLY ONE of my features to be converted to a separate binary features:

df["pattern_id"]

Out[202]: 

0       3

1       3

...

7440    2

7441    2

7442    3

Name: pattern_id, Length: 7443, dtype: int64 

df["pattern_id"]

Out[202]: 

0       0 0 1

1       0 0 1

...

7440    0 1 0

7441    0 1 0

7442    0 0 1

Name: pattern_id, Length: 7443, dtype: int64 

I want to use OneHotEncoder, data is int, so no need to encode it:

onehotencoder = OneHotEncoder(categorical_features=["pattern_id"])

df = onehotencoder.fit_transform(df).toarray()

ValueError: could not convert string to float: 'http://www.zaragoza.es/sedeelectronica/'

Interesting enough I receive an error... sklearn tried to encode another column, not the one I wanted.

We have to encode pattern_id to be an integer value

I used this link: Issue with OneHotEncoder for categorical features

#transform the pattern_id feature to int

encoding_feature = ["pattern_id"]

enc = LabelEncoder()

enc.fit(encoding_feature)

working_feature = enc.transform(encoding_feature)

working_feature = working_feature.reshape(-1, 1)

ohe = OneHotEncoder(sparse=False)

#convert the pattern_id feature to separate binary features

onehotencoder = OneHotEncoder(categorical_features=working_feature, sparse=False)

df = onehotencoder.fit_transform(df).toarray()

And I get the same error. What am I doing wrong ?

Edit

source: https://github.com/martin-varbanov96/scraper/blob/master/logo_scrape/logo_scrape/analysis.py

df

Out[259]: 

      found_img  is_http                                           link_img  \

0          True        0                                  img/aahoteles.svg   

//www.zaragoza.es/cont/paginas/img/sede/logo_e...   

      pattern_id                                       current_link  site_id  \

0              3             https://www.aa-hoteles.com/es/reservas        3   

6              3      https://www.aa-hoteles.com/es/ofertas-hoteles        3   

7              2           http://about.pressreader.com/contact-us/        4   

8              3           http://about.pressreader.com/contact-us/        4   

      status                                   link_id  

0        200               https://www.aa-hoteles.com/  

1        200               https://www.365travel.asia/  

2        200               https://www.365travel.asia/  

3        200               https://www.365travel.asia/  

4        200               https://www.aa-hoteles.com/  

5        200               https://www.aa-hoteles.com/  

6        200               https://www.aa-hoteles.com/  

7        200              http://about.pressreader.com  

8        200              http://about.pressreader.com  

9        200               https://www.365travel.asia/  

10       200               https://www.365travel.asia/  

11       200               https://www.365travel.asia/  

12       200               https://www.365travel.asia/  

13       200               https://www.365travel.asia/  

14       200               https://www.365travel.asia/  

15       200               https://www.365travel.asia/  

16       200               https://www.365travel.asia/  

17       200               https://www.365travel.asia/  

18       200              http://about.pressreade 

[7443 rows x 8 columns]

1 Answer

0 votes
by (38.2k points)

You can make the following changes in your code so that it works fine.

import pandas as pd

from sklearn.preprocessing import OneHotEncoder

# Create a dataframe of random ints

df = pd.DataFrame(np.random.randint(0, 4, size=(100, 4)),

                  columns=['pattern_id', 'B', 'C', 'D'])

onehotencoder = OneHotEncoder(categorical_features=[df.columns.tolist().index('pattern_id')])

df = onehotencoder.fit_transform(df)

You can refer to the documentation of OneHotEncoder.

Learn SK Learn with the help of this Scikit Learn Tutorial.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...