Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Data Science by (17.6k points)

I want one of my ONLY ONE of my features to be converted to a separate binary features:

df["pattern_id"]

Out[202]: 

0       3

1       3

...

7440    2

7441    2

7442    3

Name: pattern_id, Length: 7443, dtype: int64 

df["pattern_id"]

Out[202]: 

0       0 0 1

1       0 0 1

...

7440    0 1 0

7441    0 1 0

7442    0 0 1

Name: pattern_id, Length: 7443, dtype: int64 

I want to use OneHotEncoder, data is int, so no need to encode it:

onehotencoder = OneHotEncoder(categorical_features=["pattern_id"])

df = onehotencoder.fit_transform(df).toarray()

ValueError: could not convert string to float: 'http://www.zaragoza.es/sedeelectronica/'

Interesting enough I receive an error... sklearn tried to encode another column, not the one I wanted.

We have to encode pattern_id to be an integer value

I used this link: Issue with OneHotEncoder for categorical features

#transform the pattern_id feature to int

encoding_feature = ["pattern_id"]

enc = LabelEncoder()

enc.fit(encoding_feature)

working_feature = enc.transform(encoding_feature)

working_feature = working_feature.reshape(-1, 1)

ohe = OneHotEncoder(sparse=False)

#convert the pattern_id feature to separate binary features

onehotencoder = OneHotEncoder(categorical_features=working_feature, sparse=False)

df = onehotencoder.fit_transform(df).toarray()

And I get the same error. What am I doing wrong ?

Edit

source: https://github.com/martin-varbanov96/scraper/blob/master/logo_scrape/logo_scrape/analysis.py

df

Out[259]: 

      found_img  is_http                                           link_img  \

0          True        0                                  img/aahoteles.svg   

//www.zaragoza.es/cont/paginas/img/sede/logo_e...   

      pattern_id                                       current_link  site_id  \

0              3             https://www.aa-hoteles.com/es/reservas        3   

6              3      https://www.aa-hoteles.com/es/ofertas-hoteles        3   

7              2           http://about.pressreader.com/contact-us/        4   

8              3           http://about.pressreader.com/contact-us/        4   

      status                                   link_id  

0        200               https://www.aa-hoteles.com/  

1        200               https://www.365travel.asia/  

2        200               https://www.365travel.asia/  

3        200               https://www.365travel.asia/  

4        200               https://www.aa-hoteles.com/  

5        200               https://www.aa-hoteles.com/  

6        200               https://www.aa-hoteles.com/  

7        200              http://about.pressreader.com  

8        200              http://about.pressreader.com  

9        200               https://www.365travel.asia/  

10       200               https://www.365travel.asia/  

11       200               https://www.365travel.asia/  

12       200               https://www.365travel.asia/  

13       200               https://www.365travel.asia/  

14       200               https://www.365travel.asia/  

15       200               https://www.365travel.asia/  

16       200               https://www.365travel.asia/  

17       200               https://www.365travel.asia/  

18       200              http://about.pressreade 

[7443 rows x 8 columns]

1 Answer

0 votes
by (41.4k points)

You can make the following changes in your code so that it works fine.

import pandas as pd

from sklearn.preprocessing import OneHotEncoder

# Create a dataframe of random ints

df = pd.DataFrame(np.random.randint(0, 4, size=(100, 4)),

                  columns=['pattern_id', 'B', 'C', 'D'])

onehotencoder = OneHotEncoder(categorical_features=[df.columns.tolist().index('pattern_id')])

df = onehotencoder.fit_transform(df)

You can refer to the documentation of OneHotEncoder.

Learn SK Learn with the help of this Scikit Learn Tutorial.

Browse Categories

...