0 votes
1 view
in Data Science by (11.2k points)

I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:

Company_Sector         Company_size  DMU_Final  Joining_Date  Country

Finance and Insurance       10        End User   2010-04-13   France

Public Administration       1         End User   2004-09-22   France

some more columns:

linkedin_shared_connections   online_activity  did_buy   Sale_Date

            11                        65           1      2016-05-23

            13                        100          1      2016-01-12

1 Answer

0 votes
by (16.1k points)

To convert categorical variables to numerical or binary variables, use the following code:

#import libraries from sklearn import preprocessing import pandas as pd #Create a label encoder object and fit to Country Column label_encoder = preprocessing.LabelEncoder() label_encoder.fit(df['Country']) # View the label {France,China,...} list(label_encoder.classes_) # Transform Country Column to Numerical Var label_encoder.transform(df['Country']) # Convert some integers into their category names --->{China,China,France} list(label_encoder.inverse_transform([2, 2, 1]))

...