Back
I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:
Company_Sector Company_size DMU_Final Joining_Date CountryFinance and Insurance 10 End User 2010-04-13 FrancePublic Administration 1 End User 2004-09-22 France
Company_Sector Company_size DMU_Final Joining_Date Country
Finance and Insurance 10 End User 2010-04-13 France
Public Administration 1 End User 2004-09-22 France
some more columns:
linkedin_shared_connections online_activity did_buy Sale_Date 11 65 1 2016-05-23 13 100 1 2016-01-12
linkedin_shared_connections online_activity did_buy Sale_Date
11 65 1 2016-05-23
13 100 1 2016-01-12
To convert categorical variables to numerical or binary variables, use the following code:
#import libraries from sklearn import preprocessing import pandas as pd #Create a label encoder object and fit to Country Column label_encoder = preprocessing.LabelEncoder() label_encoder.fit(df['Country']) # View the label {France,China,...} list(label_encoder.classes_) # Transform Country Column to Numerical Var label_encoder.transform(df['Country']) # Convert some integers into their category names --->{China,China,France} list(label_encoder.inverse_transform([2, 2, 1]))
31k questions
32.8k answers
501 comments
693 users