Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have a 2-column DataFrame, column-1 corresponds to customer, column-2 corresponds to the city this customer has visited. The DataFrame looks like the following:

print(df)

    customer    visited_city

0   John        London

1   Mary        Melbourne

2   Steve       Paris

3   John        New_York

4   Peter       New_York

5   Mary        London

6   John        Melbourne

7   John        New_York

I would like to convert the above DataFrame into a row-vector format, such that each row represents a unique user with the row vector indicating the cities visited.

print(wide_format_df)

          London  Melbourne  New_York  Paris

John      1.0        1.0       1.0      0.0

Mary      1.0        1.0       0.0      0.0

Steve     0.0        0.0       0.0      1.0

Peter     0.0        0.0       1.0      0.0

Below is the code I used to generate the wide format. It iterates through each user one by one. I was wondering is there any more efficient way to do so?

import pandas as pd

import numpy as np

UNIQUE_CITIESS = np.sort(df['visited_city'].unique())

p = len(UNIQUE_CITIESS)

unique_customers = df['customer'].unique().tolist()

X = []

for customer in unique_customers:

    x = np.zeros(p)    

    city_visited = np.sort(df[df['customer'] == customer]['visited_city'].unique())

    visited_idx = np.searchsorted(UNIQUE_CITIESS, city_visited)

    x[visited_idx] = 1    

    X.append(x)

wide_format_df = pd.DataFrame(np.array(X), columns=UNIQUE_CITIESS, index=unique_customers)

wide_format_df

1 Answer

0 votes
by (41.4k points)

You can use the below code to convert the dataframe into a row vector format:

df.pivot_table(index='customer', columns='visited_city',aggfunc=len, fill_value=0)=

visited_city  London Melbourne  New_York Paris

customer                                        

John            1        1        1      0

Mary            1        1        0      0

Peter           0        0        1      0

Steve           0        0        0      1

If you want to learn more about Pandas visit this Python Pandas Tutorial.

Browse Categories

...