Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have a 2-column DataFrame, column-1 corresponds to customer, column-2 corresponds to the city this customer has visited. The DataFrame looks like the following:

print(df)

    customer    visited_city

0   John        London

1   Mary        Melbourne

2   Steve       Paris

3   John        New_York

4   Peter       New_York

5   Mary        London

6   John        Melbourne

7   John        New_York

I would like to convert the above DataFrame into a row-vector format, such that each row represents a unique user with the row vector indicating the cities visited.

print(wide_format_df)

          London  Melbourne  New_York  Paris

John      1.0        1.0       1.0      0.0

Mary      1.0        1.0       0.0      0.0

Steve     0.0        0.0       0.0      1.0

Peter     0.0        0.0       1.0      0.0

Below is the code I used to generate the wide format. It iterates through each user one by one. I was wondering is there any more efficient way to do so?

import pandas as pd

import numpy as np

UNIQUE_CITIESS = np.sort(df['visited_city'].unique())

p = len(UNIQUE_CITIESS)

unique_customers = df['customer'].unique().tolist()

X = []

for customer in unique_customers:

    x = np.zeros(p)    

    city_visited = np.sort(df[df['customer'] == customer]['visited_city'].unique())

    visited_idx = np.searchsorted(UNIQUE_CITIESS, city_visited)

    x[visited_idx] = 1    

    X.append(x)

wide_format_df = pd.DataFrame(np.array(X), columns=UNIQUE_CITIESS, index=unique_customers)

wide_format_df

1 Answer

0 votes
by (41.4k points)

You can use the below code to convert the dataframe into a row vector format:

df.pivot_table(index='customer', columns='visited_city',aggfunc=len, fill_value=0)=

visited_city  London Melbourne  New_York Paris

customer                                        

John            1        1        1      0

Mary            1        1        0      0

Peter           0        0        1      0

Steve           0        0        0      1

If you want to learn more about Pandas visit this Python Pandas Tutorial.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.6k answers

500 comments

108k users

Browse Categories

...