Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I'm working with a table with the following format: enter image description here

I would like to pivot it using:

user_product_rating = df.pivot_table(index='review/userId', columns='product/productId', values='review/score')

The problem is that there are 80k records in original df. Both Google Colab and my computer are running out of ram. Is there any efficient way to achieve the same results?

Edit: Data I'm using Cell_Phones_&_Accessories.txt.gz. I can't time it, always crashes.

1 Answer

0 votes
by (41.4k points)

Here, you can do these two things:

1. The 'review/score' column is not an integer type. So, try to convert it to int type.

 

2.You can use groupby as you have to run only one operation in your pivot table:

 

user_product_rating = df.groupby(['review/userID'])['product/productID'].mean()

Browse Categories

...