Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Azure by (5.8k points)

I am working on the Walmart Kaggle competition and I'm trying to create a dummy column of the "FinelineNumber" column. For context, df.shape returns (647054, 7). I am trying to make a dummy column for df['FinelineNumber'], which has 5,196 unique values. The results should be a dataframe of shape (647054, 5196), which I then plan to concat to the original dataframe.

Nearly every time I run fineline_dummies = pd.get_dummies(df['FinelineNumber'], prefix='fl'), I get the following error message The kernel appears to have died. It will restart automatically. I am running python 2.7 in jupyter notebook on a MacBookPro with 16GB RAM.

Can someone explain why this is happening (and why it happens most of the time but not every time)? Is it a jupyter notebook or pandas bug? Also, I thought it might have to do with not enough RAM but I get the same error on a Microsoft Azure Machine Learning notebook with >100 GB of RAM. On Azure ML, the kernel dies every time - almost immediately.

1 Answer

0 votes
by (9.6k points)

Even though the VM has huge memory, user is limited in the memory that they have available. Once you have used up the memory, kernel dies. 

You can try .to_sparse() on data frame first before other operations. 

Browse Categories

...