Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

I'm starting with input data like this

df1 = pandas.DataFrame( { 

"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

Which when printed appears like this:

    City      Name 

0  Seattle     Alice 

1  Seattle     Bob 

2  Portland   Mallory 

3  Seattle     Mallory 

4  Seattle     Bob 

5  Portland   Mallory

Grouping is simple enough:

g1 = df1.groupby( [ "Name", "City"] ).count()

and printing yields a GroupBy object:

City      Name    Name City 

Alice     Seattle      1      1 

Bob       Seattle      2      2 

Mallory   Portland     2      2 

          Seattle      1      1

But what I want eventually is another DataFrame object that contains all the rows in the GroupBy object. In other words, I want to get the following result:

City    Name       Name    City 

Alice   Seattle      1       1 

Bob     Seattle      2       2 

Mallory Portland    2        2 

Mallory Seattle     1       1

I can't quite see how to accomplish this in the pandas documentation. Any hints would be welcome.

2 Answers

0 votes
by (33.1k points)

You can simply use .reset_index() method with .groupby() function for your problem.

For example:

In [1]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index() 


   Name    City     count 

0 Alice   Seattle 1 

1 Bob     Seattle 2 

2 Mallory Portland  2 

3 Mallory Seattle    1

Or you can use:

In[2]: df1.groupby( [ "Name", "City"] ).size().to_frame(name = 'count').reset_index()


   Name    City     count 

0 Alice     Seattle 1 

1 Bob      Seattle 2 

2 Mallory Portland  2 

3 Mallory Seattle    1

Hope this answer helps.

0 votes
by (11.4k points)

Simply, do this:

import pandas as pd

grouped_df = df1.groupby( [ "Name", "City"] )

pd.DataFrame(grouped_df.size().reset_index(name = "Group_Count"))

Here, grouped_df.size() pulls up the unique groupby count, and reset_index() method resets the name of the column you want it to be. After that, the pandas Dataframe() function is called upon to create DataFrame object.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

29.3k questions

30.6k answers


104k users

Browse Categories