0 votes
1 view
in Machine Learning by (12.4k points)

I'm starting with input data like this

df1 = pandas.DataFrame( { 

"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

Which when printed appears like this:

    City      Name 

0  Seattle     Alice 

1  Seattle     Bob 

2  Portland   Mallory 

3  Seattle     Mallory 

4  Seattle     Bob 

5  Portland   Mallory

Grouping is simple enough:

g1 = df1.groupby( [ "Name", "City"] ).count()

and printing yields a GroupBy object:

City      Name    Name City 

Alice     Seattle      1      1 

Bob       Seattle      2      2 

Mallory   Portland     2      2 

          Seattle      1      1

But what I want eventually is another DataFrame object that contains all the rows in the GroupBy object. In other words, I want to get the following result:

City    Name       Name    City 

Alice   Seattle      1       1 

Bob     Seattle      2       2 

Mallory Portland    2        2 

Mallory Seattle     1       1

I can't quite see how to accomplish this in the pandas documentation. Any hints would be welcome.

2 Answers

0 votes
by (31.9k points)

You can simply use .reset_index() method with .groupby() function for your problem.

For example:


In [1]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index() 

Out[1]:

   Name    City     count 

0 Alice   Seattle 1 

1 Bob     Seattle 2 

2 Mallory Portland  2 

3 Mallory Seattle    1

Or you can use:

In[2]: df1.groupby( [ "Name", "City"] ).size().to_frame(name = 'count').reset_index()

Out[2]:

   Name    City     count 

0 Alice     Seattle 1 

1 Bob      Seattle 2 

2 Mallory Portland  2 

3 Mallory Seattle    1

Hope this answer helps.

0 votes
by (11.5k points)

Simply, do this:

import pandas as pd

grouped_df = df1.groupby( [ "Name", "City"] )

pd.DataFrame(grouped_df.size().reset_index(name = "Group_Count"))


Here, grouped_df.size() pulls up the unique groupby count, and reset_index() method resets the name of the column you want it to be. After that, the pandas Dataframe() function is called upon to create DataFrame object.

...