Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Data Science by (17.6k points)

I have a dataframe that I need to group, then subgroup. From the subgroups I need to return what the subgroup is as well as the unique values for a column.

df = pandas.DataFrame({'country': pandas.Series(['US',

'Canada', 'US', 'US']),

'gender': pandas.Series(['male','female','male','female']),

'industry': pandas.Series(['real estate','shipping',

'telecom','real estate']),

'income': pandas.Series([1, 2, 3, 4])})

def subgroup(g):

    return g.groupby(['gender'])

s = df.groupby(['country']).apply(subgroup)

From s, how can I compute the uniques of "industry" as well as which "gender" it's grouped for?

--------------------------------------------

| US     | male   | [real estate, telecom] |

|        |----------------------------------

|        | female | [real estate]          |

--------------------------------------------

| Canada | female | [shipping]             |

--------------------------------------------

1 Answer

0 votes
by (41.4k points)

Use groupby() and unique()

df.groupby(['country','gender'])['industry'].unique()

Output:

country   gender

Canada    female        [shipping]

US        female        [real estate]

          male          [real estate, telecom]

Name: industry, dtype: object

Gain practical exposure with data science projects in Intellipaat's Data Science course online.

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.6k answers

500 comments

108k users

Browse Categories

...