Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Data Science by (17.6k points)

I have a dataframe that I need to group, then subgroup. From the subgroups I need to return what the subgroup is as well as the unique values for a column.

df = pandas.DataFrame({'country': pandas.Series(['US',

'Canada', 'US', 'US']),

'gender': pandas.Series(['male','female','male','female']),

'industry': pandas.Series(['real estate','shipping',

'telecom','real estate']),

'income': pandas.Series([1, 2, 3, 4])})

def subgroup(g):

    return g.groupby(['gender'])

s = df.groupby(['country']).apply(subgroup)

From s, how can I compute the uniques of "industry" as well as which "gender" it's grouped for?

--------------------------------------------

| US     | male   | [real estate, telecom] |

|        |----------------------------------

|        | female | [real estate]          |

--------------------------------------------

| Canada | female | [shipping]             |

--------------------------------------------

1 Answer

0 votes
by (41.4k points)

Use groupby() and unique()

df.groupby(['country','gender'])['industry'].unique()

Output:

country   gender

Canada    female        [shipping]

US        female        [real estate]

          male          [real estate, telecom]

Name: industry, dtype: object

Gain practical exposure with data science projects in Intellipaat's Data Science course online.

Related questions

...