# Subsetting multi-hierarchical data in pandas

0 votes
1 view

I'm successfully using the groupby() function to compute statistics on grouped data, however, I'd now like to do the same for subsets of each group.

I can't seem to understand how to generate a subset for each group (as a groupby object) that can then be applied to a groupby function such as mean(). The following line works as intended:

d.groupby(['X','Y'])['Value'].mean()

How can I subset the values of the individual groups to then supply to the mean function? I suspect transform() or filter() might be useful though I can't figure out how.

EDIT to add reproducible example:

random.seed(881)

value = np.random.randn(15)

letter = np.random.choice(['a','b','c'],15)

date = np.repeat(pd.date_range(start = '1/1/2001', periods=3), 5)

data = {'date':date,'letter':letter,'value':value}

df = pd.DataFrame(data)

df.groupby(['date','letter'])['value'].mean()

date        letter

2001-01-01   a        -0.039407

b        -0.350787

c         1.221200

2001-01-02   a        -0.688744

b         0.346961

c        -0.702222

2001-01-03   a         1.320947

b        -0.915636

c        -0.419655

Name: value, dtype: float64

Here's an example of calculating the mean of the multi-level group. Now I'd like to find the mean of a subset of each group. For example, the mean of each groups data that is < the groups 10th percentile. The key take away being that the subsets must be performed on the groups and not the entire df first.

## 1 Answer

0 votes
by (41.4k points)

Use quantile() function that you can add to groupby().apply() statement.

Now, if you want to get the 10th percentile use quantile(.1)

df.groupby(['date','letter'])['value'].apply(lambda g: g[g <= g.quantile(.1)].mean())

Gain practical exposure with data science projects in Intellipaat's Data Science course online.

0 votes
1 answer
+9 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer