0 votes
1 view
in Python by (7.4k points)

I have the following df,

pct    id

0.3    631

0.2    115

0.1    312

0.2    581

0.01   574

0.09   586

I want to first sort the df by pct,

df.sort_values(by=['pct'], ascending=False, inplace=True)

then adding up pct to 0.8 and count how many rows does that, e.g. top 4 rows in this case; I am wondering whats the best way to it. using pd.eval or pd.query?

1 Answer

0 votes
by (14.6k points)

You can filter by boolean indexing:

df1 = df[df['pct'].cumsum().lt(0.8)]

print (df1)

   pct   id

0  0.3  631

1  0.2  115

3  0.2  581

2  0.1  312

...