Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have a pandas DataFrame called data with a column called ms. I want to eliminate all the rows where data.ms is above the 95% percentile. For now, I'm doing this:

limit = data.ms.describe(90)['95%']

valid_data = data[data['ms'] < limit]

which works, but I want to generalize that to any percentile. What's the best way to do that?

1 Answer

0 votes
by (41.4k points)

Using Series.quantile() method:

In [48]: cols = list('abc')

In [49]: df = DataFrame(randn(10, len(cols)), columns=cols)

In [50]: df.a.quantile(0.95)

Out[50]: 1.5776961953820687

Eliminating the rows of df where df.a is above 95 percentile:

In [72]: df[df.a < df.a.quantile(.95)]

Out[72]:

       a      b      c

0 -1.044 -0.247 -1.149

2  0.395  0.591  0.764

3 -0.564 -2.059  0.232

4 -0.707 -0.736 -1.345

5  0.978 -0.099  0.521

6 -0.974  0.272 -0.649

7  1.228  0.619 -0.849

8 -0.170  0.458 -0.515

9  1.465  1.019  0.966

Browse Categories

...