Let’s have a sample DF:
In : df
Out:
sex age name
0 M 40 Max
1 F 35 Anna
2 M 29 Joe
3 F 18 Maria
4 F 23 Natalie
Using .query() method has some advantages:
Compared to boolean indexing, it is much shorter and cleaner:
df.query("20 <= age <= 30 and sex=='F'")
Output:
sex age name
4 F 23 Natalie
df[(df['age']>=20) & (df['age']<=30) & (df['sex']=='F')]
Out:
sex age name
4 F 23 Natalie
Also,you can prepare queries programmatically:
conditions = {'name':'Joe', 'sex':'M'}
q = ' and '.join(['{}=="{}"'.format(k,v) for k,v in conditions.items()])
Output: 'name=="Joe" and sex=="M"'
df.query(q)
Output:
sex age name
2 M 29 Joe
Some disadvantages of PS are:
1.For columns containing spaces or columns that consist only from digits, .query() method cannot be used.
2.Not all functions can be applied or in some cases we have to use engine='python' instead of default engine='numexpr' which is faster.