0 votes
1 view
in Data Science by (17.6k points)

Hey I'm new to Pandas and I just came across df.query().

Why people would use df.query() when you can directly filter your Dataframes using brackets notation ? The official pandas tutorial also seems to prefer the latter approach.

With brackets notation :

df[df['age'] <= 21]

With pandas query method :

df.query('age <= 21')

Besides some of the stylistic or flexibility differences that have been mentioned, is one canonically preferred - namely for performance of operations on large dataframes?

If you wish to learn more about how to use python for data science, then go through data science python programming course by Intellipaat for more insights.

1 Answer

0 votes
by (38.2k points)

Let’s have a sample DF:

In : df

Out:

  sex  age     name

0   M   40      Max

1   F   35     Anna

2   M   29      Joe

3   F   18    Maria

4   F   23  Natalie

Using .query() method has some advantages:

 

 Compared to boolean indexing, it is much shorter and cleaner:

 df.query("20 <= age <= 30 and sex=='F'")

 Output:

  sex  age     name

4   F   23  Natalie

 

 df[(df['age']>=20) & (df['age']<=30) & (df['sex']=='F')]

Out:

  sex  age     name

4   F   23  Natalie

Also,you can prepare queries programmatically:

 

conditions = {'name':'Joe', 'sex':'M'}

 q = ' and '.join(['{}=="{}"'.format(k,v) for k,v in conditions.items()])

Output: 'name=="Joe" and sex=="M"'

 

 df.query(q)

Output:

  sex  age name

2   M   29  Joe

Some disadvantages of PS are:

1.For columns containing spaces or columns that consist only from digits, .query() method cannot be used.

2.Not all functions can be applied or in some cases we have to use engine='python' instead of default engine='numexpr' which is faster.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...