Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Data Science by (50.2k points)
edited by

despite there being at least two good tutorials on how to index a DataFrame in Python's pandas library, I still can't work out an elegant way of SELECTing on more than one column.

>>> d = pd.DataFrame({'x':[1, 2, 3, 4, 5], 'y':[4, 5, 6, 7, 8]})

>>> d

   x  y

0  1  4

1  2  5

2  3  6

3  4  7

4  5  8

>>> d[d['x']>2] # This works fine

   x  y

2  3  6

3  4  7

4  5  8

>>> d[d['x']>2 & d['y']>7] # I had expected this to work, but it doesn't

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I have found (what I think is) a rather inelegant way of doing it, like this

>>> d[d['x']>2][d['y']>7]

But it's not pretty, and it scores fairly low for readability (I think).

Is there a better, more Python-tastic way?

1 Answer

0 votes
by (107k points)

It is just a precedence operator issue.

You have to add extra parenthesis to make your multi-condition test working:

d[(d['x']>2) & (d['y']>7)]

You can refer the following link for more information  regarding the same:

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html 

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...