Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Data Science by (50.2k points)
edited by

despite there being at least two good tutorials on how to index a DataFrame in Python's pandas library, I still can't work out an elegant way of SELECTing on more than one column.

>>> d = pd.DataFrame({'x':[1, 2, 3, 4, 5], 'y':[4, 5, 6, 7, 8]})

>>> d

   x  y

0  1  4

1  2  5

2  3  6

3  4  7

4  5  8

>>> d[d['x']>2] # This works fine

   x  y

2  3  6

3  4  7

4  5  8

>>> d[d['x']>2 & d['y']>7] # I had expected this to work, but it doesn't

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I have found (what I think is) a rather inelegant way of doing it, like this

>>> d[d['x']>2][d['y']>7]

But it's not pretty, and it scores fairly low for readability (I think).

Is there a better, more Python-tastic way?

1 Answer

0 votes
by (108k points)

It is just a precedence operator issue.

You have to add extra parenthesis to make your multi-condition test working:

d[(d['x']>2) & (d['y']>7)]

You can refer the following link for more information  regarding the same:

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html 

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

Browse Categories

...