2 views

I am trying to alter the data in the panda's df. Using below, where X >=5, I want to change the corresponding Y row to 1. Where X <= -5, I want to change the corresponding Y 0.

# Generate random data

np.random.seed(2)

df = pd.DataFrame(np.random.randint(-10,10,size=(10, 1)), columns=list('X'))

df['X2'] = np.random.randint(1, 20, df.shape[0])

df['Y'] = np.random.randint(0, 2, df.shape[0])

df['Y'] = [y if y <= 5 else 1 for y in df['X']]

df['Y'] = [y if y >= -5 else 0 for y in df['X']]

Out:

X  X2  Y

0  5  11  5

1  5  13  5

2  5   5  5

3 -7   3  0

4  2   8  2

5 -7   7  0

6 -4   2 -4

7  1   8  1

8 -7  14  0

9 -2   8 -2

Intended:

X  X2  Y

0  5  11  1

1  5  13  1

2  5   5  1

3 -7   3  0

4  2   8  Original random int

5 -7   7  0

6 -4   2  Original random int

7  1   8  Original random int

8 -7  14  0

9 -2   8  Original random int

by (36.8k points)

Just use np.where:

import numpy as np

df['Y'] = np.where(df['X'].ge(5),1,df['Y'])

df['Y'] = np.where(df['X'].le(-5),0,df['Y'])

Even better, for multiple conditions, use np.select:

conditions=[df['X'].ge(5),df['X'].le(-5)]

choices=[1,0]

df['Y']=np.select(conditions,choices,default=df['Y'])

Or, if you only want to do it with a list comprehension, use zip:

df['Y'] =[1 if x>=5 else(0 if x<=-5 else y)for x,y in zip(df['X'],df['Y'])]

Output:

original df

X  X2  Y

0  -6  11  1

1 -10  10  0

2   6  15  1

3   9  12  0

4  -2   3  1

5  -5   2  0

6   5   6  1

7  -1  12  0

8   7  10  0

9  -6   9  0

df after np.where

X  X2  Y

0  -6  11  0

1 -10  10  0

2   6  15  1

3   9  12  1

4  -2   3  1

5  -5   2  0

6   5   6  1

7  -1  12  0

8   7  10  1

9  -6   9  0

Learn Python for Data Science Course to improve your technical knowledge.