Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have a dataset that contains out layer and I wanted to replace them with the median value. This is my dataset

id     Age

10236 766105

11993 288

9337 205

38189 88

35555 82

39443 75

10762 74

33847 72

21194 70

39450 70

I am trying to replace the value which is greater than 75 with the median value.

To achieve that I am using the following steps:

  1. replace the values which are greater than 75 with 0
  2. then replace 0 with a median value

I used the code below to achieve but it's giving me the desired result.

df['age'].replace(df.age>75,0,inplace=True)

1 Answer

0 votes
by (36.8k points)

You can use loc for assigning the value, then you replace it with NAN value

median = df.loc[df['Age']<75, 'Age'].median()

df.loc[df.Age > 75, 'Age'] = np.nan

df.fillna(median,inplace=True)

OR you can use np.where

df["Age"] = np.where(df["Age"] >75, median,df['Age'])

OR

df["Age"] = df["Age"].mask(df["Age"] >75, median)

 If you are a beginner and want to know more about Data Science the do check out the Data Science course

You can refer to our Python online course for more information.

Browse Categories

...