How do I calculate mean of a column but only including certain rows?

Question

asked Jul 27, 2019 in Data Science by sourav (17.6k points)

I'm working using automobile.csv which can be found in the UCI website. I want to replace some NaNs in normalized losses attribute. I figured that a better way of doing it is by calculating the mean according to the symboling because symboling affects the value of normalized losses.

So if the NaN have a symboling of 3 I only want mean of other normalized losses that have value 3 as their symboling. How do I achieve this?

example table:

symb norm other attrs
1 100 8017 2
1 90 5019 2
-1 20 8017 1
-1 20 8870 1
1 NaN 8305 3
0 10 8305 3
3 200 8221 3

so for NaN I only want mean from other rows with the same symboling

if i use

automobile['normalizedlosses'].fillna(automobile['normalizedlosses'].mean(axis=0), inplace=True)

This would replace all NaN with the same value which I don't want

1 Answer

Shlok Pandey · Answer 1 · 2019-08-01T06:10:21+0000

Use Series.fillna by this Series:

s = automobile.groupby('symb')['norm'].transform('mean') automobile['norm'] = automobile['norm'].fillna(s) print (automobile) symb norm other attrs 0 1 100.0 8017 2 1 1 90.0 5019 2 2 -1 20.0 8017 1 3 -1 20.0 8870 1 4 1 95.0 8305 3 5 0 10.0 8305 3 6 3 200.0 8221 3

How do I calculate mean of a column but only including certain rows?

1 Answer

Related questions

Browse Categories