I'm working using automobile.csv which can be found in the UCI website. I want to replace some NaNs in normalized losses attribute. I figured that a better way of doing it is by calculating the mean according to the symboling because symboling affects the value of normalized losses.
So if the NaN have a symboling of 3 I only want mean of other normalized losses that have value 3 as their symboling. How do I achieve this?
example table:
symb norm other attrs
1 100 8017 2
1 90 5019 2
-1 20 8017 1
-1 20 8870 1
1 NaN 8305 3
0 10 8305 3
3 200 8221 3
so for NaN I only want mean from other rows with the same symboling
if i use
automobile['normalizedlosses'].fillna(automobile['normalizedlosses'].mean(axis=0), inplace=True)
This would replace all NaN with the same value which I don't want