0 votes
1 view
in Data Science by (17.6k points)

I'm working using automobile.csv which can be found in the UCI website. I want to replace some NaNs in normalized losses attribute. I figured that a better way of doing it is by calculating the mean according to the symboling because symboling affects the value of normalized losses.

So if the NaN have a symboling of 3 I only want mean of other normalized losses that have value 3 as their symboling. How do I achieve this?

example table:

symb    norm    other attrs

1        100  8017  2

1        90  5019  2

-1       20   8017  1

-1       20    8870  1

1        NaN    8305  3

0        10   8305  3

3        200  8221  3

so for NaN I only want mean from other rows with the same symboling

if i use

automobile['normalizedlosses'].fillna(automobile['normalizedlosses'].mean(axis=0), inplace=True)

This would replace all NaN with the same value which I don't want

1 Answer

0 votes
by (40.4k points)

Use Series.fillna by this Series:

s = automobile.groupby('symb')['norm'].transform('mean') automobile['norm'] = automobile['norm'].fillna(s) print (automobile) symb norm other attrs 0 1 100.0 8017 2 1 1 90.0 5019 2 2 -1 20.0 8017 1 3 -1 20.0 8870 1 4 1 95.0 8305 3 5 0 10.0 8305 3 6 3 200.0 8221 3

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...