Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the respective modes of those two columns. I can get the modes easily:

mode = df.filter(["workclass", "native-country"]).mode()

which returns a dataframe:

workclass native-country

0 Private United-States

However,

df.filter(["workclass", "native-country"]).fillna(mode)

does not replace the NaNs in each column with anything, let alone the mode corresponding to that column. Is there a smooth way to do this?

1 Answer

0 votes
by (41.4k points)

You can simply use this line of code.

cols = ["workclass", "native-country"]

df[cols]=df[cols].fillna(df.mode().iloc[0])

or instead of fillna(df.mode().iloc[0]), you can use  fillna(mode.iloc[0])

Example:

import pandas as pn

df={

    'P3': [7,9,9,9,3],

    'P2': [8,8,9],

    'P1': [8,9,9],

}

df=pn.DataFrame.from_dict(d,orient='index').transpose()

Then df is

    P3  P2   P1

0   7   8    8

1   9   8    9

2   9   9    9

3   9  NaN   NaN

4   3  NaN   NaN

After this,

l=df.filter(["P1", "P2"]).mode()

df[["P1", "P2"]]=df[["P1", "P2"]].fillna(value=l.iloc[0])

we get that df is

     P3   P2  P1

0   7   8    8

1   9   8    9

2   9   9    9

3   9   8    9

4   3   8    9

If you want to be build successful data science career then enroll for best data science certification.

Browse Categories

...