Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here's a subsection of it:

ID      Age    BMI    Risk Factor

PT 6    48     19.3    4

PT 8    43     20.9    NaN

PT 2    39     18.1    3

PT 9    41     19.5    NaN

Some of my columns contain NaN values which I do not want to include into the z-score calculations so I intend to use a solution offered to this question: how to zscore normalize pandas column with nans?

df['zscore'] = (df.a - df.a.mean())/df.a.std(ddof=0)

I'm interested in applying this solution to all of my columns except the ID column to produce a new dataframe which I can save as an Excel file using

df2.to_excel("Z-Scores.xlsx")

So basically; how can I compute z-scores for each column (ignoring NaN values) and push everything into a new dataframe?

SIDENOTE: there is a concept in pandas called "indexing" which intimidates me because I do not understand it well. If indexing is a crucial part of solving this problem, please dumb down your explanation of indexing.

1 Answer

0 votes
by (41.4k points)

1.Build a list from the columns

2.After that, the columns for which you do not want to calculate Z scores,simply remove them.

In [66]:

cols = list(df.columns)

cols.remove('ID')

df[cols]

Out[66]:

   Age  BMI  Risk  Factor

0    6   48  19.3       4

1    8   43  20.9     NaN

2    2   39  18.1       3

3    9   41  19.5     NaN

In [68]:

# now iterate over the remaining columns and create a new zscore column

for col in cols:

    col_zscore = col + '_zscore'

    df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)

df

Out[68]:

   ID  Age  BMI  Risk  Factor  Age_zscore  BMI_zscore  Risk_zscore  \

0  PT    6   48  19.3       4   -0.093250    1.569614    -0.150946   

1  PT    8   43  20.9     NaN    0.652753    0.074744     1.459148   

2  PT    2   39  18.1       3   -1.585258   -1.121153    -1.358517   

3  PT    9   41  19.5     NaN    1.025755   -0.523205     0.050315   

   Factor_zscore  

0              1  

1            NaN  

2             -1  

3            NaN  

If you wish to learn more about Data Science, visit data science tutorial and data science course by Intellipaat.

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
asked Aug 31, 2019 in Data Science by sourav (17.6k points)

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...