0 votes
1 view
in Data Science by (17.6k points)

I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here's a subsection of it:

ID      Age    BMI    Risk Factor

PT 6    48     19.3    4

PT 8    43     20.9    NaN

PT 2    39     18.1    3

PT 9    41     19.5    NaN

Some of my columns contain NaN values which I do not want to include into the z-score calculations so I intend to use a solution offered to this question: how to zscore normalize pandas column with nans?

df['zscore'] = (df.a - df.a.mean())/df.a.std(ddof=0)

I'm interested in applying this solution to all of my columns except the ID column to produce a new dataframe which I can save as an Excel file using

df2.to_excel("Z-Scores.xlsx")

So basically; how can I compute z-scores for each column (ignoring NaN values) and push everything into a new dataframe?

SIDENOTE: there is a concept in pandas called "indexing" which intimidates me because I do not understand it well. If indexing is a crucial part of solving this problem, please dumb down your explanation of indexing.

1 Answer

0 votes
by (40.7k points)

1.Build a list from the columns

2.After that, the columns for which you do not want to calculate Z scores,simply remove them.

In [66]:

cols = list(df.columns)

cols.remove('ID')

df[cols]

Out[66]:

   Age  BMI  Risk  Factor

0    6   48  19.3       4

1    8   43  20.9     NaN

2    2   39  18.1       3

3    9   41  19.5     NaN

In [68]:

# now iterate over the remaining columns and create a new zscore column

for col in cols:

    col_zscore = col + '_zscore'

    df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)

df

Out[68]:

   ID  Age  BMI  Risk  Factor  Age_zscore  BMI_zscore  Risk_zscore  \

0  PT    6   48  19.3       4   -0.093250    1.569614    -0.150946   

1  PT    8   43  20.9     NaN    0.652753    0.074744     1.459148   

2  PT    2   39  18.1       3   -1.585258   -1.121153    -1.358517   

3  PT    9   41  19.5     NaN    1.025755   -0.523205     0.050315   

   Factor_zscore  

0              1  

1            NaN  

2             -1  

3            NaN  

If you wish to learn more about Data Science, visit data science tutorial and data science course by Intellipaat.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...