Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

Here is my problem, I have a dataframe like this :

Depr_1  Depr_2 Depr_3

S3  0 5   9

S2  4 11  8

S1  6 11  12

S5  0 4   11

S4  4 8   8

and I just want to calculate the mean over the full dataframe, as the following doesn't work :

df.mean()

Then I came up with :

df.mean().mean()

But this trick won't work for computing the standard deviation. My final attempts were :

df.get_values().mean()

df.get_values().std()

Except that in the latter case, it uses mean() and std() function from numpy. It's not a problem for the mean, but it is for std, as the pandas function uses by default ddof=1, unlike the numpy one where ddof=0.

1 Answer

0 votes
by (108k points)

You can simply just convert the dataframe to be a single column with stack (this changes the shape from 5x3 to 15x1) and then take the standard deviation:

df.stack().std()         # pandas default degrees of freedom is 1

Or you can use values to convert from a pandas dataframe to a numpy array before taking the standard deviation:

df.values.std(ddof=1)    # numpy default degrees of freedom is 0

Browse Categories

...