Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (50.2k points)
I am working with Pandas dataframe and I want to breakdown the variance into one variable.

Say for instance, if I am having a column named 'Degrees', and I have this indexed for various dates, cities, and night vs. day, from that, I want to obtain what portion of the variation in this series is occurring from cross-sectional city variation...

In Stata, I have used Fixed effects and have looked at the R^2. Primarily, I want to see the ANOVA breakdown of "Degrees" by the other three columns.

1 Answer

0 votes
by (108k points)

I have set up a direct observation to test them, found that their opinions can vary, below is an illustration of ANOVA on a pandas dataframe resembling R's results:

import pandas as pd

import statsmodels.api as sm

from statsmodels.formula.api import ols

# R code on an R sample dataset

#> anova(with(ChickWeight, lm(weight ~ Time + Diet)))

#Analysis of Variance Table

#

#Response: weight

#           Df  Sum Sq Mean Sq  F value    Pr(>F)

#Time        1 2042344 2042344 1576.460 < 2.2e-16 ***

#Diet        3  129876   43292   33.417 < 2.2e-16 ***

#Residuals 573  742336    1296

#write.csv(file='ChickWeight.csv', x=ChickWeight, row.names=F)

cw = pd.read_csv('ChickWeight.csv')

cw_lm=ols('weight ~ Time + C(Diet)', data=cw).fit() #Specify C for Categorical

print(sm.stats.anova_lm(cw_lm, typ=2))

#                  sum_sq   df            F         PR(>F)

#C(Diet)    129876.056995    3    33.416570   6.473189e-20

#Time      2016357.148493    1  1556.400956  1.803038e-165

#Residual   742336.119560  573          NaN            NaN

Kick-start your career in Python with the perfect Python online course now!

 

Browse Categories

...