Converting statsmodels summary object to pandas dataframe

Question

asked Jan 31, 2021 in Python by laddulakshana (16.4k points)

I'm doing multiple linear regression with statsmodels.formula.api (version 0.9.0) on Windows 10. Subsequent to fitting the model and getting the summary with following lines I get outline in synopsis object format.

X_opt = X[:, [0,1,2,3]]
regressor_OLS = sm.OLS(endog= y, exog= X_opt).fit()
regressor_OLS.summary()
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.951
Model: OLS Adj. R-squared: 0.948
Method: Least Squares F-statistic: 296.0
Date: Wed, 08 Aug 2018 Prob (F-statistic): 4.53e-30
Time: 00:46:48 Log-Likelihood: -525.39
No. Observations: 50 AIC: 1059.
Df Residuals: 46 BIC: 1066.
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 5.012e+04 6572.353 7.626 0.000 3.69e+04 6.34e+04
x1 0.8057 0.045 17.846 0.000 0.715 0.897
x2 -0.0268 0.051 -0.526 0.602 -0.130 0.076
x3 0.0272 0.016 1.655 0.105 -0.006 0.060
==============================================================================
Omnibus: 14.838 Durbin-Watson: 1.282
Prob(Omnibus): 0.001 Jarque-Bera (JB): 21.442
Skew: -0.949 Prob(JB): 2.21e-05
Kurtosis: 5.586 Cond. No. 1.40e+06
==============================================================================

I need to do reverse elimination for P values for importance level 0.05. For this, I need to eliminate the predictor with the most noteworthy P values and run the code once more.

I needed to know whether there is an approach to remove the P values from the outline object so I can run a loop with the conditional and locate the significant variables without rehashing the means physically.

Thank you

1 Answer

hari_sh · Answer 1 · 2021-01-31T11:29:37+0000

The appropriate response from @Michael B functions well, yet requires "recreating" the table. The actual table is very accessible from the summary().tables attribute. Each table in this attribute (which is a list of tables) is a SimpleTable, which has techniques for yielding various arrangements. We would then be able to peruse any of those organizations back as a pd.DataFrame:

import statsmodels.api as sm
model = sm.OLS(y,x)
results = model.fit()
results_summary = results.summary()
# Note that tables is a list. The table at index 1 is the "core" table. Additionally, read_html puts dfs in a list, so we want index 0
results_as_html = results_summary.tables[1].as_html()
pd.read_html(results_as_html, header=0, index_col=0)[0]

Wanna become a Python expert? Come and join the python certification course and get certified.

Watch this video tutorial if you want to know more information.

Converting statsmodels summary object to pandas dataframe

1 Answer

Related questions

Browse Categories