Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (16.4k points)

I'm doing multiple linear regression with statsmodels.formula.api (version 0.9.0) on Windows 10. Subsequent to fitting the model and getting the summary with following lines I get outline in synopsis object format.

X_opt  = X[:, [0,1,2,3]]

regressor_OLS = sm.OLS(endog= y, exog= X_opt).fit()

regressor_OLS.summary()

                          OLS Regression Results                            

==============================================================================

Dep. Variable:                      y   R-squared:                       0.951

Model:                            OLS   Adj. R-squared:                  0.948

Method:                 Least Squares   F-statistic:                     296.0

Date:                Wed, 08 Aug 2018   Prob (F-statistic):           4.53e-30

Time:                        00:46:48   Log-Likelihood:                -525.39

No. Observations:                  50   AIC:                             1059.

Df Residuals:                      46   BIC:                             1066.

Df Model:                           3                                         

Covariance Type:            nonrobust                                         

==============================================================================

                 coef    std err          t      P>|t|      [0.025      0.975]

------------------------------------------------------------------------------

const       5.012e+04   6572.353      7.626      0.000    3.69e+04    6.34e+04

x1             0.8057      0.045     17.846      0.000       0.715       0.897

x2            -0.0268      0.051     -0.526      0.602      -0.130       0.076

x3             0.0272      0.016      1.655      0.105      -0.006       0.060

==============================================================================

Omnibus:                       14.838   Durbin-Watson:                   1.282

Prob(Omnibus):                  0.001   Jarque-Bera (JB):               21.442

Skew:                          -0.949   Prob(JB):                     2.21e-05

Kurtosis:                       5.586   Cond. No.                     1.40e+06

==============================================================================

I need to do reverse elimination for P values for importance level 0.05. For this, I need to eliminate the predictor with the most noteworthy P values and run the code once more. 

I needed to know whether there is an approach to remove the P values from the outline object so I can run a loop with the conditional and locate the significant variables without rehashing the means physically.

Thank you

1 Answer

0 votes
by (26.4k points)

The appropriate response from @Michael B functions well, yet requires "recreating" the table. The actual table is very accessible from the summary().tables attribute. Each table in this attribute (which is a list of tables) is a SimpleTable, which has techniques for yielding various arrangements. We would then be able to peruse any of those organizations back as a pd.DataFrame:

import statsmodels.api as sm

model = sm.OLS(y,x)

results = model.fit()

results_summary = results.summary()

# Note that tables is a list. The table at index 1 is the "core" table. Additionally, read_html puts dfs in a list, so we want index 0

results_as_html = results_summary.tables[1].as_html()

pd.read_html(results_as_html, header=0, index_col=0)[0]

Wanna become a Python expert? Come and join the python certification course and get certified.

Watch this video tutorial if you want to know more information.

Browse Categories

...