Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (16.4k points)

I'm doing multiple linear regression with statsmodels.formula.api (version 0.9.0) on Windows 10. Subsequent to fitting the model and getting the summary with following lines I get outline in synopsis object format.

X_opt  = X[:, [0,1,2,3]]

regressor_OLS = sm.OLS(endog= y, exog= X_opt).fit()

regressor_OLS.summary()

                          OLS Regression Results                            

==============================================================================

Dep. Variable:                      y   R-squared:                       0.951

Model:                            OLS   Adj. R-squared:                  0.948

Method:                 Least Squares   F-statistic:                     296.0

Date:                Wed, 08 Aug 2018   Prob (F-statistic):           4.53e-30

Time:                        00:46:48   Log-Likelihood:                -525.39

No. Observations:                  50   AIC:                             1059.

Df Residuals:                      46   BIC:                             1066.

Df Model:                           3                                         

Covariance Type:            nonrobust                                         

==============================================================================

                 coef    std err          t      P>|t|      [0.025      0.975]

------------------------------------------------------------------------------

const       5.012e+04   6572.353      7.626      0.000    3.69e+04    6.34e+04

x1             0.8057      0.045     17.846      0.000       0.715       0.897

x2            -0.0268      0.051     -0.526      0.602      -0.130       0.076

x3             0.0272      0.016      1.655      0.105      -0.006       0.060

==============================================================================

Omnibus:                       14.838   Durbin-Watson:                   1.282

Prob(Omnibus):                  0.001   Jarque-Bera (JB):               21.442

Skew:                          -0.949   Prob(JB):                     2.21e-05

Kurtosis:                       5.586   Cond. No.                     1.40e+06

==============================================================================

I need to do reverse elimination for P values for importance level 0.05. For this, I need to eliminate the predictor with the most noteworthy P values and run the code once more. 

I needed to know whether there is an approach to remove the P values from the outline object so I can run a loop with the conditional and locate the significant variables without rehashing the means physically.

Thank you

1 Answer

0 votes
by (26.4k points)

The appropriate response from @Michael B functions well, yet requires "recreating" the table. The actual table is very accessible from the summary().tables attribute. Each table in this attribute (which is a list of tables) is a SimpleTable, which has techniques for yielding various arrangements. We would then be able to peruse any of those organizations back as a pd.DataFrame:

import statsmodels.api as sm

model = sm.OLS(y,x)

results = model.fit()

results_summary = results.summary()

# Note that tables is a list. The table at index 1 is the "core" table. Additionally, read_html puts dfs in a list, so we want index 0

results_as_html = results_summary.tables[1].as_html()

pd.read_html(results_as_html, header=0, index_col=0)[0]

Wanna become a Python expert? Come and join the python certification course and get certified.

Watch this video tutorial if you want to know more information.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...