Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (50.2k points)
edited by

I have created the below multiple linear regression model and from that, I want to calculate the adjusted R-squared. The 'score' method will help me to calculate the r-squared error, but not the adjusted one. Kindly help!

import pandas as pd #import the pandas module

import numpy as np

df = pd.read_csv ('/Users/jeangelj/Documents/training/linexdata.csv', sep=',')

df

       AverageNumberofTickets   NumberofEmployees   ValueofContract Industry

   0              1                    51                  25750    Retail

   1              9                    68                  25000    Services

   2             20                    67                  40000    Services

   3              1                   124                  35000    Retail

   4              8                   124                  25000    Manufacturing

   5             30                   134                  50000    Services

   6             20                   157                  48000    Retail

   7              8                   190                  32000    Retail

   8             20                   205                  70000    Retail

   9             50                   230                  75000    Manufacturing

  10             35                   265                  50000    Manufacturing

  11             65                   296                  75000    Services

  12             35                   336                  50000    Manufacturing

  13             60                   359                  75000    Manufacturing

  14             85                   403                  81000    Services

  15             40                   418                  60000    Retail

  16             75                   437                  53000    Services

  17             85                   451                  90000    Services

  18             65                   465                  70000    Retail

  19             95                   491                  100000   Services

from sklearn.linear_model import LinearRegression

model = LinearRegression()

X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets

model.fit(X, y)

model.score(X, y)

>>0.87764337132340009

1 Answer

0 votes
by (108k points)

Please be informed that there are many different ways to calculate the R^2 and the adjusted R^2, the following are few of them:

from sklearn.linear_model import LinearRegression

model = LinearRegression()

X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets

model.fit(X, y)

SST = SSR + SSE (ref definitions)

# compute with formulas from the theory

yhat = model.predict(X)

SS_Residual = sum((y-yhat)**2)       

SS_Total = sum((y-np.mean(y))**2)     

r_squared = 1 - (float(SS_Residual))/SS_Total

adjusted_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1)

print r_squared, adjusted_r_squared

# 0.877643371323 0.863248473832

# compute with sklearn linear_model, although could not find any function to compute adjusted-r-square directly from documentation

print model.score(X, y), 1 - (1-model.score(X, y))*(len(y)-1)/(len(y)-X.shape[1]-1)

# 0.877643371323 0.863248473832 

# compute with statsmodels, by adding intercept manually

import statsmodels.api as sm

X1 = sm.add_constant(X)

result = sm.OLS(y, X1).fit()

#print dir(result)

print result.rsquared, result.rsquared_adj

# 0.877643371323 0.863248473832

# compute with statsmodels, another way, using formula

import statsmodels.formula.api as sm

result = sm.ols(formula="AverageNumberofTickets ~ NumberofEmployees + ValueofContract", data=df).fit()

#print result.summary()

print result.rsquared, result.rsquared_adj

# 0.877643371323 0.863248473832

For more information regarding the same, do refer to the Python for Data science course that will help you out in a better way.

Browse Categories

...