# Screening (multi)collinearity in a regression model

1 view

I hope that this one is not going to be "ask-and-answer" question... here goes: (multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them... well, sometimes you don't need to "cure" collinearity, since it doesn't affect the regression model itself, but the interpretation of an effect of individual predictors.

One way to spot collinearity is to put each predictor as a dependent variable, and other predictors as independent variables, determine R2, and if it's larger than .9 (or .95), we can consider predictor redundant. This is one "method"... what about other approaches? Some of them are time-consuming, like excluding predictors from model and watching for b-coefficient changes - they should be noticeably different.

Of course, we must always bear in mind the specific context/goal of the analysis... Sometimes, the only remedy is to repeat the research, but right now, I'm interested in various ways of screening redundant predictors when (multi)collinearity occurs in a regression model.

by (25.3k points)

To screen multicollinearity in a regression model, you can use the kappa() function as follows:

> set.seed(123)

> x1 <- rnorm(100)

> x2 <- rnorm(100)

> x3 <- x1 + 2*x2 + rnorm(100)*0.0001

# so x3 approx a linear comb. of x1+x2

> mm12 <- model.matrix(~ x1 + x2)

# normal model, two indep. regressors

> mm123 <- model.matrix(~ x1 + x2 + x3)

# bad model with near collinearity

> kappa(mm12)

# a 'low' kappa is good

 1.232216

> kappa(mm123)

# a 'high' kappa indicates trouble

 122076.3