2 views

edited

How can I tell R to use a certain level as a reference if I use binary explanatory variables in a regression?

It's just using some level by default.

lm(x ~ y + as.factor(b))

with b {0, 1, 2, 3, 4}. Let's say I want to use 3 instead of the zero that is used by R

edited by

To specify a factor level as a reference in a regression, you can use the relevels() function.

According to R Documentation:

relevel

Reorder Levels of Factor

Description

The levels of a factor are reordered so that the level specified by ref is first and the others are moved down. This is useful for contr.treatment contrasts which take the first level as the reference.

Usage

relevel(x, ref, ...)

Arguments

x

an unordered factor.

ref

the reference level, typically a string.

...

For example:

set.seed(111)

x <- rnorm(100)

DF <- data.frame(x = x,

y = 2 + (1.5*x) + rnorm(100, sd = 2),

b = gl(5, 20))

x          y b

1  0.2352207  3.5520706 1

2 -0.3307359 -0.8167629 1

3 -0.3116238  2.4107511 1

4 -2.3023457 -1.0438110 1

5 -0.1708760  0.3453233 1

6  0.1402782  0.3571660 1

str(DF)

'data.frame': 100 obs. of  3 variables:

\$ x: num  0.235 -0.331 -0.312 -2.302 -0.171 ...

\$ y: num  3.552 -0.817 2.411 -1.044 0.345 ...

\$ b: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...

m1 <- lm(y ~ x + b, data = DF)

To alter the factor levels:

DF\$b = relevel(DF\$b, ref=3)

m2 <- lm(y ~ x + b, data = DF)

Outputs:

Here the two models have estimated different reference levels.i.e.,

coef(m1)

(Intercept)           x          b2          b3          b4          b5

1.86380751  1.34015281  0.36891046  0.03624094  0.75197019 -0.65507558

coef(m2)

(Intercept)           x          b1          b2          b4          b5

1.84948031  1.41392197  0.07761524  0.24765394  0.22572331 -0.10877612