To specify a factor level as a reference in a regression, you can use the relevels() function.
According to R Documentation:
relevel
Reorder Levels of Factor
Description
The levels of a factor are reordered so that the level specified by ref is first and the others are moved down. This is useful for contr.treatment contrasts which take the first level as the reference.
Usage
relevel(x, ref, ...)
Arguments
x
an unordered factor.
ref
the reference level, typically a string.
...
additional arguments for future methods.
For example:
set.seed(111)
x <- rnorm(100)
DF <- data.frame(x = x,
y = 2 + (1.5*x) + rnorm(100, sd = 2),
b = gl(5, 20))
head(DF)
x y b
1 0.2352207 3.5520706 1
2 -0.3307359 -0.8167629 1
3 -0.3116238 2.4107511 1
4 -2.3023457 -1.0438110 1
5 -0.1708760 0.3453233 1
6 0.1402782 0.3571660 1
str(DF)
'data.frame': 100 obs. of 3 variables:
$ x: num 0.235 -0.331 -0.312 -2.302 -0.171 ...
$ y: num 3.552 -0.817 2.411 -1.044 0.345 ...
$ b: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
m1 <- lm(y ~ x + b, data = DF)
To alter the factor levels:
DF$b = relevel(DF$b, ref=3)
m2 <- lm(y ~ x + b, data = DF)
Outputs:
Here the two models have estimated different reference levels.i.e.,
coef(m1)
(Intercept) x b2 b3 b4 b5
1.86380751 1.34015281 0.36891046 0.03624094 0.75197019 -0.65507558
coef(m2)
(Intercept) x b1 b2 b4 b5
1.84948031 1.41392197 0.07761524 0.24765394 0.22572331 -0.10877612