2 views

There seems to be a difference between levels and labels of a factor in R. Up to now, I always thought that levels were the 'real' name of factor levels, and labels were the names used for output (such as tables and plots). Obviously, this is not the case, as the following example shows:

df <- data.frame(v=c(1,2,3),f=c('a','b','c'))

str(df)

'data.frame':   3 obs. of  2 variables:

\$ v: num  1 2 3

\$ f: Factor w/ 3 levels "a","b","c": 1 2 3

df\$f <- factor(df\$f, levels=c('a','b','c'),

labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))

levels(df\$f)

[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"

I thought that the levels ('a','b','c') could somehow still be accessed when scripting, but this doesn't work:

> df\$f=='a'

[1] FALSE FALSE FALSE

But this does:

> df\$f=='Treatment A: XYZ'

[1]  TRUE FALSE FALSE

So, my question consists of two parts:

• What's the difference between levels and labels?

• Is it possible to have different names for factor levels for scripting and output?

Background: For longer scripts, scripting with short factor levels seems to be much easier. However, for reports and plots, this short factor levels may not be adequate and should be replaced with preciser names.

by

In the factor function, levels are the input and labels are the output. A factor has only a level attribute, which is set by the labels argument in the factor() function.

For example:

df <- data.frame(v=c(1,2,3),f=c('a','b','c'))

> str(df)

'data.frame': 3 obs. of  2 variables:

\$ v: num  1 2 3

\$ f: Factor w/ 3 levels "a","b","c": 1 2 3

> df\$f <- factor(df\$f, levels=c('a','b','c'),

+                labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))

> levels(df\$f)

[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"

There is a vector df\$f

• that is converted to a factor,
• with different levels as a, b, and c and,
• the levels are to be labeled as Treatment A etc.

The factor function will convert the values a, b and c, to numerical factor classes, and add the label values to the level attribute of the factor. This attribute is used to convert the internal numerical values to the correct labels. But there is no label attribute.i.e.,

df <- data.frame(v=c(1,2,3),f=c('a','b','c'))

> attributes(df\$f)

\$levels

[1] "a" "b" "c"

\$class

[1] "factor"

> df\$f <- factor(df\$f, levels=c('a','b','c'),

+                     labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX'))

> attributes(df\$f)

\$levels

[1] "Treatment A: XYZ" "Treatment B: YZX" "Treatment C: ZYX"

\$class

[1] "factor"