Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in R Programming by (5.3k points)

One of the basic data types in R is factors. In my experience factors are basically a pain and I never use them. I always convert to characters. I feel odd like I'm missing something.

Are there some important examples of functions that use factors as grouping variables where the factor data type becomes necessary? Are there specific circumstances when I should be using factors?

1 Answer

0 votes
by

Factors in R  are useful because model fitting packages like lme4 use factors and ordered factors to differentially fit models and determine the type of contrasts to use. And graphing packages also use them to group by. ggplot and most model fitting functions coerce character vectors to factors, so the result is the same. 

They can be a pain because in read.table and read.csv, the argument stringsAsFactors is  TRUE  by default (and most users miss this subtlety).

One tricky thing occurs while dropping the factor levels in vectors and data frames using drop= TRUE

For example:

In a vector:

 s <- iris$Species

> s[s == 'setosa', drop=TRUE]

 [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[12] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[23] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[34] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[45] setosa setosa setosa setosa setosa setosa

Levels: setosa

But in a data frame:

x <- subset(iris, Species == 'setosa', drop=TRUE) 

> x$Species

 [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[12] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[23] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[34] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[45] setosa setosa setosa setosa setosa setosa

Levels: setosa versicolor virginica

Therefore, you need to use droplevels() function in a data frame to drop the unused factor levels for an individual factor.i.e.,

 x <- subset(iris, Species == 'setosa')

> levels(x$Species)

[1] "setosa"     "versicolor" "virginica" 

> x <- droplevels(x)

> levels(x$Species)

[1] "setosa"

 In ggplot2. If you fake factors with character vectors, there's a risk that you'll change just one element, and accidentally create a separate new level.

Browse Categories

...