Factors in R are useful because model fitting packages like lme4 use factors and ordered factors to differentially fit models and determine the type of contrasts to use. And graphing packages also use them to group by. ggplot and most model fitting functions coerce character vectors to factors, so the result is the same.
They can be a pain because in read.table and read.csv, the argument stringsAsFactors is TRUE by default (and most users miss this subtlety).
One tricky thing occurs while dropping the factor levels in vectors and data frames using drop= TRUE
For example:
In a vector:
s <- iris$Species
> s[s == 'setosa', drop=TRUE]
[1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
[12] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
[23] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
[34] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
[45] setosa setosa setosa setosa setosa setosa
Levels: setosa
But in a data frame:
x <- subset(iris, Species == 'setosa', drop=TRUE)
> x$Species
[1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
[12] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
[23] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
[34] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
[45] setosa setosa setosa setosa setosa setosa
Levels: setosa versicolor virginica
Therefore, you need to use droplevels() function in a data frame to drop the unused factor levels for an individual factor.i.e.,
x <- subset(iris, Species == 'setosa')
> levels(x$Species)
[1] "setosa" "versicolor" "virginica"
> x <- droplevels(x)
> levels(x$Species)
[1] "setosa"
In ggplot2. If you fake factors with character vectors, there's a risk that you'll change just one element, and accidentally create a separate new level.