Factors in R are useful because model fitting packages like **lme4** use factors and ordered factors to differentially fit models and determine the type of contrasts to use. And graphing packages also use them to group by. **ggplot** and most model fitting functions coerce character vectors to factors, so the result is the same.

They can be a pain because in **read.table** and read.csv, the argument **stringsAsFactors** is **TRUE **by default (and most users miss this subtlety).

One tricky thing occurs while dropping the factor levels in vectors and data frames using **drop= TRUE**

For example:

In a vector:

s <- iris$Species

> s[s == 'setosa', drop=TRUE]

[1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[12] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[23] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[34] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[45] setosa setosa setosa setosa setosa setosa

Levels: setosa

But in a data frame:

x <- subset(iris, Species == 'setosa', drop=TRUE)

> x$Species

[1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[12] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[23] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[34] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa

[45] setosa setosa setosa setosa setosa setosa

Levels: setosa versicolor virginica

Therefore, you need to use **droplevels()** function in a data frame to drop the unused factor levels for an individual factor.i.e.,

x <- subset(iris, Species == 'setosa')

> levels(x$Species)

[1] "setosa" "versicolor" "virginica"

> x <- droplevels(x)

> levels(x$Species)

[1] "setosa"

In ggplot2. If you fake factors with character vectors, there's a risk that you'll change just one element, and accidentally create a separate new level.