0 votes
1 view
in R Programming by (5k points)

I have a data.frame with character data in one of the columns. I would like to filter multiple options in the data.frame from the same column. Is there an easy way to do this that I'm missing?

Example: data.frame name = dat

days      name

88        Lynn

11          Tom

2           Chris

5           Lisa

22        Kyla

1          Tom

222      Lynn

2         Lynn

I'd like to filter out Tom and Lynn for example.

When I do:

target <- c("Tom", "Lynn")

filt <- filter(dat, name == target)

I get this error:

longer object length is not a multiple of shorter object length

1 Answer

0 votes
by (23.2k points)

To filter multiple values in a string column using dplyr, you can use the %in% operator as follows:

df <- data.frame(days = c(88, 11, 2, 5, 22, 1, 222, 2),

                 name = c("Lynn", "Tom", "Chris", "Lisa", "Kyla", "Tom", "Lynn", "Lynn"))

> df

  days  name

1   88  Lynn

2   11   Tom

3    2 Chris

4    5  Lisa

5   22  Kyla

6    1   Tom

7  222  Lynn

8    2  Lynn

library(dplyr)

target <- c("Tom", "Lynn")

filter(df, name %in% target)

  days name

1   88 Lynn

2   11  Tom

3    1  Tom

4  222 Lynn

5    2 Lynn

 Basically, the statement dat$name == target is equivalent to saying:

return TRUE for every odd value that is equal to "Tom" or every even value that is equal to "Lynn".

dat$name == target 

# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE

It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE above.

To contrast, dat$name %in% target says:

for each value in dat$name, check that it exists in target.

Very different. Here is the result:

[1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE

Note your problem has nothing to do with dplyr, just the mis-use of ==.

...