0 votes
1 view
in R Programming by (5.1k points)

Can the mutate be used when the mutation is conditional (depending on the values of certain column values)?

This example helps to show what I mean.

structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4, 

2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6, 5, 3), d = c(6, 2, 4, 

5, 3, 7, 2, 6), e = c(1, 2, 4, 5, 6, 7, 6, 3), f = c(2, 3, 4, 

2, 2, 7, 5, 2)), .Names = c("a", "b", "c", "d", "e", "f"), row.names = c(NA, 

8L), class = "data.frame")

  a b c d e f

1 1 1 6 6 1 2

2 3 3 3 2 2 3

3 4 4 6 4 4 4

4 6 2 5 5 5 2

5 3 6 3 3 6 2

6 2 7 6 7 7 7

7 5 2 5 2 6 5

8 1 6 3 6 3 2

I was hoping to find a solution to my problem using the dplyr package (and yes I know this not code that should work, but I guess it makes the purpose clear) for creating a new column g:

library(dplyr)

 df <- mutate(df,

         if (a == 2 | a == 5 | a == 7 | (a == 1 & b == 4)){g = 2},

         if (a == 0 | a == 1 | a == 4 | a == 3 |  c == 4) {g = 3})

The result of the code I am looking for should have this result in this particular example:

  a b c d e f  g

1 1 1 6 6 1 2  3

2 3 3 3 2 2 3  3

3 4 4 6 4 4 4  3

4 6 2 5 5 5 2 NA

5 3 6 3 3 6 2 NA

6 2 7 6 7 7 7  2

7 5 2 5 2 6 5  2

8 1 6 3 6 3 2  3

Does anyone have an idea about how to do this in dplyr? This data frame is just an example, the data frames I am dealing with are much larger. Because of its speed I tried to use dplyr, but perhaps there are other, better ways to handle this problem?

1 Answer

0 votes
by (23.6k points)

You can use the case_when function from the dplyr package in the mutate function to get the desired output.

In your case:

df <- structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), 

               b = c(1, 3, 4, 2, 6, 7, 2, 6), 

               c = c(6, 3, 6, 5, 3, 6, 5, 3), 

               d = c(6, 2, 4, 5, 3, 7, 2, 6), 

               e = c(1, 2, 4, 5, 6, 7, 6, 3), 

               f = c(2, 3, 4, 2, 2, 7, 5, 2)),

          .Names = c("a", "b", "c", "d", "e", "f"), 

          row.names = c(NA, 8L), class = "data.frame")

df %>% mutate(g = case_when(a == 2 | a == 5 | a == 7 | (a == 1        & b == 4) ~ 2, a == 0 | a == 1 | a == 4 | a == 3 | c ==         4 ~ 3, TRUE ~ NA_real_))

NA has to be replaced with NA_real_  because case_when requires both conditions to be of the same type. 

...