Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in R Programming by (7.3k points)
edited by

I'm trying to transfer my understanding of plyr into dplyr, but I can't figure out how to group by multiple columns.

# make data with weird column names that can't be hard coded

data = data.frame(

  asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE),

  a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE),

  value = rnorm(100)

)

# get the columns we want to average within

columns = names(data)[-3]

# plyr - works

ddply(data, columns, summarize, value=mean(value))

# dplyr - raises error

data %.%

  group_by(columns) %.%

  summarise(Value = mean(value))

#> Error in eval(expr, envir, enclos) : index out of bounds

What am I missing to translate the plyr example into a dplyr-esque syntax?

1 Answer

0 votes
by
edited by

You can use the group_by_at function from the dplyr package to group by multiple columns using string vector inputs.

The basic syntax is as follows:

group_by_at(.tbl, .vars, .funs = list(), ..., .add = FALSE,

  .drop = group_by_drop_default(.tbl))

Where

.tbl

A tbl object.

.funs

A function fun, a quosure style lambda ~ fun(.) or a list of either form.

...

Additional arguments for the function calls in .funs. These are evaluated only once, with tidy dots support.

.drop

When .drop = TRUE, empty groups are dropped.

.vars

A list of columns generated by vars(), a character vector of column names, a numeric vector of column positions, or NULL.

one_of(): Variables in character vector

In your case:

data = data.frame(

  asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE),

  a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE),

  value = rnorm(100)

)

columns = names(data)[-3]

data %>%

  group_by_at(vars(one_of(columns))) %>%

  summarize(MeanValue = mean(value))

Output:

# A tibble: 9 x 3

# Groups:   asihckhdoydkhxiydfgfTgdsx [3]

  asihckhdoydkhxiydfgfTgdsx a30mvxigxkghc5cdsvxvyv0ja MeanValue

  <fct>                     <fct>                         <dbl>

1 A                         A                             0.588

2 A                         B                             0.113

3 A                         C                             0.178

4 B                         A                            -0.397

5 B                         B                            -0.118

6 B                         C                            -0.461

7 C                         A                             0.117

8 C                         B                            -0.583

9 C                         C                            -0.382

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...