Back

Explore Courses Blog Tutorials Interview Questions
0 votes
18 views
in R Programming by (5.3k points)

I'm struggling a bit with the dplyr-syntax. I have a data frame with different variables and one grouping variable. Now I want to calculate the mean for each column within each group, using dplyr in R.

df <- data.frame(

    a = sample(1:5, n, replace = TRUE), 

    b = sample(1:5, n, replace = TRUE), 

    c = sample(1:5, n, replace = TRUE), 

    d = sample(1:5, n, replace = TRUE), 

    grp = sample(1:3, n, replace = TRUE)

)

df %>% group_by(grp) %>% summarise(mean(a))

This gives me the mean for column "a" for each group indicated by "grp".

My question is: is it possible to get the means for each column within each group at once? Or do I have to repeat df %>% group_by(grp) %>% summarise(mean(a)) for each column?

What I would like to have is something like

df %>% group_by(grp) %>% summarise(mean(a:d)) # "mean(a:d)" does not work

closed

1 Answer

0 votes
by
edited
 
Best answer

To summarize multiple columns, you can use the summarise_all() function in the dplyr package as follows:

library(dplyr)

df <- data.frame(

  a = sample(1:5, 100, replace = TRUE), 

  b = sample(1:5, 100, replace = TRUE), 

  c = sample(1:5, 100, replace = TRUE), 

  d = sample(1:5, 100, replace = TRUE), 

  grp = sample(1:3, 100, replace = TRUE)

)

df %>% group_by(grp) %>% summarise_all(funs(mean))

df

Output:

# A tibble: 3 x 5

    grp     a     b     c     d

  <int> <dbl> <dbl> <dbl> <dbl>

1     1  2.95  2.87  2.79  3.18

2     2  3.09  2.44  2.62  2.97

3     3  3     3.22  3.22  2.89

If you want to summarize only certain columns, use the summarise_at  or summarise_if functions.

The basic syntax is given below:

summarise_if(.tbl, .predicate, .funs, ...)

summarise_at(.tbl, .vars, .funs, ..., .cols = NULL)

If you want to explore more in R programming then watch this R programming tutorial for beginner:

Learn R Programming with the help of this R Programming Course by Intellipaat.

Browse Categories

...