# Summarizing multiple columns with dplyr?

1 view

I'm struggling a bit with the dplyr-syntax. I have a data frame with different variables and one grouping variable. Now I want to calculate the mean for each column within each group, using dplyr in R.

df <- data.frame(

a = sample(1:5, n, replace = TRUE),

b = sample(1:5, n, replace = TRUE),

c = sample(1:5, n, replace = TRUE),

d = sample(1:5, n, replace = TRUE),

grp = sample(1:3, n, replace = TRUE)

)

df %>% group_by(grp) %>% summarise(mean(a))

This gives me the mean for column "a" for each group indicated by "grp".

My question is: is it possible to get the means for each column within each group at once? Or do I have to repeat df %>% group_by(grp) %>% summarise(mean(a)) for each column?

What I would like to have is something like

df %>% group_by(grp) %>% summarise(mean(a:d)) # "mean(a:d)" does not work

closed

by (25.3k points)
edited

To summarize multiple columns, you can use the summarise_all() function in the dplyr package as follows:

library(dplyr)

df <- data.frame(

a = sample(1:5, 100, replace = TRUE),

b = sample(1:5, 100, replace = TRUE),

c = sample(1:5, 100, replace = TRUE),

d = sample(1:5, 100, replace = TRUE),

grp = sample(1:3, 100, replace = TRUE)

)

df %>% group_by(grp) %>% summarise_all(funs(mean))

df

Output:

# A tibble: 3 x 5

grp     a     b     c     d

<int> <dbl> <dbl> <dbl> <dbl>

1     1  2.95  2.87  2.79  3.18

2     2  3.09  2.44  2.62  2.97

3     3  3     3.22  3.22  2.89

If you want to summarize only certain columns, use the summarise_at  or summarise_if functions.

The basic syntax is given below:

summarise_if(.tbl, .predicate, .funs, ...)

summarise_at(.tbl, .vars, .funs, ..., .cols = NULL)

If you want to explore more in R programming then watch this R programming tutorial for beginner:

Learn R Programming with the help of this R Programming Course by Intellipaat.