1 view

Suppose I want to calculate the proportion of different values within each group. For example, using the mtcars data, how do I calculate the relative frequency of the number of gears by am (automatic/manual) in one go with dplyr?

library(dplyr)

data(mtcars)

mtcars <- tbl_df(mtcars)

# count frequency

mtcars %>%

group_by(am, gear) %>%

summarise(n = n())

# am gear  n

#  0    3 15

#  0    4  4

#  1    4  8

#  1    5  5

What I would like to achieve:

am gear  n rel.freq

0    3 15      0.7894737

0    4  4      0.2105263

1    4  8      0.6153846

1    5  5      0.3846154

by

To print relative proportions, you can add another column that calculates relative frequencies using the mutate function from the dplyr package  as follows:

library(dplyr)

data(mtcars)

mtcars <- tbl_df(mtcars)

mtcars %>%

group_by(am, gear) %>%

summarise (n = n()) %>%

mutate(freq = n / sum(n))

Output:

# A tibble: 4 x 4

# Groups:   am 

am  gear     n  freq

<dbl> <dbl> <int> <dbl>

1     0     3    15 0.789

2     0     4     4 0.211

3     1     4     8 0.615

4     1     5     5 0.385

To print the relative percentages:

mtcars %>%

group_by(am, gear) %>%

summarise (n = n()) %>%

mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))

Output:

# A tibble: 4 x 4

# Groups:   am 

am  gear     n rel.freq

<dbl> <dbl> <int> <chr>

1     0     3    15 79%

2     0     4     4 21%

3     1     4     8 62%

4     1     5     5 38%