# How to select the rows with maximum values in each group with dplyr

1 view

recategorized

I would like to select a row with maximum value in each group with dplyr.

Firstly I generate some random data to show my question

set.seed(1)

df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5))

df\$value <- runif(nrow(df))

In plyr, I could use a custom function to select this row.

library(plyr)

ddply(df, .(A, B), function(x) x[which.max(x\$value),])

In dplyr, I am using this code to get the maximum value, but not the rows with maximum value (Column C in this case).

library(dplyr)

df %>% group_by(A, B) %>%

summarise(max = max(value))

How could I achieve this? Thanks for any suggestion.

by (25.4k points)

To select the rows with maximum values in each group with dplyr, you can do the following:

set.seed(1)

df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5))

df\$value <- runif(nrow(df))

df %>% group_by(A,B) %>% slice(which.max(value))

#OR

df %>%

group_by(A, B) %>%

filter(value == max(value)) %>%

arrange(A,B,C)

Output:

# A tibble: 25 x 4

# Groups:   A, B 

A     B     C value

<int> <int> <int> <dbl>

1     1     1     4 0.892

2     1     2     1 0.898

3     1     3     5 0.976

4     1     4     2 0.821

5     1     5     5 0.992

6     2     1     4 0.864

7     2     2     1 0.945

8     2     3     2 0.794

9     2     4     1 0.718

10     2     5     3 0.839

# ... with 15 more rows