0 votes
1 view
in R Programming by (5.1k points)
recategorized by

I would like to select a row with maximum value in each group with dplyr.

Firstly I generate some random data to show my question

set.seed(1)

df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5))

df$value <- runif(nrow(df))

In plyr, I could use a custom function to select this row.

library(plyr)

ddply(df, .(A, B), function(x) x[which.max(x$value),])

In dplyr, I am using this code to get the maximum value, but not the rows with maximum value (Column C in this case).

library(dplyr)

df %>% group_by(A, B) %>%

    summarise(max = max(value))

How could I achieve this? Thanks for any suggestion.

1 Answer

0 votes
by (23.2k points)

To select the rows with maximum values in each group with dplyr, you can do the following:

set.seed(1)

df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5))

df$value <- runif(nrow(df))

df %>% group_by(A,B) %>% slice(which.max(value))

#OR

df %>% 

  group_by(A, B) %>%

  filter(value == max(value)) %>%

  arrange(A,B,C)

Output:

# A tibble: 25 x 4

# Groups:   A, B [25]

       A     B     C value

   <int> <int> <int> <dbl>

 1     1     1     4 0.892

 2     1     2     1 0.898

 3     1     3     5 0.976

 4     1     4     2 0.821

 5     1     5     5 0.992

 6     2     1     4 0.864

 7     2     2     1 0.945

 8     2     3     2 0.794

 9     2     4     1 0.718

10     2     5     3 0.839

# ... with 15 more rows

...