0 votes
1 view
in R Programming by (4.4k points)

Question

Using dplyr, how do I select the top and bottom observations/rows of grouped data in one statement?

Data & Example

Given a data frame

df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), 

                 stopId=c("a","b","c","a","b","c","a","b","c"), 

                 stopSequence=c(1,2,3,3,1,4,3,1,2))

I can get the top and bottom observations from each group using slice, but using two separate statements:

firstStop <- df %>%

  group_by(id) %>%

  arrange(stopSequence) %>%

  slice(1) %>%

  ungroup

lastStop <- df %>%

  group_by(id) %>%

  arrange(stopSequence) %>%

  slice(n()) %>%

  ungroup

Can I combine these two statements into one that selects both top and bottom observations?

1 Answer

0 votes
by (23.2k points)

To select the first and the last row from the grouped data, you can use the following:

library("dplyr")

df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), 

                 stopId=c("a","b","c","a","b","c","a","b","c"), 

                 stopSequence=c(1,2,3,3,1,4,3,1,2))

df %>%

  group_by(id) %>%

  arrange(stopSequence) %>%

  filter(row_number() %in% c(1, n()))

Output:

# A tibble: 6 x 3

# Groups:   id [3]

     id stopId stopSequence

  <dbl> <fct>         <dbl>

1     1 a                 1

2     2 b                 1

3     3 b                 1

4     1 c                 3

5     3 a                 3

6     2 c                 4

...