0 votes
1 view
in R Programming by (5k points)

Here's a sample data frame:

d <- data.frame(

  x   = runif(90),

  grp = gl(3, 30)

I want the subset of d containing the rows with the top 5 values of x for each value of grp.

Using base-R, my approach would be something like:

ordered <- d[order(d$x, decreasing = TRUE), ]    

splits <- split(ordered, ordered$grp)

heads <- lapply(splits, head)

do.call(rbind, heads)

##              x grp

## 1.19 0.8879631   1

## 1.4  0.8844818   1

## 1.12 0.8596197   1

## 1.26 0.8481809   1

## 1.18 0.8461516   1

## 1.29 0.8317092   1

## 2.31 0.9751049   2

## 2.34 0.9269764   2

## 2.57 0.8964114   2

## 2.58 0.8896466   2

## 2.45 0.8888834   2

## 2.35 0.8706823   2

## 3.74 0.9884852   3

## 3.73 0.9837653   3

## 3.83 0.9375398   3

## 3.64 0.9229036   3

## 3.69 0.8021373   3

## 3.86 0.7418946   3

Using dplyr, I expected this to work:

d %>%

  arrange_(~ desc(x)) %>%

  group_by_(~ grp) %>%

  head(n = 5)

but it only returns the overall top 5 rows.

Swapping head for top_n returns the whole of d.

d %>%

  arrange_(~ desc(x)) %>%

  group_by_(~ grp) %>%

  top_n(n = 5)

How do I get the correct subset?

Please log in or register to answer this question.

...