# Getting the top values by group

1 view

Here's a sample data frame:

d <- data.frame(

x   = runif(90),

grp = gl(3, 30)

I want the subset of d containing the rows with the top 5 values of x for each value of grp.

Using base-R, my approach would be something like:

ordered <- d[order(d\$x, decreasing = TRUE), ]

splits <- split(ordered, ordered\$grp)

##              x grp

## 1.19 0.8879631   1

## 1.4  0.8844818   1

## 1.12 0.8596197   1

## 1.26 0.8481809   1

## 1.18 0.8461516   1

## 1.29 0.8317092   1

## 2.31 0.9751049   2

## 2.34 0.9269764   2

## 2.57 0.8964114   2

## 2.58 0.8896466   2

## 2.45 0.8888834   2

## 2.35 0.8706823   2

## 3.74 0.9884852   3

## 3.73 0.9837653   3

## 3.83 0.9375398   3

## 3.64 0.9229036   3

## 3.69 0.8021373   3

## 3.86 0.7418946   3

Using dplyr, I expected this to work:

d %>%

arrange_(~ desc(x)) %>%

group_by_(~ grp) %>%

but it only returns the overall top 5 rows.

Swapping head for top_n returns the whole of d.

d %>%

arrange_(~ desc(x)) %>%

group_by_(~ grp) %>%

top_n(n = 5)

How do I get the correct subset?

by (50.5k points)
edited by

From ?top_n, about the wt argument, the variable that is used for ordering '[...]' leads to the last variable in the tbl.

Try this:

set.seed(123)

d <- data.frame(

x   = runif(90),

grp = gl(3, 30))

d %>%

group_by(grp) %>%

top_n(n = 5, wt = x)

#            x grp

# 1  0.9404673   1

# 2  0.9568333   1

# 3  0.8998250   1

# 4  0.9545036   1

# 5  0.9942698   1

# 6  0.9630242   2

# 7  0.9022990   2

# 8  0.8578277   2

# 9  0.7989248   2

# 10 0.8950454   2

# 11 0.8146400   3

# 12 0.8123895   3

# 13 0.9849570   3

# 14 0.8930511   3

# 15 0.8864691   3

If you are interested in R certification then do check out the R programming certification