Select the first row by group

Question

asked Aug 17, 2019 in R Programming by Ajinkya757 (5.3k points)

From a data frame like this

test <- data.frame('id'= rep(1:5,2), 'string'= LETTERS[1:10])
test <- test[order(test$id), ]
rownames(test) <- 1:10
> test
id string
1 1 A
2 1 F
3 2 B
4 2 G
5 3 C
6 3 H
7 4 D
8 4 I
9 5 E
10 5 J

I want to create a new one with the first row of each id / string pair. If sqldf accepted R code within it, the query could look like this:

res <- sqldf("select id, min(rownames(test)), string
from test
group by id, string")
> res
id string
1 1 A
3 2 B
5 3 C
7 4 D
9 5 E

Is there a solution short of creating a new column like

test$row <- rownames(test)

and running the same sqldf query with min(row)?

1 Answer

anonymous · Answer 1 · 2019-08-17T14:56:25+0000

To select the first row by group, you can use the duplicated function while indexing a data frame as follows:

test <- data.frame('id'= rep(1:5,2), 'string'= LETTERS[1:10])
test <- test[order(test$id), ]
rownames(test) <- 1:10

test[!duplicated(test$id),]
id string
1 1 A
3 2 B
5 3 C
7 4 D
9 5 E

Select the first row by group

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources