0 votes
1 view
in R Programming by (5k points)

I have a data frame with >100 columns, and I would like to find the unique rows, by comparing only two of the columns. I'm hoping this is an easy one, but I can't get it working with unique or duplicated myself.

In the below, I would like to unique only using id and id2:

data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))

id id2 somevalue

1   1         x

1   1         y

3   4         z

I would like to obtain either:

id id2 somevalue

1   1         x

3   4         z

or:

id id2 somevalue

1   1         y

3   4         z

(I have no preference which of the unique rows is kept)

1 Answer

0 votes
by (23.2k points)

To select unique rows from a data frame with only selected columns, you can use the duplicated function from the base function as follows:

dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))

dat

  id id2 somevalue

1  1   1         x

2  1   1         y

3  3   4         z

dat[!duplicated(dat[,c('id','id2')]),]

  id id2 somevalue

1  1   1         x

3  3   4         z

...