Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in R Programming by (50.2k points)
edited by

How can I remove duplicated rows in R?

1 Answer

0 votes
by (108k points)

The function distinct() in the dplyr package performs arbitrary duplicate removal

Data:

dt <- data.frame(m = rep(c(1,2),4), n = rep(LETTERS[1:4],2))

Remove rows where specified columns have been duplicated:

library(dplyr)

dat %>% distinct(m, .keep_all = TRUE)

  m n

1 1 A

2 2 B

Remove rows which are complete duplicates of other rows:

dat %>% distinct

  m n

1 1 A

2 2 B

3 1 C

4 2 D

The general answer for duplicate row removal:

m <- c(rep("A", 3), rep("B", 3), rep("C",2))

n <- c(1,1,2,4,1,1,2,2)

df <-data.frame(m,n)

duplicated(df)

[1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE

df[duplicated(df), ]

  m n

2 A 1

6 B 1

8 C 2

df[!duplicated(df), ]

  m n

1 A 1

3 A 2

4 B 4

5 B 1

7 C 2

Related questions

0 votes
1 answer
asked Jul 4, 2019 in SQL by Tech4ever (20.3k points)
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...