Remove duplicate rows based on multiple columns using dplyr / tidyverse?

Question

1 Answer

vinita · Answer 1 · 2020-03-26T07:52:46+0000

See in the R programming the "duplicated()" function basically works with a vector or a data frame or an array as you can see in the below code:

df %>%
filter(duplicated(.))
# a b
# 1 1 1
# 2 2 2
df %>%
filter(!duplicated(.))
# a b
# 1 1 1
# 2 1 2
# 3 2 2
# 4 2 1

If you prefer to reference a specific subset of columns, then you can use cbind(), refer to the following code:

df %>%
filter(duplicated(cbind(a, b)))

Note that the dplyr for the code can be distinct:

df %>%
distinct(a, b, .keep_all = TRUE)
# a b
# 1 1 1
# 2 1 2
# 3 2 2
# 4 2 1