Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in R Programming by (50.2k points)

I just want to duplicate the records that are based on >1 column using dplyr / tidyverse

Example

library(dplyr)

df <- data.frame(a=c(1,1,1,2,2,2), b=c(1,2,1,2,1,2), stringsAsFactors = F)

I thought the above command would return rows 3 and 6, but it returns 0 rows.

df %>% filter(duplicated(a, b))

# [1] a b

# <0 rows> (or 0-length row.names)

Conversely, I thought this would return rows 1,2,4 and 5, but it returns all rows.

df %>% filter(!duplicated(a, b))

#   a b

# 1 1 1

# 2 1 2

# 3 1 1

# 4 2 2

# 5 2 1

# 6 2 2

Am I missing something?

1 Answer

0 votes
by (108k points)

See in the R programming the "duplicated()" function basically works with a vector or a data frame or an array as you can see in the below code:

df %>%

  filter(duplicated(.))

#   a b

# 1 1 1

# 2 2 2

df %>%

  filter(!duplicated(.))

#   a b

# 1 1 1

# 2 1 2

# 3 2 2

# 4 2 1

If you prefer to reference a specific subset of columns, then you can use cbind(), refer to the following code:

df %>%

  filter(duplicated(cbind(a, b)))

Note that  the dplyr for the code can be distinct:

df %>%

  distinct(a, b, .keep_all = TRUE)

#   a b

# 1 1 1

# 2 1 2

# 3 2 2

# 4 2 1

Browse Categories

...