Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in R Programming by (5.3k points)

I have data.frame like this -

set.seed(123)

df = data.frame(x=sample(0:1,10,replace=T),y=sample(0:1,10,replace=T),z=1:10)

> df

   x y  z

1  0 1  1

2  1 0  2

3  0 1  3

4  1 1  4

5  1 0  5

6  0 1  6

7  1 0  7

8  1 0  8

9  1 0  9

10 0 1 10

I would like to remove duplicate rows based on the first two columns. Expected output -

df[!duplicated(df[,1:2]),]

  x y z

1 0 1 1

2 1 0 2

4 1 1 4

I am specifically looking for a solution using the dplyr package.

1 Answer

0 votes
by

You can use the distinct function from the dplyr package to remove duplicate rows as follows:

set.seed(123) 

df = data.frame(x=sample(0:1,10, replace = TRUE),y=sample(0:1,10,replace=TRUE),z=1:10)

df %>% distinct(x, y, .keep_all = TRUE)

  x y z

1 0 1 1

2 1 0 4

3 1 1 7

4 0 0 9

You can also use the filter function as follows:

df %>% group_by(x, y) %>% filter(row_number() == 1)

# A tibble: 4 x 3

# Groups:   x, y [4]

      x     y     z

  <int> <int> <int>

1     0     1     1

2     1     0     2

3     0     0     4

4     1     1     6

...