Back

Explore Courses Blog Tutorials Interview Questions
+1 vote
3 views
in SQL by (20.3k points)
edited by

It's easy to find duplicates with one field:

SELECT name, COUNT(email) 

FROM users

GROUP BY email

HAVING COUNT(email) > 1

So if we have a table

ID   NAME   EMAIL

1    John   [email protected]

2    Sam    [email protected]

3    Tom    [email protected]

4    Bob    [email protected]

5    Tom    [email protected]

This query will give us John, Sam, Tom, Tom because they all have the same email.

However, what I want is to get duplicates with the same email and name.

That is, I want to get "Tom", "Tom".

The reason I need this: I made a mistake, and allowed to insert duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.

1 Answer

+2 votes
by (40.7k points)
edited by

It’s mostly recommended to use UNIQUE Constraints to prevent the duplicate rows. However, if you find a database with duplicate rows which is caused by human error or uncleaned data from an external source, then to find duplicate values in SQL table you need to follow these steps:

Step1: 

Identify whether duplicate criteria are matching or not: First you need to decide, Are you looking for a duplicate in a single column or are you searching for a combination of two columns which can be made unique.

 Step2: To verify whether the duplicates really exist or not:

Use this Query:

SELECT user_name, email, COUNT(*)

FROM users

GROUP BY user_name, email

HAVING COUNT(*) > 1

You can group both the columns.

Note: ‘HAVING’ clause is important to use because it filters the aggregate functions.

Refer to this video to learn how to use HAVING clause in detail.

Step 3: To find all rows which contain duplicates.

 Use this Code:

SELECT a1.*

FROM users a1

JOIN (SELECT user_name, email, COUNT(*)

FROM users 

GROUP BY user_name, email

HAVING count(*) > 1 ) b1

ON a1.user_name = b.user_name

AND a1.email = b1.email

ORDER BY a1.email

Refer to this video to learn GROUP BY syntax in detail.

If you observe here, you can see that the above query is not complicated. The first SELECT clause simply selects every column in the user's table. Inner Joins combine the duplicated data table from the first query. It’s necessary to use aliases (here, we’re using a1 & b1) to label the two versions as we’re joining the table to itself.

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
asked Jul 4, 2019 in SQL by Tech4ever (20.3k points)

Browse Categories

...