How To Delete Duplicate Rows in SQL Server


Duplicate records in the database are old age problem that almost every organization has to deal with that.Duplicate may appear due to reckless data input, merge records from old systems into new systems, upload lists purchased and many other reasons.Identifying these duplicate records can also be difficult. You can have many people with the first and last names.You can have a person with many addresses, emails or other identifying characteristics.Here we will follow some basic steps to find and remove duplicate records in the database.



  • The below script creates Employee table and inserts sample data into the Employee table for the demonstration.
CREATE TABLE Employee (
    id INT PRIMARY KEY identity,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL, 
    email VARCHAR(255) NOT NULL
);
 


INSERT INTO Employee (first_name,last_name,email) 
VALUES ('john ','Schmitt','john.schmitt@JCPenney.com'),
       ('Joney','dey','joney.dey@walmart.com'),
       ('Park','decosta','Park.decosta@gmail.com'),
       ('Mat ','Prior','Mat.Prior@amazon.com'),
       ('amazon','Jonas','nick.Jonas@mac.com'),
       ('Tom ','Cruise','Tom.Cruise@iphone11.com'),
       ('microsoft','Kelly','susan.Kelly@comcast.net'),
       ('Albert ','Pintos','Albert.Pintos@apple.net'),
       ('Holland','liferway','Holland.liferway@yahoo.com'),
       ('Alex','Durby','Alex.Durby@gmail.com.com'),
       ('Silvester','Stolen','Silvester.Stolen@rediff.com'),
       ('scarlett','Johnson','scarlett.Johnson@microsoft.com'),
       ('ebay','Kelly','susan.Kelly@comcast.net'),
     ('Mat ','Prior','Mat.Prior@apple.com'),
       ('JCPenney','liferway','Holland.liferway@ebay.com');

After execution of above script, we are getting the records in a table like below:-

  • Going to check the duplicate in Employee table ,The following query returns the duplicate records in the Employee table:-
SELECT 
    email, COUNT(email) AS Duplicates
FROM
    Employee
GROUP BY 
    email
HAVING 
    COUNT(email) > 1;

Here we can see, we have three rows with duplicate records.

  • Delete duplicate rows using DELETE JOIN statement
DELETE t1 FROM Employee t1
      INNER JOIN
  Employee t2 
on t1.id < t2.id AND t1.email = t2.email;

It indicated that three rows have been deleted but we can execute the query again to find duplicate records:-

 	SELECT 
    email, 
    COUNT(email)
FROM
    Employee
GROUP BY 
    email
HAVING 
    COUNT(email) > 1;

The above query returns an empty set, which means that the duplicate rows have been deleted from employee table.

  • Now verify data from the Employee table where we are able to find all unique records
SELECT 
    *
FROM
    Employee

The duplicate rows with id  4, 7, and 9 have been deleted from employee table

 


 

Leave a Reply

Your email address will not be published. Required fields are marked *