Deleting Duplicate Records in SQL Server

In SQL Server, deleting duplicate records can be a challenging task, especially when dealing with large datasets. Here are a few methods to delete duplicate records, along with examples.

Method 1: Using ROW_NUMBER() Function

This method uses the ROW_NUMBER() function to assign a unique number to each row within a partition of the result set. We can then delete the duplicate records based on the row number.

WITH duplicates AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY column1) AS row_num FROM your_table ) DELETE FROM duplicates WHERE row_num > 1; 

Example:

Let's say we have a table called Customers with duplicate records based on the Name and Email columns.

CustomerID Name Email
1 John Smith john.smith@example.com
2 John Smith john.smith@example.com
3 Jane Doe jane.doe@example.com
4 Jane Doe jane.doe@example.com
WITH duplicates AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Name, Email ORDER BY CustomerID) AS row_num FROM Customers ) DELETE FROM duplicates WHERE row_num > 1; 

After executing this query, the duplicate records will be deleted, leaving only one record for each unique combination of Name and Email.

Method 2: Using Self-Join

This method uses a self-join to identify duplicate records and then delete them.

DELETE a FROM your_table a INNER JOIN ( SELECT column1, column2, ... FROM your_table GROUP BY column1, column2, ... HAVING COUNT(*) > 1 ) b ON a.column1 = b.column1 AND a.column2 = b.column2 AND ... WHERE a.column1 < b.column1 OR (a.column1 = b.column1 AND a.column2 < b.column2); 

Example:

Using the same Customers table as before, we can delete duplicate records using a self-join.

DELETE a FROM Customers a INNER JOIN ( SELECT Name, Email FROM Customers GROUP BY Name, Email HAVING COUNT(*) > 1 ) b ON a.Name = b.Name AND a.Email = b.Email WHERE a.CustomerID < b.CustomerID; 

After executing this query, the duplicate records will be deleted, leaving only one record for each unique combination of Name and Email.