Deleting Duplicate Rows in SQL Server

In SQL Server, deleting duplicate rows can be a challenging task, especially when dealing with large datasets. Here are a few methods to delete duplicate rows, along with examples.

Method 1: Using ROW_NUMBER() Function

This method uses the ROW_NUMBER() function to assign a unique number to each row within a partition of the result set. We can then delete the duplicate rows based on the row number.

WITH duplicates AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2,... ORDER BY column1) AS row_num FROM your_table ) DELETE FROM duplicates WHERE row_num > 1; 

Example:

Let's say we have a table called Orders with duplicate rows.

OrderID CustomerID OrderDate
1 1 2022-01-01
2 1 2022-01-01
3 2 2022-01-05
4 2 2022-01-05
WITH duplicates AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY CustomerID, OrderDate ORDER BY OrderID) AS row_num FROM Orders ) DELETE FROM duplicates WHERE row_num > 1; 

After executing this query, the duplicate rows will be deleted, leaving only one row for each unique combination of CustomerID and OrderDate.

Method 2: Using Common Table Expression (CTE)

This method uses a Common Table Expression (CTE) to identify duplicate rows and then delete them.

WITH duplicates AS ( SELECT column1, column2,..., COUNT(*) OVER (PARTITION BY column1, column2,...) AS count FROM your_table ) DELETE FROM duplicates WHERE count > 1; 

Example:

Using the same Orders table as before, we can delete duplicate rows using a CTE.

WITH duplicates AS ( SELECT CustomerID, OrderDate, COUNT(*) OVER (PARTITION BY CustomerID, OrderDate) AS count FROM Orders ) DELETE FROM duplicates WHERE count > 1; 

After executing this query, the duplicate rows will be deleted, leaving only one row for each unique combination of CustomerID and OrderDate.

Method 3: Using Self-Join

This method uses a self-join to identify duplicate rows and then delete them.

DELETE a FROM your_table a INNER JOIN ( SELECT column1, column2,... FROM your_table GROUP BY column1, column2,... HAVING COUNT(*) > 1 ) b ON a.column1 = b.column1 AND a.column2 = b.column2 AND... WHERE a.column1 < b.column1 OR (a.column1 = b.column1 AND a.column2 < b.column2); 

Example:

Using the same Orders table as before, we can delete duplicate rows using a self-join.

DELETE a FROM Orders a INNER JOIN ( SELECT CustomerID, OrderDate FROM Orders GROUP BY CustomerID, OrderDate HAVING COUNT(*) > 1 ) b ON a.CustomerID = b.CustomerID AND a.OrderDate = b.OrderDate WHERE a.OrderID < b.OrderID; 

After executing this query, the duplicate rows will be deleted, leaving only one row for each unique combination of CustomerID