Deleting Duplicate Rows in SQL Server
In SQL Server, deleting duplicate rows can be a challenging task, especially when dealing with large datasets. Here are a few methods to delete duplicate rows, along with examples.
Method 1: Using ROW_NUMBER() Function
This method uses the ROW_NUMBER() function to assign a unique number to each row within a partition of the result set. We can then delete the duplicate rows based on the row number.
WITH duplicates AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2,... ORDER BY column1) AS row_num FROM your_table ) DELETE FROM duplicates WHERE row_num > 1;
Example:
Let's say we have a table called Orders
with duplicate rows.
OrderID | CustomerID | OrderDate |
---|---|---|
1 | 1 | 2022-01-01 |
2 | 1 | 2022-01-01 |
3 | 2 | 2022-01-05 |
4 | 2 | 2022-01-05 |
WITH duplicates AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY CustomerID, OrderDate ORDER BY OrderID) AS row_num FROM Orders ) DELETE FROM duplicates WHERE row_num > 1;
After executing this query, the duplicate rows will be deleted, leaving only one row for each unique combination of CustomerID
and OrderDate
.
Method 2: Using Common Table Expression (CTE)
This method uses a Common Table Expression (CTE) to identify duplicate rows and then delete them.
WITH duplicates AS ( SELECT column1, column2,..., COUNT(*) OVER (PARTITION BY column1, column2,...) AS count FROM your_table ) DELETE FROM duplicates WHERE count > 1;
Example:
Using the same Orders
table as before, we can delete duplicate rows using a CTE.
WITH duplicates AS ( SELECT CustomerID, OrderDate, COUNT(*) OVER (PARTITION BY CustomerID, OrderDate) AS count FROM Orders ) DELETE FROM duplicates WHERE count > 1;
After executing this query, the duplicate rows will be deleted, leaving only one row for each unique combination of CustomerID
and OrderDate
.
Method 3: Using Self-Join
This method uses a self-join to identify duplicate rows and then delete them.
DELETE a FROM your_table a INNER JOIN ( SELECT column1, column2,... FROM your_table GROUP BY column1, column2,... HAVING COUNT(*) > 1 ) b ON a.column1 = b.column1 AND a.column2 = b.column2 AND... WHERE a.column1 < b.column1 OR (a.column1 = b.column1 AND a.column2 < b.column2);
Example:
Using the same Orders
table as before, we can delete duplicate rows using a self-join.
DELETE a FROM Orders a INNER JOIN ( SELECT CustomerID, OrderDate FROM Orders GROUP BY CustomerID, OrderDate HAVING COUNT(*) > 1 ) b ON a.CustomerID = b.CustomerID AND a.OrderDate = b.OrderDate WHERE a.OrderID < b.OrderID;
After executing this query, the duplicate rows will be deleted, leaving only one row for each unique combination of CustomerID