A this time I'd like to show some techniques to deal with duplicates
CREATE TABLE TableWithDups
(
row_id int identity(1,1),
col1 int,
col2 int
)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (1,10)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (1,10)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (4,2)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (3,3)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (3,3)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (9,20)
PRINT 'Duplicates in Table'
SELECT * FROM TableWithDups
WHERE EXISTS
(SELECT * FROM TableWithDups TWD
WHERE TWD.row_id <> TableWithDups.row_id
AND TWD.col1 = TableWithDups.col1
AND TWD.col2 = TableWithDups.col2)
PRINT 'Duplicates to be Deleted'
SELECT * FROM TableWithDups
WHERE EXISTS
(SELECT * FROM TableWithDups TWD
WHERE TWD.row_id < TableWithDups.row_id
AND TWD.col1 = TableWithDups.col1
AND TWD.col2 = TableWithDups.col2 )
PRINT 'Executig Deletion of Duplicates '
DELETE FROM TableWithDups
WHERE EXISTS
(SELECT * FROM TableWithDups TWD
WHERE TWD.row_id < TableWithDups.row_id
AND TWD.col1 = TableWithDups.col1
AND TWD.col2 = TableWithDups.col2 )
SELECT * FROM TableWithDups
If you are using SQL Server 2005 take a look at the below script
WITH Duplicates
AS
( SELECT *, ROW_NUMBER()OVER(PARTITION BY col1 , col2 ORDER BY col1 , col2 ) AS cnt FROM TableWithDups)
DELETE FROM Duplicates WHERE cnt> 1;
Recently my good friend Doron Farber wrote great article about how efficiently remove duplicates on huge tables based on real world example.
http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx
Thursday, February 22, 2007
Subscribe to:
Post Comments (Atom)
8 comments:
Hi Uri,
Surely the true test is when there is no identifier on the table (i.e. a badly designed table)?
The SQL Server 2005 method works for this but in SQL 2000 the best I've managed to come up with is using a temp table and SELECT DISTINCT - which gets a lot more complicated if the duplicate rows have foreign key references :-/
Any thoughts?
By definition this is not a table at all -- it has no key! What this usually models is a commodity, we need a column for the quantity, which can be an absolute scale (count) or a ratio scale (units of measure).
CREATE TABLE Inventory
(item_name VARCHAR(25) NOT NULL PRIMARY KEY,
item_qty INTEGER NOT NULL
CHECK (item_qty > 0));
INSERT INTO Inventory VALUES ('No. 5 Machine Bolts', 123);
INSERT INTO Inventory VALUES ('No. 5 Hex Nuts', 127);
etc.
This sort of design comes from mimicking a paper form or other physical data source directly in SQL. It should never be allowed inthe first place.
These articles are fantastic; the information you show us is interesting for everybody and really good written. It’s just great!!
Agriculture Logo
Hi, I like your blog very much! It is found me so interesting and informative... Thanks very much for sharing this amazing information over here…Ready-Made Logo
This is an excellent site, I like your blog and your writing style, thanks for sharing your knowledge, keep up the good work. Cheap Essay Writing
I really like this website. This is a great resources, provide you with your free distribution. It gives in depth information. Thank you for this valuable information. Thesis Paper Writing
Hi, I like your blog very much! It is found me so interesting and informative. Thanks very much for sharing this amazing information over here. Essay Writing Services
Post a Comment