Thursday, February 22, 2007

Dealing with duplicates

A this time I'd like to show some techniques to deal with duplicates
CREATE TABLE TableWithDups
(
row_id int identity(1,1),
col1 int,
col2 int
)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (1,10)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (1,10)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (4,2)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (3,3)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (3,3)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (9,20)

PRINT 'Duplicates in Table'
SELECT * FROM TableWithDups
WHERE EXISTS
(SELECT * FROM TableWithDups TWD
WHERE TWD.row_id <> TableWithDups.row_id
AND TWD.col1 = TableWithDups.col1
AND TWD.col2 = TableWithDups.col2)

PRINT 'Duplicates to be Deleted'
SELECT * FROM TableWithDups
WHERE EXISTS
(SELECT * FROM TableWithDups TWD
WHERE TWD.row_id < TableWithDups.row_id
AND TWD.col1 = TableWithDups.col1
AND TWD.col2 = TableWithDups.col2 )


PRINT 'Executig Deletion of Duplicates '
DELETE FROM TableWithDups
WHERE EXISTS
(SELECT * FROM TableWithDups TWD
WHERE TWD.row_id < TableWithDups.row_id
AND TWD.col1 = TableWithDups.col1
AND TWD.col2 = TableWithDups.col2 )

SELECT * FROM TableWithDups

If you are using SQL Server 2005 take a look at the below script
WITH Duplicates
AS
( SELECT *, ROW_NUMBER()OVER(PARTITION BY col1 , col2 ORDER BY col1 , col2 ) AS cnt FROM TableWithDups)
DELETE FROM Duplicates WHERE cnt> 1;

Recently my good friend Doron Farber wrote great article about how efficiently remove duplicates on huge tables based on real world example.
http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx

19 comments:

Unknown said...

Hi Uri,

Surely the true test is when there is no identifier on the table (i.e. a badly designed table)?

The SQL Server 2005 method works for this but in SQL 2000 the best I've managed to come up with is using a temp table and SELECT DISTINCT - which gets a lot more complicated if the duplicate rows have foreign key references :-/

Any thoughts?

--CELKO-- said...

By definition this is not a table at all -- it has no key! What this usually models is a commodity, we need a column for the quantity, which can be an absolute scale (count) or a ratio scale (units of measure).
CREATE TABLE Inventory
(item_name VARCHAR(25) NOT NULL PRIMARY KEY,
item_qty INTEGER NOT NULL
CHECK (item_qty > 0));

INSERT INTO Inventory VALUES ('No. 5 Machine Bolts', 123);
INSERT INTO Inventory VALUES ('No. 5 Hex Nuts', 127);
etc.

This sort of design comes from mimicking a paper form or other physical data source directly in SQL. It should never be allowed inthe first place.

aiya said...
This comment has been removed by a blog administrator.
Knox Karter said...

These articles are fantastic; the information you show us is interesting for everybody and really good written. It’s just great!!

Agriculture Logo

Fabian Smith said...

Hi, I like your blog very much! It is found me so interesting and informative... Thanks very much for sharing this amazing information over here…Ready-Made Logo

Unknown said...

This is an excellent site, I like your blog and your writing style, thanks for sharing your knowledge, keep up the good work. Cheap Essay Writing

Unknown said...
This comment has been removed by the author.
Unknown said...

Hi, I like your blog very much! It is found me so interesting and informative. Thanks very much for sharing this amazing information over here. Essay Writing Services

Unknown said...

I really like this website. This is a great resources, provide you with your free distribution. It gives in depth information. Thank you for this valuable information. Paper Writing Services
Thesis Paper Writing
Research Paper Writing
Letter Writing Services

IT Services said...

Appreciation for nice Updates.I like your blog very much! It is found me so interesting and informative.Database Solutions

Doron said...

Hello Uri,

Your code is great regarding removing duplicate records. Here is an article that covers that topic in depth:

http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx

Regards,
Doron

Http://www.dFarber.Com
Remote DBA:
http://www.dfarber.com/microsoft-solutions/mssql-server/ms-sql-remote-dba-services.aspx

Doron said...

Hi Uri,

Here is an article to cover this topic at:

Remove duplicate records in sql

Regards,

Doron
The Farber Consulting Group, Inc.

Unknown said...

such a nice work! Happy to find this great work keep it up. Thanks for shearing.
Search Strings

Unknown said...

This was a great and interesting article to read. I found many interesting things from this site.
SEO Company

Unknown said...

Good work by the blog writter! such a nice and informative article keep shearing more articles.
Google Adwords

Unknown said...

Have a good day! Thanks for sharing. :0
Cheap Essay

Unknown said...

This was a great and interesting article to read. I found many interesting things from this site.
Social Media

Unknown said...

Good to see this blog really informative.
Conversion Optimization

Herry Johnson said...

ARINET DBA Services is a Chicago, IL based organization established in 2013. Uniting more than 100+ years of combined involvement in giving quality Oracle database Support administrations to American organizations, we've given the diverse options about remote dba experts, remote dba organizations, remote dba reinforce, remote database, prophet remote, dba remote, database association, prophet dba reinforce, Oracle sponsorship and Oracle Consultants.