Thursday, February 22, 2007

Dealing with duplicates

A this time I'd like to show some techniques to deal with duplicates
CREATE TABLE TableWithDups
(
row_id int identity(1,1),
col1 int,
col2 int
)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (1,10)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (1,10)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (4,2)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (3,3)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (3,3)
INSERT INTO TableWithDups (col1 ,col2 ) VALUES (9,20)

PRINT 'Duplicates in Table'
SELECT * FROM TableWithDups
WHERE EXISTS
(SELECT * FROM TableWithDups TWD
WHERE TWD.row_id <> TableWithDups.row_id
AND TWD.col1 = TableWithDups.col1
AND TWD.col2 = TableWithDups.col2)

PRINT 'Duplicates to be Deleted'
SELECT * FROM TableWithDups
WHERE EXISTS
(SELECT * FROM TableWithDups TWD
WHERE TWD.row_id < TableWithDups.row_id
AND TWD.col1 = TableWithDups.col1
AND TWD.col2 = TableWithDups.col2 )


PRINT 'Executig Deletion of Duplicates '
DELETE FROM TableWithDups
WHERE EXISTS
(SELECT * FROM TableWithDups TWD
WHERE TWD.row_id < TableWithDups.row_id
AND TWD.col1 = TableWithDups.col1
AND TWD.col2 = TableWithDups.col2 )

SELECT * FROM TableWithDups

If you are using SQL Server 2005 take a look at the below script
WITH Duplicates
AS
( SELECT *, ROW_NUMBER()OVER(PARTITION BY col1 , col2 ORDER BY col1 , col2 ) AS cnt FROM TableWithDups)
DELETE FROM Duplicates WHERE cnt> 1;

Recently my good friend Doron Farber wrote great article about how efficiently remove duplicates on huge tables based on real world example.
http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx

24 comments:

JustinH said...

Hi Uri,

Surely the true test is when there is no identifier on the table (i.e. a badly designed table)?

The SQL Server 2005 method works for this but in SQL 2000 the best I've managed to come up with is using a temp table and SELECT DISTINCT - which gets a lot more complicated if the duplicate rows have foreign key references :-/

Any thoughts?

--CELKO-- permission to use my posts on websites that make a profit from advertising is denied without my written consent said...

By definition this is not a table at all -- it has no key! What this usually models is a commodity, we need a column for the quantity, which can be an absolute scale (count) or a ratio scale (units of measure).
CREATE TABLE Inventory
(item_name VARCHAR(25) NOT NULL PRIMARY KEY,
item_qty INTEGER NOT NULL
CHECK (item_qty > 0));

INSERT INTO Inventory VALUES ('No. 5 Machine Bolts', 123);
INSERT INTO Inventory VALUES ('No. 5 Hex Nuts', 127);
etc.

This sort of design comes from mimicking a paper form or other physical data source directly in SQL. It should never be allowed inthe first place.

aiya said...
This comment has been removed by a blog administrator.
Adim said...

These articles are fantastic; the information you show us is interesting for everybody and really good written. It’s just great!!

Agriculture Logo

Admin said...

Hi, I like your blog very much! It is found me so interesting and informative... Thanks very much for sharing this amazing information over here…Ready-Made Logo

Felix Smith said...

This is an excellent site, I like your blog and your writing style, thanks for sharing your knowledge, keep up the good work. Cheap Essay Writing

Felix Smith said...
This comment has been removed by the author.
Felix Smith said...

Hi, I like your blog very much! It is found me so interesting and informative. Thanks very much for sharing this amazing information over here. Essay Writing Services

Felix Smith said...

I really like this website. This is a great resources, provide you with your free distribution. It gives in depth information. Thank you for this valuable information. Paper Writing Services
Thesis Paper Writing
Research Paper Writing
Letter Writing Services

IT Services said...

Appreciation for nice Updates.I like your blog very much! It is found me so interesting and informative.Database Solutions

Doron said...

Hello Uri,

Your code is great regarding removing duplicate records. Here is an article that covers that topic in depth:

http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx

Regards,
Doron

Http://www.dFarber.Com
Remote DBA:
http://www.dfarber.com/microsoft-solutions/mssql-server/ms-sql-remote-dba-services.aspx

Doron said...

Hi Uri,

Here is an article to cover this topic at:

Remove duplicate records in sql

Regards,

Doron
The Farber Consulting Group, Inc.

Arabella Simson said...


I wanted to thank you for this great read!! I definitely enjoyed every little bit of it
Airport Taxi

jacqueline fernandez said...

such a nice work! Happy to find this great work keep it up. Thanks for shearing.
Search Strings

jacqueline fernandez said...

This was a great and interesting article to read. I found many interesting things from this site.
SEO Company

jacqueline fernandez said...

Good work by the blog writter! such a nice and informative article keep shearing more articles.
Google Adwords

sophia martin said...

Have a good day! Thanks for sharing. :0
Cheap Essay

jacqueline fernandez said...

This was a great and interesting article to read. I found many interesting things from this site.
Social Media

jacqueline fernandez said...

Good to see this blog really informative.
Conversion Optimization

sophia martin said...

Such a nice way to discuss the blog! fantastic work.
Cheap Essay

Frozen Juegos said...

Any way I'll be subscribing to your feed and I hope you post again soon.
Signature:
Jugar juego de frozen en lĂ­nea gratis, los nuevos de princesa de Disney juegos frozen - la princesa encantadora y linda. Divertirse frozen!

Facebook Descargar said...

I am very happy to read this. Appreciate your sharing.
Signature:
download descarga facebook gratis para Android celular and download free descargar facebook gratis and descargar facebook gratis , descarga facebook

Whatsapp Descargar said...

Every successful person must have a failure. Do not be afraid to fail because failure is a part of success.
Signature:
download free descargar whatsapp and download baixar whatsapp online and descargar whatsapp gratis , baixar whatsapp gratis

Croose Hackle said...

ARINET DBA Services is a Chicago, IL based organization established in 2013. Uniting more than 100+ years of combined involvement in giving quality Oracle database Support administrations to American organizations, we've given the diverse options about remote dba experts, remote dba organizations, remote dba reinforce, remote database, prophet remote, dba remote, database association, prophet dba reinforce, Oracle sponsorship and Oracle Consultants.