Tuesday, July 16, 2013

Remove duplicated rows

Problem: 
Assume you have a table with structure as follows.
    Comment [id, app_id, reviewer_id,review_id ....] where id is primary key.
And you have to remove duplicated records with the same (app_id, and review_id).
Here are some solutions to fulfill your requirement.

Solution 1: [For small size of table]

 1) Find out which records are duplicated.
    SELECT id FROM (SELECT COUNT(*) AS num, id, app_id, review_id FROM Comment GROUP BY app_id,review_id) AS result WHERE result.num>2;
 NOTE: the above sql statement will list one of duplicated id;

2) Discard duplicated records
   DELETE FROM Comment WHERE id in (SELECT id FROM (SELECT COUNT(*) AS num, id, app_id, review_id FROM Comment GROUP BY app_id,review_id) AS result WHERE result.num>2)


Solution 2: [Once for all]

    ALTER IGNORE TABLE Comment ADD UNIQUE INDEX idx_name (app_id,review_id);

This solution works for me so that I did not examine its limitation in some extreme cases. 

The credit of this solution goes to http://stackoverflow.com/a/3312066 . 

No comments:

Post a Comment