Mysql中查找并删除重复数据的方法
Mysql中查找并删除重复数据的方法
(一)单个字段1、查找表中多余的重复记录,根据(question_title)字段来判断
代码如下 | |
select * from questions where question_title in (select question_title from peoplegroup by question_title having count(question_title) > 1) |
2、删除表中多余的重复记录,根据(question_title)字段来判断,只留有一个记录
代码如下 | |
delete from questions where peopleId in (select peopleId from people group by peopleId having count(question_title) > 1) and min(id) not in (select question_id from questions group by question_title having count(question_title)>1) |
(二)多个字段
删除表中多余的重复记录(多个字段),只留有rowid最小的记录
代码如下 | |
DELETE FROM questions WHERE (questions_title,questions_scope) IN (SELECT questions_title,questions_scope FROM questions GROUP BY questions_title,questions_scope HAVING COUNT(*) > 1) AND question_id NOT IN (SELECT MIN(question_id) FROM questions GROUP BY questions_scope,questions_title HAVING COUNT(*)>1) |
用上述语句无法删除,创建了临时表才删的,求各位达人解释一下。
代码如下 | |
CREATE TABLE tmp AS SELECT question_id FROM questions WHERE (questions_title,questions_scope) IN (SELECT questions_title,questions_scope FROM questions GROUP BY questions_title,questions_scope HAVING COUNT(*) > 1) AND question_id NOT IN (SELECT MIN(question_id) FROM questions GROUP BY questions_scope,questions_title HAVING COUNT(*)>1); DELETE FROM questions WHERE question_id IN (SELECT question_id FROM tmp); DROP TABLE tmp; |
(三) 存储过程
代码如下 | |
declare @max integer,@id integer declare cur_rows cursor local for select 主字段,count(*) from 表名 group by 主字段 having count(*) >; 1 open cur_rows fetch cur_rows into @id,@max while @@fetch_status=0 begin select @max = @max -1 set rowcount @max delete from 表名 where 主字段 = @id fetch cur_rows into @id,@max end close cur_rows set rowcount 0 |
例,
数据库版本 Server version: 5.1.41-community-log MySQL Community Server (GPL)
例1,表中有主键(可唯一标识的字段),且该字段为数字类型
例1测试数据
代码如下 | |
/* 表结构 */ DROP TABLE IF EXISTS `t1`; CREATE TABLE IF NOT EXISTS `t1`( `id` INT(1) NOT NULL AUTO_INCREMENT, `name` VARCHAR(20) NOT NULL, `add` VARCHAR(20) NOT NULL, PRIMARY KEY(`id`) )Engine=InnoDB; /* 插入测试数据 */ INSERT INTO `t1`(`name`,`add`) VALUES ('abc',"123"), ('abc',"123"), ('abc',"321"), ('abc',"123"), ('xzy',"123"), ('xzy',"456"), ('xzy',"456"), ('xzy',"456"), ('xzy',"789"), ('xzy',"987"), ('xzy',"789"), ('ijk',"147"), ('ijk',"147"), ('ijk',"852"), ('opq',"852"), ('opq',"963"), ('opq',"741"), ('tpk',"741"), ('tpk',"963"), ('tpk',"963"), ('wer',"546"), ('wer',"546"), ('once',"546"); SELECT * FROM `t1`; +----+------+-----+ id | name | add | +----+------+-----+ 1 | abc | 123 | 2 | abc | 123 | 3 | abc | 321 | 4 | abc | 123 | 5 | xzy | 123 | 6 | xzy | 456 | 7 | xzy | 456 | 8 | xzy | 456 | 9 | xzy | 789 | 10 | xzy | 987 | 11 | xzy | 789 | 12 | ijk | 147 | 13 | ijk | 147 | 14 | ijk | 852 | 15 | opq | 852 | 16 | opq | 963 | 17 | opq | 741 | 18 | tpk | 741 | 19 | tpk | 963 | 20 | tpk | 963 | 21 | wer | 546 | 22 | wer | 546 | 23 | once | 546 | +----+------+-----+ rows in set (0.00 sec) |
查找id最小的重复数据(只查找id字段)
代码如下 | |
/* 查找id最小的重复数据(只查找id字段) */ SELECT DISTINCT MIN(`id`) AS `id` FROM `t1` GROUP BY `name`,`add` HAVING COUNT(1) > 1; +------+ id | +------+ 1 | 12 | 19 | 21 | 6 | 9 | +------+ rows in set (0.00 sec) |
查找所有重复数据
代码如下 | |
/* 查找所有重复数据 */ SELECT `t1`.* FROM `t1`,( SELECT `name`,`add` FROM `t1` GROUP BY `name`,`add` HAVING COUNT(1) > 1 ) AS `t2` WHERE `t1`.`name` = `t2`.`name` AND `t1`.`add` = `t2`.`add`; +----+------+-----+ |
id | name | add |
+----+------+-----+
1 | abc | 123 |
2 | abc | 123 |
4 | abc | www.111cn.net|
6 | xzy | 456 |
7 | xzy | 456 |
8 | xzy | 456 |
9 | xzy | 789 |
11 | xzy | 789 |
12 | ijk | 147 |
13 | ijk | 147 |
19 | tpk | 963 |
20 | tpk | 963 |
21 | wer | 546 |
22 | wer | 546 |
+----+------+-----+
rows in set (0.00 sec)