需求:對一張用戶表根據(jù)name/email/card_num字段去除重復(fù)數(shù)據(jù);
思路:用group by方法可以查詢出’去重’后的數(shù)據(jù),將這些數(shù)據(jù)存儲到一張臨時表中,然后將臨時表的數(shù)據(jù)存儲到指定的表中;
誤區(qū)及解決方案:group by方法只能獲取部分字段(去重指定字段),不能一次獲取到完整的數(shù)據(jù),但是可以通過max函數(shù)獲取group by結(jié)果集中的id,再根據(jù)id集合查詢出全部的記錄。
測試思路
- 查詢?nèi)ブ睾蟮臄?shù)據(jù)
select max(id) as id,name,email,card_num FROM users GROUP BY name,email,card_num;
- 從去重后的數(shù)據(jù)中獲取id集合
SELECT ID from (SELECT max(id) as id,name,email,card_num FROM users ?GROUP BY name,email,card_num) as T;
- 根據(jù)去重后的數(shù)據(jù)中獲取id集合,從源數(shù)據(jù)中獲得記錄列表
SELECT * from users ?where id in (SELECT ID from (SELECT max(id) as id,name,email,card_num FROM users GROUP BY name,email,card_num) as T);
實際方法
- 根據(jù)去重后的數(shù)據(jù)中獲取id集合,從源數(shù)據(jù)中獲得記錄列表,將這些列表數(shù)據(jù)存入一個臨時表中
create TEMP table tmp_data as SELECT * from users where id in (SELECT ID from (SELECT max(id) as id,name,email,card_num FROM users GROUP BY name,email,card_num) as T);
- 將臨時表中的數(shù)據(jù)存入指定的數(shù)據(jù)表中,完畢
insert into users_copy1 select * from tmp_data;
檢測
- 檢測結(jié)果是不是和第一步查詢?nèi)ブ睾蟮臄?shù)據(jù)總數(shù)相同
select count(*) from users_copy1;
測試結(jié)果:1.4w條數(shù)據(jù)中有2300條數(shù)據(jù)重復(fù),實際運行結(jié)果為0.7s,基本滿足現(xiàn)在的需求。