mysql刪除數據時為什么不使用delete

有些表的數據量增長很快,對應sql掃描了很多無效數據,導致sql慢了下來,通過確認之后,這些大表都是一些流水、記錄、日志類型數據,只需要保留1到3個月,此時需要對表做數據清理實現瘦身。

mysql刪除數據時為什么不使用delete

這篇文章我會從InnoDB存儲空間分布,delete對性能的影響,以及優化建議方面解釋為什么不建議delete刪除數據。

InnoDB存儲架構

mysql刪除數據時為什么不使用delete

從這張圖可以看到,InnoDB存儲結構主要包括兩部分:邏輯存儲結構和物理存儲結構。

邏輯上是由表空間tablespace —> 段segment或者inode —> 區Extent ——>數據頁Page構成,Innodb邏輯管理單位是segment,空間分配的最小單位是extent,每個segment都會從表空間FREE_PAGE中分配32個page,當這32個page不夠用時,會按照以下原則進行擴展:如果當前小于1個extent,則擴展到1個extent;當表空間小于32MB時,每次擴展一個extent;表空間大于32MB,每次擴展4個extent。

物理上主要由系統用戶數據文件,日志文件組成,數據文件主要存儲mysql字典數據和用戶數據,日志文件記錄的是data page的變更記錄,用于MySQL Crash時的恢復。

Innodb表空間

InnoDB存儲包括三類表空間:系統表空間,用戶表空間,Undo表空間。

系統表空間: 主要存儲MySQL內部的數據字典數據,如information_schema下的數據。

用戶表空間: 當開啟innodb_file_per_table=1時,數據表從系統表空間獨立出來存儲在以table_name.ibd命令的數據文件中,結構信息存儲在table_name.frm文件中。

Undo表空間: 存儲Undo信息,如快照一致讀和flashback都是利用undo信息。

從MySQL 8.0開始允許用戶自定義表空間,具體語法如下:

CREATE?TABLESPACE?tablespace_name ????ADD?DATAFILE?'file_name'???????????????#數據文件名 ????USE?LOGFILE?GROUP?logfile_group????????#自定義日志文件組,一般每組2個logfile。 ????[EXTENT_SIZE?[=]?extent_size]??????????#區大小 ????[INITIAL_SIZE?[=]?initial_size]????????#初始化大小? ????[AUTOEXTEND_SIZE?[=]?autoextend_size]??#自動擴寬尺寸 ????[MAX_SIZE?[=]?max_size]????????????????#單個文件最大size,最大是32G。 ????[NODEGROUP?[=]?nodegroup_id]???????????#節點組 ????[WAIT] ????[COMMENT?[=]?comment_text] ????ENGINE?[=]?engine_name

這樣的好處是可以做到數據的冷熱分離,分別用HDD和SSD來存儲,既能實現數據的高效訪問,又能節約成本,比如可以添加兩塊500G硬盤,經過創建卷組vg,劃分邏輯卷lv,創建數據目錄并mount相應的lv,假設劃分的兩個目錄分別是/hot_data 和 /cold_data。

這樣就可以將核心的業務表如用戶表,訂單表存儲在高性能SSD盤上,一些日志,流水表存儲在普通的HDD上,主要的操作步驟如下:

#創建熱數據表空間 create?tablespace?tbs_data_hot?add?datafile?'/hot_data/tbs_data_hot01.dbf'?max_size?20G; #創建核心業務表存儲在熱數據表空間 create?table?booking(id?bigint?not?null?primary?key?auto_increment,?……?)?tablespace?tbs_data_hot; #創建冷數據表空間 create?tablespace?tbs_data_cold?add?datafile?'/hot_data/tbs_data_cold01.dbf'?max_size?20G; #創建日志,流水,備份類的表存儲在冷數據表空間 create?table?payment_log(id?bigint?not?null?primary?key?auto_increment,?……?)?tablespace?tbs_data_cold; #可以移動表到另一個表空間 alter?table?payment_log?tablespace?tbs_data_hot;

Inndob存儲分布

創建空表查看空間變化

mysql>?create?table?user(id?bigint?not?null?primary?key?auto_increment,? ????->?name?varchar(20)?not?null?default?''?comment?'姓名',? ????->?age?tinyint?not?null?default?0?comment?'age',? ????->?gender?char(1)?not?null?default?'M'??comment?'性別', ????->?phone?varchar(16)?not?null?default?''?comment?'手機號', ????->?create_time?datetime?NOT?NULL?DEFAULT?CURRENT_TIMESTAMP?COMMENT?'創建時間', ????->?update_time?datetime?NOT?NULL?DEFAULT?CURRENT_TIMESTAMP?ON?UPDATE?CURRENT_TIMESTAMP?COMMENT?'修改時間' ????->?)?engine?=?InnoDB?DEFAULT?CHARSET=utf8mb4?COMMENT?'用戶信息表'; Query?OK,?0?rows?affected?(0.26?sec)
#?ls?-lh?user1.ibd? -rw-r-----?1?mysql?mysql?96K?Nov??6?12:48?user.ibd

設置參數innodb_file_per_table=1時,創建表時會自動創建一個segment,同時分配一個extent,包含32個data page的來存儲數據,這樣創建的空表默認大小就是96KB,extent使用完之后會申請64個連接頁,這樣對于一些小表,或者undo segment,可以在開始時申請較少的空間,節省磁盤容量的開銷。

#?python2?py_innodb_page_info.py?-v?/data2/mysql/test/user.ibd page?offset?00000000,?page?type?<file> page?offset?00000001,?page?type?<insert> page?offset?00000002,?page?type?<file> page?offset?00000003,?page?type?<b-tree>,?page?level? page?offset?00000000,?page?type?<freshly> page?offset?00000000,?page?type?<freshly> Total?number?of?page:?6:??????#總共分配的頁數 Freshly?Allocated?Page:?2?????#可用的數據頁 Insert?Buffer?Bitmap:?1???????#插入緩沖頁 File?Space?Header:?1??????????#文件空間頭 B-tree?Node:?1????????????????#數據頁 File?Segment?inode:?1?????????#文件端inonde,如果是在ibdata1.ibd上會有多個inode。</freshly></freshly></b-tree></file></insert></file>

插入數據后的空間變化

mysql&gt;?DELIMITER?$$ mysql&gt;?CREATE?PROCEDURE?insert_user_data(num?INTEGER)? ????-&gt;?BEGIN ????-&gt;?????DECLARE?v_i?int?unsigned?DEFAULT?0; ????-&gt;?set?autocommit=?0; ????-&gt;?WHILE?v_i?????insert?into?user(`name`,?age,?gender,?phone)?values?(CONCAT('lyn',v_i),?mod(v_i,120),?'M',?CONCAT('152',ROUND(RAND(1)*100000000))); ????-&gt;??SET?v_i?=?v_i+1; ????-&gt;?END?WHILE; ????-&gt;?commit; ????-&gt;?END?$$ Query?OK,?0?rows?affected?(0.01?sec) mysql&gt;?DELIMITER?;  #插入10w數據 mysql&gt;?call?insert_user_data(100000); Query?OK,?0?rows?affected?(6.69?sec)
#?ls?-lh?user.ibd -rw-r-----?1?mysql?mysql?14M?Nov?6?10:58?/data2/mysql/test/user.ibd  #?python2?py_innodb_page_info.py?-v?/data2/mysql/test/user.ibd page?offset?00000000,?page?type?<file> page?offset?00000001,?page?type?<insert> page?offset?00000002,?page?type?<file> page?offset?00000003,?page?type?<b-tree>,?page?level????#增加了一個非葉子節點,樹的高度從1變為2. ........................................................ page?offset?00000000,?page?type?<freshly> Total?number?of?page:?896: Freshly?Allocated?Page:?493 Insert?Buffer?Bitmap:?1 File?Space?Header:?1 B-tree?Node:?400 File?Segment?inode:?1</freshly></b-tree></file></insert></file>

delete數據后的空間變化

mysql&gt;?select?min(id),max(id),count(*)?from?user; +---------+---------+----------+ |?min(id)?|?max(id)?|?count(*)?| +---------+---------+----------+ |???????1?|??100000?|???100000?| +---------+---------+----------+ 1?row?in?set?(0.05?sec) #刪除50000條數據,理論上空間應該從14MB變長7MB左右。 mysql&gt;?delete?from?user?limit?50000; Query?OK,?50000?rows?affected?(0.25?sec)  #數據文件大小依然是14MB,沒有縮小。 #?ls?-lh?/data2/mysql/test/user1.ibd? -rw-r-----?1?mysql?mysql?14M?Nov??6?13:22?/data2/mysql/test/user.ibd  #數據頁沒有被回收。 #?python2?py_innodb_page_info.py?-v?/data2/mysql/test/user.ibd page?offset?00000000,?page?type?<file> page?offset?00000001,?page?type?<insert> page?offset?00000002,?page?type?<file> page?offset?00000003,?page?type?<b-tree>,?page?level? ........................................................ page?offset?00000000,?page?type?<freshly> Total?number?of?page:?896: Freshly?Allocated?Page:?493 Insert?Buffer?Bitmap:?1 File?Space?Header:?1 B-tree?Node:?400 File?Segment?inode:?1 #在MySQL內部是標記刪除,</freshly></b-tree></file></insert></file>
mysql&gt;?use?information_schema;  Database?changed mysql&gt;?SELECT?A.SPACE?AS?TBL_SPACEID,?A.TABLE_ID,?A.NAME?AS?TABLE_NAME,?FILE_FORMAT,?ROW_FORMAT,?SPACE_TYPE,??B.INDEX_ID?,?B.NAME?AS?INDEX_NAME,?PAGE_NO,?B.TYPE?AS?INDEX_TYPE?FROM?INNODB_SYS_TABLES?A?LEFT?JOIN?INNODB_SYS_INDEXES?B?ON?A.TABLE_ID?=B.TABLE_ID?WHERE?A.NAME?=?'test/user1'; +-------------+----------+------------+-------------+------------+------------+----------+------------+---------+------------+ |?TBL_SPACEID?|?TABLE_ID?|?TABLE_NAME?|?FILE_FORMAT?|?ROW_FORMAT?|?SPACE_TYPE?|?INDEX_ID?|?INDEX_NAME?|?PAGE_NO?|?INDEX_TYPE?| +-------------+----------+------------+-------------+------------+------------+----------+------------+---------+------------+ |????????1283?|?????1207?|?test/user?|?Barracuda???|?Dynamic????|?Single?????|?????2236?|?PRIMARY????|???????3?|??????????3?| +-------------+----------+------------+-------------+------------+------------+----------+------------+---------+------------+ 1?row?in?set?(0.01?sec)  PAGE_NO?=?3?標識B-tree的root?page是3號頁,INDEX_TYPE?=?3是聚集索引。?INDEX_TYPE取值如下: 0?=?nonunique?secondary?index;? 1?=?automatically?generated?clustered?index?(GEN_CLUST_INDEX);? 2?=?unique?nonclustered?index;? 3?=?clustered?index;? 32?=?full-text?index; #收縮空間再后進行觀察

MySQL內部不會真正刪除空間,而且做標記刪除,即將delflag:N修改為delflag:Y,commit之后會會被purge進入刪除鏈表,如果下一次insert更大的記錄,delete之后的空間不會被重用,如果插入的記錄小于等于delete的記錄空會被重用,這塊內容可以通過知數堂的innblock工具進行分析。

Innodb中的碎片

碎片的產生

我們知道數據存儲在文件系統上的,總是不能100%利用分配給它的物理空間,刪除數據會在頁面上留下一些”空洞”,或者隨機寫入(聚集索引非線性增加)會導致頁分裂,頁分裂導致頁面的利用空間少于50%,另外對表進行增刪改會引起對應的二級索引值的隨機的增刪改,也會導致索引結構中的數據頁面上留下一些”空洞”,雖然這些空洞有可能會被重復利用,但終究會導致部分物理空間未被使用,也就是碎片。

同時,即便是設置了填充因子為100%,Innodb也會主動留下page頁面1/16的空間作為預留使用(An innodb_fill_factor setting of 100 leaves 1/16 of the space in clustered index pages free for future index growth)防止update帶來的行溢出。

mysql&gt;?select?table_schema, ????-&gt;????????table_name,ENGINE, ????-&gt;????????round(DATA_LENGTH/1024/1024+?INDEX_LENGTH/1024/1024)?total_mb,TABLE_ROWS, ????-&gt;????????round(DATA_LENGTH/1024/1024)?data_mb,?round(INDEX_LENGTH/1024/1024)?index_mb,?round(DATA_FREE/1024/1024)?free_mb,?round(DATA_FREE/DATA_LENGTH*100,2)?free_ratio ????-&gt;?from?information_schema.TABLES?where??TABLE_SCHEMA=?'test' ????-&gt;?and?TABLE_NAME=?'user'; +--------------+------------+--------+----------+------------+---------+----------+---------+------------+ |?table_schema?|?table_name?|?ENGINE?|?total_mb?|?TABLE_ROWS?|?data_mb?|?index_mb?|?free_mb?|?free_ratio?| +--------------+------------+--------+----------+------------+---------+----------+---------+------------+ |?test?????????|?user??????|?InnoDB?|????????4?|??????50000?|???????4?|????????0?|???????6?|?????149.42?| +--------------+------------+--------+----------+------------+---------+----------+---------+------------+ 1?row?in?set?(0.00?sec)

其中data_free是分配了未使用的字節數,并不能說明完全是碎片空間。

碎片的回收

對于InnoDB的表,可以通過以下命令來回收碎片,釋放空間,這個是隨機讀IO操作,會比較耗時,也會阻塞表上正常的DML運行,同時需要占用額外更多的磁盤空間,對于RDS來說,可能會導致磁盤空間瞬間爆滿,實例瞬間被鎖定,應用無法做DML操作,所以禁止在線上環境去執行。

#執行InnoDB的碎片回收 mysql&gt;?alter?table?user?engine=InnoDB; Query?OK,?0?rows?affected?(9.00?sec) Records:?0??Duplicates:?0??Warnings:?0  ##執行完之后,數據文件大小從14MB降低到10M。 #?ls?-lh?/data2/mysql/test/user1.ibd? -rw-r-----?1?mysql?mysql?10M?Nov?6?16:18?/data2/mysql/test/user.ibd
mysql&gt;?select?table_schema,???????? ????-&gt;table_name,ENGINE,???????? ????-&gt;round(DATA_LENGTH/1024/1024+?INDEX_LENGTH/1024/1024)?total_mb,TABLE_ROWS,???????? ????-&gt;round(DATA_LENGTH/1024/1024)?data_mb,? ????-&gt;round(INDEX_LENGTH/1024/1024)?index_mb,? ????-&gt;round(DATA_FREE/1024/1024)?free_mb,? ????-&gt;round(DATA_FREE/DATA_LENGTH*100,2)?free_ratio?from?information_schema.TABLES?where??TABLE_SCHEMA=?'test'?and?TABLE_NAME=?'user'; +--------------+------------+--------+----------+------------+---------+----------+---------+------------+ |?table_schema?|?table_name?|?ENGINE?|?total_mb?|?TABLE_ROWS?|?data_mb?|?index_mb?|?free_mb?|?free_ratio?| +--------------+------------+--------+----------+------------+---------+----------+---------+------------+ |?test?????????|?user??????|?InnoDB?|????????5?|??????50000?|???????5?|????????0?|???????2?|??????44.29?| +--------------+------------+--------+----------+------------+---------+----------+---------+------------+ 1?row?in?set?(0.00?sec)

delete對SQL的影響

未刪除前的SQL執行情況

#插入100W數據 mysql&gt;?call?insert_user_data(1000000); Query?OK,?0?rows?affected?(35.99?sec)  #添加相關索引 mysql&gt;?alter?table?user?add?index?idx_name(name),?add?index?idx_phone(phone); Query?OK,?0?rows?affected?(6.00?sec) Records:?0??Duplicates:?0??Warnings:?0  #表上索引統計信息 mysql&gt;?show?index?from?user; +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ |?Table?|?Non_unique?|?Key_name??|?Seq_in_index?|?Column_name?|?Collation?|?Cardinality?|?Sub_part?|?Packed?|?Null?|?Index_type?|?Comment?|?Index_comment?| +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ |?user??|??????????0?|?PRIMARY???|????????????1?|?id??????????|?A?????????|??????996757?|?????NULL?|?NULL???|??????|?BTREE??????|?????????|???????????????| |?user??|??????????1?|?idx_name??|????????????1?|?name????????|?A?????????|??????996757?|?????NULL?|?NULL???|??????|?BTREE??????|?????????|???????????????| |?user??|??????????1?|?idx_phone?|????????????1?|?phone???????|?A?????????|???????????2?|?????NULL?|?NULL???|??????|?BTREE??????|?????????|???????????????| +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3?rows?in?set?(0.00?sec)  #重置狀態變量計數 mysql&gt;?flush?status; Query?OK,?0?rows?affected?(0.00?sec)  #執行SQL語句 mysql&gt;?select?id,?age?,phone?from?user?where?name?like?'lyn12%'; +--------+-----+-------------+ |?id?????|?age?|?phone???????| +--------+-----+-------------+ |????124?|???3?|?15240540354?| |???1231?|??30?|?15240540354?| |??12301?|??60?|?15240540354?| ............................. |?129998?|??37?|?15240540354?| |?129999?|??38?|?15240540354?| |?130000?|??39?|?15240540354?| +--------+-----+-------------+ 11111?rows?in?set?(0.03?sec)  mysql&gt;?explain?select?id,?age?,phone?from?user?where?name?like?'lyn12%'; +----+-------------+-------+-------+---------------+----------+---------+------+-------+-----------------------+ |?id?|?select_type?|?table?|?type??|?possible_keys?|?key??????|?key_len?|?ref??|?rows??|?Extra?????????????????| +----+-------------+-------+-------+---------------+----------+---------+------+-------+-----------------------+ |??1?|?SIMPLE??????|?user??|?range?|?idx_name??????|?idx_name?|?82??????|?NULL?|?22226?|?Using?index?condition?| +----+-------------+-------+-------+---------------+----------+---------+------+-------+-----------------------+ 1?row?in?set?(0.00?sec)  #查看相關狀態呢變量 mysql&gt;?select?*?from?information_schema.session_status?where?variable_name?in('Last_query_cost','Handler_read_next','Innodb_pages_read','Innodb_data_reads','Innodb_pages_read'); +-------------------+----------------+ |?VARIABLE_NAME?????|?VARIABLE_VALUE?| +-------------------+----------------+ |?HANDLER_READ_NEXT?|?11111??????????|????#請求讀的行數 |?INNODB_DATA_READS?|?7868409????????|????#數據物理讀的總數 |?INNODB_PAGES_READ?|?7855239????????|????#邏輯讀的總數 |?LAST_QUERY_COST???|?10.499000??????|????#SQL語句的成本COST,主要包括IO_COST和CPU_COST。 +-------------------+----------------+ 4?rows?in?set?(0.00?sec)

刪除后的SQL執行情況

#刪除50w數據 mysql&gt;?delete?from?user?limit?500000; Query?OK,?500000?rows?affected?(3.70?sec)  #分析表統計信息 mysql&gt;?analyze?table?user; +-----------+---------+----------+----------+ |?Table?????|?Op??????|?Msg_type?|?Msg_text?| +-----------+---------+----------+----------+ |?test.user?|?analyze?|?status???|?OK???????| +-----------+---------+----------+----------+ 1?row?in?set?(0.01?sec)  #重置狀態變量計數 mysql&gt;?flush?status; Query?OK,?0?rows?affected?(0.01?sec)  mysql&gt;?select?id,?age?,phone?from?user?where?name?like?'lyn12%'; Empty?set?(0.05?sec)  mysql&gt;?explain?select?id,?age?,phone?from?user?where?name?like?'lyn12%'; +----+-------------+-------+-------+---------------+----------+---------+------+-------+-----------------------+ |?id?|?select_type?|?table?|?type??|?possible_keys?|?key??????|?key_len?|?ref??|?rows??|?Extra?????????????????| +----+-------------+-------+-------+---------------+----------+---------+------+-------+-----------------------+ |??1?|?SIMPLE??????|?user??|?range?|?idx_name??????|?idx_name?|?82??????|?NULL?|?22226?|?Using?index?condition?| +----+-------------+-------+-------+---------------+----------+---------+------+-------+-----------------------+ 1?row?in?set?(0.00?sec)  mysql&gt;?select?*?from?information_schema.session_status?where?variable_name?in('Last_query_cost','Handler_read_next','Innodb_pages_read','Innodb_data_reads','Innodb_pages_read'); +-------------------+----------------+ |?VARIABLE_NAME?????|?VARIABLE_VALUE?| +-------------------+----------------+ |?HANDLER_READ_NEXT?|?0??????????????| |?INNODB_DATA_READS?|?7868409????????| |?INNODB_PAGES_READ?|?7855239????????| |?LAST_QUERY_COST???|?10.499000??????| +-------------------+----------------+ 4?rows?in?set?(0.00?sec)

結果統計分析

操作 COST 物理讀次數 邏輯讀次數 掃描行數 返回行數 執行時間
初始化插入100W 10.499000 7868409 7855239 22226 11111 30ms
100W隨機刪除50W 10.499000 7868409 7855239 22226 0 50ms

這也說明對普通的大表,想要通過delete數據來對表進行瘦身是不現實的,所以在任何時候不要用delete去刪除數據,應該使用優雅的標記刪除。

delete優化建議

控制業務賬號權限

對于一個大的系統來說,需要根據業務特點去拆分子系統,每個子系統可以看做是一個service,例如美團APP,上面有很多服務,核心的服務有用戶服務user-service,搜索服務search-service,商品product-service,位置服務location-service,價格服務price-service等。每個服務對應一個數據庫,為該數據庫創建單獨賬號,同時只授予DML權限且沒有delete權限,同時禁止跨庫訪問。

#創建用戶數據庫并授權 create?database?mt_user?charset?utf8mb4; grant?USAGE,?SELECT,?INSERT,?UPDATE?ON?mt_user.*??to?'w_user'@'%'?identified?by?'t$W*g@gaHTGi123456'; flush?privileges;

delete改為標記刪除

在MySQL數據庫建模規范中有4個公共字段,基本上每個表必須有的,同時在create_time列要創建索引,有兩方面的好處:

  • 一些查詢業務場景都會有一個默認的時間段,比如7天或者一個月,都是通過create_time去過濾,走索引掃描更快。

  • 一些核心的業務表需要以T +1的方式抽取數據倉庫中,比如每天晚上00:30抽取前一天的數據,都是通過create_time過濾的。

`id`?bigint(20)?NOT?NULL?AUTO_INCREMENT?COMMENT?'主鍵id', `is_deleted`?tinyint(4)?NOT?NULL?DEFAULT?'0'?COMMENT?'是否邏輯刪除:0:未刪除,1:已刪除', `create_time`?timestamp?NOT?NULL?DEFAULT?CURRENT_TIMESTAMP?COMMENT?'創建時間', `update_time`?timestamp?NOT?NULL?DEFAULT?CURRENT_TIMESTAMP?ON?UPDATE?CURRENT_TIMESTAMP?COMMENT?'修改時間'  #有了刪除標記,業務接口的delete操作就可以轉換為update update?user?set?is_deleted?=?1?where?user_id?=?1213;  #查詢的時候需要帶上is_deleted過濾 select?id,?age?,phone?from?user?where?is_deleted?=?0?and?name?like?'lyn12%';

數據歸檔方式

通用數據歸檔方法

#1.?創建歸檔表,一般在原表名后面添加_bak。 CREATE?TABLE?`ota_order_bak`?( ??`id`?bigint(11)?NOT?NULL?AUTO_INCREMENT?COMMENT?'主鍵', ??`order_id`?varchar(255)?DEFAULT?NULL?COMMENT?'訂單id', ??`ota_id`?varchar(255)?DEFAULT?NULL?COMMENT?'ota', ??`check_in_date`?varchar(255)?DEFAULT?NULL?COMMENT?'入住日期', ??`check_out_date`?varchar(255)?DEFAULT?NULL?COMMENT?'離店日期', ??`hotel_id`?varchar(255)?DEFAULT?NULL?COMMENT?'酒店ID', ??`guest_name`?varchar(255)?DEFAULT?NULL?COMMENT?'顧客', ??`purcharse_time`?timestamp?NULL?DEFAULT?NULL?COMMENT?'購買時間', ??`create_time`?datetime?DEFAULT?NULL, ??`update_time`?datetime?DEFAULT?NULL, ??`create_user`?varchar(255)?DEFAULT?NULL, ??`update_user`?varchar(255)?DEFAULT?NULL, ??`status`?int(4)?DEFAULT?'1'?COMMENT?'狀態?:?1?正常?,?0?刪除', ??`hotel_name`?varchar(255)?DEFAULT?NULL, ??`price`?decimal(10,0)?DEFAULT?NULL, ??`remark`?longtext, ??PRIMARY?KEY?(`id`), ??KEY?`IDX_order_id`?(`order_id`)?USING?BTREE, ??KEY?`hotel_name`?(`hotel_name`)?USING?BTREE, ??KEY?`ota_id`?(`ota_id`)?USING?BTREE, ??KEY?`IDX_purcharse_time`?(`purcharse_time`)?USING?BTREE, ??KEY?`IDX_create_time`?(`create_time`)?USING?BTREE )?ENGINE=InnoDB?DEFAULT?CHARSET=utf8 PARTITION?BY?RANGE?(to_days(create_time))?(? PARTITION?p201808?VALUES?LESS?THAN?(to_days('2018-09-01')),? PARTITION?p201809?VALUES?LESS?THAN?(to_days('2018-10-01')),? PARTITION?p201810?VALUES?LESS?THAN?(to_days('2018-11-01')),? PARTITION?p201811?VALUES?LESS?THAN?(to_days('2018-12-01')),? PARTITION?p201812?VALUES?LESS?THAN?(to_days('2019-01-01')),? PARTITION?p201901?VALUES?LESS?THAN?(to_days('2019-02-01')),? PARTITION?p201902?VALUES?LESS?THAN?(to_days('2019-03-01')),? PARTITION?p201903?VALUES?LESS?THAN?(to_days('2019-04-01')),? PARTITION?p201904?VALUES?LESS?THAN?(to_days('2019-05-01')),? PARTITION?p201905?VALUES?LESS?THAN?(to_days('2019-06-01')),? PARTITION?p201906?VALUES?LESS?THAN?(to_days('2019-07-01')),? PARTITION?p201907?VALUES?LESS?THAN?(to_days('2019-08-01')),? PARTITION?p201908?VALUES?LESS?THAN?(to_days('2019-09-01')),? PARTITION?p201909?VALUES?LESS?THAN?(to_days('2019-10-01')),? PARTITION?p201910?VALUES?LESS?THAN?(to_days('2019-11-01')),? PARTITION?p201911?VALUES?LESS?THAN?(to_days('2019-12-01')),? PARTITION?p201912?VALUES?LESS?THAN?(to_days('2020-01-01')));  #2.?插入原表中無效的數據(需要跟開發同學確認數據保留范圍) create?table?tbl_p201808?as?select?*?from?ota_order?where?create_time?between?'2018-08-01?00:00:00'?and?'2018-08-31?23:59:59';  #3.?跟歸檔表分區做分區交換 alter?table?ota_order_bak?exchange?partition?p201808?with?table?tbl_p201808;?  #4.?刪除原表中已經規范的數據 delete?from?ota_order?where?create_time?between?'2018-08-01?00:00:00'?and?'2018-08-31?23:59:59'?limit?3000;

優化后的歸檔方式

#1.?創建中間表 CREATE?TABLE?`ota_order_2020`?(........)?ENGINE=InnoDB?DEFAULT?CHARSET=utf8 PARTITION?BY?RANGE?(to_days(create_time))?(? PARTITION?p201808?VALUES?LESS?THAN?(to_days('2018-09-01')),? PARTITION?p201809?VALUES?LESS?THAN?(to_days('2018-10-01')),? PARTITION?p201810?VALUES?LESS?THAN?(to_days('2018-11-01')),? PARTITION?p201811?VALUES?LESS?THAN?(to_days('2018-12-01')),? PARTITION?p201812?VALUES?LESS?THAN?(to_days('2019-01-01')),? PARTITION?p201901?VALUES?LESS?THAN?(to_days('2019-02-01')),? PARTITION?p201902?VALUES?LESS?THAN?(to_days('2019-03-01')),? PARTITION?p201903?VALUES?LESS?THAN?(to_days('2019-04-01')),? PARTITION?p201904?VALUES?LESS?THAN?(to_days('2019-05-01')),? PARTITION?p201905?VALUES?LESS?THAN?(to_days('2019-06-01')),? PARTITION?p201906?VALUES?LESS?THAN?(to_days('2019-07-01')),? PARTITION?p201907?VALUES?LESS?THAN?(to_days('2019-08-01')),? PARTITION?p201908?VALUES?LESS?THAN?(to_days('2019-09-01')),? PARTITION?p201909?VALUES?LESS?THAN?(to_days('2019-10-01')),? PARTITION?p201910?VALUES?LESS?THAN?(to_days('2019-11-01')),? PARTITION?p201911?VALUES?LESS?THAN?(to_days('2019-12-01')),? PARTITION?p201912?VALUES?LESS?THAN?(to_days('2020-01-01')));  #2.?插入原表中有效的數據,如果數據量在100W左右可以在業務低峰期直接插入,如果比較大,建議采用dataX來做,可以控制頻率和大小,之前我這邊用Go封裝了dataX可以實現自動生成json文件,自定義大小去執行。 insert?into?ota_order_2020?select?*?from?ota_order?where?create_time?between?'2020-08-01?00:00:00'?and?'2020-08-31?23:59:59';  #3.?表重命名 alter?table?ota_order?rename?to?ota_order_bak;?? alter?table?ota_order_2020?rename?to?ota_order; #4.?插入差異數據 insert?into?ota_order?select?*?from?ota_order_bak?a?where?not?exists?(select?1?from?ota_order?b?where?a.id?=?b.id); #5.?ota_order_bak改造成分區表,如果表比較大不建議直接改造,可以先創建好分區表,通過dataX把導入進去即可。  #6.?后續的歸檔方法 #創建中間普遍表 create?table?ota_order_mid?like?ota_order; #交換原表無效數據分區到普通表 alter?table?ota_order?exchange?partition?p201808?with?table?ota_order_mid;? ##交換普通表數據到歸檔表的相應分區 alter?table?ota_order_bak?exchange?partition?p201808?with?table?ota_order_mid;

這樣原表和歸檔表都是按月的分區表,只需要創建一個中間普通表,在業務低峰期做兩次分區交換,既可以刪除無效數據,又能回收空,而且沒有空間碎片,不會影響表上的索引及SQL的執行計劃。

總結

通過從InnoDB存儲空間分布,delete對性能的影響可以看到,delete物理刪除既不能釋放磁盤空間,而且會產生大量的碎片,導致索引頻繁分裂,影響SQL執行計劃的穩定性;

同時在碎片回收時,會耗用大量的CPU,磁盤空間,影響表上正常的DML操作。

在業務代碼層面,應該做邏輯標記刪除,避免物理刪除;為了實現數據歸檔需求,可以用采用MySQL分區表特性來實現,都是DDL操作,沒有碎片產生。

另外一個比較好的方案采用Clickhouse,對有生命周期的數據表可以使用Clickhouse存儲,利用其TTL特性實現無效數據自動清理。

相關推薦:《mysql教程

以上就是

? 版權聲明
THE END
喜歡就支持一下吧
點贊10 分享