連接失效問題
例子
其中,Redis常見的報錯就是:
配置項:timeout
報錯信息:Error while reading line from the server
立即學習“PHP免費學習筆記(深入)”;
Redis可以配置如果客戶端經過多少秒還不給Redis服務器發送數據,那么就會把連接close掉。
推薦學習:?swoole教程
MySQL常見的報錯:
配置項:wait_timeout & interactive_timeout
報錯信息:has gone away
和Redis服務器一樣,MySQL也會定時的去清理掉沒用的連接。
如何解決
1、用的時候進行重連
2、定時發送心跳維持連接
用的時候進行重連
優點是簡單,缺點是面臨短連接的問題。
定時發送心跳維持連接
推薦。
如何維持長連接
tcp協議中實現的tcp_keepalive
?
操作系統底層提供了一組tcp的keepalive配置:
tcp_keepalive_time?(integer;?default:?7200;?since?Linux?2.2) The?number?of?seconds?a?connection?needs?to?be?idle?before?TCP begins?sending?out?keep-alive?probes.?Keep-alives?are?sent?only when?the?SO_KEEPALIVE?socket?option?is?enabled.?The?default value?is?7200?seconds?(2?hours).?An?idle?connection?is terminated?after?approximately?an?additional?11?minutes?(9 probes?an?interval?of?75?seconds?apart)?when?keep-alive?is enabled. ? Note?that?underlying?connection?tracking?mechanisms?and application?timeouts?may?be?much?shorter. ? tcp_keepalive_intvl?(integer;?default:?75;?since?Linux?2.4) The?number?of?seconds?between?TCP?keep-alive?probes. ? tcp_keepalive_probes?(integer;?default:?9;?since?Linux?2.2) The?maximum?number?of?TCP?keep-alive?probes?to?send?before giving?up?and?killing?the?connection?if?no?response?is?obtained from?the?other?end. 8
swoole底層把這些配置開放出來了,例如:
?php ? $server?=?new?SwooleServer('127.0.0.1',?6666,?SWOOLE_PROCESS); ? $server->set([ 'worker_num'?=>?1, 'open_tcp_keepalive'?=>?1, 'tcp_keepidle'?=>?4,?//?對應tcp_keepalive_time 'tcp_keepinterval'?=>?1,?//?對應tcp_keepalive_intvl 'tcp_keepcount'?=>?5,?//?對應tcp_keepalive_probes ]);
其中:
'open_tcp_keepalive'?=>?1,?//?總開關,用來開啟tcp_keepalive 'tcp_keepidle'?=>?4,?//?4s沒有數據傳輸就進行檢測 //?檢測的策略如下: 'tcp_keepinterval'?=>?1,?//?1s探測一次,即每隔1s給客戶端發一個包(然后客戶端可能會回一個ack的包,如果服務端收到了這個ack包,那么說明這個連接是活著的) 'tcp_keepcount'?=>?5,?//?探測的次數,超過5次后客戶端還沒有回ack包,那么close此連接
?
我們來實戰測試體驗一下,服務端腳本如下:
<?php $server = new SwooleServer('127.0.0.1', 6666, SWOOLE_PROCESS); $server->set([ 'worker_num'?=>?1, 'open_tcp_keepalive'?=>?1,?//?開啟tcp_keepalive 'tcp_keepidle'?=>?4,?//?4s沒有數據傳輸就進行檢測 'tcp_keepinterval'?=>?1,?//?1s探測一次 'tcp_keepcount'?=>?5,?//?探測的次數,超過5次后還沒有回包close此連接 ]); ? $server->on('connect',?function?($server,?$fd)?{ var_dump("Client:?Connect?$fd"); }); ? $server->on('receive',?function?($server,?$fd,?$reactor_id,?$data)?{ var_dump($data); }); ? $server->on('close',?function?($server,?$fd)?{ var_dump("close?fd?$fd"); }); ? $server->start();
我們啟動這個服務器:
?~/codeDir/phpCode/hyperf-skeleton?#?php?server.php
然后通過tcpdump進行抓包:
~/codeDir/phpCode/hyperf-skeleton # tcpdump -i lo port 6666tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
我們此時正在監聽lo上的6666端口的數據包。
然后我們用客戶端去連接它:
~/codeDir/phpCode/hyperf-skeleton?#?nc?127.0.0.1?6666
此時服務端會打印出消息:
~/codeDir/phpCode/hyperf-skeleton?#?php?server.php string(17)?"Client:?Connect?1"
tcpdump的輸出信息如下:
01:48:40.178439?IP?localhost.33933?>?localhost.6666:?Flags?[S],?seq?43162537,?win?43690,?options?[mss?65495,sackOK,TS?val?9833698?ecr?0,nop,wscale?7],?length?0 01:48:40.178484?IP?localhost.6666?>?localhost.33933:?Flags?[S.],?seq?1327460565,?ack?43162538,?win?43690,?options?[mss?65495,sackOK,TS?val?9833698?ecr?9833698,nop,wscale?7],?length?0 01:48:40.178519?IP?localhost.33933?>?localhost.6666:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9833698?ecr?9833698],?length?0 01:48:44.229926?IP?localhost.6666?>?localhost.33933:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9834104?ecr?9833698],?length?0 01:48:44.229951?IP?localhost.33933?>?localhost.6666:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9834104?ecr?9833698],?length?0 01:48:44.229926?IP?localhost.6666?>?localhost.33933:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9834104?ecr?9833698],?length?0 01:48:44.229951?IP?localhost.33933?>?localhost.6666:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9834104?ecr?9833698],?length?0 01:48:44.229926?IP?localhost.6666?>?localhost.33933:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9834104?ecr?9833698],?length?0 //?省略了其他的輸出
我們會發現最開始的時候,會打印三次握手的包:
01:48:40.178439?IP?localhost.33933?>?localhost.6666:?Flags?[S],?seq?43162537,?win?43690,?options?[mss?65495,sackOK,TS?val?9833698?ecr?0,nop,wscale?7],?length?0 01:48:40.178484?IP?localhost.6666?>?localhost.33933:?Flags?[S.],?seq?1327460565,?ack?43162538,?win?43690,?options?[mss?65495,sackOK,TS?val?9833698?ecr?9833698,nop,wscale?7],?length?0 01:48:40.178519?IP?localhost.33933?>?localhost.6666:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9833698?ecr?9833698],?length?0
然后,停留了4s沒有任何包的輸出。
之后,每隔1s左右就會打印出一組:
01:52:54.359341?IP?localhost.6666?>?localhost.43101:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9859144?ecr?9858736],?length?0 ?01:52:54.359377?IP?localhost.43101?>?localhost.6666:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9859144?ecr?9855887],?length?0
其實這就是我們配置的策略:
?'tcp_keepinterval'?=>?1,?//?1s探測一次 ?'tcp_keepcount'?=>?5,?//?探測的次數,超過5次后還沒有回包close此連接
因為我們操作系統底層會自動的給客戶端回ack,所以這個連接不會在5次探測后被關閉。操作系統底層會持續不斷的發送這樣的一組包:
01:52:54.359341?IP?localhost.6666?>?localhost.43101:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9859144?ecr?9858736],?length?0 ?01:52:54.359377?IP?localhost.43101?>?localhost.6666:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?9859144?ecr?9855887],?length?0
如果我們要測試5次探測后關閉這個連接,可以禁掉6666端口的包:
?~/codeDir/phpCode/hyperf-skeleton?#?iptables?-A?INPUT?-p?tcp?--dport?6666?-j?DROP
這樣會把所有從6666端口進來的包給禁掉,自然,服務器就接收不到從客戶端那一邊發來的ack包了。
然后服務器過5秒就會打印出close(服務端主動的調用了close方法,給客戶端發送了FIN包):
?~/codeDir/phpCode/hyperf-skeleton?#?php?server.php ?string(17)?"Client:?Connect?1" ?string(10)?"close?fd?1"
我們恢復一下iptables的規則:
?~/codeDir/phpCode?#?iptables?-D?INPUT?-p?tcp?-m?tcp?--dport?6666?-j?DROP
即把我們設置的規則給刪除了。
通過tcp_keepalive的方式實現心跳的功能,優點是簡單,不要寫代碼就可以完成這個功能,并且發送的心跳包小。缺點是依賴于系統的網絡環境,必須保證服務器和客戶端都實現了這樣的功能,需要客戶端配合發心跳包。
還有一個更為嚴重的缺點是如果客戶端和服務器不是直連的,而是通過代理來進行連接的,例如socks5代理,它只會轉發應用層的包,不會轉發更為底層的tcp探測包,那這個心跳功能就失效了。
所以,Swoole就提供了其他的解決方案,一組檢測死連接的配置。
?'heartbeat_check_interval'?=>?1,?//?1s探測一次 ?'heartbeat_idle_time'?=>?5,?//?5s未發送數據包就close此連接
swoole實現的heartbeat
我們來測試一下:
<?php $server = new SwooleServer('127.0.0.1', 6666, SWOOLE_PROCESS); $server->set([ 'worker_num'?=>?1, 'heartbeat_check_interval'?=>?1,?//?1s探測一次 'heartbeat_idle_time'?=>?5,?//?5s未發送數據包就close此連接 ]); ? $server->on('connect',?function?($server,?$fd)?{ var_dump("Client:?Connect?$fd"); }); ? $server->on('receive',?function?($server,?$fd,?$reactor_id,?$data)?{ var_dump($data); }); ? $server->on('close',?function?($server,?$fd)?{ var_dump("close?fd?$fd"); }); ? $server->start();
然后啟動服務器:
?~/codeDir/phpCode/hyperf-skeleton # php server.php
然后啟動tcpdump:
?
?~/codeDir/phpCode?#?tcpdump?-i?lo?port?6666 ?tcpdump:?verbose?output?suppressed,?use?-v?or?-vv?for?full?protocol?decode ?listening?on?lo,?link-type?EN10MB?(Ethernet),?capture?size?262144?bytes
?
然后再啟動客戶端:
?~/codeDir/phpCode/hyperf-skeleton # nc 127.0.0.1 6666
?
此時服務器端打印:
~/codeDir/phpCode/hyperf-skeleton?#?php?server.php ?string(17)?"Client:?Connect?1"
?
然后tcpdump打印:
?
02:48:32.516093?IP?localhost.42123?>?localhost.6666:?Flags?[S],?seq?1088388248,?win?43690,?options?[mss?65495,sackOK,TS?val?10193342?ecr?0,nop,wscale?7],?length?0 02:48:32.516133?IP?localhost.6666?>?localhost.42123:?Flags?[S.],?seq?80508236,?ack?1088388249,?win?43690,?options?[mss?65495,sackOK,TS?val?10193342?ecr?10193342,nop,wscale?7],?length?0 02:48:32.516156?IP?localhost.42123?>?localhost.6666:?Flags?[.],?ack?1,?win?342,?options?[nop,nop,TS?val?10193342?ecr?10193342],?length?0
這是三次握手信息。
然后過了5s后,tcpdump會打印出:
?02:48:36.985027 IP localhost.6666 > localhost.42123: Flags [F.], seq 1, ack 1, win 342, options [nop,nop,TS val 10193789 ecr 10193342], length 0
?02:48:36.992172 IP localhost.42123 > localhost.6666: Flags [.], ack 2, win 342, options [nop,nop,TS val 10193790 ecr 10193789], length 0
也就是服務端發送了FIN包。因為客戶端沒有發送數據,所以Swoole關閉了連接。
然后服務器端會打印:
?~/codeDir/phpCode/hyperf-skeleton?#?php?server.php ?string(17)?"Client:?Connect?1" ?string(10)?"close?fd?1"
?
所以,heartbeat和tcp keepalive還是有一定的區別的,tcp keepalive有保活連接的功能,但是heartbeat存粹是檢測沒有數據的連接,然后關閉它,并且只可以在服務端這邊配置,如果需要保活,也可以讓客戶端配合發送心跳。
如果我們不想讓服務端close掉連接,那么就得在應用層里面不斷的發送數據包來進行保活,例如我在nc客戶端里面不斷的發送包:
~/codeDir/phpCode/hyperf-skeleton?#?nc?127.0.0.1?6666 ping ping ping ping ping ping ping ping ping
?
我發送了9個ping包給服務器,tcpdump的輸出如下:
//?省略了三次握手的包 02:57:53.697363?IP?localhost.44195?>?localhost.6666:?Flags?[P.],?seq?1:6,?ack?1,?win?342,?options?[nop,nop,TS?val?10249525?ecr?10249307],?length?5 02:57:53.697390?IP?localhost.6666?>?localhost.44195:?Flags?[.],?ack?6,?win?342,?options?[nop,nop,TS?val?10249525?ecr?10249525],?length?0 02:57:55.309532?IP?localhost.44195?>?localhost.6666:?Flags?[P.],?seq?6:11,?ack?1,?win?342,?options?[nop,nop,TS?val?10249686?ecr?10249525],?length?5 02:57:55.309576?IP?localhost.6666?>?localhost.44195:?Flags?[.],?ack?11,?win?342,?options?[nop,nop,TS?val?10249686?ecr?10249686],?length?0 02:57:58.395206?IP?localhost.44195?>?localhost.6666:?Flags?[P.],?seq?11:16,?ack?1,?win?342,?options?[nop,nop,TS?val?10249994?ecr?10249686],?length?5 02:57:58.395239?IP?localhost.6666?>?localhost.44195:?Flags?[.],?ack?16,?win?342,?options?[nop,nop,TS?val?10249994?ecr?10249994],?length?0 02:58:01.858094?IP?localhost.44195?>?localhost.6666:?Flags?[P.],?seq?16:21,?ack?1,?win?342,?options?[nop,nop,TS?val?10250341?ecr?10249994],?length?5 02:58:01.858126?IP?localhost.6666?>?localhost.44195:?Flags?[.],?ack?21,?win?342,?options?[nop,nop,TS?val?10250341?ecr?10250341],?length?0 02:58:04.132584?IP?localhost.44195?>?localhost.6666:?Flags?[P.],?seq?21:26,?ack?1,?win?342,?options?[nop,nop,TS?val?10250568?ecr?10250341],?length?5 02:58:04.132609?IP?localhost.6666?>?localhost.44195:?Flags?[.],?ack?26,?win?342,?options?[nop,nop,TS?val?10250568?ecr?10250568],?length?0 02:58:05.895704?IP?localhost.44195?>?localhost.6666:?Flags?[P.],?seq?26:31,?ack?1,?win?342,?options?[nop,nop,TS?val?10250744?ecr?10250568],?length?5 02:58:05.895728?IP?localhost.6666?>?localhost.44195:?Flags?[.],?ack?31,?win?342,?options?[nop,nop,TS?val?10250744?ecr?10250744],?length?0 02:58:07.150265?IP?localhost.44195?>?localhost.6666:?Flags?[P.],?seq?31:36,?ack?1,?win?342,?options?[nop,nop,TS?val?10250870?ecr?10250744],?length?5 02:58:07.150288?IP?localhost.6666?>?localhost.44195:?Flags?[.],?ack?36,?win?342,?options?[nop,nop,TS?val?10250870?ecr?10250870],?length?0 02:58:08.349124?IP?localhost.44195?>?localhost.6666:?Flags?[P.],?seq?36:41,?ack?1,?win?342,?options?[nop,nop,TS?val?10250990?ecr?10250870],?length?5 02:58:08.349156?IP?localhost.6666?>?localhost.44195:?Flags?[.],?ack?41,?win?342,?options?[nop,nop,TS?val?10250990?ecr?10250990],?length?0 02:58:09.906223?IP?localhost.44195?>?localhost.6666:?Flags?[P.],?seq?41:46,?ack?1,?win?342,?options?[nop,nop,TS?val?10251145?ecr?10250990],?length?5 02:58:09.906247?IP?localhost.6666?>?localhost.44195:?Flags?[.],?ack?46,?win?342,?options?[nop,nop,TS?val?10251145?ecr?10251145],?length?0
?
有9組數據包的發送。(這里的Flags [P.]代表Push的含義)
此時服務器還沒有close掉連接,實現了客戶端保活連接的功能。然后我們停止發送ping,過了5秒后tcpdump就會輸出一組:
02:58:14.811761 IP localhost.6666 > localhost.44195: Flags [F.], seq 1, ack 46, win 342, options [nop,nop,TS val 10251636 ecr 10251145], length 0
02:58:14.816420 IP localhost.44195 > localhost.6666: Flags [.], ack 2, win 342, options [nop,nop,TS val 10251637 ecr 10251636], length 0
服務端那邊發送了FIN包,說明服務端close掉了連接。服務端的輸出如下:
~/codeDir/phpCode/hyperf-skeleton?#?php?server.php string(17)?"Client:?Connect?1" string(5)?"ping " string(5)?"ping " string(5)?"ping " string(5)?"ping " string(5)?"ping " string(5)?"ping " string(5)?"ping " string(5)?"ping " string(5)?"ping " string(10)?"close?fd?1"
?
然后我們在客戶端那邊ctrl + c來關閉連接:
~/codeDir/phpCode/hyperf-skeleton?#?nc?127.0.0.1?6666 ping ping ping ping ping ping ping ping ping ^Cpunt! ~/codeDir/phpCode/hyperf-skeleton?#
?
此時,tcpdump的輸出如下:
?03:03:02.257667?IP?localhost.44195?>?localhost.6666:?Flags?[F.],?seq?46,?ack?2,?win?342,?options?[nop,nop,TS?val?10280414?ecr?10251636],?length?0 ?03:03:02.257734?IP?localhost.6666?>?localhost.44195:?Flags?[R],?seq?2678621620,?win?0,?length?0
?
應用層心跳
1、制定ping/pong協議(mysql等自帶ping協議)
2、客戶端靈活的發送ping心跳包
3、服務端OnRecive檢查可用性回復pong
例如:
$server->on('receive',?function?(SwooleServer?$server,?$fd,?$reactor_id,?$data) { if?($data?==?'ping') { checkDB(); checkServiceA(); checkRedis(); $server->send('pong'); } });
?
結論
1、tcp的keepalive最簡單,但是有兼容性問題,不夠靈活
2、swoole提供的keepalive最實用,但是需要客戶端配合,復雜度適中
3、應用層的keepalive最靈活但是最麻煩