



id为22953:运行环境102发送select语句到Socket代理212。数据包长度为296byte 时间:23.877515
id为22954:Socket代理212发送了一部分select语句128byte到数据库139。       时间:23.877611
id为22955:Socket代理212回运行环境103的ack。                             时间:23.917294
id为22956:数据库139回Socket代理212的ack。                               时间:23.918398
id为22957:Socket代理212发送剩余部分select语句168byte到数据库139。       时间:23.918415


18:42:52.119359 epoll_wait(7, {{EPOLLIN, {u32=38391536, u64=38391536}}}, 1024, 500) = 1 <0.000175>
18:42:52.119672 recvfrom(8, "\240\0\0\0\3SELECT cat_id, cat_name, parent_id, is_show FROM `jiewang300`.`jw_category`WHERE parent_id = '1401' AND is_show = 1 ORDER B", 128, 0, NULL, NULL) = 128 <0.000014>
18:42:52.119758 recvfrom(8, "Y sort_order ASC, cat_id ASC limit 8", 128, 0, NULL, NULL) = 36 <0.000016>
18:42:52.119823 recvfrom(8, 0x24a4494, 92, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000013>
18:42:52.119929 epoll_wait(7, {{EPOLLOUT, {u32=38394672, u64=38394672}}}, 1024, 500) = 1 <0.000022>
18:42:52.120074 sendto(9, "\240\0\0\0\3SELECT cat_id, cat_name, parent_id, is_show FROM `jiewang300`.`jw_category`WHERE parent_id = '1401' AND is_show = 1 ORDER B", 128, 0, NULL, 0) = 128 <0.000052>
18:42:52.120238 sendto(9, "Y sort_order ASC, cat_id ASC limit 8", 36, 0, NULL, 0) = 36 <0.000022>
18:42:52.120406 epoll_wait(7, {{EPOLLIN, {u32=38394672, u64=38394672}}}, 1024, 500) = 1 <0.041082>
18:42:52.161624 recvfrom(9, "\1\0\0\1\4B\0\0\2\3def\njiewang300\vjw_category\vjw_category\6cat_id\6cat_id\f?\0\5\0\0\0\2#B\0\0\0F\0\0\3\3def\njiewang300\vjw_category\vjw_category\10cat_name\10", 128, 0, NULL, NU
18:42:52.161736 recvfrom(9, "cat_name\f!\0\16\1\0\0\375\1\0\0\0\0H\0\0\4\3def\njiewang300\vjw_category\vjw_category\tparent_id\tparent_id\f?\0\5\0\0\0\2)@\0\0\0D\0\0\5\3def\njiewang300\vjw_category", 128, 0, NUL

其中fd 8是和Web运行环境的连接fd,fd 9是和数据库连接的fd,可以看到,程序接收和发送的buffer大小都是128字节,同时,根据第一列的时间可以看到,程序接收完所有数据,就立马通过sendto将数据发送出去了,所以这个40ms的数据包发送延迟应该不是代理程序的问题(不过针对这种场景,可能128字节buffer有点太小了,这也是可以优化的一个点)

不是程序的问题,那肯定就是内核或者其他什么原因导致了这个延迟了,于是搜索了一番,发现确实是内核导致了这个延迟,具体牵扯到两个TCP的机制Nagle's algorithmTCP delayed acknowledgment,最主要的原因还是因为这个Nagle's algorithm

Nagle’s algorithm is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network. It was defined by John Nagle while working for Ford Aerospace. It was published in 1984 as a Request for Comments (RFC) with title Congestion Control in IP/TCP Internetworks (see RFC 896).

The RFC describes what he called the “small-packet problem”, where an application repeatedly emits data in small chunks, frequently only 1 byte in size. Since TCP packets have a 40-byte header (20 bytes for TCP, 20 bytes for IPv4), this results in a 41-byte packet for 1 byte of useful information, a huge overhead. This situation often occurs in Telnet sessions, where most keypresses generate a single byte of data that is transmitted immediately. Worse, over slow links, many such packets can be in transit at the same time, potentially leading to congestion collapse.

Nagle’s algorithm works by combining a number of small outgoing messages and sending them all at once. Specifically, as long as there is a sent packet for which the sender has received no acknowledgment, the sender should keep buffering its output until it has a full packet’s worth of output, thus allowing output to be sent all at once.


if there is new data to send
    if the window size >= MSS and available data is >= MSS
        send complete MSS segment now
        if there is unconfirmed data sill in the pipe
            enqueue data in the buffer until an acknowledge is received
            send data immediately
        end if
    end if
end if

其实如果正常来看,这个算法没有问题,因为只要对端回复了ack,数据还是可以立即发送的,但是如果对端开启了TCP delayed acknowledgment功能,数据包的ack被延迟发送,那么,这两个功能一起作用,就会导致延迟。

那怎么避免这个情况发生呢,既然是两个机制共同作用导致的,那就任意破坏其中一个就可以了。关闭TCP delayed acknowledgment功能,或者关闭Nagle's algorithm,很显然,关闭TCP delayed acknowledgment是不明智的,因为多回复的那个ack,实际并没有很大的必要,反而还多增加了延迟。
好在TCP提供了关闭Nagle's algorithm的办法,也就是使用setsockopt设置TCP_NODELAY选项,即可关闭Nagle's algorithm


