Observe the TCP handshake with tcpdump by GDB the code line by line
- Compile
./tcpdump_demo/build.sh
- Monitoring TCP port with tcpdump
sudo tcpdump -i any tcp port 12345
- GDB server
gdb ./debug/server
Ctrl + x + a
into TUI mode,start
to main and the line ofint main()
will be highlighted- input
next
orn
, TUI will highlight the next line of code, press Enter to callbind
andlisten
int listen_fd = jc::ListenFd("0.0.0.0", 12345);
- The status of the TCP port will change to
LISTEN
netstat -antp | grep 12345
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 65639/server
- GDB client in another shell
gdb ./debug/client
Ctrl + x + a
andstart
,n
the following line of code to callconnect
int fd = jc::Connect("0.0.0.0", 12345);
- tcpdump will display the follwing log
11:58:17.059466 IP localhost.56234 > localhost.12345: Flags [S], seq 2271595046, win 65495, options [mss 65495,sackOK,TS val 988640603 ecr 0,nop,wscale 7], length 0
11:58:17.068493 IP localhost.12345 > localhost.56234: Flags [S.], seq 2153659986, ack 2271595047, win 65483, options [mss 65495,sackOK,TS val 988640612 ecr 988640603,nop,wscale 7], length 0
11:58:17.068515 IP localhost.56234 > localhost.12345: Flags [.], ack 1, win 512, options [nop,nop,TS val 988640613 ecr 988640612], length 0
- The first column is the time, accurate to microseconds, the left side of
>
is the sender, the right side is the receiver, andwin
represents the window size (16 bits, maximum 65535). For ease of analysis, extract the key parts
localhost.56234 > localhost.12345: Flags [S], seq 2271595046
localhost.12345 > localhost.56234: Flags [S.], seq 2153659986, ack 2271595047
localhost.56234 > localhost.12345: Flags [.], ack 1
- A pair of sockets uniquely identifies a TCP connection on the network. It is a four-tuple, including the local IP address and TCP port number, destination IP address and TCP port number.
- The port number is 16 bits, ranging from 0 to 65535, and can be divided into three segments:
- 0-1023:Well-known port, also a Unix reserved port, is a socket that can only be assigned to privileged user processes and is allocated and controlled by IANA (The Internet Assigned Numbers Authority). The same port number will be assigned to the same service, such as SSH port 22 and HTTP port 80
- 1024-49151:registered port, which are not controlled by IANA, registered by IANA and provide a list of their usage
- 49152-65535:dynamic or private, temporary port, 49152=65536 * 0.75
- The captured TCP packet reflects the three-way handshake to establish a TCP connection.
S
representsSYN
,.
representsACK
, and the process is as follows- 1st handshake: After the server calls
bind
andlisten
, it is in theLISTEN
state. The client callsconnect
to send aSYN
to the server and enters theSYN_SENT
state. TheSYN
sequence number is2271595046
. This sequence number is a randomly generated ISN (Initial Sequence Number) that changes over time - 2nd handshake: After receiving
SYN
, the server sends aSYN + ACK
to the client, and the state changes fromLISTEN
toSYN_RCVD
. The sequence number of thisSYN
is also random, and the sequence number ofACK
is the sequence number of the SYN sent by the client plus 1 - 3rd handshake: After the client receives the server’s
SYN + ACK
, it sends anACK
to the server, and the status changes fromSYN_SENT
toESTABLISHED
. Similarly, the sequence number ofACK
is the sequence number ofSYN
sent by the server plus 1. It is displayed here as 1 because tcpdump uses relative sequence numbers by default. If there is noSYN
, the increment relative to itsSYN
is displayed - Finally, the server receives
ACK
, and the state changes fromSYN_RCVD
toESTABLISHED
. Both parties enter theESTABLISHED
state, and the TCP connection is established - Except for the first active connection initiation of the three-way handshake, which sends
SYN
andACK
is set to 0, theACK
of all other TCP packets is set to 1, because the 32-bit confirmation number itself is part of the TCP header, and settingACK
to 1 just makes use of this part, without any additional cost - Each
SYN
can contain multiple TCP options. Commonly used TCP options are as follows (the latter two are called RFC 1323 options, also called long fat pipe options, because high bandwidth or long delay networks are called long fat pipes)- MSS (maximum segment size): 16 bits, with a maximum value of 65535. The end sending
SYN
notifies the other end of its maximum segment size. The Ethernet MTU is 1500 bytes. If the transmitted IP data packet is larger than the MTU, IP fragmentation is performed, and the fragments will not be reassembled before reaching the destination. The purpose of MSS is to avoid fragmentation. It is usually set to the fixed length of MTU minus IP and TCP headers. The MSS of IPv4 in Ethernet is 1500 - 20 (TCP header) - 20 (IPv4 header) = 1460. The IPv6 header is 40 bytes, corresponding to an MSS of 1440 - Window size: 16 bits, with a maximum value of 65535. Nowadays, in order to obtain greater throughput, larger windows are required. This option specifies the number of left shift bits (0-14), and the maximum window provided is close to 1GB (65536 * 2 ^ 14). The premise of using this option is that both end systems must support this option. The option is affected by the
SO_RVCBUF
socket option - Timestamps: prevents undetected corruption of the data caused by recurring packets. Programmers do not need to consider this option
- MSS (maximum segment size): 16 bits, with a maximum value of 65535. The end sending
- 1st handshake: After the server calls
- The server and client enter the
ESTABLISHED
state
netstat -antp | grep 12345
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 65639/server
tcp 0 0 127.0.0.1:56234 127.0.0.1:12345 ESTABLISHED 65633/client
tcp 0 0 127.0.0.1:12345 127.0.0.1:56234 ESTABLISHED 65639/server
- Continue
n
in the server.accept
is used to take the first completed connection from theESTABLISHED
queue and return the file descriptor of the new connection. If the queue is empty, the process is blocked
int accept_fd = jc::AcceptFd(listen_fd);
- Since the client and server have established a connection,
accept
will return directly without blocking. Continuen
with the following code to callsend
jc::Send(accept_fd, "welcome to join");
- tcpdump log
11:59:19.715745 IP localhost.12345 > localhost.56234: Flags [P.], seq 1:16, ack 1, win 512, options [nop,nop,TS val 988703260 ecr 988640613], length 15
11:59:19.715757 IP localhost.56234 > localhost.12345: Flags [.], ack 16, win 512, options [nop,nop,TS val 988703260 ecr 988703260], length 0
P
stands forPSH
. When set to 1, data is sent. 16 is the data length plus 1 (the length ofwelcome to join
is 15). ack is the offset. If it is sent next time, it will be 16 (the last ack + length). The ack replied by the client is the value after the colon of the sequence number. Note that these TCP packets are only generated by the server callingsend
. At this time, there is no receiver to receive the message. It can be seen thatsend
does not really send the data to the other party- Each TCP socket has a send buffer, the size of which can be changed using the
SO_SNDBUF
socket option. When an application callssend
, the kernel copies all data from the application’s buffer to the send buffer of the written socket. If the send buffer cannot hold the data, the application will be blocked until all data in the application buffer is copied to the send buffer. The return value ofsend
is the length of the data put into the buffer. If it is inconsistent with the length specified in the parameter, it means that the data was not fully put in, which will result in data loss. Aftersend
returns, it only means that the original application buffer can be reused, and it does not mean that the other end has received the data - Kernel parameters for
Ubuntu 18.04.5 LTS
sysctl -a | grep "net.ipv4.tcp_.*mem"
net.ipv4.tcp_mem = 22062 29417 44124
net.ipv4.tcp_rmem = 4096 131072 6291456 # recv buf is 6.29 MB
net.ipv4.tcp_wmem = 4096 16384 4194304 # send buf is 4.19 MB
- In the client,
n
to callrecv
to get data from the buffer. TCP is a stream-oriented protocol, so it does not guarantee that the data of eachsend
andrecv
will correspond. In order to reduce the number of packets sent, the sender can use the Nagel algorithm. In network programming, both parties need to specify the application layer protocol to parse the data
jc::PrintReceiveMessage(fd);
n
the client to close the connection
::close(fd);
- tcpdump log
12:00:42.379997 IP localhost.56234 > localhost.12345: Flags [F.], seq 1, ack 16, win 512, options [nop,nop,TS val 988785924 ecr 988703260], length 0
12:00:42.383661 IP localhost.12345 > localhost.56234: Flags [.], ack 2, win 512, options [nop,nop,TS val 988785928 ecr 988785924], length 0
- These are the first two of four waves to disconnect the TCP connection, where
F
stands forFIN
- 1st wave: client calls
close
to sendFIN + ACK
to server and entersFIN_WAIT_1
state - 2nd wave: After receiving
FIN + ACK
, the server sends anACK
and enters theCLOSE_WAIT
state. This state allows the server to continue sending unfinished data. Because there may be unfinished data, the disconnection process takes one more time than the establishment of the connection. After the client receivesACK
, the state changes fromFIN_WAIT_1
toFIN_WAIT_2
- 1st wave: client calls
- The client enters the
FIN_WAIT2
state, and the server enters theCLOSE_WAIT
state
netstat -antp | grep 12345
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 65639/server
tcp 0 0 127.0.0.1:56234 127.0.0.1:12345 FIN_WAIT2 -
tcp 1 0 127.0.0.1:12345 127.0.0.1:56234 CLOSE_WAIT 65639/server
n
the server to close the connection
::close(accept_fd);
- tcpdump log
12:01:24.794975 IP localhost.12345 > localhost.56234: Flags [F.], seq 16, ack 2, win 512, options [nop,nop,TS val 988828339 ecr 988785924], length 0
12:01:24.794993 IP localhost.56234 > localhost.12345: Flags [.], ack 17, win 512, options [nop,nop,TS val 988828339 ecr 988828339], length 0
- These are the last two waves of the four waves that close the TCP connection
- 3rd wave: the server calls
close
to sendFIN + ACK
to the client, and the state changes fromCLOSE_WAIT
toLAST_ACK
- 4th wave: After receiving
FIN + ACK
, the client sends anACK
, and the state changes fromFIN_WAIT_2
toTIME_WAIT
. After receivingACK
, the server changes its state fromLAST_ACK
toCLOSED
. After2MSL
(Maximum Segment Lifetime), the client state changes fromTIME_WAIT
toCLOSED
- The role of
TIME_WAIT
- Reliably terminate TCP full-duplex connections. If the server does not receive the last
ACK
sent by the client, it will resendFIN + ACK
, and the client will resendACK
and restart the2MSL
timer - Allow old duplicate segments to disappear in the network. For example, if a connection is closed, another connection is established between the same IP and port soon.
TIME_WAIT
can prevent this from happening. For ports in theTIME_WAIT
state, callingbind
will fail, so theSO_REUSEADDR
option is generally set when creating a socket, so that the port can bebind
even if it is in theTIME_WAIT
state
- Reliably terminate TCP full-duplex connections. If the server does not receive the last
- 3rd wave: the server calls
- Any
TCP
implementation must choose a value for MSL. RFC 793 stipulates that MSL is 2 minutes (For this specification the MSL is taken to be 2 minutes
).Ubuntu 18.04.5 LTS
has aTIME_WAIT
duration of 60 seconds,uname -r
is4.15.0-135-generic
, and the following code can be seen in/usr/src/linux-headers-4.15.0-135-generic/include/net/tcp.h
#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
* state, about 60 seconds */
#define TCP_FIN_TIMEOUT TCP_TIMEWAIT_LEN
/* BSD style FIN_WAIT2 deadlock breaker.
* It used to be 3min, new value is 60sec,
* to combine FIN-WAIT-2 timeout with
* TIME-WAIT timer.
*/
- Kernel parameters
sysctl -a | grep net.ipv4.tcp_fin_timeout
net.ipv4.tcp_fin_timeout = 60
- At this point, the connection between the two parties has been closed. In the server,
n
the following code to close the file descriptor of the listening port
::close(listen_fd);
- The above is the normal connection and disconnection process. If the client calls
connect
without running the server, the captured packets are as follows
12:50:57.975036 IP localhost.56262 > localhost.12345: Flags [S], seq 1183472988, win 65495, options [mss 65495,sackOK,TS val 991801517 ecr 0,nop,wscale 7], length 0
12:50:57.975051 IP localhost.12345 > localhost.56262: Flags [R.], seq 0, ack 1183472989, win 0, length 0
R
stands forRST
. The client callsconnect
to initiate a connection. Since the port is not open, aRST
is returned. After receivingRST
, the client does not need to returnACK
and directly releases the connection. The state changes toCLOSED
.connect
returns -1 to indicate that the call failed. If the client and server are connected normally, and the server hangs up after a while, the client will send a request to the server and will also receive aRST
to reset the connection- The packets captured by tcpdump can be written to a file using the following command. You can open the file with Wireshark to see the contents of the packet more intuitively.
sudo tcpdump -i any tcp port 12345 -w socket_debug.cap
UDP
- UDP can send messages directly without establishing a connection. Change the TCP example to UDP and use the following command to monitor the UDP port
sudo tcpdump -iany udp port 12345
- The captured UDP packet looks like this
09:19:53.027405 IP localhost.39094 > localhost.12345: UDP, length 12
09:19:53.029951 IP localhost.12345 > localhost.39094: UDP, length 15