Observe the TCP handshake with tcpdump by GDB the code line by line
- Compile
./tcpdump_demo/build.sh
- Monitoring TCP port with tcpdump
sudo tcpdump -i any tcp port 12345
- GDB server
gdb ./debug/server
Ctrl + x + ainto TUI mode,startto main and the line ofint main()will be highlighted- input
nextorn, TUI will highlight the next line of code, press Enter to callbindandlisten
int listen_fd = jc::ListenFd("0.0.0.0", 12345);
- The status of the TCP port will change to
LISTEN
netstat -antp | grep 12345
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 65639/server
- GDB client in another shell
gdb ./debug/client
Ctrl + x + aandstart,nthe following line of code to callconnect
int fd = jc::Connect("0.0.0.0", 12345);
- tcpdump will display the follwing log
11:58:17.059466 IP localhost.56234 > localhost.12345: Flags [S], seq 2271595046, win 65495, options [mss 65495,sackOK,TS val 988640603 ecr 0,nop,wscale 7], length 0
11:58:17.068493 IP localhost.12345 > localhost.56234: Flags [S.], seq 2153659986, ack 2271595047, win 65483, options [mss 65495,sackOK,TS val 988640612 ecr 988640603,nop,wscale 7], length 0
11:58:17.068515 IP localhost.56234 > localhost.12345: Flags [.], ack 1, win 512, options [nop,nop,TS val 988640613 ecr 988640612], length 0
- The first column is the time, accurate to microseconds, the left side of
>is the sender, the right side is the receiver, andwinrepresents the window size (16 bits, maximum 65535). For ease of analysis, extract the key parts
localhost.56234 > localhost.12345: Flags [S], seq 2271595046
localhost.12345 > localhost.56234: Flags [S.], seq 2153659986, ack 2271595047
localhost.56234 > localhost.12345: Flags [.], ack 1
- A pair of sockets uniquely identifies a TCP connection on the network. It is a four-tuple, including the local IP address and TCP port number, destination IP address and TCP port number.
- The port number is 16 bits, ranging from 0 to 65535, and can be divided into three segments:
- 0-1023:Well-known port, also a Unix reserved port, is a socket that can only be assigned to privileged user processes and is allocated and controlled by IANA (The Internet Assigned Numbers Authority). The same port number will be assigned to the same service, such as SSH port 22 and HTTP port 80
- 1024-49151:registered port, which are not controlled by IANA, registered by IANA and provide a list of their usage
- 49152-65535:dynamic or private, temporary port, 49152=65536 * 0.75
- The captured TCP packet reflects the three-way handshake to establish a TCP connection.
SrepresentsSYN,.representsACK, and the process is as follows- 1st handshake: After the server calls
bindandlisten, it is in theLISTENstate. The client callsconnectto send aSYNto the server and enters theSYN_SENTstate. TheSYNsequence number is2271595046. This sequence number is a randomly generated ISN (Initial Sequence Number) that changes over time - 2nd handshake: After receiving
SYN, the server sends aSYN + ACKto the client, and the state changes fromLISTENtoSYN_RCVD. The sequence number of thisSYNis also random, and the sequence number ofACKis the sequence number of the SYN sent by the client plus 1 - 3rd handshake: After the client receives the server’s
SYN + ACK, it sends anACKto the server, and the status changes fromSYN_SENTtoESTABLISHED. Similarly, the sequence number ofACKis the sequence number ofSYNsent by the server plus 1. It is displayed here as 1 because tcpdump uses relative sequence numbers by default. If there is noSYN, the increment relative to itsSYNis displayed - Finally, the server receives
ACK, and the state changes fromSYN_RCVDtoESTABLISHED. Both parties enter theESTABLISHEDstate, and the TCP connection is established - Except for the first active connection initiation of the three-way handshake, which sends
SYNandACKis set to 0, theACKof all other TCP packets is set to 1, because the 32-bit confirmation number itself is part of the TCP header, and settingACKto 1 just makes use of this part, without any additional cost - Each
SYNcan contain multiple TCP options. Commonly used TCP options are as follows (the latter two are called RFC 1323 options, also called long fat pipe options, because high bandwidth or long delay networks are called long fat pipes)- MSS (maximum segment size): 16 bits, with a maximum value of 65535. The end sending
SYNnotifies the other end of its maximum segment size. The Ethernet MTU is 1500 bytes. If the transmitted IP data packet is larger than the MTU, IP fragmentation is performed, and the fragments will not be reassembled before reaching the destination. The purpose of MSS is to avoid fragmentation. It is usually set to the fixed length of MTU minus IP and TCP headers. The MSS of IPv4 in Ethernet is 1500 - 20 (TCP header) - 20 (IPv4 header) = 1460. The IPv6 header is 40 bytes, corresponding to an MSS of 1440 - Window size: 16 bits, with a maximum value of 65535. Nowadays, in order to obtain greater throughput, larger windows are required. This option specifies the number of left shift bits (0-14), and the maximum window provided is close to 1GB (65536 * 2 ^ 14). The premise of using this option is that both end systems must support this option. The option is affected by the
SO_RVCBUFsocket option - Timestamps: prevents undetected corruption of the data caused by recurring packets. Programmers do not need to consider this option
- MSS (maximum segment size): 16 bits, with a maximum value of 65535. The end sending
- 1st handshake: After the server calls
- The server and client enter the
ESTABLISHEDstate
netstat -antp | grep 12345
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 65639/server
tcp 0 0 127.0.0.1:56234 127.0.0.1:12345 ESTABLISHED 65633/client
tcp 0 0 127.0.0.1:12345 127.0.0.1:56234 ESTABLISHED 65639/server
- Continue
nin the server.acceptis used to take the first completed connection from theESTABLISHEDqueue and return the file descriptor of the new connection. If the queue is empty, the process is blocked
int accept_fd = jc::AcceptFd(listen_fd);
- Since the client and server have established a connection,
acceptwill return directly without blocking. Continuenwith the following code to callsend
jc::Send(accept_fd, "welcome to join");
- tcpdump log
11:59:19.715745 IP localhost.12345 > localhost.56234: Flags [P.], seq 1:16, ack 1, win 512, options [nop,nop,TS val 988703260 ecr 988640613], length 15
11:59:19.715757 IP localhost.56234 > localhost.12345: Flags [.], ack 16, win 512, options [nop,nop,TS val 988703260 ecr 988703260], length 0
Pstands forPSH. When set to 1, data is sent. 16 is the data length plus 1 (the length ofwelcome to joinis 15). ack is the offset. If it is sent next time, it will be 16 (the last ack + length). The ack replied by the client is the value after the colon of the sequence number. Note that these TCP packets are only generated by the server callingsend. At this time, there is no receiver to receive the message. It can be seen thatsenddoes not really send the data to the other party- Each TCP socket has a send buffer, the size of which can be changed using the
SO_SNDBUFsocket option. When an application callssend, the kernel copies all data from the application’s buffer to the send buffer of the written socket. If the send buffer cannot hold the data, the application will be blocked until all data in the application buffer is copied to the send buffer. The return value ofsendis the length of the data put into the buffer. If it is inconsistent with the length specified in the parameter, it means that the data was not fully put in, which will result in data loss. Aftersendreturns, it only means that the original application buffer can be reused, and it does not mean that the other end has received the data - Kernel parameters for
Ubuntu 18.04.5 LTS
sysctl -a | grep "net.ipv4.tcp_.*mem"
net.ipv4.tcp_mem = 22062 29417 44124
net.ipv4.tcp_rmem = 4096 131072 6291456 # recv buf is 6.29 MB
net.ipv4.tcp_wmem = 4096 16384 4194304 # send buf is 4.19 MB
- In the client,
nto callrecvto get data from the buffer. TCP is a stream-oriented protocol, so it does not guarantee that the data of eachsendandrecvwill correspond. In order to reduce the number of packets sent, the sender can use the Nagel algorithm. In network programming, both parties need to specify the application layer protocol to parse the data
jc::PrintReceiveMessage(fd);
nthe client to close the connection
::close(fd);
- tcpdump log
12:00:42.379997 IP localhost.56234 > localhost.12345: Flags [F.], seq 1, ack 16, win 512, options [nop,nop,TS val 988785924 ecr 988703260], length 0
12:00:42.383661 IP localhost.12345 > localhost.56234: Flags [.], ack 2, win 512, options [nop,nop,TS val 988785928 ecr 988785924], length 0
- These are the first two of four waves to disconnect the TCP connection, where
Fstands forFIN- 1st wave: client calls
closeto sendFIN + ACKto server and entersFIN_WAIT_1state - 2nd wave: After receiving
FIN + ACK, the server sends anACKand enters theCLOSE_WAITstate. This state allows the server to continue sending unfinished data. Because there may be unfinished data, the disconnection process takes one more time than the establishment of the connection. After the client receivesACK, the state changes fromFIN_WAIT_1toFIN_WAIT_2
- 1st wave: client calls
- The client enters the
FIN_WAIT2state, and the server enters theCLOSE_WAITstate
netstat -antp | grep 12345
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:12345 0.0.0.0:* LISTEN 65639/server
tcp 0 0 127.0.0.1:56234 127.0.0.1:12345 FIN_WAIT2 -
tcp 1 0 127.0.0.1:12345 127.0.0.1:56234 CLOSE_WAIT 65639/server
nthe server to close the connection
::close(accept_fd);
- tcpdump log
12:01:24.794975 IP localhost.12345 > localhost.56234: Flags [F.], seq 16, ack 2, win 512, options [nop,nop,TS val 988828339 ecr 988785924], length 0
12:01:24.794993 IP localhost.56234 > localhost.12345: Flags [.], ack 17, win 512, options [nop,nop,TS val 988828339 ecr 988828339], length 0
- These are the last two waves of the four waves that close the TCP connection
- 3rd wave: the server calls
closeto sendFIN + ACKto the client, and the state changes fromCLOSE_WAITtoLAST_ACK - 4th wave: After receiving
FIN + ACK, the client sends anACK, and the state changes fromFIN_WAIT_2toTIME_WAIT. After receivingACK, the server changes its state fromLAST_ACKtoCLOSED. After2MSL(Maximum Segment Lifetime), the client state changes fromTIME_WAITtoCLOSED - The role of
TIME_WAIT- Reliably terminate TCP full-duplex connections. If the server does not receive the last
ACKsent by the client, it will resendFIN + ACK, and the client will resendACKand restart the2MSLtimer - Allow old duplicate segments to disappear in the network. For example, if a connection is closed, another connection is established between the same IP and port soon.
TIME_WAITcan prevent this from happening. For ports in theTIME_WAITstate, callingbindwill fail, so theSO_REUSEADDRoption is generally set when creating a socket, so that the port can bebindeven if it is in theTIME_WAITstate
- Reliably terminate TCP full-duplex connections. If the server does not receive the last
- 3rd wave: the server calls
- Any
TCPimplementation must choose a value for MSL. RFC 793 stipulates that MSL is 2 minutes (For this specification the MSL is taken to be 2 minutes).Ubuntu 18.04.5 LTShas aTIME_WAITduration of 60 seconds,uname -ris4.15.0-135-generic, and the following code can be seen in/usr/src/linux-headers-4.15.0-135-generic/include/net/tcp.h
#define TCP_TIMEWAIT_LEN \
(60 * HZ) /* how long to wait to destroy TIME-WAIT \
* state, about 60 seconds */
#define TCP_FIN_TIMEOUT TCP_TIMEWAIT_LEN
/* BSD style FIN_WAIT2 deadlock breaker.
* It used to be 3min, new value is 60sec,
* to combine FIN-WAIT-2 timeout with
* TIME-WAIT timer.
*/
- Kernel parameters
sysctl -a | grep net.ipv4.tcp_fin_timeout
net.ipv4.tcp_fin_timeout = 60
- At this point, the connection between the two parties has been closed. In the server,
nthe following code to close the file descriptor of the listening port
::close(listen_fd);
- The above is the normal connection and disconnection process. If the client calls
connectwithout running the server, the captured packets are as follows
12:50:57.975036 IP localhost.56262 > localhost.12345: Flags [S], seq 1183472988, win 65495, options [mss 65495,sackOK,TS val 991801517 ecr 0,nop,wscale 7], length 0
12:50:57.975051 IP localhost.12345 > localhost.56262: Flags [R.], seq 0, ack 1183472989, win 0, length 0
Rstands forRST. The client callsconnectto initiate a connection. Since the port is not open, aRSTis returned. After receivingRST, the client does not need to returnACKand directly releases the connection. The state changes toCLOSED.connectreturns -1 to indicate that the call failed. If the client and server are connected normally, and the server hangs up after a while, the client will send a request to the server and will also receive aRSTto reset the connection- The packets captured by tcpdump can be written to a file using the following command. You can open the file with Wireshark to see the contents of the packet more intuitively.
sudo tcpdump -i any tcp port 12345 -w socket_debug.cap
UDP
- UDP can send messages directly without establishing a connection. Change the TCP example to UDP and use the following command to monitor the UDP port
sudo tcpdump -iany udp port 12345
- The captured UDP packet looks like this
09:19:53.027405 IP localhost.39094 > localhost.12345: UDP, length 12
09:19:53.029951 IP localhost.12345 > localhost.39094: UDP, length 15