Wireshark-users: Re: [Wireshark-users] TCP question: retransmission or prodding the peer?
From: Bill Meier <wmeier@xxxxxxxxxxx>
Date: Thu, 20 Feb 2014 14:21:14 -0500
On 2/20/2014 2:05 PM, Bill Meier wrote:
On 2/20/2014 6:04 AM, netztier@xxxxxxxxxx wrote:Hi all I am trying to track down a problem with an embedded device (card reader, attached to a printer/copier) which is part of a "follow me printing" solution: User starts print job, walks to the next available print machine, inserts card/badge, gets shown the list of his/her queued jobs, selects one or more and prints it, and his card gets billed by-the-page, etc etc. This is usually done using a sequence of TCP sessions between card reader and card server. Eventually, the card server will notify the print server to push the selected print job to the printer, and will maintain a flow of packets to the card reader, sending a billing notification for every single page printed. Every so often, there seems to be a stall in communication between the card reader and the server, but only during the very first TCP session. After the full three way handshake and two or three more packets, there is a stall of ca 2.5seconds. This is delay is noticeable to the user - and this is what we're trying to track down. After that delay, the card server sends a new packet with: - 1 byte payload (and 1 byte less of padding in the IP header) - PSH set - the **same** SEQ/ACK numbers as the packet before the delay (see frames 9 and 10). A similar effect can be observed in frames 5 and 6, but there the "delay" is only 7.5ms. This time, the card reader resends a packet to the server. - 1 byte payload (and 1 byte less of padding in the IP header) - PSH set - the **same** SEQ/ACK numbers as the packet before the delay (see frames 9 and 10). The capture was done on a passive 10Mbit/s Hub between Card Reader's switch port (Cisco2960S), using the onboard Intel NIC of a Lenovo T520. I was considering that the card reader's ACK might have got lost somewhere had CRC errors; or the Intel NIC might have them forwarded them to libpcap. However, I doubt that there are any invalid frames at all. During the months we spent to track down the issue, the Cisco's switch port never saw any invalid incoming frame (CRC, undersize etc), during the capture with the 10Mbit/s hub, there wasn't even a single collision on that given port, although it was running "10-half" at the time. Upstream bandwidth from the access switch is plentiful, and we have no indication that quality suffers anywhere in the network - and they're doing VoIP and all. QUESTIONS ========= a) can these observations be called "retransmissions"?No: see discussion belowb) if yes, is there a reason why Wireshark's [ Version 1.10.5 (SVN Rev 54262 from /trunk-1.10) ] SEQ/ACK analysis would not detect them as such?N/Ac) are there any knobs to turn in Wireshark to make this form of "retransmissions" show up ?N/Ad) is sending "same SEQ/ACK plus PSH" a known form of "cattle prodding a lagging TCP peer"?N/Ae) is 2.5sec a known "wait time" or "timeout" in common TCP implementations? (from which I will conclude that there must've been some packet loss all the same)Discussion: 1. It might be useful if you could provide a short capture of a good sequence (without the 2.5 sec delay). 2. I have several observations: a. The basic request/response sequence as follows: Time A ................ B .............. 1. 0.000000 --> 1 byte: seq:1 2. 0.200000 <-- ack:2 seq:1 len:0 3. 0.200010 --> 90 bytes: s:2 4. 0.400000 <-- ack:92 seq:1 len:0 (interval) 5. 2.900000 <-- 1 byte: ack:92 seq:1 len:1 6. 3.100000 --> ack:2 seq:92 len:0 7. 3.100010 <-- 10 bytes: ack:92 seq:2 8. 3.300000 --> ack:11 seq:92 So: The fact that the seq & ack in 4 and 5 are the same is just as expected. packet 4 is just an "ack" with no data packet 5 is data (with same seq/ack as the previous) However: for some reason, B took 2.5 secs to send (the start of) a response to packet 3 in packet 5. We know that B received packet 3 immediately because B sent an ack in packet 4 (after the usual 200 ms delay). So: The "B" application failed to respond immediately even though we know that "B" received the packet at the network level. I've idea as to why. Does "Only the during the first TCP connection" suggest some kind of initial setup going on in "B" ? That being said: there's another issue having to do with the "send 1 byte", wait for ack, send remaining bytes" pattern. Rather than me trying to explain: Do a web search on "Nagle algorithm" and TCP_NODELAY for an explanation. Basically: the software isn't programmed quite right (IMHO). Another thing I find a bit interesting: The widow size advertised by B (card server ?)just keeps decreasing as data is received from A. Normally that would mean that the app isn't taking the data from the network layer. However, that appears not to be the case since the request/response sequence seems to complete OK. What kind of system is the card server. Some kind of minimal system ?
Actually: I see that the continually decrementing window size advertisement applies to both the card reader and the card server.
Given that we're talking embedded devices, have you discussed this issue with the vendor ?
- Follow-Ups:
- References:
- [Wireshark-users] TCP question: retransmission or prodding the peer?
- From: netztier@xxxxxxxxxx
- Re: [Wireshark-users] TCP question: retransmission or prodding the peer?
- From: Bill Meier
- [Wireshark-users] TCP question: retransmission or prodding the peer?
- Prev by Date: Re: [Wireshark-users] TCP question: retransmission or prodding the peer?
- Next by Date: Re: [Wireshark-users] TCP question: retransmission or prodding the peer?
- Previous by thread: Re: [Wireshark-users] TCP question: retransmission or prodding the peer?
- Next by thread: Re: [Wireshark-users] TCP question: retransmission or prodding the peer?
- Index(es):