Wireshark-users: Re: [Wireshark-users] [semi-OT] request second opinion on possible bugs in OS TC

From: Alan Tu <8libra@xxxxxxxxx>
Date: Mon, 17 Jan 2011 05:27:01 +0000
Sake, I have additional evidence supporting the theory that no
follow-on ACKs were received by the server. Using the TCP timestamp
options data. There is no clean display filter for TCP timestamps, so
I output in PDML format, as in

tshark -r sack_fail.pcap -T pdml

and search on the string tcp.options.time_stamp.

TCP timestamps are defined in RFC1323. Conceptually, each packet
contains a timestamp (TSval), and the timestamp of the packet being
acknowledged (TSecr). The following is from RFC1323:

"(1)  The connection state is augmented with two 32-bit slots:
TS.Recent holds a timestamp to be echoed in TSecr whenever a segment
is sent, and Last.ACK.sent holds the ACK field from the last segment
sent.  Last.ACK.sent will equal RCV.NXT except when ACKs have been
delayed.

(2)  If Last.ACK.sent falls within the range of sequence numbers of an
incoming segment:
SEG.SEQ <= Last.ACK.sent < SEG.SEQ + SEG.LEN
then the TSval from the segment is copied to TS.Recent; otherwise, the
TSval is ignored.

(3)  When a TSopt is sent, its TSecr field is set to the current
TS.Recent value."

For the difficult connection, the client correctly increments its
TSval value. If the ACKs from the client were being received by the
server, the TSecr sent by the server would be incremented per #2
above. However, the server's TSecr consistently refers to the TSval of
the packet containing the HTTP GET request. Therefore it stands to
reason the server is not receiving any additional ACK.

So, the remaining mystery is why are ACKs from the client not making
it to the server, especially when the initial three way handshake and
HTTP GET request are successfully received, and other hosts behind the
same public IP have no connection issue? For what its worth, the other
host on the same LAN is not using the TCP timestamp option at all,
otherwise I cannot see any obvious difference.

I hope the information and techniques described in this thread is
enlightening to some.

Alan


On 1/17/11, Alan Tu <8libra@xxxxxxxxx> wrote:
> Sake, Thanks for your analysis. It helps a lot. I knew I needed a
> second opinion. I'll start liking my Nokia phone again. Well, maybe
> just a little.
>
> For everyone else, the original PCAP (Nokia phone) is on Cloudshark at
> http://www.cloudshark.org/captures/3cc0916bb5be
>
> I agree figuring out why this site (not a fly by night site) always
> times out and retransmits (I have other samples) is interesting. I
> just didn't focus on that because (1) dropped packets is part of the
> TCP model, and (2) I overlooked the idea that my packets may be
> selectively but systematically filtered. Clearly, the initial SYN and
> HTTP request packets are getting to the server. Also, Windows and this
> site communicate fine, using the same shared public IP (PCAP at
> http://www.cloudshark.org/captures/f06cb43fec83
> ). What criteria might an intermediate host be using to distinguish my
> Nokia-originated packets and my PC-originated packets?
>
> Back to the stack issue. Also from the RFC 2018: "the bytes just below
> the block, (Left Edge of Block - 1), and just above the block, (Right
> Edge of Block), have not been received". So SACK'ing 1-1448 while
> ACK'ing 2896 in the same packet (frame 10) seems unreasonable.
>
> I do concur that it appears somehow none of the later ACKs are making it.
> Weird.
>
> Thanks for your help using your wisdom to slightly untangle this.
>
> Alan
>
>
> On 1/16/11, Sake Blok <sake@xxxxxxxxxx> wrote:
>> On 16 jan 2011, at 17:00, Alan Tu wrote:
>>
>>> If after reading this and you're interested in
>>> helping, please e-mail me individually and I'll reply with the PCAP.
>>
>> Thanks for sending the PCAP.
>>
>>> [...]
>>> This is my assessment of what is going on:
>>> Frame 1-3: three way handshake, normal
>>> Frame 4: client sends HTTP GET request, normal
>>> Frame 5: server ACK frame 4, normal
>>> Frame 6: server sends payload segment 1 (PS1), normal
>>> Frame 7: server sends PS2, normal
>>> Frame 8: client ACK PS2, normal
>>>
>>> For some reason, frame 8 is not received or processed by the server
>>> (this is a mystery, but not discussed here.)
>>
>> IMHO, this mystery is what needs to be investigated, as this is the cause
>> of
>> the problem. Here follows my analysis which backs up that statement :-)
>>
>>> Frame 9: server resends PS1, normal
>>> ***Frame 10: client receives frame 9, a duplicate of frame 6. Client
>>> ACK frame 7, but sends a SACK with the segment from frame 6.
>>>
>>> This is clearly incorrect behavior, ref the SACK RFC, RFC2018. The
>>> client is treating frame 9 as an out of order packet and jumping into
>>> SACK mode, but frame 9 is merely a duplicate or retransmit. Frame 9
>>> falls outside the client's receive window (updated after frame 7) and
>>> should discard it, but doesn't. My theory is that the client (Symbian
>>> OS TCP stack) is not doing a bounds check on its TCP receive window.
>>
>> According to the RFC:
>>
>> "If the data receiver generates SACK
>>    options under any circumstance, it SHOULD generate them under all
>>    permitted circumstances."
>>
>> So it is obligated to use the SACK option when ACKing the retransmission.
>>
>> Also from the RFC:
>>
>> "The first SACK block (i.e., the one immediately following the
>>       kind and length fields in the option) MUST specify the contiguous
>>       block of data containing the segment which triggered this ACK,
>>       unless that segment advanced the Acknowledgment Number field in
>>       the header.  This assures that the ACK with the SACK option
>>       reflects the most recent change in the data receiver's buffer
>>       queue."
>>
>> This means it has to SACK the block that has just been received and it
>> does.
>>
>>> Frame 11: The server TCP stack has received an invalid SACK and is now
>>> confused. It retransmits PS1. This is semantically incorrect because
>>> the client actually indicates it has received PS1.
>>
>> *If* the server TCP received the ACK with SACK. But I don't think it did.
>> If
>> you use the filter "tcp.srcport==80", you can see clearly that it keeps
>> retransmitting the same segment with an increasing retransmission
>> timeout.
>> This is the behavior of a system that does *not* receive any ACKs.
>>
>>> Frame 12: client retransmits frame 10
>>> Frame 13: server retransmits PS1
>>> Frame 14: client retransmits frame 10
>>> Frame 15: server retransmits PS1
>>> Frame 16: client retransmits frame 10
>>
>> Actually, the client keeps ACKing the received frame, hoping to reach the
>> server and make it send new data.
>>
>>> Frame 17: server sends PS3, normal
>>>
>>> Somehow, this "breaks the spell", for the moment.
>>
>> The somehow could be explained by a Keep-Alive timer on the server. As
>> can
>> be seen in the HTTP data, both the client and the server want to use
>> Keep-Alive, so neither of the two should close the connection until a
>> timeout has been reached or the maximum configured objects have been
>> served
>> over the same TCP connection.
>>
>> Since the server waited on an ACK after sending two full frames, it will
>> not
>> send all the data at once without waiting on ACKs. So the fact that it
>> sends
>> all the data at once and the fact that is closes the connection with a
>> FIN
>> tells me it is flusing its send buffer after the http daemon has told it
>> to
>> close the connection due to a 15 sec idle timeout.
>>
>>> Frame 18: server sends PS4, normal
>>> Frame 19: server sends PS5, normal
>>> Frame 20: server sends PS6, normal
>>> Frame 21: server sends PS7, normal
>>> Frame 22: server sends PS8, normal
>>> Frame 23: server sends PS9, normal
>>> Frame 24: server sends PS10, normal
>>> Frame 25: server sends PS12 with FIN, received out of order
>>> Frame 26: server sends PS11, received out of order
>>> Frame 27: client ACK PS4, normal
>>> Frame 28: client ACK PS6, normal
>>> Frame 29: client ACK PS8, normal
>>> Frame 30: client ACK PS10, normal
>>> Frame 31: client ACK PS10, but sends a SACK saying it has received PS12,
>>> normal
>>> Frame 32: client ACK PS12, normal
>>> Frame 33: client sends FIN/ACK, acknowledging server's FIN from frame
>>> 25,
>>> normal
>>>
>>> At this point, the client is expecting an ACK to its own FIN.
>>>
>>> Frame 34: for some reason, the client does not receive an ACK to its
>>> FIN, so two seconds later it retransmits a FIN/ACK, normal
>>
>> Well, since the server does not seem to receive the packets of the
>> client,
>> it will never respond to these FINs.
>>
>>> ***Frame 35: server resends PS1, 2.7 seconds after the client sends
>>> the first FIN in frame 33
>>>
>>> Why oh why does the server (unknown OS) do this? the SACK storm from
>>> earlier seemed to have broken, the client has acknowledged all the
>>> later payload segments, the server has sent its FIN, and the client
>>> has sent its FIN (twice.) PS1 should be out of the server's TCP send
>>> window anyway.
>>
>> It should... *IF* it ever received an ACK from the client.
>>
>> So the main question is... why do the packets from the client never reach
>> the server? Or do they reach the server in a transformed state and get
>> discarded by the server?
>>
>> Hope this helps,
>> Cheers,
>>
>>
>> Sake
>>
>> ___________________________________________________________________________
>> Sent via:    Wireshark-users mailing list <wireshark-users@xxxxxxxxxxxxx>
>> Archives:    http://www.wireshark.org/lists/wireshark-users
>> Unsubscribe: https://wireshark.org/mailman/options/wireshark-users
>>
>> mailto:wireshark-users-request@xxxxxxxxxxxxx?subject=unsubscribe
>>
>