On Wed, Apr 08, 2009 at 02:17:45PM +0200, Bart Greyson wrote:
> On Wed, Apr 8, 2009 at 1:36 PM, Sake Blok <sake@xxxxxxxxxx> wrote:
> On Wed, Apr 08, 2009 at 12:59:40PM +0200, Bart Greyson wrote:
> > Can anyone tell from these messages what went wrong and what might
> be the
> > cause?
>
> It would be helpful to have this data as a binary capture file. Could
> you post a binary capture file with packets 8705-8726 of the original
> file?
>
> I don't know exactly where I can turn this data into a "binary capture
> file" in Wireshark. I uploaded it in a more detailed form if that's what
> you wanted: http://www.greyson.be/capt.txt.
Well, with binary capture file, I just meant a saved capture file. If
you select "File -> Save as", you can save the capture file again, but
this time you could use a packet range of 8705-8726 in the lower left
corner to save only these frames in the new capture file. Having a file
that can be loaded in our favorite tool is much more easy than having to
anlyse it in a text editor ;-)
Well, I have looked through the data and come up with the following
events:
8717 C->S data (request)
8718 S->C ACK
8719 S->C data (response, seq 2613, next 2466)
8720 S->C data (response, seq 2666, next 2492)
8721 C->S ACK (2492)
~17 sec delay
8722 S->C FIN (seq 2515, previous segment lost)
8723 C->S Dup.ACK (2492)
8724 S->C Encryted Alert (seq 2492, next 2514)
8725 C->S ACK (2516)
~16 sec delay
8726 S->C RST
Frame 8717-8721 look like a normal request/response. The 17 sec delay is
usually caused by either the browser or the server in an HTTP/1.1
conversation when the preconfigured time-out expires while waiting for
another request. Then the server wants to close the connection, usually
this is done by sending a SSL alert, which in this phase of the
communication would of course be encrypted and then a TCP FIN. It looks
like the SSL Alert somehow did not make it to the client (assuming the
trace is made at the client side). When the TCP FIN arrives, the client
knows it missed some data, so it asks for the data with the ACK in frame
8723. The server resends the missing data i(the Alert) in frame 8724.
The client now ACKs the data, but since it has already seen the TCP FIN,
it adds 1, so it ACKs 2516 instead of 2515. After a 16 second delat, the
server sends a TCP RST.
To me there are two issues here, first of all, why does the SSL Alert
gets lost from the server to the client. This could just be random
packets being dropped (do you see other packets being retransmitted in
other sessions?). Another possibility is that they are dropped on
purpose by an intermediate device. But as you say about 10% of the
transactions fail, I assume it's just random packet loss.
Second issue is why the client and server are not capable of restoring
the communication properly (which is the responsibility of the TCP
protocol). I would suggest that there is a device in between the client
and the server (a firewall, IDP, Loadbalancer, etc) which was not
keeping track of sequence numbers properly and dropped the ACK in frame
8725 on its way from the client to the server. The device would have
expected an ACK of 2515 instead of 2516 if it was not for the already
transmitted FIN. This would also account for the RST from the server
after 15 seconds. If the server never saw the ACK, it would start a
timer for the connection closure and since it never saw anything anymore
from the client, it will close the connection, sending the client a RST
to inform it of the (unclean) closure.
So, assuming the packet loss is random, there might be a bug in an
intermediate device. I would make traces at the client and the server to
verify these findings and if they are correct, work your way inwards to
find the device that is causing this behavior. Then you can open up a
bug-report with the vendor of that device.
Hope this helps,
Cheers,
Sake