On Fri, Feb 09, 2001 at 09:52:26PM -0600, stefmit@xxxxxxxxxxxxx wrote:
> In the specific case I am talking about now it looks like a too
> long "train" of not-yet ack-ed packets (because of big receiving
> window size) let to the dissapearance of one packet in the middle,
> never caught at the other end,
Presumably by "never caught on the other end" you mean that the packet
never makes it to the other end, *NOT* that the disappearance itself is
not detected on the other end.
> thus never ack-ing the whole "train" ... no retransmission being able
> to repair ... and FTP servers
> closing "cleanly" because of that.
As per my earlier mail, an FTP server would close cleanly in that
situation only if either
1) the TCP stack was either broken, and didn't inform the code
using it that the connection closed due to a timeout rather
than a FIN with the right sequence number, or was communicating
through an inadequate API, so that it couldn't inform the code
using it that the connection closed due to a timeout
or
2) the FTP server wasn't making use of the "connection closed
due to a timeout" information it was handed.
Either one of those may well be the case, but TCP itself makes the
distinction between "connection closed due to packets being lost and not
retransmitted" and "connection closed cleanly due to a FIN" pretty
clear.
You might try capturing with:
Ethereal, with a smaller snapshot length (in this particular
case, you probably don't need to see the data going over the
wire, just the link-layer/IP/TCP headers and perhaps enough
payload to see the commands on the control connection);
Tethereal, again with a smaller snapshot length;
tcpdump, again with a smaller snapshot length (which you get by
default with tcpdump) - unfortunately, you won't necessarily
even get an "N packets dropped" indication from it, as the Linux
libpcap doesn't necessarily supply that information.
Note also that capturing with a packet filter to see only FTP traffic
could also reduce the risk of dropped packets, although, on Linux, that
would require a 2.2 or better kernel built with the "socket filter"
option and a libpcap that uses PF_PACKET sockets and supports the socket
filter (e.g., the one with Red Hat 6.1 or later, or libpcap 0.6.1 or
later).
> > Perhaps the FTP servers and clients you're using both suck, so that they
> > won't report a "connection timed out" error (or perhaps the OS on one
> > side or the other sucks and doesn't provide a "connection timed out"
> > indication on reads or writes)
> <snip>
>
> I will have to admit that this might be the case ... but I am "stuck"
> with both ends - one being an OS/390 TCP/IP implementation (the
> "client" for FTP), and the other a Netware 5.x (latest) for the stack
> servicing either the Novell FTP server, or the Murkworks one (which
> makes me believe that FTP sever implementation itself has nothing
> to do wtih the problem - the more so that the Murkworks one is
> probably the best by-the-book/RFC FTP design I have seen).
When implementing an FTP server or client you need to worry about more
than the RFC - FTP servers and clients tend to use the underlying file
system as well as using whatever API the networking socket provides. A
"by-the-book/RFC" FTP server or client may still not properly handle the
OS's "connection timed out" notification.