Wireshark-users: Re: [Wireshark-users] troubleshooting ftp timeout using wireshark

From: Hansang Bae <hbae@xxxxxxxxxx>
Date: Mon, 03 Mar 2008 22:22:10 -0500
luis pena wrote:
Hello all, my first post so forgive me if I omit any info.

I am observing a FTP timeout on our network that I am hoping to pin down using Wireshark.

The network is an 18 node Frame Relay WAN. Nodes are connected via point-to-point T1 using Cisco 2600s to a central hub (Site A) which provides our connection out to the Internet. Our IT Department is located in two sites (Site B) and (Site C). I first came across this issue when our payroll dept. complained that they could not upload a file to the payroll company that cuts our checks (oh no not payroll!). I found out that the problem has been occurring over the course of a couple weeks. There is no way for me to tell what has changed over the last couple weeks!

Using Filezilla in passive mode on Ubuntu Gutsy I am trying to upload a 100MB file to a private FTP server on the Internet. I am able to recreate the timeout at Site B and Site C. The System Administrator at Site A is not experiencing the upload timeout.

There is an ISA proxy server that sits between Site B & C and has been configured to allow FTP traffic. To be on the safe side I am bypassing proxy altogether. The problem persists when bypassing proxy as well. Windows firewalls are disabled via Group Policy. Hmmmm... I fired up Wireshark and filtered out the following: FTP & FTP-DATA. The FTP-DATA packets are show a lot (about 50%) of retransmission packets. FTP shows a <retransmission request> packet; the TCP checksum field states that the problem may be a TCP checksum offload. If I may assume that the problem is at Layer 4 and that there is a TCP segment sequencing error originating on our network. What steps have I missed and where so I look next in the troubleshooting process?
Thank you in advance.

OK, you cannot possibly have 50% retransmission. If you do, you have a MAJOR MAJOR issue. Also, forget the tcp checksum notice for now. It's almost always caused by TCP checksum being off loaded to the NIC (just like what it's telling you).

Are you absolutely sure you are not seeing duplicate packets? In other words, did you span the traffic as it was leaving the server port and capturing on the router port as well? Take a look at the IP ID field.

Look at the original packet and the "retransmitted" packet. Are the IP ID's the same? If so, they are just duplicated (not real).

Are you absolutely sure you don't have a duplex mismatch on the router uplink ports at site B and C?

Do a "sho int fa0/0" or whatever your upling port is. Do you see any errors on the interface? Or you can do a "sho port count" or "sho int" on your switch (former for CatOS, latter for IOS based switches).

Also, check your serial interface to make sure you are not taking errors at site A (hub) or sites B and C.

--

Thanks,
Hansang