Ethereal-users: [Ethereal-users] Why was I getting "ARP storms" on my network....

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: "Gary Mansell" <Gary.Mansell@xxxxxxxxxxx>
Date: Wed, 07 Sep 2005 09:47:57 +0100
Hi,

We have just traced and solved a network performance issue on our
network and I am looking for some understanding as to why the problem
occurred.

Our network consists of about 50 switches each containing approx 24
machines. These switches are all connected back to a central core switch
in a star topology. The whole network runs a class B network address
space. There are about 1000 machines which are a mixture of UNIX and
PC's. The backbone is GB and most machine ports are 100mb apart from the
servers.

We were experiencing intermittent (once per minute) drops in network
performance where key machines on the network could not be accessed. On
further investigation with Ethereal I found that we would regularly get
ARP storms whereby the number of ARPs would shoot up from there normal
50-70 per second to 3000/sec and then drop down to about 600/sec for
about 1 minute and then drop back down to the usual 50-70.

In the packet captures, I saw that numerous machines on our network
would all of a sudden try and ARP the same IP address hundreds of times
within a couple of seconds. (Note - each of the numerous machines was
trying to ARP a different IP address hundreds of times; not all the same
one) While the machine in question was doing this, the machine would not
respond to other network access such as pings. Over time it seemed to be
the same 20 or so machines that kept performing these huge amounts of
ARPS on the network. All of the machines that were making these hundreds
of ARPS were all on different network segments spread around the site
and the target machines that they were trying to ARP were also spread
around the site so we could find no common point.

We eventually isolated the fault by gradually disconnecting each segment
from our main switch and waiting to see if the ARP storms stopped, which
it finally did. The problem was traced to a network switch that had
fallen off of a shelf in a server cabinet and was hanging by the UTP
cables !! When we disconnected this switch the ARP storms stopped and
the network was fine.

It would seem that dodgy network connections in this switch caused by it
hanging by the UTP cables was causing numerous machines around the
network to create hundreds of ARP requests every minute or so.

Please can someone explain to me why this was happening.

Thanks in advance

Regards

Gary Mansell


-- 

This e-mail and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this e-mail in error please notify the sender immediately
and delete this e-mail from your system.  Please note that any views or opinions
presented in this e-mail are solely those of the author and do not necessarily
represent those of Ricardo (save for reports and other documentation formally
approved and signed for release to the intended recipient).  Only Directors
or Duly Authorised Officers are authorised to enter into legally binding
obligations on behalf of Ricardo unless the obligation is contained within
a Ricardo Purchase Order.

Ricardo may monitor outgoing and incoming e-mails and other telecommunications
on its e-mail and telecommunications systems.  By replying to this e-mail you
give consent to such monitoring.  The recipient should check this e-mail and
any attachments for the presence of viruses.  Ricardo accepts no liability for
any damage caused by any virus transmitted by this e-mail.  "Ricardo" means
Ricardo plc and its subsidiary companies.

Ricardo plc is a public limited company registered in England with registered
number 00222915.
The registered office of Ricardo plc is Bridge Works, Shoreham-by Sea,
West Sussex, BN43 5FG.