Ethereal-users: RE: [Ethereal-users] SQL Slammer - How to identify

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Greg Saunders <gregs@xxxxxxxxxxxxxxx>
Date: Thu, 18 Nov 2004 11:05:53 -0500
Thanks for the replies... Here is why I was asking. (This is kinda long to
give the detail)

*** Details ***

- We have an HP Procurve 4000M which is connected to subnet 172.20.2.0.

- This subnet has 10MB and 100MB devices with some having full duplex and a
small number having half duplex.  Most of the half duplex are print servers.

- The HP Procurve 4000M has 3 ports that connect to other switches.  1 to
another HP Procurve 4000M and 2 to small linksys switches that have most of
the slow print servers and such tied to them.

- We are running on a Windows NT domain with NT servers, 2000 servers, 2000
workstations, XP workstations, 98 workstations, SQL 7.0 server, Exchange 5.5
Server, HP print servers, and Intel print servers.

- We generally don't have high network utilization, it is normally below 5
percent all the time, once in a while when we do a network backup or large
file transfers will it go higher.

*** Symptoms ***

- We first noticed problems when our Network Monitoring software (basically
ping sweeps, event log monitoring, and NT server monitoring) noticed
multiple servers going down for just a few seconds.  The ping was set give
an alert if it detected 3 or more pings greater than 500ms.  I also bumped
it to 6 or more pings greater than 500ms to make it less sensitive.  Later
we found that it wasn't because of low latency, but pings requests not
getting replies at all.

- We also noticed applications like Outlook would loose connection to the
Exchange server briefly, applications that communicated directly to SQL
would loose connection and you would have to restart the app, and other
similar problems.

- Based on a 1 minute ping sweep of about 10 servers we are seeing a drop of
about 10 to 20 drops a day.

- Based on a 30 second ping sweep of about 10 servers we are seeing a drop
of about 30 or more a day.

- Based on a 10 second ping sweep of about 10 servers we are seeing a drop
of about 60.

- This is not isolated to one server it is happening on all of our servers.

- Some servers seem to experience it more than others, but no pattern is
really found here.

- We never see a "link loss" it is always just a matter of not being able to
communicate with the server for short time periods

- Usually the loss of communication is about 3 seconds, but I have seen it
last up to a minute.

*** Testing / Diagnosing ***

- We first started examining all of the logs and alerts that were generated
by the HP Procurve Switch and found nothing.  There were a couple of
CRC/alignment errors on one port and a couple of FCS rx errors on another
port.  But other than that we have no errors. 

- The switch diagnostics don't show any problems and looking at the numbers
the buffers are not used up and the memory is not used up.

- All servers have the latest MS security patches as well... Especially ones
that used the RPC buffer overflow problmes.

- All of our units and servers are protected with TrendMicro products with
the latest patterns and they don's see anything.

- We decided to start capturing packets to see if there were any of the
common worm / virus.  For capturing of packets I had the switch's monitoring
port monitor all ports so I could get a dump of everything.  I did this for
several hours and after examining the captures I didn't see anything
resembling a worm and none of the basic filters for the most common worms
revealed anything.

- One of my tests was while I was pinging server A from PC B I was not
getting replies.  I then went to server A and ping out to another unit.
Server A was getting replies and as soon as I had done this PC B started
getting replies from Server A.  When I stopped the pinging from Server A, PC
B quit seeing Server A.  After about a minute everything went back to
normall.

- I worked with HP and they swapped out the HP Procurve Chassis and we ended
up with the same results.

- After a bunch more tests HP was willing to send all new modules and we
swapped them and we have the same problem.  So now we have a whole new
switch and modules.

- HP wanted to determine whether or not the pings were actually traversing
the switch and making it to the servers so we did the tests below.  Listed
you will find the ports we were testing and how things were connected.

**********
Server TBGSMS with IP 172.20.2.6 on Port B1
Notebook TBGITSOLO2K with IP 172.20.2.12 on Port D6 (which is only
monitoring port B1 and the notebook is capturing packets)
PC TBGGREGS with IP 172.20.2.4 on Port D7 (This pc is also capturing
packets)
 
I turn packet capturing on for both 172.20.2.12 (D6) and 172.20.2.4 (D7)
I then start a ping utility that sends 1 ping every 2500ms with 32 bytes
continuously from 172.20.2.4 (D7).
 
The pc on 172.20.2.4 (D7) is capturing every packet it sends and the
replies.
The Notebook on 172.20.2.12 (D6) is capturing all packet inbound/outbound to
172.20.2.6 (B1) as well as it's own traffic.
 
Then pc on 172.20.2.4 (D7) shows that it did not receive replies to 5 icmp
ping requests in a row which spans 20 seconds.  I then look at the capture
on the monitoring port (D6) and it does not see the 5 pings that were
outbound from 172.20.2.4 (D7).  Basically the ping requests never traversed
the switch.  In the capture log on 172.20.2.12 (D7) you see all the ping
requests and replies to and from 172.20.2.6 (D6) up until those 5 never show
up and then you start seeing them pick back up again.

HP wanted to verify that the pings actually got to the switch from
172.20.2.4 (D7) so we started monitoring (D7) from (D6) and tested for this.
When 172.20.2.4 (D7) pinged without getting replies I was able to see that
the pings were getting to the actuall D7 port via the monitoring port and
they were intact.
**********

- So... I was able to prove that periodically the switch was not forwarding
packets to the destination they were intended to be.

- Never did we see the utilization of the network or the switch go up
sufficiently to cause dropped packets.  We are talking 1 to 10 percent
utilization with the majarity of the time it is well be low 5%.

- In my mind the only other thing I could think of is that the MAC table in
the switch is possible getting incorrect MAC addresses being set and causing
the problems we are seeing. Oh... We are NOT using VLAN's and NOT using tree
spanning.  The switch is setup basically as default.

- I am currently capturing in a text file the listing of the MAC address
assignments to the ports repetatively hoping to catch a port changing its
MAC address assignment.

*** Conclusions / Help needed ***

- I am open to any suggestions on what this problem could be.

- I am open to any assistance you can provide on how to use Ethereal to
catch something going on.  What would you recommend looking for to help
isolate this problem.

Thanks in advance and if you got this far then I am impressed.

Greg

-----Original Message-----
From: Andrew Hood [mailto:ajhood@xxxxxxxxx] 
Sent: Thursday, November 18, 2004 5:10 AM
To: Ethereal user support
Subject: Re: [Ethereal-users] SQL Slammer - How to identify

Greg Saunders wrote:
> Hey folks,
> 
> How can I identify the SQL slammer if I am capturing all the packets 
> on my switch through a monitoring port?  What specifics should I look 
> for… is there a filter or something to spot this?

I've seen Martin's reply, and would agree installing Snort would be a
simpler solution than trying to get Ethereal to pick them out.

The Snort rules for CVE CAN-2002-0649 a.k.a. Slammer a.k.a Saphire are:

alert udp $EXTERNAL_NET any -> $HOME_NET 1434 (msg:"MS-SQL Worm propagation
attempt"; content:"|04|"; depth:1; content:"|81 F1 03 01 04 9B 81 F1 01|";
content:"
sock"; content:"send"; reference:bugtraq,5310; reference:bugtraq,5311;
reference:cve,2002-0649; reference:url,vil.nai.com/vil/content/v_99992.htm;
classtype:mis c-attack; sid:2003; rev:6;)

alert udp $HOME_NET any -> $EXTERNAL_NET 1434 (msg:"MS-SQL Worm propagation
attempt OUTBOUND"; content:"|04|"; depth:1; content:"|81 F1
03 01 04 9B 81 F1|"; con
tent:"sock"; content:"send"; reference:bugtraq,5310; reference:bugtraq,5311;
reference:cve,2002-0649; reference:url,vil.nai.com/vil/content/v_99992.htm;
classty pe:misc-attack; sid:2004; rev:5;)

alert udp $EXTERNAL_NET any -> $HOME_NET 1434 (msg:"MS-SQL version overflow
attempt"; dsize:>100; content:"|04|"; depth:1; reference:bugtraq,5310;
reference:cve ,2002-0649; reference:nessus,10674; classtype:misc-activity;
sid:2050;
rev:5;)

--
There's no point in being grown up if you can't be childish sometimes.
                 -- Dr. Who

_______________________________________________
Ethereal-users mailing list
Ethereal-users@xxxxxxxxxxxx
http://www.ethereal.com/mailman/listinfo/ethereal-users