Ethereal-users: Re: [Ethereal-users] Using eteareal on host machine configured as abridge

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <guy@xxxxxxxxxx>
Date: Tue, 3 Apr 2001 19:41:28 -0700 (PDT)
> Unfortunately for me, most of the dissectors stop working if there's a
> VLAN tag in the packet, so looking at the raw interface isn't much use
> except for debugging VLAN related problems.
> 
> ...time elapsed...
> 
> I may be wrong about the dissectors. It appears that the vlan patch for
> linux may be causing some problems.

Yup, it could be.  Linux protocol stack code has, on occasion, had the
really irritating habit of somehow causing packets in intermediate
states - i.e., packets it's still doing things to before it sends them,
or packets it's doing things to after it's received them - to be given
to the PF_PACKET or SOCK_PACKET sockets that Ethereal/Tethereal and
tcpdump use to capture packets.

(I suspect the problems may be a one or more of

	1) the "copy-on-write" protocol not being followed, so that if
	   the code that processes received packets modifies the packet
	   data, it doesn't make a copy first, so that it doesn't step
	   on the sbuf being handed to the PF_PACKET/SOCK_PACKET socket;

	2) outgoing packets being switched off to the "taps"
	   (PF_PACKET/SOCK_PACKET sockets that see all packets,
	   including outgoing ones) before they're completed and ready
	   to send out on the wire;

	3) incoming packets having stuff done to the link-layer headers
	   (including VLAN headers) before being handed to the rest of
	   the networking stack - most sockets won't have a problem with
	   this, as they don't look at the link-layer headers, but the
	   sockets that just pass the packets on to libpcap end up
	   passing mangled packets along.

I haven't checked the code for any of this, but we've definitely seen
mangled NFS packets and mangled Appletalk packets in captures on Linux,
and may have seen other manglings.)

On the topic of VLAN problems with packet sniffer programs, here's some
mail to the tcpdump-workers mailing list; the problem he saw may be a
problem with the Linux VLAN code as well:

Date: Tue, 03 Apr 2001 17:36:39 +0000
From: Ed Stevens <estevens@xxxxxxxxxxxxxxxxxx>
Organization: Atreus Systems
To: Guy Harris <guy@xxxxxxxxxx>
CC: "tcpdump-workers@xxxxxxxxxxx" <tcpdump-workers@xxxxxxxxxxx>,
   Brian Bauer <bbauer@xxxxxxxxxxxxxxxxxx>
Subject: Re: [tcpdump-workers] ARP and VLANS

Guy Harris wrote:
> 
> >   This is what a VLAN packet from tcpdump should look like...
> >
> >     19:27:35.522141 ff:ff:ff:ff:0:e0 Broadcast 988e 346:
> >         df36 0800 4500 0148 c503 0000 8011 74a2
> >         0000 0000 ffff ffff 0044 0043 0134 3cfc
> >         0101 0600 555d e627 d04f 0000 0000 0000
> >         0000 0000 0000
> 
> I don't know what happened to the mail I'd sent out about this from home
> last night, but it didn't show up here at work, or in the tcpdump.org
> archive, so I'll reconstruct the important part.
> 
> The "988e" in the first line indicates that the Ethernet-frame dissector
> saw an Ethernet type field of hex 988e in the frame header, meaning the
> frame had:
> 
>         6 bytes that the dissector interpreted as the destination
>         address;
> 
>         6 bytes that the dissector interpreted as the source address
> 
>         2 bytes with 988e in them.
> 
> 988e is *NOT* a valid Ethernet type value for a VLAN frame (the right
> value is hex 8100), so, not surprisingly, tcpdump doesn't dissect it as
> a VLAN frame.
> 
> The destination address is "Broadcast", which means ff:ff:ff:ff:ff:ff;
> the source address is claimed to be ff:ff:ff:ff:00:e0, which looks a bit
> bogus.
> 
> So the frame, as captured by tcpdump, was:
> 
>         ffff ffff ffff ffff ffff 00e0 988e df36
>         0800 4500 0148 c503 0000 8011 74a2 0000
>         0000 ffff ffff 0044 0043 0134 3cfc 0101
>         0600 555d e627 d04f 0000 0000 0000 0000
>         0000 0000
> 
> >   This is what an "arp for gateway" VLAN packet looks like from
> >   tcpdump:
> >
> >     19:18:14.382141 ff:ff:ff:ff:0:e0 Broadcast 2994 64:
> >         a159 0806 0001 0800 0604 0001 00e0 2994
> >         a159 2021 2223 0000 0000 0000 2021 2202
> >         0000 2910 0001 0000 0000 0001 2046 4446
> >         0000
> 
> Same here; hex 2994 isn't a valid Ethernet type for a VLAN frame,
> either, so tcpdump, quite correctly, doesn't dissect that frame as a
> VLAN frame.
> 
> The frame was:
> 
>         ffff ffff ffff ffff ffff 00e0 2994 a159
>         0806 0001 0800 0604 0001 00e0 2994 a159
>         2021 2223 0000 0000 0000 2021 2202 0000
>         2910 0001 0000 0000 0001 2046 4446 0000
> 
> If you remove the ffff ffff from the beginnings of those frames, you get
> 
>         ffff ffff ffff 00e0 988e df36 0800 4500
>         0148 c503 0000 8011 74a2 0000 0000 ffff
>         ffff 0044 0043 0134 3cfc 0101 0600 555d
>         e627 d04f 0000 0000 0000 0000 0000 0000
> 
> and
>         ffff ffff ffff 00e0 2994 a159 0806 0001
>         0800 0604 0001 00e0 2994 a159 2021 2223
>         0000 0000 0000 2021 2202 0000 2910 0001
>         0000 0000 0001 2046 4446 0000
> 
> The first of those frames looks like an IP packet, sent from
> 00:e0:98:8e:df:36 to ff:ff:ff:ff:ff:ff (i.e., a broadcast frame); the
> frame type is 0800, meaning it's an IPv4 frame, and the next byte is
> 45, which is "IP version 4, frame length 5 4-byte words", in an IP
> header - i.e., an IP header with no options.
> 
> There appears to be "ffff ffff" in that packet; I'll bet that's the
> destination IP address of that frame (i.e., an IP broadcast).
> 
> The second of those frames looks like an ARP packet, sent from
> 00:e0:29:94:a1:59 to ff:ff:ff:ff:ff:ff (i.e., a broadcast ARP packet);
> if you parse the ARP packet, you find that the frame is an ARP request,
> with the hardware addresses being Ethernet addresses and the protocol
> addresses being IP addresses, with the source Ethernet address being
> 00:e0:29:94:a1:59, and the target Ethernet address being
> 00:00:00:00:00:00 (because it's unknown - that's what the request is
> trying to find).
> 
> So it appears that the real problem is either that
> 
>         1) some device is putting bogus frames onto the Ethernet, with
>            an extra "ffff ffff" stuck at the front
> 
> or
> 
>         2) the Linux networking code is somehow mangling those frames,
>            perhaps doing so while handling VLAN stuff.
> 
> I didn't see any obvious place in the 2.4.2 kernel where there's an
> 802.1Q implementation (which I find a bit surprising), so, for now, I
> suspect 1) might be the case.
> 
> I'd suggest putting some other network analyzer - whether it's a PC
> running some OS other than Linux plus some network analyzer program (a
> BSD PC running a libpcap-based program such as tcpdump or Ethereal or
> Ksnuffle, or a Solaris PC running any of the preceding programs or
> running snoop, or a Windows PC running WinDump or Analyzer or Ethereal
> or any of the zillions of commercial analyzer programs), or a
> specialized device (which, admittedly, are often just PC's running DOS
> or Windows plus some network analyzer software).
> 
> If that analyzer has the same problem, something probably really is
> putting bogus frames on the wire.
> 
> If that analyzer *doesn't* have the same problem, then Linux's
> networking code is probably mangling the frame somewhere after
> receiving it.

Many thanks for the feedback.  It was tremendously helpful and
instructive at a time when I was trying to get a grip on the problem. 
Having run some tests and analyzed the results, I agree fully with your
observations.  I am attaching a file that I have also sent to the Linux
VLAN mailing list.
Additional input from anyone is always appreciated!


-- 
Ed Stevens
Senior Software Designer, Atreus Systems Corporation
(613) 233-1741 x226
http://www.atreus-systems.com

Question with regard to VLAN 0.15:
==================================

An ARP message carried in a VLAN 802.1Q tagged frame appears to
get a 4-byte prefix added to it under certain conditions.

Here is the scenario:

Below is the tcpdump output for an ARP request from a laptop
whose ethernet adapter has mac address 00:E0:29:94:A1:59.
The laptop is configured with a static IP of 3.3.3.3.  It is
sending an ARP request for one of its DNS servers at 3.3.3.5.

The laptop is plugged into a VLAN port on a Cisco Catalyst 2900 XL
switch, whose VLAN trunk goes into eth1 on a Linux (Red Hat 6.2)
server.
Ben Greear's VLAN 0.15 patch has been applied to the kernel, and
configured to match the ports on the Cisco switch.

To test, I start tcpdump 3.6.2 on the server, and fire up a browser
on the laptop.  Instead of printing the ARP request from the
browser for its DNS server, tcpdump prints the following:

19:29:17.246869 ff:ff:ff:ff:0:e0 Broadcast 2994 64:
			 a159 0806 0001 0800 0604 0001 00e0 2994
			 a159 0303 0303 0000 0000 0000 0303 0305
			 0002 2910 0001 0000 0000 0001 2045 4246
			 0000

With or without the 802.1Q tag, we expect the first 12 bytes of
an ARP packet to consist of a destination mac and a source mac,
as follows:

    FFFF FFFF FFFF 00E0 2994 A159

But somehow an extra four bytes (FFFF FFFF) have been shoved onto
the front of the packet, making it unintelligible.  The destination
is therefore still broadcast (FFFF FFFF FFFF), but now the "source"
address has become FFFF FFFF 00E0.
Worse still, the frame type has become 2994 (...yuck...) instead
of the expected 0806 (for ARP) or 8100 (for 802.1Q VLAN).
The hardware type is now A159 (...double yuck...) instead of the
expected 0001.  No wonder packet sniffers choke on this.
 			
The rest of the packet appears to be the "real" ARP request, right
through to the target IP of the DNS server (0303 0305).
Interestingly, ARP requests coming from the server are printed correctly,
because they are not encapsulated in 802.1Q frames.

I sniffed the packets on the trunk line between the Cisco switch and the
Linux box, using Ethereal for Win32.  The 802.1Q-tagged packets that
contained ARP messages from the laptop were correct in every respect.
I wish that I had a Linux laptop handy; then I could have run tcpdump
as well at that spot!  (I might install WinDump, for a second opinion...)
In fact, the non-tagged ARPs from my sniffer host (another statically
configured laptop) were printed by tcpdump with no problem.
Because Ethereal uses libpcap, and tagged packets were shown exactly as
they should be, this appears to rule out a bug in libpcap (version 0.6.2).

When tethereal is run instead of tcpdump on the server, it prints the
same kind of mangled ARP packet...

  3.860000 ff:ff:ff:ff:00:e0 -> ff:ff:ff:ff:ff:ff 0x2994 Ethernet II

   0  ffff ffff ffff ffff ffff 00e0 2994 a159   ............)..Y
  10  0806 0001 0800 0604 0001 00e0 2994 a159   ............)..Y
  20  0303 0303 0000 0000 0000 0303 0305 000a   ................
  30  0110 0001 0000 0000 0000 2046 4846 0000   .......... FHF..

This packet, like the one printed by tcpdump, would make sense as an
ARP message if the extra "ffff ffff" was removed from the start.

When a statically configured laptop is on a non-VLAN port, no problems
occur, because the ARP messages (such as self-ARPs on bootup) are not
sent down the trunk wrapped in a trunking protocol.  But when the ARP
messages are tagged as 802.1Q, something in the Linux box inflicts
grievous bodily harm to the packets.  I should mention that this is
not Cisco-specific; the same results have been noted with a 3Com switch.

The next step in my investigation will probably involve some kernel
debugging, to see whether this is a bug in VLAN 0.15.  I plan to
double-check the way that VLAN has been configured on the server,
in case something has been overlooked.  I will also need to study the
documentation carefully, since I did none of the original setup,
integration or testing of VLAN 0.15 on the Linux box. (I'm new to this.)
By the way, each entry in /proc/net/vlan already has the REORDER_HDR
flag set to 1.

Now that I have admitted my ignorance, does anyone have any idea where
the extra four bogus bytes come from?

-
This is the TCPDUMP workers list. It is archived at
http://www.tcpdump.org/lists/workers/index.html
To unsubscribe use mailto:tcpdump-workers-request@xxxxxxxxxxx?body=unsubscribe