Ethereal-users: Re: [Ethereal-users] Using eteareal on host machine configured as abridge
Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.
From: Guy Harris <guy@xxxxxxxxxx>
Date: Tue, 3 Apr 2001 19:41:28 -0700 (PDT)
> Unfortunately for me, most of the dissectors stop working if there's a > VLAN tag in the packet, so looking at the raw interface isn't much use > except for debugging VLAN related problems. > > ...time elapsed... > > I may be wrong about the dissectors. It appears that the vlan patch for > linux may be causing some problems. Yup, it could be. Linux protocol stack code has, on occasion, had the really irritating habit of somehow causing packets in intermediate states - i.e., packets it's still doing things to before it sends them, or packets it's doing things to after it's received them - to be given to the PF_PACKET or SOCK_PACKET sockets that Ethereal/Tethereal and tcpdump use to capture packets. (I suspect the problems may be a one or more of 1) the "copy-on-write" protocol not being followed, so that if the code that processes received packets modifies the packet data, it doesn't make a copy first, so that it doesn't step on the sbuf being handed to the PF_PACKET/SOCK_PACKET socket; 2) outgoing packets being switched off to the "taps" (PF_PACKET/SOCK_PACKET sockets that see all packets, including outgoing ones) before they're completed and ready to send out on the wire; 3) incoming packets having stuff done to the link-layer headers (including VLAN headers) before being handed to the rest of the networking stack - most sockets won't have a problem with this, as they don't look at the link-layer headers, but the sockets that just pass the packets on to libpcap end up passing mangled packets along. I haven't checked the code for any of this, but we've definitely seen mangled NFS packets and mangled Appletalk packets in captures on Linux, and may have seen other manglings.) On the topic of VLAN problems with packet sniffer programs, here's some mail to the tcpdump-workers mailing list; the problem he saw may be a problem with the Linux VLAN code as well: Date: Tue, 03 Apr 2001 17:36:39 +0000 From: Ed Stevens <estevens@xxxxxxxxxxxxxxxxxx> Organization: Atreus Systems To: Guy Harris <guy@xxxxxxxxxx> CC: "tcpdump-workers@xxxxxxxxxxx" <tcpdump-workers@xxxxxxxxxxx>, Brian Bauer <bbauer@xxxxxxxxxxxxxxxxxx> Subject: Re: [tcpdump-workers] ARP and VLANS Guy Harris wrote: > > > This is what a VLAN packet from tcpdump should look like... > > > > 19:27:35.522141 ff:ff:ff:ff:0:e0 Broadcast 988e 346: > > df36 0800 4500 0148 c503 0000 8011 74a2 > > 0000 0000 ffff ffff 0044 0043 0134 3cfc > > 0101 0600 555d e627 d04f 0000 0000 0000 > > 0000 0000 0000 > > I don't know what happened to the mail I'd sent out about this from home > last night, but it didn't show up here at work, or in the tcpdump.org > archive, so I'll reconstruct the important part. > > The "988e" in the first line indicates that the Ethernet-frame dissector > saw an Ethernet type field of hex 988e in the frame header, meaning the > frame had: > > 6 bytes that the dissector interpreted as the destination > address; > > 6 bytes that the dissector interpreted as the source address > > 2 bytes with 988e in them. > > 988e is *NOT* a valid Ethernet type value for a VLAN frame (the right > value is hex 8100), so, not surprisingly, tcpdump doesn't dissect it as > a VLAN frame. > > The destination address is "Broadcast", which means ff:ff:ff:ff:ff:ff; > the source address is claimed to be ff:ff:ff:ff:00:e0, which looks a bit > bogus. > > So the frame, as captured by tcpdump, was: > > ffff ffff ffff ffff ffff 00e0 988e df36 > 0800 4500 0148 c503 0000 8011 74a2 0000 > 0000 ffff ffff 0044 0043 0134 3cfc 0101 > 0600 555d e627 d04f 0000 0000 0000 0000 > 0000 0000 > > > This is what an "arp for gateway" VLAN packet looks like from > > tcpdump: > > > > 19:18:14.382141 ff:ff:ff:ff:0:e0 Broadcast 2994 64: > > a159 0806 0001 0800 0604 0001 00e0 2994 > > a159 2021 2223 0000 0000 0000 2021 2202 > > 0000 2910 0001 0000 0000 0001 2046 4446 > > 0000 > > Same here; hex 2994 isn't a valid Ethernet type for a VLAN frame, > either, so tcpdump, quite correctly, doesn't dissect that frame as a > VLAN frame. > > The frame was: > > ffff ffff ffff ffff ffff 00e0 2994 a159 > 0806 0001 0800 0604 0001 00e0 2994 a159 > 2021 2223 0000 0000 0000 2021 2202 0000 > 2910 0001 0000 0000 0001 2046 4446 0000 > > If you remove the ffff ffff from the beginnings of those frames, you get > > ffff ffff ffff 00e0 988e df36 0800 4500 > 0148 c503 0000 8011 74a2 0000 0000 ffff > ffff 0044 0043 0134 3cfc 0101 0600 555d > e627 d04f 0000 0000 0000 0000 0000 0000 > > and > ffff ffff ffff 00e0 2994 a159 0806 0001 > 0800 0604 0001 00e0 2994 a159 2021 2223 > 0000 0000 0000 2021 2202 0000 2910 0001 > 0000 0000 0001 2046 4446 0000 > > The first of those frames looks like an IP packet, sent from > 00:e0:98:8e:df:36 to ff:ff:ff:ff:ff:ff (i.e., a broadcast frame); the > frame type is 0800, meaning it's an IPv4 frame, and the next byte is > 45, which is "IP version 4, frame length 5 4-byte words", in an IP > header - i.e., an IP header with no options. > > There appears to be "ffff ffff" in that packet; I'll bet that's the > destination IP address of that frame (i.e., an IP broadcast). > > The second of those frames looks like an ARP packet, sent from > 00:e0:29:94:a1:59 to ff:ff:ff:ff:ff:ff (i.e., a broadcast ARP packet); > if you parse the ARP packet, you find that the frame is an ARP request, > with the hardware addresses being Ethernet addresses and the protocol > addresses being IP addresses, with the source Ethernet address being > 00:e0:29:94:a1:59, and the target Ethernet address being > 00:00:00:00:00:00 (because it's unknown - that's what the request is > trying to find). > > So it appears that the real problem is either that > > 1) some device is putting bogus frames onto the Ethernet, with > an extra "ffff ffff" stuck at the front > > or > > 2) the Linux networking code is somehow mangling those frames, > perhaps doing so while handling VLAN stuff. > > I didn't see any obvious place in the 2.4.2 kernel where there's an > 802.1Q implementation (which I find a bit surprising), so, for now, I > suspect 1) might be the case. > > I'd suggest putting some other network analyzer - whether it's a PC > running some OS other than Linux plus some network analyzer program (a > BSD PC running a libpcap-based program such as tcpdump or Ethereal or > Ksnuffle, or a Solaris PC running any of the preceding programs or > running snoop, or a Windows PC running WinDump or Analyzer or Ethereal > or any of the zillions of commercial analyzer programs), or a > specialized device (which, admittedly, are often just PC's running DOS > or Windows plus some network analyzer software). > > If that analyzer has the same problem, something probably really is > putting bogus frames on the wire. > > If that analyzer *doesn't* have the same problem, then Linux's > networking code is probably mangling the frame somewhere after > receiving it. Many thanks for the feedback. It was tremendously helpful and instructive at a time when I was trying to get a grip on the problem. Having run some tests and analyzed the results, I agree fully with your observations. I am attaching a file that I have also sent to the Linux VLAN mailing list. Additional input from anyone is always appreciated! -- Ed Stevens Senior Software Designer, Atreus Systems Corporation (613) 233-1741 x226 http://www.atreus-systems.com Question with regard to VLAN 0.15: ================================== An ARP message carried in a VLAN 802.1Q tagged frame appears to get a 4-byte prefix added to it under certain conditions. Here is the scenario: Below is the tcpdump output for an ARP request from a laptop whose ethernet adapter has mac address 00:E0:29:94:A1:59. The laptop is configured with a static IP of 3.3.3.3. It is sending an ARP request for one of its DNS servers at 3.3.3.5. The laptop is plugged into a VLAN port on a Cisco Catalyst 2900 XL switch, whose VLAN trunk goes into eth1 on a Linux (Red Hat 6.2) server. Ben Greear's VLAN 0.15 patch has been applied to the kernel, and configured to match the ports on the Cisco switch. To test, I start tcpdump 3.6.2 on the server, and fire up a browser on the laptop. Instead of printing the ARP request from the browser for its DNS server, tcpdump prints the following: 19:29:17.246869 ff:ff:ff:ff:0:e0 Broadcast 2994 64: a159 0806 0001 0800 0604 0001 00e0 2994 a159 0303 0303 0000 0000 0000 0303 0305 0002 2910 0001 0000 0000 0001 2045 4246 0000 With or without the 802.1Q tag, we expect the first 12 bytes of an ARP packet to consist of a destination mac and a source mac, as follows: FFFF FFFF FFFF 00E0 2994 A159 But somehow an extra four bytes (FFFF FFFF) have been shoved onto the front of the packet, making it unintelligible. The destination is therefore still broadcast (FFFF FFFF FFFF), but now the "source" address has become FFFF FFFF 00E0. Worse still, the frame type has become 2994 (...yuck...) instead of the expected 0806 (for ARP) or 8100 (for 802.1Q VLAN). The hardware type is now A159 (...double yuck...) instead of the expected 0001. No wonder packet sniffers choke on this. The rest of the packet appears to be the "real" ARP request, right through to the target IP of the DNS server (0303 0305). Interestingly, ARP requests coming from the server are printed correctly, because they are not encapsulated in 802.1Q frames. I sniffed the packets on the trunk line between the Cisco switch and the Linux box, using Ethereal for Win32. The 802.1Q-tagged packets that contained ARP messages from the laptop were correct in every respect. I wish that I had a Linux laptop handy; then I could have run tcpdump as well at that spot! (I might install WinDump, for a second opinion...) In fact, the non-tagged ARPs from my sniffer host (another statically configured laptop) were printed by tcpdump with no problem. Because Ethereal uses libpcap, and tagged packets were shown exactly as they should be, this appears to rule out a bug in libpcap (version 0.6.2). When tethereal is run instead of tcpdump on the server, it prints the same kind of mangled ARP packet... 3.860000 ff:ff:ff:ff:00:e0 -> ff:ff:ff:ff:ff:ff 0x2994 Ethernet II 0 ffff ffff ffff ffff ffff 00e0 2994 a159 ............)..Y 10 0806 0001 0800 0604 0001 00e0 2994 a159 ............)..Y 20 0303 0303 0000 0000 0000 0303 0305 000a ................ 30 0110 0001 0000 0000 0000 2046 4846 0000 .......... FHF.. This packet, like the one printed by tcpdump, would make sense as an ARP message if the extra "ffff ffff" was removed from the start. When a statically configured laptop is on a non-VLAN port, no problems occur, because the ARP messages (such as self-ARPs on bootup) are not sent down the trunk wrapped in a trunking protocol. But when the ARP messages are tagged as 802.1Q, something in the Linux box inflicts grievous bodily harm to the packets. I should mention that this is not Cisco-specific; the same results have been noted with a 3Com switch. The next step in my investigation will probably involve some kernel debugging, to see whether this is a bug in VLAN 0.15. I plan to double-check the way that VLAN has been configured on the server, in case something has been overlooked. I will also need to study the documentation carefully, since I did none of the original setup, integration or testing of VLAN 0.15 on the Linux box. (I'm new to this.) By the way, each entry in /proc/net/vlan already has the REORDER_HDR flag set to 1. Now that I have admitted my ignorance, does anyone have any idea where the extra four bogus bytes come from? - This is the TCPDUMP workers list. It is archived at http://www.tcpdump.org/lists/workers/index.html To unsubscribe use mailto:tcpdump-workers-request@xxxxxxxxxxx?body=unsubscribe
- References:
- Re: [Ethereal-users] Using eteareal on host machine configured as abridge
- From: Richard Harvey Chapman
- Re: [Ethereal-users] Using eteareal on host machine configured as abridge
- Prev by Date: Re: [Ethereal-users] Using eteareal on host machine configured as abridge
- Next by Date: [Ethereal-users] (no subject)
- Previous by thread: Re: [Ethereal-users] Using eteareal on host machine configured as abridge
- Next by thread: Re: [Ethereal-users] Using eteareal on host machine configured as abridge
- Index(es):