Ethereal-dev: Re: [Ethereal-dev] netxray.c patch

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxx>
Date: Sun, 02 Jan 2005 19:46:01 -0800
Kevin Johnson wrote:
FIX FOR NETXRAY TIMESTAMP DISPLAY PROBLEMS

This note is to describe / explain a fairly major rewrite of those portions of netxray.c concerned with assigning the correct timeunits (time ticks). The summary explains the actual changes. This is followed by a more wordy narrative on the reasoning followed and the testing performed. Finally a diff of our proposed changes is attached for testing and potential inclusion in the Ethereal release code

SUMMARY: The current netxray.c file defines the "parsing" instructions for reading files taken with name brand Sniffer products (which have been based on NetXray code for some time). There is a problem with the way the variable "timeunit" is determined, namely that there are about 4 different pre-selected values that can be assigned based on various features looked for in the capture file, and in other places even more timeunit numbers are used in variables that are part of the checking.

It turns out all this is unnecessary - the actual correct timeunit
value is recorded in the capture files by the sniffer. This is a 32-bit
value read in from offset 0x4c, reversing the order of the hex bytes as usual for this
file (example, displayed as 100f963b but read as 3b960f10).

"Reversing the order of the bytes" is rarely the right thing to do with a capture file format, as most of them are written in a standard byte order, but Ethereal doesn't run on machines with any particular byte order - most machines running it might be little-endian x86 PC's, but I'm running it on a big-endian PowerBook. "Reversing the order of the bytes", if correct on a little-endian machine, would be wrong on a big-endian machine, and *vice versa*.

(One exception is libpcap format, which isn't written in a standard byte order - it's written in the byte order of the machine doing the capture, with the magic number being written in that order, so that the byte order in which it's written can be determined.)

The right thing to do, in cases where the file is written in a standard byte order, is to convert from that byte order (whether it's big-endian or little-endian for the file format in question) to the host byte order.

The other multi-byte integral fields in NetXRay format are little-endian - not surprisingly, as NetXRay and Windows Sniffer are Windows applications running on little-endian PCs - so the right way to extract the value of that field is probably to use "pletohl()". Using that, rather than just using the value as extracted from the file directly, causes at least one file to dissect with time stamps closer to the one that the old code generated.

(Note that this is less "reversing the order of the bytes" than "packing the bytes into a 32-bit quantity in a particular fashion", i.e. processing the 4 bytes of a big-endian 4-byte quantity means putting them, in the order they appear in memory, into the 4-byte quantity from the most significant byte to the least significant byte, and processing the 4 bytes of a little-endian 4-byte quantity means putting them into the 4-byte quantity from the least significant byte to the most significant byte.)

We ended up only reading in 12 bytes to xxb[], then grabbing a 32-bit value for a new variable
"realtick", and grabbing the remaining 48 bytes into the new array variable "xxj[]".

By "variable" you presumably mean "structure member".

"xxc", rather than "xxj", would fit the pattern used for those fields (the first unknown field was named "xxx", with the next one being named "xxy", wrapping from "xxz" to "xxa" and continuing with "xxb".)

The only Sniffer captures where timestamp display is not improved are
ATM captures made with the older ATM pods - and these are no worse than
in the existing version of Ethereal - they simply have not changed.

So the "realtick" value in those captures is the same as the value that the old code chose? What's the file format version number in those files, and what's the "capture type" (xxj[4]/xxc[4]) value for those files? Are the time stamps wrong because the units are wrong (in which case the amount by which the time stamps are off would change from packet to packet), or because the start time is wrong (in which case all the time stamps would be off by the same amount)?

Do you have any captures of that sort that you could send us?

In one ATM capture I have, the timestamp units the old code used were microseconds while the value from realtick is 1/1193180 seconds, so it does make a difference to at least one ATM capture. The "capture type" value is 15, which might refer to some form of ATM pod.

The initial problem seemed simple - netxray.c was looking at a
particular byte to figure out what kind of sniffer trace
this was. A value of "02" in that byte would have given us a timeunit of
a billion (1e9), but our sniffer was inserting a "06" in that byte (this
seemed to be interpreted as an ISDN HDLC capture). This gave us a time
unit of a million (1e6). Editing the file to replace the 06 with 02 caused
Ethereal to "fix" the time problem, but this did not solve the problem.

Did it cause Ethereal to misinterpret the capture?

Note that hdr.xxc[4] appears to indicate in some sense *how* the capture was done - a value of 0 appears to mean that the capture was done purely in software by tapping into the driver using NDIS, and most other values indicate the type of hardware pod used to do the capture and, for WAN captures, the type of traffic (which might sometimes be the type of pod or might be the way the pod was configured). A value of 6 *when the capture is a WAN capture* indicates that it was an "HDLC" capture, either X.25 or ISDN - another field in the headeer indicates what type of HDLC it was. Ethereal currently only interprets a value of 6 as meaning an HDLC capture if the network type is 4, meaning a WAN capture.

It might be that the interpretation of that field is *supposed* to depend on the network type, with a value of 6, for an Ethernet capture, referring to a particular model of gigabit pod, different from the value of 2 for other gigabit pods.

It was in reading comments in the source code regarding previous use of
the value 1193180 that triggered the discovery of the location of the
actual time tick value. In a slightly older capture file I found that value
(in the reverse-order hex) encoded at offset 0x04c. I then checked my new
capture files (which need a time tick closer to one billion) and found that
the hex value there equates to around 999,970,000 (changes a little bit in
some captures).

Cool!  Thanks for the reverse-engineering.

We then postulated that Sniffer has probably been using the same
location for encoding the time ticks value for a long time, and that it
might prove fruitful to try using that value ALL the time. We did this,
and tested against many captures taken with various versions of the Sniffer
code over a 7 year period, and including ethernet, gigabit ethernet,
token ring, FDDI, and various WAN types. We do not seem to have broken or
worsened Ethereal's ability to display the correct time for any of these,
and in most cases have at least slightly improved the accuracy (as compared
with the times displayed by the brand-name Sniffer product). In the process
we have also fixed the issue with newer gigabit captures being off by a factor of 1000.

I have a file with a version of 001.100 that has a time stamp in units of microseconds - the new code gets the time stamps very wrong (as in "thinks the packets were captured in 1932"), as the bytes at the "realtick" location in the file header don't appear to correspond to time stamp units. I also have a file that apparently came from an old version of NetXRay, before it used "XCP" as the magic number, with a similar problem...

...and a file with a version of 002.001 with a similar problem.

In the 002.001 file, the realtick value is 0, which is clearly bogus. The "capture type" value is 0, for NDIS; however, I have one capture with a network type of 0 (Ethernet) and capture type 0 (NDIS), so there *are* NDIS captures with non-zero realtick values (and the value is close to the value derived from the TpS table, so it's probably valid).

Do you have any files with non-002.xxx versions on which you've tested? If not, the time stamp units in the header might have been introduced in the 002.xxx file format (the file format version number doesn't directly correspond to the application version number) - older files appear to have millisecond time stamps, with version 001.100 introducing microsecond time stamps, so it might be the case that version 002.xxx might be the version where they switched to time stamps with the units recorded in the file header.

I don't know whether there's any field that indicates whether to get the time stamp from the realtick value or the timeunits value, other than the realtick value itself (if 0, use timeunits).

Note that the "capture type" field also only appears to be in the 002.xxx file format.

As an interesting side note, there is a check being performed to determine whether or not
to honor the FCS bits at the end of each packet.  The check is based on the hex values of
two bytes - right in the middle of the time stamp.  It turns out these two bytes are only
valid if the "real" time unit is near the 1193180 value, so it is likely Ethereal has been
ignoring the FCS information when some folks think it has been used.

Do you mean "ignoring the FCS information when it's present" or "treating the data at the end of the packet like an FCS when it's not an VCS"?

Do you have examples of captures when there's no FCS but Ethereal treats the capture as if packets had an FCS? If so, what's the hex value of the time stamp value?

We did not attempt to fix this but did our best to duplicate the functionality
of the original check.

Unfortunately, an exact comparison of the time stamp value causes Ethereal not to see the FCS in some captures. As the "realtick" field can't be directly used as a 32-bit binary value - it has to be extracted with "pletohs()", so that it'll be handled correctly on big-endian machines - I just turned it into a 4-byte array of bytes, and reverted to checking the middle two bytes of the time stamp.

It might be that "near the 1193180 value" means "in the range 1192960 through 1193215", or perhaps there's a narrower range.

In any case that's a bit bizarre. Perhaps there's a field in the file header that's the *real* indication of whether packets might have an FCS; 0x12 and 0x34 are somewhat odd values, so *maybe* that's a magic number, but 0x00123400 also just happens to be in the general range of units for time stamps in at least some old DOS Sniffer files, so perhaps the 0x12 0x34 is just a coincidence.

In addition to the FCS check problem, we need to address some other
issues including the actual meaning of the Capture Type byte, which
used to be xxb[20] and is now xxj[4] - since the new gig sniffers are
writing a "6" there, we have some bad assumptions in the code that may
have other impacts for how we're interpreting the code.

It's probably an indication of how the packets were delivered to the Sniffer, whether by tapping into a driver via NDIS (no hardware) or by using some form of pod (with the type of pod, and possibly a network type setting for WAN pods, being indicated by that field).

The interpretation might depend on the network type (annoying, given that they're not exactly running out of type values, so it's not as if they *had* to reuse a value of 6 for the new pod.

One issue is the "start timestamp" issue - the old code assumed a start time stamp of 0 if:

	the network type is 1 (Ethernet);

	the capture type is 2 (gigabit pod, but not the new S6040 one);

	the version number is 002.002.

Your new code is doing it if hdr.timeunit isn't 2. I'm leaving that test as is, for now - are there any captures that don't have those values, but that have a start time stamp of 0?

Perhaps this is a function of pod vs. NDIS, or perhaps only some pods have a start timestamp of 0?

I've checked in the patch, with the modifications indicated above. Thanks for being the ones to finally figure out where the hell the Sniffer folks hid the time stamp!