Ethereal-dev: Re: [ethereal-dev] Re: Packet Sniffer Package

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxxxxx>
Date: Fri, 3 Mar 2000 22:34:06 -0800
> > Does anyone know if libpcap under Linux uses the new, improved capture
> > routines automagically, or simply uses the 'lame' interface ...?
> 
> The standard libpcap under Linux uses the oldest, lamest interface -
> SOCK_PACKET sockets with an address/protocol family of AF_INET/PF_INET.
> 
> A new one being done will use, on the Linux 2.2 and later kernels, the
> better mechanism that the 2.2 kernel added, namely SOCK_RAW sockets
> with an address/protocol family of AF_PACKET/PF_PACKET).  I forget
> whether the guy working on that checked it into the tcpdump.org CVS tree
> yet or not.

Torsten Landschoff is the person who's working on the new libpcap Linux
module; he's checked it into a branch of the tcpdump.org CVS tree for
libpcap, but it's a side branch - it's neither in the main branch, nor
the branch for an 0.5 release, yet.  I don't know whether it's intended
to go into an 0.5 release (or when an 0.5 release will come out).

> I don't know whether that's the 1-copy mechanism to which you're
> referring, though.

>From looking at the code path from the Intel EEPro100 driver to the code
that dispatches received packets, it looks as if, for a packet that is
handed only to libpcap, or is handed to regular protocols as well as
libpcap but isn't modified by those protocols, the only copy that should
be involved is the copy to userland.

However, I think that should happen regardless of which interface is
used, even the oldest, lamest one, at least on 2.2 (I don't have
2.0[.x] kernel source handy right now).

In any case, the particular lameness that seemed to be discussed in the
mail thread you forwarded is the lack of a way of finding out how many
packets were dropped; that's not an issue of the number of copies
(except that too many copies might, in some situations, eat enough CPU
to cause packets to be dropped when they wouldn't have been dropped
without the copy).

It looks as if there's a global counter, "netdev_rx_dropped", of all
packets dropped in "netif_rx()", which appears to be the routine called
by device drivers to hand an incoming frame to the network stack.

The only way I see of fetching that is something that I presume is a
"/proc" entry; if so, a read from it returns various statistics,
including "netdev_rx_dropped".  This won't tell you how many of the
packets a particular libpcap stream would have seen got dropped - they
get dropped before even being chosen to be handed to a particular
stream.

The mechanism for handing raw packets up to userland is a socket; as one
might expect, the socket has a receive high water mark, and stuff gets
tossed if the socket's receive buffer is full and something comes in
(which would happen if the application using libpcap can't read stuff
fast enough).  It appears that no count is kept of packets discarded
because the socket buffer is too full.

Torsten's code doesn't provide any packet-drop count - and I'm not sure
it can reliably report such a count, as there doesn't appear to be a
count of packets dropped at the socket layer.

> Alexey Kuznetzov has a patch to add a mechanism
> that, as I understand it, lets the kernel and the application share a
> memory-mapped region, so that incoming packets don't have to get copied
> up to userland;

It doesn't look as if this eliminates the copy.

What it appears to do is provide a chunk of wired-down memory shared
between the kernel and userland, and copies incoming packets to that
area.  This does let it keep track of packet drops due to the shared
area being full of packets not yet processed by the userland code.

Packet drops in "netif_rx()" would have to be counted by getting the
value of "netdev_rx_dropped" at the start of the capture and at the end
of the capture, and adding the difference to the count of packet drops
due to "buffer full".  However, that would count packets dropped on
interfaces other than the one on which you're listening, if the machine
has more than one interface - those drops are due to the system being so
busy that even the kernel code that handles packets queued up at
interrupt level and processed later can't handle them, and I don't know
whether that's a common occurrence, but if somebody wants to know about
every single packet dropped, that's a problem.

Of course, I don't know whether on a *really* busy system - too busy to
even drain the device's ring buffer in the interrupt handler - packets
dropped because they arrive when the device's ring buffer is filled are
counted, or if the device even tells you how many devices are dropped
due to that, so perhaps nobody does a *perfect* job.

The BPF mechanism in the BSDs has its own buffer, and the link-layer
driver hands all incoming packets to BPF, which can keep track of every
packet that gets dropped on a particular BPF device, as the only reason
(other than "the device dropped it because the ring buffer is full") why
a packet is dropped on a BPF device is "the BPF device's buffer was
full", so, whilst it may not handle the "device ring buffer full" case
(although *if* the device reports how often that happened, a mechanism
could conceivably be provide to let the device bump the drop count - no
such mechanism exists, however), it does at least handle all other drops.

(Alexey's patch also lets you pick up the time stamp for the packet
without making a second system call; the socket-based stuff requires you
to do an SIOCGSTAMP "ioctl" to get the time stamp.)

Obviously, said patch is a kernel patch, so libpcap cannot, by itself,
fix that problem.

> he also has patches to the old libpcap that use that
> mechanism if present, and otherwise use the 2.2-and-later mechanism if
> present,

It does appear to do that...

> otherwise, I think, fall back on the old 2.0 mechanism.

...but it doesn't do that (i.e., it's 2.2-and-later only).

> In addition, he says that some such mechanism was checked into the 2.3
> kernel at some point.

There is such a mechanism; it looks similar, and does involve a single
copy.

(None of these mechanisms implement the timeout mechanism that Ethereal
requires - and the lack of which we work around, on Linux, with a
"select()" - but, as the way you block waiting for a packet to arrive
with the shared-memory mechanism is you do a "poll()", I think they
*could* implement it as a timeout on the "poll()".)

The BPF mechanism in the BSDs currently requires two copies - the mbuf
chain for the incoming packet is copied to an internal buffer, and the
stuff from that buffer is copied up to userland on a read.  A
shared-memory mechanism similar to the Linux ones could be implemented,
I suspect.