Ethereal-dev: R: R: [tcpdump-workers] Re: R: [Ethereal-dev] Re: Fwd: kyxtech: freebsd outsniff

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: "Loris Degioanni" <loris@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 19 Dec 2000 14:57:18 +0100
Hi.

-----Messaggio Originale-----
Da: Michael T. Stolarchuk <mts@xxxxxxxxxx>
A: Loris Degioanni <loris@xxxxxxxxxxxxxxxxxxxxxxx>
Data invio: gioved� 14 dicembre 2000 16.39
Oggetto: Re: R: [tcpdump-workers] Re: R: [Ethereal-dev] Re: Fwd:
kyxtech: freebsd outsniffed by wintendo !!?!?
>
> ah, but the buffer sizes are fixed, and when the second buffer
> is full, packets are lost.  yes, the tap runs at a higher prio
> than the buffer, but that doesn't alone guarnatee you won't
> see packet loss.
>
> (btw: i can confirm that behavior because i've had to work with it...
> i'm familiar with these effects since i wrote the nfrd sniffing
> and protocol decomposition stack)
>
> Or saying it another way:  if you increase the buffer sizes, say
> to 1M each, and you're using, say completely saturdated 100Mb,
> which means 12.5Mbyes/sec, you have to get the copy out of bpf
> to process space in 1/12.5Mb/sec->80 Millisec.
>
>
> By copy rates, that's a long time.  But, typical BPF sleep
> prioirities are LOW, which means that other processes complete
> with the bpf-processes restart to gain the processor.  (As
> i recall, that has been fixed in a few architecutres). So if the
> bpf is run on a loaded machine (ie: a typical intrusion detection
system)
> you still see periodic packet loss.  That also partially explains
> why just test-sniffing the traffic isn't sufficient to test a platform
> for its ability to perform a decent job at ids...

Ok, but I was testing only the capture performance, without any other
process running at user level. In this situation, increasing the size of
the buffer from 32k to 1M should give considerably better performance.
This happens regularly in Windows, but very strangely, does not seem to
happen in freebsd.

> >`wintendo' sniffing is done in a way very similar to the one of BPF.
> >With the same buffer size, the number of context switches is
> >approximatly the same.
>
> I'm sorry, but i don't see that in your paper.  Near the bottom of
> the paper it says that windoes sniffing buffers are 1M large.  There
> are *very few* bpf's with buffers that large.  In fact, in several
> kernels which i've used, multiple 1M kernel alloc's for space will
> cause the kernel to hang indefinitly (due to multiple 1M vm space
> allocations).  i started my first reply with your text snippet noting
> the buffer size differences.

Sorry, my phrase was not clear. I speak English like Tarzan... :-)
I was trying to say that if you set the same buffer size in winpcap and
in BPF, you will obtain approximately the same number of system calls,
because the structure and the basic optimizations are the same. I say
'approximately' because this parameter is fixed in freebsd, while in
windows it is possible to change it.
However, standard buffer sizes are different, and I confirm that the
buffer in windows is usually 1M. Notice that this size can be increased
to bigger values without problems, and this seems to increase capture
performance quite linearly. For example, on my Win2000 machine with
64M of RAM I am able to set a 40M kernel buffer, that grants very good
performance when dumping to disk a 100Mbit ethernet.

> Also, in the same article, there's not attempt at trying to
> uncover the cause of performance difference, i don't see
> measurements of context switch rates, number of kernel system
> calls, nor number of interrupts.  If i have missed it somewhere
> please let me know.

Measuring these values can be quite complex, and we are not sure to do
it properly in freebsd.
My opinion, however, is that the discrepancy in performance is due not
only to the number of system calls, but also to architectural
differences, for example:
- BPF is optimized to use small buffers, winpcap for big buffers. The
circular buffer architecture of winpcap is more efficient with a 1M
buffer.
- DMA (or polling) transfers from the NIC driver to the RAM are handled
more effectively by Windows than by freebsd.

> What i wish i had is a good tool to discover what is going on during
> the bpf packet loss.  I was hoping (a few years back) to instructment
> a kernel, so that instead of being able to profile the sniffing
> process via statistical information about clock-tics, i could instead
> collect statistical about what was going on during bpf-packet-loss
> (ie: when the bpf second buffer is full).  Turns out, that's hard
> to do, but i haven't forgotten how worthwhile such a hack would be...

Yes, it would be very intersresting.
Another interesting thing, in my opinion, would be the development of a
standard benchmark to measure the performance of a capture
architecture/program. This would allow to test precisely and impartially
a capture system, and to obtain acceptable comparisons/references among
different systems.
Anyone interested at working on this?

Loris.