Wireshark-users: [Wireshark-users] tshark overrun?

From: Eric Ewanco <Eric.Ewanco@xxxxxxxxxxx>
Date: Wed, 16 Nov 2011 14:51:42 +0000

I’m seeing a very strange problem and I’m curious to see if anyone has run across it.

 

We’re trying to do a simple loopback test: Generate 1000 UDP packets using pacgen at 50 pps, loop them back, and count them with tshark (1.4.2-1.1-1).

 

The problem is under most circumstances tshark ignores about 379 packets, that is to say, it counts 621 packets and stops, even though the packets are received at the interface.  Often it will work one time, and then have trouble on subsequent runs.

 

The problem appears to be timing related (see below).  My version of tshark can’t seem to handle bursts of tightly-spaced packets under some circumstances.  Is there a known bug with a source patch that doesn’t require upgrading our Linux distribution (OpenSuSE 11.1)?

 

The “lost” packets are counted by ifconfig and not accounted for by any of the dropped counters in /proc/net/snmp or other places in /proc/net, but tshark doesn’t recognize them.  The capture file corresponds to the printed counts; the lost packets do not show up there.  I can even do a capture on the transmit side and find more packets received on the receive side than are counted by tshark on the transmit side.  However, if I monitor both transmit and receive, the dynamics change and it almost works; I lose only about a dozen packets.

 

If I use tcpdump instead of tshark, everything is copacetic.  It also works fine when I transmit an identical packet to tshark using a different program (hping3).  Dumpcap works better but still loses packets on several runs (drops of 28, 12, 70 out of five runs of 1000 packets). 

 

Command line:

 

tshark -i eth5 udp -c 1000 -w /tmp/eth5.cap

 

It seems to be there has to be some sort of timing issue.  According to the capture log, the first 50 packets are received over the course of 470 us about 5-15 us apart.   Then the pacgen transmitter waits about one second 100 us and transmits another 50.  When I do a flood ping (which works fine), the rate is much lower, every 200 ms or so.  If I do a flood ping with hping3 (which works, at least until I get to hundreds of thousands of packets, and then it loses only 0.05%), it sometimes has a similar gap to pacgen, but it doesn’t sustain it and leaves large gaps pacgen does not leave.  I did notice that if I slow pacgen down to 25 or 10 pps, it works more reliably, although even at ten pps I’ve occasionally seen a loss.

 

Platform is a customized OpenSuSE 11.1 on Intel.  I verified this using stock Wireshark on a stock OpenSuSE 11.3 laptop with similar results. 

 

We’ve tested 0.10.13, 1.2.4, 1.2.8, and 1.4.4-0.2-1 in addition to 1.4.2-1.1-1.   Because of its dependencies we can’t use a later wireshark; but if we can cherry-pick a fix we may do that.  Using  0.10.13 and 1.2.4 seems to work fine; the other two fail.

 

I checked the bug database for bugs in any state with summary or comment with search terms “burst”, “drops”, “dropped”, “overrun”, “loses”, and “lost”.  If you can suggest any others (I find this is a difficult bug to come up with keywords for) feel free to do so.

 

Thanks for any help.

 

bl-s1r1-24:~/hardware_test # tshark -v

TShark 1.4.2

 

Copyright 1998-2010 Gerald Combs <gerald@xxxxxxxxxxxxx> and contributors.

This is free software; see the source for copying conditions. There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

 

Compiled (64-bit) with GLib 2.18.2, with libpcap 0.9-PRE-CVS, with libz 1.2.3,

with POSIX capabilities (Linux), with libpcre (version unknown), without SMI,

without c-ares, with ADNS, with Lua 5.1, without Python, with GnuTLS 2.4.1, with

Gcrypt 1.4.1, with MIT Kerberos, without GeoIP.

 

Running on Linux 2.6.34.4-gb05, with libpcap version 0.9-PRE-CVS, with libz

1.2.3.

 

Built using gcc 4.3.2 [gcc-4_3-branch revision 141291].