Wireshark-dev: Re: [Wireshark-dev] slow when loading big pcaps

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Mon, 25 Oct 2010 16:52:43 -0700
On Oct 25, 2010, at 12:25 PM, cco wrote:

> On Wed, Oct 20, 2010 at 04:07:22AM -0700, Guy Harris wrote:
>> 
>> On Oct 20, 2010, at 3:42 AM, cco wrote:
>> 
>>> why is wireshark so slow when loading up >500 MB pcaps?
>> 
>> Are you saying that the time taken to read a file, as a function of the size of the file, is discontinuous, with a jump at about 500 MB?
> 
> cristian: hi! I have not tested with continous values of file sizes (I
> hope this is not becoming too mathematical...)

Well, if we want to get *mathematical*, you can't test with continuous values of file sizes, as e, for example, isn't a valid file size. :-)  (Well, maybe it is if your machine uses base e - it is, if I remember correctly, the base that requires the fewest bits, on average, to encode a number.  I'm still not sure what .718281828... of a byte would be. :-))

I guess a better-phrased question would be whether, for example, a file of 200MB takes about twice as long to read as a file of 100MB, and a file of 300MB takes about 3 times as long, and a file of 400MB takes about 4 times as long, and a file of 500MB takes about 5 times as long, but a file of 600MB takes significantly longer than 6 times as long?  (Or, if there's a discontinuity - in the informal sense, not the mathematical sense :-) - at some other value.)

If so, that might be due to the working set size of Wireshark growing above the amount of memory available on the machine.  The main way to improve that would be to try to reduce the per-packet memory consumption of Wireshark; there are a number of ways that this might be doable, although a number of them involve a significant amount of work.

If not - but the time is, as indicated, linear in the file size - that might just be an O(n) algorithm, of which there are probably many in Wireshark, and, for a lot of them, the best we can do is reduce the constant factor, which might also be doable (at least some of the fixes for the previous case would help here as well).

If not - and the time *isn't* linear in the file size, so a file of 200MB takes significantly more than twice as long as a file of 100MB, and a file of 300MB takes even more "significantly more", etc., then there might be some O(bigger than n) algorithms in there.

> what I wanted to say was that large files take far too long to get
> loaded by wireshark. (2gb file takes 45 minutes...)

So how long does, for example, a 1GB file take?  About 22 minutes, or significantly less than 22 minutes?  And what about a 500MB file?

>> If you're paging:
>> 
>> Make sure you're running Wireshark 1.4.0 or later - *no* columns can have their text generated on the fly in earlier releases, but some can in 1.4.0.
> 
> cristian: do you mean the gui is that so slow?

To the extent that the GUI requires that, when you're reading in a capture, the values of several of the columns be computed, for each packet, and stored as a string, yes, the GUI could be that slow.  The more such columns there are, the slower it gets.  If we can reduce that to 0, that would improve both the memory use (and thus reduce the working set size - even though, while reading the file in, the column text for columns read in a while ago will eventually leave the working set - as well as reducing the constant factor).