Wireshark-dev: Re: [Wireshark-dev] Wireshark memory handling

From: Erlend Hamberg <hamberg@xxxxxxxxxxxx>
Date: Fri, 9 Oct 2009 09:15:05 +0200
On Friday 9. October 2009 03.47.16 didier wrote:
> Linux can use swap files too. It doesn't allocate them on demand, that's
> all.

Yes, and (a bit off-topic) I guess Linux distros will switch to using swap 
files eventually as they are as fast as using a swap partition and have been 
for a long time.

Still, if other operating systems actually allocate swap files (or grow one 
swap file) on demand, that would pretty much mean that a solution with memory-
mapped files would be superfluous.

So, in theory, if one has a 64-bit OS and an infinite amount of swap space -- 
could Wireshark in capture data as long as one would like? Assuming rate of 
network traffic < write rate of harddisk, of course.

(It's really great that to my 1980s brain a 64-bit OS represents "infinite 
memory space" :-)
 
> I don't see what you would get with mmaped files vs enough swap. But if
> you are using wireshark, ie working interactively, it'd be slow, slow as
> in unusable.

(See comment further down.)

> Using a DB could be a better option, but you need a 'data silo'
> something like http://www.monetdb.nl For it a 100 Millions rows 200,000
> columns sparse matrice should be a trivial data set. It would be faster
> than wireshark for filtering by an order of magnitude or two. 
> Disclaimer: We're using a proprietary data silo and I've no experience
> with MonetDB.   

Interesting, but it seems that the licence is GPL-incompatible, it seems. (MPL 
is, and MonetDB's licence is an MPL derivative.)
 
> A modified Tshark should be able to upload a capture at around 30,000
> packets/second.

Very interesting. By "uploading", I presume you mean to the database?

> No idea what would be better for the interactive front-end: a modified
> wireshark or a new application.
> No idea if you have enough time to do it either.

An important use case -- and the reason for wanting to be able to do one long 
capture, instead of splitting up captures -- is to follow a TCP stream. Other 
analysis functions of the Wireshark program are also desirable, so I think our 
aim should be to use the Wireshark GUI.

We would actually prefer that what we do could be done in such a way that it 
could become part of the official Wireshark distribution, but this would of 
course require that you, the Wireshark developers, agree with our solution and 
that we do a good job with integrating it with the current code base.

> [...]

> > But we never use wireshark if it needs to hit harddisks (for us roughly
> 3 times the file size), it's too slow.

Too slow, full stop? Our experience in using disk-cached data in interactive 
programs is very limited, but our naïve assumptions were that it that data is 
sequential and the operating system's disk buffering system does its job, it 
should be possible to work with this solution. It is of course hard to put 
exact numbers on how fast something has to be, but if the speed dropped below 
a level where it's not possible to use the program interactively at all, this 
solution is of no use.

> If we have to use bigger files I would use MonetDB, I don't know if
> using wireshark on such big data set would be useful though, at some
> point more data is just noise.

Well, when more data is "just noise" is not up to us to judge ;-)

-- 
Erlend Hamberg
"Everything will be ok in the end. If its not ok, its not the end."
GPG/PGP:  0xAD3BCF19
45C3 E2E7 86CA ADB7 8DAD 51E7 3A1A F085 AD3B CF19

Attachment: signature.asc
Description: This is a digitally signed message part.