Wireshark-dev: Re: [Wireshark-dev] Wireshark memory handling

From: Erlend Hamberg <hamberg@xxxxxxxxxxxx>
Date: Thu, 8 Oct 2009 22:15:19 +0200
Sorry about the late reply. I am one of the other students in the group. 
Thanks for your answers. I have commented below and would appreciate further 
feedback.

On Monday 5. October 2009 20.23.42 Guy Harris wrote:
> The paper says
> 
> 	Since exhausting the available primary memory is the problem ...
> 
> What does "primary memory" refer to here?

That could certainly have been worded more clearly. By primary memory, we mean 
main memory, as your reasing lead you to.

The "problem", as we have understood it, and as we have seen it to be, is that 
Wireshark keeps its internal representation (from reading a capture file) in 
memory. I write "problem" in quotes, because in most use cases I guess that 
this is not a problem at all, and this is also how almost any program 
operates.

We work for an external customer who uses Wireshark and would like to be able 
to analyze more data than is allowed by a machine's virtual memory without 
having to splitup the captured data.

To be able to do this we looked at the two solutions mentioned in the PDF 
Håvar sent, namely using a database and using memory-mapped files. Our main 
focus is 64-bit machines due to 64-bit OS-es' liberal limits on a process' 
memory space. Doing memory management ourselves, juggling what is mapped in 
the 2 GiB memory space at any time, is considered out of the scope of this 
project. (We are going to work on this until mid-November.)

[...]

> In effect, using memory-mapped files allows the application to extend
> the available backing store beyond what's pre-allocated (note that OS
> X and Windows NT - "NT" as generic for all NT-based versions of
> Windows - both use files, rather than a fixed set of separate
> partitions, as backing store, and I think both will grow existing swap
> files or add new swap files as necessary; I know OS X does that),
> making more virtual memory available.

So, on OS X (and possibly other modern OS-es), as long as you have available 
harddisk space, a process will not run out of memory, ever? (A process can 
have address space of ~18 exabytes on 64-bit OS X. [1])

This would mean that this problem would only continue to exist on operating 
sytems using a fixed swap space, like most (all?) Linux distros still do.

> The right long-term fix for a lot of this problem is to figure out how
> to make Wireshark use less memory; we have some projects we're working
> on to do that, and there are some additional things that can be done
> if we support fast random access to all capture files (including
> gzipped capture files, so that involves some work).

Absolutely.

> However, your
> scheme would provide a quicker solution for large captures that
> exhaust the available main memory and swap space, as long as you can
> intercept all the main allocators of main memory (the allocators in
> epan/emem.c can be intercepted fairly easily; the allocator used by
> GLib might be harder, but it still might be possible).

Yes, the solution we planned was to have memory mapped files which we can 
create as they are needed of a configurable size and then map that directly 
into the process' address space. This would mean that if this is enabled, 
memory access needs to be intercepted and "re-routed" to the next available 
chunk of a memory-mapped file. This would of course be significantly slower 
and only be a real benefit on a 64-bit system, but it would at least make it 
*possible* to do.

Comments are welcome. :-)

[1] 
http://developer.apple.com/mac/library/documentation/Performance/Conceptual/ManagingMemory/Articles/AboutMemory.html

-- 
Erlend Hamberg
"Everything will be ok in the end. If its not ok, its not the end."
GPG/PGP:  0xAD3BCF19
45C3 E2E7 86CA ADB7 8DAD 51E7 3A1A F085 AD3B CF19

Attachment: signature.asc
Description: This is a digitally signed message part.