Wireshark-dev: Re: [Wireshark-dev] Wireshark memory handling

From: Jeff Morriss <jeff.morriss.ws@xxxxxxxxx>
Date: Wed, 14 Oct 2009 13:15:37 -0400
Erlend Hamberg wrote:
On Saturday 10. October 2009 03.48.29 Guy Harris wrote:
The data from the frames in the capture file are not kept in
Wireshark's address space - they are read in as necessary, into a
small number of buffers (one for the main window, and one for each
packet window opened).  *HOWEVER*, if data from a frame is reassembled
into a higher-level multiple-frame packet, the result of the
reassembly is, as noted, kept in Wireshark's address space.

So, when Wireshark reads the capture file, if it finds a single-frame packet, it will only create a frame_data structure in memory and possibly data from the dissector for that type of packet. But if the packet is made up of several frames, the packet is reassembled and kept in memory? If so, do you think this could be changed? Would it be worth it?

One thought: per-dissector data usually has to be real memory since the dissectors access it as, well, memory.

The results of reassembly, however, are (I think always) put into a TVB which you're only allowed[1] to access via the tvb_ APIs. Couldn't a TVB be backed by something other than memory? For example, a (non-memory-mapped) file?

To make it not be horrendously slow, the TVB layer might have to implement some kind of in-memory caching of the stuff going to/from the file (so that each tvb_get_guint8() wouldn't result in a seek plus a 1-byte read). Or maybe the OS would do that well enough?

[1] tvb_get_ptr() notwithstanding. OK, that is a tvb_ API but it allows you direct access to the TVB data. Using this API with a file-backed TVB would require allocating memory and copying it in from disk to return to the user. BTW, given the big comment about this function in tvbuff.h, I was surprised to find almost 1300 uses in epan/dissectors/ ...

People complain about it enough that, while in *most* cases it might
not be a problem, we frequently get mail from people who have to split
up capture files to read them - I'd call it enough of a problem that
we should work on it (ideally, by reducing the amount of address space
required by the aforementioned data items).

Yes, absolutely.

It would still be nice if would be possible for people to analyse more data than will fit in virtual memory (in the case of Linux/Solaris, etc. where the swap space is fixed). I see that there is an "abstraction" of memory allocation in epan/emem.c (se_alloc* and friends), but g_malloc, and plain malloc is used as well, it seems. If the functions in emem.c were used for all memory allocation/freeing, that would mean that this could be done by intercepting requests for memory in those functions.

You mean by sending them to memory-mapped files? Unless, as Guy pointed out, there's some way to tell the OS to swap out that memory before normal memory, I think that once you start swapping the UI is (still) going to become unusable.

What is the status on the use of these functions? I got the impression from README.malloc that these are recommended, but I mostly see allocations done using g_malloc. Or is that just allocations that should outlive a capture session?

Yes, those functions "should" normally be used. But there are good reasons not to: for example if we know we're allocating a bunch of memory and we'll free it after the current frame is dissected (so we can't use ep_ memory) but before the file is closed (so using se_ memory would mean the allocation sticks around longer than it needs to). The reassembly code uses g_malloc() (presumably) for this reason.

Another reason, of course, is that the ep_ and se_ allocators are (relatively) new.