Wireshark-dev: Re: [Wireshark-dev] Wireshark memory handling

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Wed, 14 Oct 2009 12:32:24 -0700

On Oct 13, 2009, at 11:00 AM, Erlend Hamberg wrote:

On Saturday 10. October 2009 03.48.29 Guy Harris wrote:
The data Wireshark currently keeps in its address space that could
grow in size as the capture file grows are:

the frame_data structure (epan/frame_data.h) - one structure instance
per packet;

Ok, so – if my understanding is correct – for every packet that is read, an
frame_data structure is created

Yes.

	the text for some or all of the columns in all of the rows of the
packet list (all, in current releases of Wireshark; some, in the
development branch);

Ok, not much to save here after the introduction of the new packet list, I
guess.

There might be more we can save if we have efficient random access to packets (even in compressed files), as we can just re-dissect the packet whenever we need the columns for it.

That could make sorting painful, however.

The data from the frames in the capture file are not kept in
Wireshark's address space - they are read in as necessary, into a
small number of buffers (one for the main window, and one for each
packet window opened). *HOWEVER*, if data from a frame is reassembled
into a higher-level multiple-frame packet, the result of the
reassembly is, as noted, kept in Wireshark's address space.

So, when Wireshark reads the capture file, if it finds a single- frame packet, it will only create a frame_data structure in memory and possibly data from the dissector for that type of packet. But if the packet is made up of several
frames, the packet is reassembled and kept in memory?

Yes.

If so, do you think this could be changed?

We probably need to keep the packet data in memory while it's being reassembled and when it's dissected.

Again, with efficient random access, we could free it when we're done with it, and leave behind an array of frame numbers, starting offsets, and lengths, so that on the next reference the frames can be read, the data reassembled, and keep the data around, again, only while it's needed.

Would it be worth it?

Probably. It would also mean that TShark would accumulate a lot less memory, and perhaps be able to run much longer, when dissecting packets (rather than just writing them to a file).