Wireshark-dev: Re: [Wireshark-dev] Copying TVBs for Reassembly [Was: Filebacked-tvbuffs : GSoC'

From: Jeff Morriss <jeff.morriss.ws@xxxxxxxxx>
Date: Thu, 18 Apr 2013 16:53:30 -0400
On 04/18/13 16:40, Evan Huus wrote:
On Thu, Apr 18, 2013 at 3:56 PM, Jeff Morriss <jeff.morriss.ws@xxxxxxxxx> wrote:
On 04/18/13 15:14, Evan Huus wrote:

This is a tangential issue that has always confused me.

Why do we malloc+memcpy data for reassembly when we already have
'virtual' composite TVBs?

Wouldn't it be more efficient (in time and memory) to create a
composite TVB for each reassembly and then build the reassembled
packet in it? You would never have to copy or allocate any actual
packet data...


There are a couple of problems with doing that (that I recall):

1) Composite TVBs don't actually work (or didn't work until very recently?).

2) The data behind a TVB goes away as soon as we're done dissecting (and
displaying) the packet.  That is, the TVB data is overwritten (IIRC) when
the next packet is read.

I suppose there was never any real reason to try to make reassembly work
with composite TVBs: if they're just more malloc()'d memory then why mess
with it rather than allocate our own copy of the data?  (Well, OK, it would
save a data copy, but...)

OK, so then the optimal case would be a tvb implementation that stored
only frame_data pointers, offsets and lengths... similar but not
identical to the current composite implementation.

The reassembly code could then add meta-data to this when
reassembling, and the tvb could lazily refetch the underlying tvbs
using the existing wiretap interface? If we add some sort of caching
mechanism so that repeated accesses didn't keep forcing reads of the
original file then I expect this would be very fast:

- adding fragments to reassembly would be near-instantaneous (just a
few pointer updates)
- reassembled tvbs would take minimal memory except when accessed
(using tvb_get_* or proto_tree_add_*)
- accessing a reassembled tvb would just be an offset calculation and
then a wtap read to bring into memory the underlying real packet(s)
containing the data being requested (assuming they aren't already
cached)

Thoughts?

Yep, that sounds about what I was thinking of for file-backed (+composite) TVBs. :-)

Basically reassembly would only have to deal with composite TVBs (for the reassembled data). The TVB layer, using file backing for the real data, would deal with keeping (and not keeping) the data in memory when it is needed (caching).

Hmmm, the way you mentioned wiretap makes me think/realize that the TVBs should not actually store the real offset (since we'd want that stuff abstracted away by wiretap so the tvbuff code wouldn't have to deal with ugliness like compressed files and so forth). That might make it a bit harder... Unless we backed the TVBs with temporary files (seems wasteful and a pain to clean up after).