Wireshark-dev: Re: [Wireshark-dev] Copying TVBs for Reassembly [Was: Filebacked-tvbuffs : GSoC'

From: Evan Huus <eapache@xxxxxxxxx>
Date: Thu, 18 Apr 2013 16:40:13 -0400
On Thu, Apr 18, 2013 at 3:56 PM, Jeff Morriss <jeff.morriss.ws@xxxxxxxxx> wrote:
> On 04/18/13 15:14, Evan Huus wrote:
>>
>> This is a tangential issue that has always confused me.
>>
>> Why do we malloc+memcpy data for reassembly when we already have
>> 'virtual' composite TVBs?
>>
>> Wouldn't it be more efficient (in time and memory) to create a
>> composite TVB for each reassembly and then build the reassembled
>> packet in it? You would never have to copy or allocate any actual
>> packet data...
>
>
> There are a couple of problems with doing that (that I recall):
>
> 1) Composite TVBs don't actually work (or didn't work until very recently?).
>
> 2) The data behind a TVB goes away as soon as we're done dissecting (and
> displaying) the packet.  That is, the TVB data is overwritten (IIRC) when
> the next packet is read.
>
> I suppose there was never any real reason to try to make reassembly work
> with composite TVBs: if they're just more malloc()'d memory then why mess
> with it rather than allocate our own copy of the data?  (Well, OK, it would
> save a data copy, but...)

OK, so then the optimal case would be a tvb implementation that stored
only frame_data pointers, offsets and lengths... similar but not
identical to the current composite implementation.

The reassembly code could then add meta-data to this when
reassembling, and the tvb could lazily refetch the underlying tvbs
using the existing wiretap interface? If we add some sort of caching
mechanism so that repeated accesses didn't keep forcing reads of the
original file then I expect this would be very fast:

- adding fragments to reassembly would be near-instantaneous (just a
few pointer updates)
- reassembled tvbs would take minimal memory except when accessed
(using tvb_get_* or proto_tree_add_*)
- accessing a reassembled tvb would just be an offset calculation and
then a wtap read to bring into memory the underlying real packet(s)
containing the data being requested (assuming they aren't already
cached)

Thoughts?