Wireshark-dev: Re: [Wireshark-dev] Filebacked-tvbuffs : GSoC'13

From: Evan Huus <eapache@xxxxxxxxx>
Date: Thu, 18 Apr 2013 12:28:20 -0400
A few misc notes on this topic in no particular order:

- Once everything is converted to wmem (after 1.10 branches) it would
be trivial to write a backend allocator that collected statistics on
memory usage.

- Has anybody ever tried to see if Massif
(http://valgrind.org/info/tools.html#massif) gives any interesting
data on memory usage? That's what it's for.

- Our reassembly code is a bit of a mess anyways, as Guy's recent
commit indicates. It could use a general cleanup and simplification
just on general principle.

Cheers,
Evan

On Thu, Apr 18, 2013 at 12:13 PM, Anders Broman <a.broman@xxxxxxxxxxxx> wrote:
> Jeff Morriss skrev 2013-04-18 17:55:
>
>> On 04/15/13 10:01, Ambarisha B wrote:
>>>
>>> Hi dev,
>>>
>>> I am a final year engineering student pursuing my bachelors in Computer
>>> Science. I was going through the GSoC'13 ideas page and found
>>> "Filebacked-tvbuffs" interesting, so I looked it up. Here's a (probably
>>> not so) short summary of what I did and understood. I'm only a novice,
>>> so if I've got something wrong, please, enlighten me.
>>>
>>> I went through the (interesting) archived conversation linked on the
>>> ideas page. I've realized most of the discussion was about "how to deal
>>> with large captures, so that users don't have to break up the captures".
>>> Swapping or if needed mmaped files would help. But since the goal of
>>> this project is to cut down the memory usage, I guess we're looking at
>>> non-mmaped files.
>>>
>>> The project description says that data in packet-bytes view and
>>> packet-details view is duplicate of that on the disk. I tried to look
>>> this up in the code. So, originally the data is in a capture_file and
>>> wtap_*() gets the data out of that and it is finally handed to
>>> dissect_packet() which actually makes the tvbuff out of it and passes to
>>> the sub-dissectors(dissect_frame etc).
>>
>>
>> Yes.  But the stuff in the packet-details view isn't what I consider to be
>> the problem (normally): that stuff is only kept in memory as long as it's on
>> your screen.  The real problem (which I thought file-backed-tvbuffs might
>> solve) would be when dissectors have to make copies of tvbuffs in order to
>> do, for example, reassembly.  Those copies are malloc()'d and it is believed
>> that, in some situations, they account for a lot of Wireshark's memory
>> usage.
>>
> Yes file backed tvbs might not have been such a great idea as Jeff points
> out the problem to be solved is
> probably the reassembled packets memory usage one also has to make sure that
> the tradeoff in speed
> isn't a problem (if any). Writing a new file on Wiresharks first pass with
> the reassembled data attached
> which will be read for any subsequent access might be the answer.
>
>
>> (A good side project would be to add some tracking to Wireshark's memory
>> allocations so we could be sure how much of a problem this is.  For example,
>> a while ago someone pointed out that actually a huge amount of memory goes
>> to storing frame_data's.)
>
> I thought about this too, would it be possible to invent a hash table
> registry function which then could
> be used to enquire the hash table sizes and display it in the GUI?
>
>
>>
>> Anyway, if reassembly could be done using composite + file-backed tvbuffs
>> then a lot of that alloc'd memory could go away.
>>
>>> I think I now have an idea of how I would back up tvbuff by a hard disk.
>>> We add another "type" of tvbuff which is backed up by a file, the same
>>> way TVBUFF_SUBSET is backed by another tvbuff. Next we think about "how
>>> to back it by a file?". Ofcourse, we can implement a neat cache in the
>>> tvb layer itself, tuned for our accesses. But I have a couple of
>>> thoughts on this. Do tell me, if I am missing something here.
>>>
>>> If we are accessing all the data in the tvbuff in one shot, there
>>> wouldn't be much use of a cache. Infact, it'll add housekeeping
>>> overhead. On the other hand, if we're making small repeated accesses to
>>> the data, a no-cache implementation would be pitifully slow. For this I
>>> need to look at usage of tvbuffs in those two views more closely. Also,
>>> now that there's this abstraction, the interface for accessing
>>> filebacked-tvbuff has to be a little different than normal tvbuffs
>>> (because the data access might require some housekeeping as opposed to
>>> the direct access of tvb->real_data+offset).
>>
>>
>> I suspect there would have to be *some* amount of caching: for example we
>> really wouldn't want to go off and read one byte off of the disk each time
>> someone calls tvb_get_guint8().
>>
>> I would expect that normally a tvbuff will have a lot of accesses in a
>> very short period of time, then no accesses for quite a while, then another
>> burst of accesses (corresponding to the frame or PDU in question being
>> dissected when the file is read and then not accessed again until the user
>> clicks or scrolls past the frame in question).
>>
>>> I thought I should talk to you guys first, because I could be going on a
>>> wild-goose-chase with this. If there's something you want me to take a
>>> look at or study, please do let me know. Also, if you can point me to a
>>> little bug, so that I can get my hands dirty, that'll be great.
>>
>>
>> I doubt there's much in the way of a bug to look at; I think to get your
>> hands dirty you'd have to start digging into how, for example, the tvbuffs
>> and reassembly work and see if it can be put together.
>>
>>
>> ___________________________________________________________________________
>> Sent via:    Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx>
>> Archives:    http://www.wireshark.org/lists/wireshark-dev
>> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>> mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe
>>
>
> ___________________________________________________________________________
> Sent via:    Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx>
> Archives:    http://www.wireshark.org/lists/wireshark-dev
> Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
>             mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe