Wireshark-dev: Re: [Wireshark-dev] About a faster wireshark

From: "Didier" <dgautheron@xxxxxxxx>
Date: Thu, 11 Oct 2007 03:34:48 +0200
On Wed, 10 Oct 2007 15:56:53 -0700, Guy Harris wrote
> On Oct 7, 2007, at 5:04 PM, Didier wrote:
> 
> > Is it ok if I upload the diff (~ 250 KB) in the wiki?
> 
>  From looking at the version you uploaded as an attachment to
> 
> 	http://wiki.wireshark.org/Development/Optimization
> 
> it looks as if one of the components is a new CList implementation.
> 
> Is that the part that's "glibc anbd gcc only"?
No, AFAIK there's a mallopt call which is glibc or gnu malloc only and a gcc
_attribute_ noinline, I ifdefed both in the patch so it might compile on OSX
and Window now, I don't know.
   
> 
> And what does the new implementation do?  I have a CList  
> implementation that doesn't store the strings for the columns, but,  
> instead, calls a supplied callback routine to get the column values  
> when needed.  It also has some rough edges (as in "not ready to 
> submit  yet" - and it's currently GTK+ 1.2[.x]-only, but it could 
> probably be  made to work with both), but speeds up initial loading 
> (no need to  allocate and copy the column values when loading),
>  reduces memory use  by a *LOT* (no copies made of the column values)
> , and can make some  display updates happen *much* faster (e.g., 
> changing the time stamp  format - it doesn't need to regenerate all 
> the columns, it just needs  to tell the CList to update itself, and 
> the column values change  because the callback now supplies 
> different strings).
it's a modified gtk2 Clist which replaces the link list with an array and use
a callback. All rows and columns are still computed in initial loading but
they aren't recomputed in filtering, they are saved in the fdata structure.

It was easier than a full callback implementation, columns sorting is broken
and adding/removing columns worked but don't anymore, though. 

Most of my work is offline and I don't really care about load time.
> 
> There also appear to be some other changes, such as a  
> tvb_new_child_real_data() routine, some changes to name resolution,  
Tearing tvbuff tree take a lot of time. TVBUFF_COMPOSITE is bit rote and
TVBUF_SUBSET are only used by actual dissectors as a link list not a tree. 

It's a small optimization, maybe worthless, which replaces the tree by a link
list (I did test a lot of stuffs but I didn't prune useless ones).

I removed tvb_new_child_real_data(), tvb_set_child_real_data_tvbuff()... and
replaced them with tvb_new_child_real_data() which does exactly the same
think but now if a change in svn adds a new tvb_new_child_real_data() call
'make' will fail and I can double check if my assumption is still true. 

> etc; do you have a description of what the changes are?

They assume that wireshark output doesn't change between loading and filtering.

It's not 100% true, col info isn't always the same, in RPC for example and a
lot of dissectors don't throw an error if the tree is null but for me it's
good enough.

So:
- Color filters are applied only once and kept around.

- Columns are computed only once and kept around.

- keep frames' protocols in a 64 bits bitfield. When a filter is applied it
only load and decode relevant frames. If there's more than 63 protocols or a
'!' is used or there's a tap listener it fallbacks to decoding all frames,
with reassembled protocols or captures with a lot of 'junk' it really improves
speed a lot. 
Some protocols always set a tap listener (gsm), I've ifdef them...

- Use an array rather than a link list for frame data.

- A lot of constant stuffs like tree creation are moved outside the main loop.

- filter VM optimizations: constants computed at compilation time, follow
goto, ....

- During load sort heuristic dissectors by used/unused. When filtering if a
protocol bitfield id is not set that means that the packet is undecoded and
the try_heuristic loop is exited.  

Many small stuff like:
- cache both resolved name and unresolved address.
- replace some col_.._fstr with faster col_..._str
- use a faster tvb_get_xxx most of the time. 
- protocols optimizations (move stuff inside if (tree) or if (tree->visible)
and so on).

Ugly hacks here and there too (:

I can send a lot them as separated patches but it seems I can't use bugzilla
for it (maybe because my IP changes very often, evil telco NATed load balancer).

Didier