Wireshark-dev: Re: [Wireshark-dev] multiple parsing of the same packets

From: Matthieu Patou <mat@xxxxxxxxx>
Date: Thu, 31 Oct 2013 14:56:44 -0700
On 10/30/2013 12:07 PM, Evan Huus wrote:
On Wed, Oct 30, 2013 at 2:20 PM, Matthieu Patou <mat@xxxxxxxxx> wrote:
On 10/30/2013 07:31 AM, Evan Huus wrote:
On Wed, Oct 30, 2013 at 4:14 AM, Matthieu Patou <mat@xxxxxxxxx> wrote:
Hello,

I noticed long time ago that wireshark is parsing the same packet at
least 3
tree times.

To make it worse if I go back and forth to the same packet it will be
dissected one more time.
With complex protocols like DRS (directory replication for Active
directory)
it's really a problem as the UI freeze for a while.
Is the protocol really so complex that dissecting a single packet of
it takes a user-visible amount of time? That seems suspect to me.
So what I did is that I'm dissecting the deferred RPC pointers only if tree
!= NULL the dissection of pointers takes a while because there is ~ 1700 top
level pointers and each of them have a lot inner pointers, DRS is a very
complicated protocol.
Fair enough, that's quite a bit of data to process. The packets must
be enormous.
The reassembled packet payload is 300K but it's compressed, after decompression it's 2MB of data.
Putting null-tree checks in can lead to huge improvements. Just be
careful that things like column data and expert info are added even if
tree==NULL.
I already added the tree==NULL checks so that instead of doing the dissection of deferred pointers 3 time we only do it 2 times.

First thing, why 3 dissections initially, is there a way to reduce this
to
2, I more or less understand why 2 pass are needed but 3 ...
It is in theory possible, the third pass is usually either to fill in
the column or tree information. We could in theory pull that straight
from the second pass, but we would have to calculate in advance which
packets are visible, which may or may not be easy.
Pardon my wireshark ignorance but it really look like the 2nd and the 3rd
pass are recreating the thing from scratch.
Every time we do a dissection it is more-or-less "from scratch". The
only data that reliably persists is minimal metadata about
conversations, request/response matching and that sort of thing.
Again, this was a decision made to trade off time for memory.

When loading a file, each packet is dissected once in order to set up
this metadata. Then any packet that is visible in the summary pane is
dissected again in order to calculate the column text to display. Then
the selected packet is dissected again to calculate the details tree
to show.
So if I get it right the second pass is to display the packet in the list and the third one to actually construct the tree.
Is there a way to understand that we are just interested for the columns ?
Because at the level were the dissection where we are there is no point adding info to the columns.

Usually the number of packets visible and/or selected is small (well
under 50) and so this extra dissection takes virtually no time at all.

Also is it possible to remember the dissection of packet so that we don't
do
it again and again ?
It is quite possible, it just takes an enormous amount of memory. I
actually hacked together a patch for this a few weeks ago while doing
some performance tests [1].

[1] http://www.mail-archive.com/wireshark-dev@xxxxxxxxxxxxx/msg29107.html

Well memory is not limitless neither ...
In the vast majority of cases dissecting a single packet (of any
protocol) is effectively instantaneous, so Wireshark saves as little
state as it possibly can. It has to redissect individual packets a lot
(pretty much any GUI action leads to at least one packet being
redissected) but this permits us to open substantially larger captures
(tens of thousands of packets) than we would be able to open
otherwise.

Given the number of tree items a DRS packet apparently produces,
storing the dissection data for every packet would require megabytes
of data per packet.
The thing is that you don't have those massive DRS packets very often, most of the time it's small but if you are doing the initial replication then it's huge. In real life you won't look too much at the initial replication (I'm looking at it because I'm tunning the dissector) but having to redissect everytime (for the columns) the big sync when you might only be interested with the successive ones is quite annoying.
  On a machine with 4GB of ram you probably wouldn't
be able to load more than a few thousand packets without forcing out
into swap. A saturated network can produce that many packets in
seconds (though maybe not that many DRS packets?), so Wireshark would
be pretty useless in that case.

This kind of massive DRS is spread on ~300 1500 bytes TCP packets.

Matthieu.

--
Matthieu Patou
Samba Team
http://samba.org