Ethereal-dev: Re: [Ethereal-dev] thoughts on reassembly...

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxxxxx>
Date: Fri, 1 Dec 2000 02:51:47 -0800
On Thu, Nov 30, 2000 at 08:35:10AM -0600, Neulinger, Nathan R. wrote:
> I had a thought pop into my head this morning on a relatively
> straightforward way to implement reassembly of packets/frags/etc. For what
> follows, I'm assuming UDP-style only,

Presumably you mean "IP-style"; UDP has no fragmentation/reassembly - it
relies on IP to do that.  *Most* fragemented IP datagrams are probably
UDP datagrams, but there's no reason in principle why there can never be
fragmented IP datagrams containing TCP segments, for example - and it's
IP that knows the fragment offsets of fragments.

> although the basic approach would be the same.
> 
> First, it depends on one thing - is there a way for dissectors to get at
> data from previous frames?

It could explicitly read the frame in, using "wtap_seek_read()".  It'd
have to allocate a buffer for that.  We'd probably have a routine in
"file.c" to do that; the routine would take either a frame number or a
"frame_data" pointer as an argument.

> The basic approach I was thinking of goes something like this: have a
> 'chunks' facility, similar to the conversations facility. When a dissector
> sees that is has a partial frame/fragment/etc that it knows is part of
> something larger, then it registers that sequence number/piece in the
> structure.

I'm not sure how much of this would be usable both by IP and by
protocols running atop TCP (TCP itself knows nothing about higher-level
packet boundaries; it would probably do some reassembly work itself, so
that it can supply to higher-level dissectors, on the first pass through
the capture, an in-order, no-duplicates stream of packet data, but it
wouldn't know about in-stream byte counts, Content-Length MIME headers,
and the like).

> As each frame comes in, if it sees that it's not whole, it looks up the
> piece in the structure, if it finds a corresponding record, it adds it to
> that record. It then passes that piece onto dissectors like always. Once
> that is done, it then checks the record for completeness - if all the pieces
> are registered, it then takes the data in that record, and creates a
> pseudo-frame that contains the data from all, and passes THAT along to the
> other dissectors.

We'd need some way to handle "reassembly timeouts" and the like; if
nothing else, the end of the capture would cause an IP reassembly
timeout (if there are any incomplete fragmented datagrams by the end of
the capture, they're never going to be reassembled), and we might also
want to time out the reassembly after a certain amount of "time" (based
on the packet timestamps) has passed, if there's a risk that some other
unrelated IP datagram later in the capture would have the same IP
identification.  Similarly, a FIN would "time out" any TCP stream
reassembly, and any reassembly of packets in the TCP stream, in that
direction of the connection.

> i'd like to see it have some way of indicating in the display that this
> frame is a reconstructed object, not a on-the-wire-one-piece object.

I'd actually like to be able to have a display that showed only frames
*but* that would, for frames that were part of a larger packet, added
"Continued in next frame" or "Continued from previous frame" indications
and showed the contents of fields that spanned frame boundaries.

Another display option would be to show stuff only at the higher level,
so that frames that were part of larger packets wouldn't be shown - only
the larger packets would be shown (frames that *weren't* part of larger
packets would be shown as is), and the protocol tree probably wouldn't
show information for lower layers.

You'd have an option to switch between the two different views.

> As each piece came in, it would be stuck in the pieces array for the
> frameset, the check-for-completeness operation would look at the totalsize,
> and all of the pieces, and see if the entire frameset was found. If it was,
> it would allocate memory for the totalsize,

...or just create a composite tvbuff (that's what composite tvbuffs are
there for - handling reassembly).