Ethereal-dev: Re: [ethereal-dev] TCP and higher level dissectors (sub-dissectors)

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxxxxx>
Date: Mon, 2 Oct 2000 11:44:39 -0700
On Mon, Oct 02, 2000 at 09:44:23AM -0500, Jeff Foster wrote:
> I suggest that we change the tcp and tcp sub-dissectors from the current
> 'data push' where the tcp dissector passes the packet data to the
> sub-dissector to a 'data pull' system where the sub-dissector requests
> data from the tcp dissector.

What's the advantage of doing it that way?  It looks as if the only
difference would be whether the sub-dissector or the TCP dissector
maintained state information and kept as-yet-unprocessed TCP segments.

The former could be done in the sub-dissector as well.

The TCP dissector would have to do the latter *anyway* if it's to handle
packet ordering problems (i.e., if two TCP segments show up
out-of-order, the second segment would have to be held onto by the TCP
dissector until the first segment arrives, the connection closes, or the
packet trace comes to an end (meaning that either the next segment never
got onto the network or got onto the network after the captue stopped).

Note also that, regardless of whether we change to a pull model or keep
the push model, this would require the sub-dissector, and probably the
TCP dissector, to keep per-packet information using
"p_add_proto_data()", because the dissectors would have to remember the
information about TCP segment order, retransmissions, etc., and any
message boundaries and the like maintained by the protocols running atop
TCP, discovered on the first sequential pass through the packets for use
if the user starts clicking on packets in a TCP connection in a random
order - once the first sequential pass is done, there's no guarantee
that packets will be visited in sequential order.

This may also mean, for example, that if you have two TCP segments out
of order, the Info column for the segment that appears first in the
capture can't be set until the segment that appears second in the
capture is seen, as the second segment has to be processed first by the
protocol running atop TCP.

This could work better if and when we only generate the contents of the
Info column when the row for a packet in the packet list pane is drawn
(rather than generating it on the first pass through the packets) - I'm
working on a scheme that turns "gtk/gtkclist.c" into "gtk/ethclist.c",
implementing a CList-derived widget that gets the column data by calling
back to a routine, a change that significantly speeds up the process of
reading in packet captures and significantly reduces the memory
requirements of Ethereal.

However, it does raise questions about

	1) Tethereal

and

	2) "Update list of packets in real time" captures

as, currently

	1) Tethereal prints the summary line for packets as soon as it
	   sees the packet in a capture or reads it from the capture
	   file

and

	2) "Update list of packets in real time" captures add a packet
	   to the list as soon as it's seen

but the packet might not be ready to dissect the instant it's seen.

In the case of "Update list of packets in real time" captures, this
might be helped by the "virtual CList" referred to above, as adding a
packet to the list doesn't set its Info column "for life".

In the case of Tethereal, though, it can't go back and redraw a line
it's already printed (a curses-based Ethereal could, but Tethereal is
line-mode and is in no position to do that - and could be "printing" to
a file, or a pipe, in any case), so:

	for capture files, I'd be inclined to have it somehow "buffer
	up" packets until they can be fully dissected, and only print
	the packet when it can be fully dissected;

	for live captures without "-w", I'd either

		1) always have it just print the summary line as
		   generated when the packet is first disssected

	or

		2) give it a command-line option to select whether it
		   does that or buffers packets as is done with
		   dead captures

	but I'd be initially inclined to just do 1) and only do 2) if
	people ask for it as a result of using it (rather than just
	arguing that it *could* be useful).

> How to handle holes in the data stream? 
> 	Is a 'start of data holes' pointer needed?

Some mechanism for indicating that there are holes in the data stream is
necessary; note that holes can occur due either to missing packets *OR*
due to the user specifying a snapshot length shorter than the MTU.

> Need to better clarify retransmission, should the original packet be
> indicated?

Note that a "retransmission" is a retransmission of a range of data, not
necessarily a retransmission of an entire packet - for example, the
transmitting machine might transmit sequence numbers 78 through 125,
but, on a retransmission, transmit sequence number 78 through 338.

TCP should, for the part of a segment that's retransmitted data, mark it
as such, showing, if possible, which original packets (possibly plural)
contained that data, and, of course, not hand it to the sub-dissector
more than once.

> How to handle display fields that are spread over multiple packets?
> 	Is the field displayed in the first packet or last packet?

I'd display it in all the packets.

> 	How does the user know that the data extends beyond the packet?

I'd be inclined to put "continued from frame N" and "continued on
frame N" indications into the protocol tree, and, if possible, also have
that indicate that the next or previous field is, itself, continued".

> How does this all fit into the ethereal proto_tree code?

It may require changes to allow a sub-dissector to dissect multiple
frames as a single packet (consider, for example, NetBIOS-over-TCP, or
ONC RPC-over-TCP) - this is a place where tvbuffs come in handy,
although that'd require tvbuffifcation of a lot more dissectors - and
have only those fields some or all of whose data comes from the current
frame actually go into the protocol tree.

We should also have a way of putting all the frames into the protocol
tree, so that, for example, you could ask to view a capture as messages
at a level above the frame level (for example, there'd be one row in the
packet list for each NetBIOS-over-TCP message, one row for each ONC RPC
request or reply, perhaps one row for each HTTP request or reply, etc.).

> How will this work with tethereal?

See above.

> How does this work with the current conversation code?  Can the tcp
> dissector and a sub-dissector both create a conversation on the same stream?

Or could they *share* a conversation?

(Note, by the way, that once all of this is done, a lot of "follow.c"
either goes away or, in effect, goes into "packet-tcp.c".)

Also note that some of this is potentially useful for other
connection-oriented protocols (although most if not all of the other
ones are packet-oriented, rather than byte-stream-oriented, so the
connection-oriented-protocol dissector, rather than the dissector it
calls, can gather packets into messages).