Ethereal-dev: Re: [Ethereal-dev] Help Conversation

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxxxxx>
Date: Sun, 18 Mar 2001 19:29:07 -0800
On Sun, Mar 18, 2001 at 05:50:22PM +0100, Guillaume Le Malet wrote:
> I've tried to understand how "conversation" in ethereal works,
> and I've got a few questions:

Note that the questions you ask don't actually pertain to the
conversation mechanism; you don't have to set up a conversation in order
to attach per-frame data to a frame, for example.

> -When we do: "frame_data = p_get_proto_data(pinfo->fd, proto_smtp)"
>  does it point at the same time on all smtp packets that where
> captured?

I'm not certain what you mean here.

The data a dissector would get from a "p_get_proto_data()" call is the
data that the dissector attached to the frame in the first pass through
the packets - "frame_data" would point to whatever object the dissector
attached to the frame with a "p_add_proto_data()" call.

> -What does CRLF, EOM and Hash Table mean?

CRLF refers to a carriage-return character ('\r', octal 15, ASCII CR)
followed by a line-feed character ('\n', octal 12, ASCII LF); there are
a number of text-oriented protocols that use TCP - SMTP, as documented
in RFC 821, is one such protocol, and some others are FTP, NNTP, HTTP,
and POP - and those protocols tend to have the client sending commands
to the server, where a command is a line of text with a CRLF at the end
of the line, and have the server send replies back to the client, where
the reply also contains one or more lines.

EOM, in the SMTP dissector, refers to the End Of the Message.

If an SMTP client tells a SMTP to send a mail message, the sequence of
commands and replies might look something like this - "client:" and
"server:" indicate who's sending the command or reply, and everything
after it is the contents of one line (ending with a CRLF):

	client: MAIL FROM:<gharris@xxxxxxxxxxxx>
	server: 250 OK
	client: RCPT TO:<ethereal-dev@xxxxxxxxxxxx>
	server: 250 OK
	client: DATA
	server: 354 Start mail input; end with <CRLF>.<CRLF>
	client: From: Guy Harris <gharris@xxxxxxxxxxxx>
	client: To: ethereal-dev@xxxxxxxxxxxx
	client: Subject: Rewriting Ethereal in Objective COBOL
	client: Date: Sun, 1 Apr 2001 12:00:00 -0700
	client: Message-ID: <20010401666666.A666@xxxxxxxxxxxxxxxxxxxxxx>
	client: X-Tagline: Poisson d'Avril
	client:
	client: Hey, I just had a really odd idea - what if we rewrote
	client: Ethereal in Objective COBOL?  (I'm not sure that's what
	client: called, but there really *is* work being done on an
	client: object-oriented version of COBOL; they really should
	client: have called it "ADD ONE TO COBOL".)
	client: .
	server: 250 OK

The "MAIL" command from the client to the server tells the server who's
sending the mail; the server replies to that command with a reply "250
OK", where "250" is the reply code saying that the command was accepted,
and the "OK" is, from the point of view of the protocol, just a comment
for use by a person reading a transcript of the session.

The "RCPT" command tells the server to whom the mail should be sent.

The "DATA" command tells the server that the client is ready to supply
the actual contents of the mail message; the "354 Start mail input..."
reply says that the client should now send the contents of the mail
message.

The client then sends the headers and the body of the mail message.  The
way the client tells the server that it's finished sending the body of
the mail message is to send a line consisting only of a "." character,
i.e. it sends a "." character followed by a CRLF.

That line is called an end-of-message, or an EOM, in the SMTP dissector.

A hash table is a data structure used to speed up the process of
searching for data items that have "keys" associated with them.  For
example, you might have a table of records about people, and the "key"
would be the person's name.

One way to find the record for a particular person would be to have a
linked list of all those records, and to look at all the records,
starting with the first one in the list, and comparing the user-name
portion of the record with the name of the person for whom you're
looking.

If you have, say, 5 people for whom you have records, that wouldn't be
too bad.

If, however, you had 5,000 people, you would, on average (assuming a
random distribution of names for which you're trying to find the
record), have to look at 2,500 records each time you tried to find a
record.

That's somewhat expensive.

In a hash table, instead of having one list, you have several lists. 
You would take the name and "hash" it (from the online Merriam-Webster's
Collegiate(R) Dictionary:

		    1
	Main Entry:  hash
	Pronunciation: 'hash
	Function: transitive verb
        Etymology: French "hacher", from Old French "hachier", from "hache"
	battle-ax, of Germanic origin; akin to Old High German "hAppa"
	sickle; akin to Greek "koptein" to cut -- more at CAPON
	Date: 1590
	1 a : to chop (as meat and potatoes) into small pieces

		...

), by taking the characters in the name and, for example, adding them
together and taking the result modulo the number of lists you've set up.
You would put the record for a given user into the appropriate list,
and, when searching for that person's record, you'd use the "hashed"
version of their name to choose which of those lists to search.

If you had 100 lists, for example, instead of having to search a list of
5,000 people, you would (assuming your "hash function" was roughly
equally likely to pick any value between 0 and 99) only have to search a
list of approximately 50 people.

See

	http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?query=hash+table

The GLib library includes routines to create hash tables, put entries
into hash tables, remove entries from hash tables, and look up entries
in hash tables; Ethereal uses them in a number of places.

One place it uses them is in the SMTP dissector.  In this case, the
"key" is the conversation to which the current frame belongs, and the
data is state information indicating what the next traffic on the SMTP
connection is expected to be.

This is necessary because, although some protocols allow a packet to be
analyzed without knowing what packets came before it on the network,
SMTP doesn't.  For example, a line containing

	MAIL FROM:<billg@xxxxxxxxxxxxx>

could either be a "MAIL" command *or* it could be part of a mail message
explaining how SMTP works.

> -Is there a CRLF in any "over TCP proto" messages?

TCP provides, to the protocols that run over it, a sequenced byte
stream; if the protocol running over TCP needs that byte stream to be
considered as a sequence of messages, the protocol in question has to
put into the byte stream data to specify when one message ends and the
next one begins.

Many protocols that run over TCP are "line-oriented" protocols; I listed
some above.  In those protocols, a line often corresponds to a message,
so most (possibly all) messages would end with a CRLF.  (In the case of
SMTP, the data supplied after a "DATA" command is an exception - it ends
not with a CRLF, but with a "." on a line by itself, i.e.  a "."
preceded by and followed by a CRLF.)

However, not *all* of the protocols that run over TCP are line-oriented;
some of them might, for example, begin a message with a count of the
number of bytes in the message.  You might, in such a message, have a
byte with the value octal 15 followed by a byte with the value octal 12,
but it wouldn't be a "CRLF" in the sense that "CRLF" is used in, say,
SMTP - it wouldn't indicate the end of a one-line message in the
protocol.