Ethereal-dev: Re: [Ethereal-dev] [HTTP]Desegmentation/Reassembly of HTTP headers/bodies

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Wed, 8 Oct 2003 17:33:01 -0700

On Oct 8, 2003, at 3:40 AM, Loïc Minier wrote:

 First, I preferred using tvb_find_guint8/tvb_strneql functions
 instead of tvb_find_line_end because I guessed they could be trouble
 with malformed headers. The original code is commented in C++ style.

One problem with "tvb_find_guint8()", at least as you're using it, is that it assumes that lines end with CR-LF. Perhaps they *should*, but that doesn't mean that they necessarily *will*.

"tvb_find_line_end()" doesn't care whether the line ends with CR, LF, CR-LF, or LF-CR.

How would "tvb_find_line_end()" have more problems with malformed headers than "tvb_find_guint8()"?

 Second, I wonder if I should do the "Content-Length:" research inside
 the normal loop of dissect_http for adding header. I am not sure
 it would be cleaner, and I would not detect the end of headers -
 "\r\n\r\n" - easily, it was too complex for my first try.

If the intent is to collect all the headers, and the content, in a single reassembled frame, you definitely need a loop that runs before the columns are set or the protocol tree is constructed, and you'd have to handle Content-Length: in that loop, not the loop that puts the headers into the protocol tree.

An alternative would be to have a state variable for the conversation, indicating whether we're processing the request/reply line, the headers, or the body, along with another state variable giving the content length, and just do enough reassembly to reassemble a single header line.

 Third, the code miss a content-type filter I do not plan to write, is
 it still acceptable when the options are switched off by default?

Perhaps, although it does run the risk of reassembling a really large amount of content even if the user doesn't care. We can always add a way to specify what content types should be reassembled if that's a problem.

Possibly fourth, I read in RFC2616 the Content-Length isn't always
 present, but should be for backward compatibility with HTTP 1.0.

If you're referring to section 4.4 "Message Length", then, if Content-Length is missing, either

1) the message is one that's not allowed to have a message-body, in which case Ethereal shouldn't even try to reassemble the message body;

2) the message has a Transfer-Encoding field other than Transfer-Encoding: identity, in which case Ethereal would have to handle chunked encoding, which is probably something that would be worth doing eventually, but it's probably not something that needs to be done now;

3) the message has a media type "multipart/byteranges", in which case Ethereal would have to handle it, which, again, is probably something Ethereal should do eventually but that it doesn't need to do now;

4) the message is a reply from the server and the connection is closed at the end.

Ethereal should, as noted, not even try to process a message-body for response messages that "MUST NOT" include a message body (although to tell whether something is a response to a HEAD request we'd have to see the request, so that might be difficult to handle...). In the case of non-identity transfer encodings, or multipart/byteranges, it should probably not reassemble traffic *or* hand it to a subdissector (as it's not raw data). Otherwise, it could probably assume that the transfer finishes when the connection is closed, although there's *currently* no way for TCP to send a "connection closed" indication to the subdissector.