Ethereal-dev: Re: [Ethereal-dev] [HTTP]Desegmentation/Reassembly of HTTP headers/bodies

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Loïc Minier <lool+ethereal@xxxxxxxxxx>
Date: Thu, 9 Oct 2003 10:11:04 +0200
Guy Harris <guy@xxxxxxxxxxxx> - Wed, Oct 08, 2003:

> One problem with "tvb_find_guint8()", at least as you're using it, is 
> that it assumes that lines end with CR-LF.  Perhaps they *should*, but 
> that doesn't mean that they necessarily *will*.
> "tvb_find_line_end()" doesn't care whether the line ends with CR, LF, 
> CR-LF, or LF-CR.
> How would "tvb_find_line_end()" have more problems with malformed 
> headers than "tvb_find_guint8()"?

 Yes, I've spotted that point too, and that's why I switched to
 "tvb_find_guint8()": to be sure to match byte exactly the end of the
 headers. Now I see how it can be useful, because I did not see the real
 meaning of the "next_offset" it returns, I should rework my code to use
 "tvb_find_line_end()" again, sorry.

> An alternative would be to have a state variable for the conversation, 
> indicating whether we're processing the request/reply line, the 
> headers, or the body, along with another state variable giving the 
> content length, and just do enough reassembly to reassemble a single 
> header line.

 I don't see what you mean, could you (or possibly someone else :) point
 me to some code doing that in Ethereal?

> >Possibly fourth, I read in RFC2616 the Content-Length isn't always
> > present, but should be for backward compatibility with HTTP 1.0.
> If you're referring to section 4.4 "Message Length", then, if 
> Content-Length is missing, either
> 	1) the message is one that's not allowed to have a message-body, in 
> which case Ethereal shouldn't even try to reassemble the message body;
> 	2) the message has a Transfer-Encoding field other than 
> Transfer-Encoding: identity, in which case Ethereal would have to 
> handle chunked encoding, which is probably something that would be 
> worth doing eventually, but it's probably not something that needs to 
> be done now;

 I did some additional captures, and it seems "chunked" is quite
 common, where gzip/deflate/compress/whatever never happens (although I
 Accept-Encoding: gzip,deflate).

> Ethereal should, as noted, not even try to process a message-body for 
> response messages that "MUST NOT" include a message body (although to 
> tell whether something is a response to a HEAD request we'd have to see 
> the request, so that might be difficult to handle...).  In the case of 
> non-identity transfer encodings, or multipart/byteranges, it should 
> probably not reassemble traffic *or* hand it to a subdissector (as it's 
> not raw data).  Otherwise, it could probably assume that the transfer 
> finishes when the connection is closed, although there's *currently* no 
> way for TCP to send a "connection closed" indication to the 
> subdissector.

 I suggest I rework the desegmentation to use "tvb_find_line_end()", and
 maybe add chunked Transfer Encoding.


 Some thought I have that you or the audience could possibly answer:
 - anyone heard about the difference between the "TE" header and the
 "Transfer-Encoding" header?
 - some captures of gzipped/deflated HTTP conversation?

 There is a lot of possible enhancements for the packet-http.c
 routines. I think Ethereal should provide common lzw-compression
 functions or use a third-party lib to deal with all the compressed data
 that we can come across, for example in HTTP each "part" of a message
 (when parsing a multipart message) could be decompressed and passed to
 an appropriate dissector.

-- 
Lo�c Minier <lool@xxxxxxxx>