Guy Harris <guy@xxxxxxxxxxxx> - Wed, Oct 08, 2003:
> One problem with "tvb_find_guint8()", at least as you're using it, is
> that it assumes that lines end with CR-LF. Perhaps they *should*, but
> that doesn't mean that they necessarily *will*.
> "tvb_find_line_end()" doesn't care whether the line ends with CR, LF,
> CR-LF, or LF-CR.
> How would "tvb_find_line_end()" have more problems with malformed
> headers than "tvb_find_guint8()"?
Yes, I've spotted that point too, and that's why I switched to
"tvb_find_guint8()": to be sure to match byte exactly the end of the
headers. Now I see how it can be useful, because I did not see the real
meaning of the "next_offset" it returns, I should rework my code to use
"tvb_find_line_end()" again, sorry.
> An alternative would be to have a state variable for the conversation,
> indicating whether we're processing the request/reply line, the
> headers, or the body, along with another state variable giving the
> content length, and just do enough reassembly to reassemble a single
> header line.
I don't see what you mean, could you (or possibly someone else :) point
me to some code doing that in Ethereal?
> >Possibly fourth, I read in RFC2616 the Content-Length isn't always
> > present, but should be for backward compatibility with HTTP 1.0.
> If you're referring to section 4.4 "Message Length", then, if
> Content-Length is missing, either
> 1) the message is one that's not allowed to have a message-body, in
> which case Ethereal shouldn't even try to reassemble the message body;
> 2) the message has a Transfer-Encoding field other than
> Transfer-Encoding: identity, in which case Ethereal would have to
> handle chunked encoding, which is probably something that would be
> worth doing eventually, but it's probably not something that needs to
> be done now;
I did some additional captures, and it seems "chunked" is quite
common, where gzip/deflate/compress/whatever never happens (although I
Accept-Encoding: gzip,deflate).
> Ethereal should, as noted, not even try to process a message-body for
> response messages that "MUST NOT" include a message body (although to
> tell whether something is a response to a HEAD request we'd have to see
> the request, so that might be difficult to handle...). In the case of
> non-identity transfer encodings, or multipart/byteranges, it should
> probably not reassemble traffic *or* hand it to a subdissector (as it's
> not raw data). Otherwise, it could probably assume that the transfer
> finishes when the connection is closed, although there's *currently* no
> way for TCP to send a "connection closed" indication to the
> subdissector.
I suggest I rework the desegmentation to use "tvb_find_line_end()", and
maybe add chunked Transfer Encoding.
Some thought I have that you or the audience could possibly answer:
- anyone heard about the difference between the "TE" header and the
"Transfer-Encoding" header?
- some captures of gzipped/deflated HTTP conversation?
There is a lot of possible enhancements for the packet-http.c
routines. I think Ethereal should provide common lzw-compression
functions or use a third-party lib to deal with all the compressed data
that we can come across, for example in HTTP each "part" of a message
(when parsing a multipart message) could be decompressed and passed to
an appropriate dissector.
--
Lo�c Minier <lool@xxxxxxxx>