Ethereal-dev: [Ethereal-dev] [PATCH][HTTP]Desegmentation/Reassembly of HTTP headers/bodies
Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.
From: Loïc Minier <lool+ethereal@xxxxxxxxxx>
Date: Thu, 16 Oct 2003 12:17:11 +0200
[ Attached patch includes tested HTTP-headers/body reassembly based on content-length. ] Lo�c Minier <lool+ethereal@xxxxxxxxxx> - Thu, Oct 09, 2003: > > One problem with "tvb_find_guint8()", at least as you're using it, is > > that it assumes that lines end with CR-LF. Perhaps they *should*, but > > that doesn't mean that they necessarily *will*. > > "tvb_find_line_end()" doesn't care whether the line ends with CR, LF, > > CR-LF, or LF-CR. > > How would "tvb_find_line_end()" have more problems with malformed > > headers than "tvb_find_guint8()"? > Yes, I've spotted that point too, and that's why I switched to > "tvb_find_guint8()": to be sure to match byte exactly the end of the > headers. Now I see how it can be useful, because I did not see the real > meaning of the "next_offset" it returns, I should rework my code to use > "tvb_find_line_end()" again, sorry. My assumption or "CRLF" ending line is not needed any more, I switched back to tvb_find_line_end in the attached patch as discussed above. > I did some additional captures, and it seems "chunked" is quite > common, where gzip/deflate/compress/whatever never happens (although I > Accept-Encoding: gzip,deflate). I checked the Content-Length detection/reassembly with "gzip" and "chunked" encoding capture, and saw no apparent problem. The only possible problem I spot is when the end of the HTTP response is not in the capture. If I understand correctly, the behaviour is that pinfo->can_desegment will be set to false if there are no more bytes to desegment, is this correct? Kind regards, -- Lo�c Minier <loic.minier@xxxxxxxxxxx>
Index: packet-http.c =================================================================== RCS file: /cvsroot/ethereal/packet-http.c,v retrieving revision 1.67 diff -u -b -r1.67 packet-http.c --- packet-http.c 2 Sep 2003 23:09:10 -0000 1.67 +++ packet-http.c 16 Oct 2003 10:09:04 -0000 @@ -40,6 +40,7 @@ #include "util.h" #include "packet-http.h" +#include "prefs.h" typedef enum _http_type { HTTP_REQUEST, @@ -67,6 +68,19 @@ static dissector_handle_t data_handle; static dissector_handle_t http_handle; +/* + * desegmentation of HTTP headers + * (when we are over TCP or another protocol providing the desegmentation API) + */ +static gboolean http_desegment_headers = FALSE; + +/* + * desegmentation of HTTP bodies + * (when we are over TCP or another protocol providing the desegmentation API) + * TODO let the user filter on content-type the bodies he wants desegmented + */ +static gboolean http_desegment_body = FALSE; + #define TCP_PORT_HTTP 80 #define TCP_PORT_PROXY_HTTP 3128 #define TCP_PORT_PROXY_ADMIN_HTTP 3132 @@ -207,6 +221,7 @@ gint offset = 0; const guchar *line; gint next_offset; + gint next_offset_sav; const guchar *linep, *lineend; int linelen; guchar c; @@ -217,8 +232,109 @@ RequestDissector req_dissector; int req_strlen; proto_tree *req_tree; + long int content_length; + gboolean content_length_found = FALSE; + + /* + * RFC 2616 defines HTTP messages as being either of the Request or + * the Response type (HTTP-message = Request | Response). + * Request and Response are defined as: + * Request = Request-Line + * *(( general-header + * | request-header + * | entity-header ) CRLF) + * CRLF + * [ message-body ] + * Response = Status-Line + * *(( general-header + * | response-header + * | entity-header ) CRLF) + * CRLF + * [ message-body ] + * that's why we can always assume two consecutive CRLF to mark + * the end of the headers, worst thing happenning otherwise is + * the packet not being desegmented or being interpreted as only + * headers + */ + /* + * if headers desegmentation is activated, check that all headers are + * in this tvbuff (search for an empty line marking end of headers) or + * request one more byte + */ + if (http_desegment_headers && pinfo->can_desegment) { + next_offset = offset; + for (;;) { + next_offset_sav = next_offset; + /* + * request one more byte if there's no byte left + */ + if (tvb_offset_exists(tvb, next_offset) == FALSE) { + pinfo->desegment_offset = offset; + pinfo->desegment_len = 1; + return; + } + /* + * request one more byte if we can not find a + * header (ie. a line end) + */ + linelen = tvb_find_line_end(tvb, + next_offset, + -1, + &next_offset, + TRUE); + /* not enough data, ask for one more byte */ + if (linelen == -1) { + pinfo->desegment_offset = offset; + pinfo->desegment_len = 1; + return; + } else if (linelen == 0) { + break; /* we found the end of the headers */ + } + /* + * search content-length, if it fails it either means + * that we are in a different header line, or that we + * are at the end of the headers, or that there isn't + * enough data, the two later cases have already been + * handled above + */ + if (http_desegment_body) { + /* check if we've found Content-Length */ + if (tvb_strneql(tvb, + next_offset_sav, + "Content-Length:", + 15) == 0) { + if (sscanf( + tvb_get_string(tvb, + next_offset_sav + 15, + linelen - 15), + "%li", + &content_length) == 1) { + content_length_found = TRUE; + } + } + } + } + } + /* + * the above loop ends when we reached the end of the headers, so + * there should be content_length byte after the 4 terminating bytes + * and next_offset points to after the end of the headers + */ + if (http_desegment_body && content_length_found) { + /* next_offset has been set because content-length was found */ + if (FALSE == tvb_bytes_exist( + tvb, next_offset, content_length)) { + gint length = tvb_length_remaining(tvb, next_offset); + if (length == -1) { + length = 0; + } + pinfo->desegment_offset = offset; + pinfo->desegment_len = content_length - length; + return; + } + } - stat_info =g_malloc( sizeof(http_info_value_t)); + stat_info = g_malloc( sizeof(http_info_value_t)); stat_info->response_code = 0; stat_info->request_method = NULL; @@ -658,11 +774,25 @@ &ett_http_ntlmssp, &ett_http_request, }; + module_t *http_module; proto_http = proto_register_protocol("Hypertext Transfer Protocol", "HTTP", "http"); proto_register_field_array(proto_http, hf, array_length(hf)); proto_register_subtree_array(ett, array_length(ett)); + http_module = prefs_register_protocol(proto_http, NULL); + prefs_register_bool_preference(http_module, "desegment_http_headers", + "Desegment all HTTP headers spanning multiple TCP segments", + "Whether the HTTP dissector should desegment all headers " + "of a request spanning multiple TCP segments", + &http_desegment_headers); + prefs_register_bool_preference(http_module, "desegment_http_body", + "Trust the � Content-length: � header and desegment HTTP " + "bodies spanning multiple TCP segments", + "Whether the HTTP dissector should use the " + "� Content-length: � value to desegment the body " + "of a request spanning multiple TCP segments", + &http_desegment_body); register_dissector("http", dissect_http, proto_http); http_handle = find_dissector("http");
- Follow-Ups:
- References:
- [Ethereal-dev] [HTTP]Desegmentation/Reassembly of HTTP headers/bodies
- From: Loïc Minier
- Re: [Ethereal-dev] [HTTP]Desegmentation/Reassembly of HTTP headers/bodies
- From: Guy Harris
- Re: [Ethereal-dev] [HTTP]Desegmentation/Reassembly of HTTP headers/bodies
- From: Loïc Minier
- [Ethereal-dev] [HTTP]Desegmentation/Reassembly of HTTP headers/bodies
- Prev by Date: RE: [Ethereal-dev] Patch for GTP 1.64
- Next by Date: Re: [Ethereal-dev] [HTTP]Desegmentation/Reassembly of HTTP headers/bodies
- Previous by thread: Re: [Ethereal-dev] [HTTP]Desegmentation/Reassembly of HTTP headers/bodies
- Next by thread: Re: [Ethereal-dev] [PATCH][HTTP]Desegmentation/Reassembly of HTTP headers/bodies
- Index(es):