Wireshark-bugs: [Wireshark-bugs] [Bug 6591] New: linear white space (LWS) not ignored after HTTP

Date: Thu, 17 Nov 2011 05:02:27 -0800 (PST)
https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=6591

           Summary: linear white space (LWS) not ignored after HTTP header
                    field content
           Product: Wireshark
           Version: unspecified
          Platform: x86
        OS/Version: All
            Status: NEW
          Severity: Normal
          Priority: Low
         Component: TShark
        AssignedTo: bugzilla-admin@xxxxxxxxxxxxx
        ReportedBy: cbley@xxxxxxxxxx


Claudio <cbley@xxxxxxxxxx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #7426|                            |review_for_checkin?
               Flag|                            |

Created an attachment (id=7426)
 --> (https://bugs.wireshark.org/bugzilla/attachment.cgi?id=7426)
strip trailing WS from http_host in dissect_http_message

Build Information:
I tried 1.6.1 (win32), 1.6.3 (win32) and 1.7.1 svn r39897 (linux-amd64). E.g.

TShark 1.7.1 (SVN Rev 39897 from /trunk)

Compiled (64-bit) with GLib 2.24.1, with libpcap 1.0.0, with libz 1.2.3.3,
without POSIX capabilities, without SMI, without c-ares, without ADNS, without
Lua, with Python 2.6.5, without GnuTLS, without Gcrypt, without Kerberos,
without GeoIP.

Running on Linux 2.6.32-33-server, with locale en_US.UTF-8, with libpcap
version
1.0.0, with libz 1.2.3.3.

Built using gcc 4.4.3.
--
I'm using a command like:

tshark -r bla.pcap -o tcp.desegment_tcp_streams:TRUE -T fields -e
http.request.full_uri

If the Host Header looks like, e.g. "Host: www.foo.com \r\n" (notice the
trailing white space), tshark generates output like:

"http://www.foo.com /a/doc.html"

Actually, I expected to get a valid URL.

According to RFC 2616 (HTTP 1.1), section 4.2 Message Headers:

   The field-content does not include any leading or trailing LWS:
   linear white space occurring before the first non-whitespace
   character of the field-value or after the last non-whitespace
   character of the field-value. Such leading or trailing LWS MAY be
   removed without changing the semantics of the field value.

RFC 1945 (HTTP 1.0), section 4.2 Message Headers does not make such an explicit
statement, but white spaces are no valid characters in host names, AFAIK.

At first, I was in favor of removing trailing LWS from all HTTP headers because
leading LWS is already stripped, too (I still have the patch if you want that). 

But I reconsidered and so here's a patch which removes (leading and) trailing
spaces from the http_host header only.

-- 
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.