Ethereal-dev: [Ethereal-dev] About tvb_find_line_end

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Heikki Vatiainen <hessu@xxxxxxxxx>
Date: Sat, 11 Nov 2000 16:47:13 +0200
Guy Harris <gharris@xxxxxxxxxxxx> wrote:

[ about tvb_find_line_end() ]
> and searches for an LF.  If it finds the LF, it checks if it's preceded
> by a CR and, if so, treats the CR/LF as the line ending, otherwise it
> treats the LF as the line ending (unless the LF is *followed* by a CR,
> in which case it treats the LF/CR as a line ending, as I think we had
> one capture at NetApp where some HTTP client or server was using LF/CR;
> that won't treat CR/LF/CR as a line ending, only <non-CR>/LF/CR).

SIP (RFC 2543) seems to allow plain CR to indicate line break. HTTP
1.1 (RFC 2616) also mentions plain CR but has restrictions for its
use as a line ending. Would it be useful or break things if
tvb_find_line_end() was augmented to also treat a plain CR as a
line ending?

Here are some excerpts, the first is from RFC 2543 and the other
from RFC 2616:

 3 SIP Message Overview

    SIP is a text-based protocol and uses the ISO 10646 character set in
    UTF-8 encoding (RFC 2279 [21]). Senders MUST terminate lines with a
    CRLF, but receivers MUST also interpret CR and LF by themselves as
    line terminators.


 3.7.1 Canonicalization and Text Defaults
    ...

    When in canonical form, media subtypes of the "text" type use CRLF as
    the text line break. HTTP relaxes this requirement and allows the
    transport of text media with plain CR or LF alone representing a line
    break when it is done consistently for an entire entity-body. HTTP
    applications MUST accept CRLF, bare CR, and bare LF as being
    representative of a line break in text media received via HTTP. In
    addition, if the text is represented in a character set that does not
    use octets 13 and 10 for CR and LF respectively, as is the case for
    some multi-byte character sets, HTTP allows the use of whatever octet
    sequences are defined by that character set to represent the
    equivalent of CR and LF for line breaks. This flexibility regarding
    line breaks applies only to text media in the entity-body; a bare CR
    or LF MUST NOT be substituted for CRLF within any of the HTTP control
    structures (such as header fields and multipart boundaries).


Heikki
-- 
Heikki Vatiainen                  * hessu@xxxxxxxxx
Tampere University of Technology  * Tampere, Finland