Wireshark-dev: Re: [Wireshark-dev] Terminating NULL chraracter in RTCP Byereason string

From: Jaap Keuter <jaap.keuter@xxxxxxxxx>
Date: Tue, 05 Aug 2008 21:29:00 +0200
Oke, did anyone file a bugreport on this one?
Be sure to attach the sample capture from this thread.
Thanx,
Jaap

Guy Harris wrote:
On Aug 5, 2008, at 10:20 AM, Neil Piercy wrote:

The real problem in the spec is here - the leap from "octets of text" to
"string".

A sequence of octets of text *is* a (text) string. A string is not necessarily null-terminated (the C programming language, and its derivatives, nonwithstanding).

It sounds as if whoever wrote RFC 3550 needs to learn the
difference between the words "padded" and "terminated" - they
probably meant to say that the string is null-*padded* to a
4-byte boundary.
Maybe they did know the diference, and maybe they didn't, but what they
actually said was:

If the string fills the packet to the next 32-bit boundary, the
string is not null terminated.
i.e. they have defined a case in which a "string" is not null terminated
(i.e. is a sequence of non-null characters only), so Wireshark should
not object to the string not being null terminated in this case.

They also said

The string has the same encoding as that described for SDES.

and what they say for SDES is

Each chunk consists of an SSRC/CSRC identifier followed by a list of
   zero or more items, which carry information about the SSRC/CSRC.
Each chunk starts on a 32-bit boundary. Each item consists of an 8-
   bit type field, an 8-bit octet count describing the length of the
   text (thus, not including this two-octet header), and the text
   itself.  Note that the text can be no longer than 255 octets, but
this is consistent with the need to limit RTCP bandwidth consumption.

The text is encoded according to the UTF-8 encoding specified in RFC
   2279 [5].  US-ASCII is a subset of this encoding and requires no
   additional encoding.  The presence of multi-octet encodings is
   indicated by setting the most significant bit of a character to a
   value of one.

   Items are contiguous, i.e., items are not individually padded to a
   32-bit boundary.  Text is not null terminated because some multi-
octet encodings include null octets. The list of items in each chunk MUST be terminated by one or more null octets, the first of which is
   interpreted as an item type of zero to denote the end of the list.
No length octet follows the null item type octet, but additional null
   octets MUST be included if needed to pad until the next 32-bit
boundary. Note that this padding is separate from that indicated by
   the P bit in the RTCP header.  A chunk with zero items (four null
   octets) is valid but useless.

which describes null-padded strings, not null-terminated strings (in fact, they explicitly say "Text is not null terminated because some multi-octet encodings include null octets" - although they earlier say the encoding is UTF-8, which *doesn't* include null octets in multi- octet encodings).

I.e., RFC 3550 needs a little attention from an editor.

In any case, what Wireshark should do is treat the BYE Reason string as null-padded, not as "null-terminated except when it isn't".