Wireshark · Wireshark-bugs: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash

Wireshark-bugs: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash

Date: Sun, 03 Mar 2013 03:40:44 +0000

Comment # 31 on bug 8382 from Evan Huus

(In reply to comment #30)
> If it needs to look at the value, it should use a tvb_get_string_* routine
> that takes an encoding argument, and either:
> 
>     1) the routine should return a C string (null-terminated, meaning we
> can't handle embedded nulls)
> 
> or
> 
>     2) the routine should return something such as a counted string, so that
> we can handle embedded nulls, and we should provide routines that handle
> that type of string

I was thinking counted string, as the emem/wmem strbufs are (although they also
provide a null-terminator at the end).

> and the string's encoding should be UTF-8.
> 
> > - tvb_get_string_* functions should provide the length of the returned
> > string, since with multi-byte characters and embedded nulls, the caller has
> > no other way of determining this value
> 
> "Length" in octets or "length" in characters?

Octets, I think, so that unicode-ignorant code will continue to not crash, at
least.

> > - calling format_str is necessary when passing a packet-derived string into
> > proto_tree_add_string, col_append_fstr and friends. These functions could
> > potentially call format_str themselves on every string they get, but the
> > performance hit would be big. Faster to continue calling it only where
> > needed.
> 
> When presenting strings in human-readable form, we need to figure out what
> to do about non-printable characters.  The format_str mechanism of showing
> them as C string escapes might work, although would we then have to escape
> all backslashes in the string?  That'd look Really Weird in the SMB
> dissector, for example, with pathnames filled up with \\.  Perhaps string
> fields should have a display option indicating whether to use C string
> escapes (complete with escaping of backslashes) or to just show
> non-printable characters either as REPLACEMENT CHARACTER or, at least for C0
> control characters, as the SYMBOL FOR xxx characters (0x2400+{value} for
> 0x00 through 0x1F and 0x2421 for 0x7F).

I suspect for now the current method is satisfactory, being that whatever the
UI toolkit does is what we get unless the dissector manually calls format_str.
I agree we'll want a flag for string fields at some point, though I'm not sure
if it should be on the hf field or in the encoding arg of the tree_add call.

> It might also be useful to have some way in the UI to look at the raw bytes
> corresponding to individual glyphs, in case, for example, you're trying to
> debug a problem with a client that's sending bad UTF-8 sequences over the
> wire, or a problem with decomposed vs. composed accented characters.

That would be handy, but unless Qt does something nice for us I suspect it
would turn into a lot of work.

> My personal inclination is to store string values as encoding+raw bytes,
> doing the conversion to UTF-8 lazily, and:

My worry would be storage space, since running a filter that forced conversion
of all cases would double the memory used by these strings. Not a particularly
strong objection though.

> Having
> 
>     protocol.string {comparison} "hello"
> 
> being done either by converting protocol.string to UTF-8 and comparing the
> UTF-8 strings or converting "hello" to the encoding, caching the result, and
> failing if the conversion fails (>, >=, <, and <= might be done by
> converting to UTF-8; they're probably less common, and they raise questions
> about what order to impose on strings in any case).  The latter would
> probably involve fewer conversions.
> 
> Having
> 
>     protocol.string {==,!=} xx:xx:xx:xx:xx:xx:xx:xx...
> 
> be done by comparing the sequence of octets with the raw bytes.
> 
> Having the "apply as filter" and "prepare as filter" stuff, when you try to
> construct a filter from a string field, use the sequence-of-octets form if
> the value can't be converted to Unicode and the string form otherwise
> (although that would get tricky if the Unicode string contains non-printable
> characters).

You are receiving this mail because:

You are watching all bug changes.

Prev by Date: [Wireshark-bugs] [Bug 7885] Buildbot crash output: fuzz-2012-10-21-7332.pcap
Next by Date: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
Previous by thread: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
Next by thread: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
Index(es):
- Date
- Thread