Wireshark-bugs: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
Date: Sun, 03 Mar 2013 03:40:44 +0000
Comment # 31
on bug 8382
from Evan Huus
(In reply to comment #30) > If it needs to look at the value, it should use a tvb_get_string_* routine > that takes an encoding argument, and either: > > 1) the routine should return a C string (null-terminated, meaning we > can't handle embedded nulls) > > or > > 2) the routine should return something such as a counted string, so that > we can handle embedded nulls, and we should provide routines that handle > that type of string I was thinking counted string, as the emem/wmem strbufs are (although they also provide a null-terminator at the end). > and the string's encoding should be UTF-8. > > > - tvb_get_string_* functions should provide the length of the returned > > string, since with multi-byte characters and embedded nulls, the caller has > > no other way of determining this value > > "Length" in octets or "length" in characters? Octets, I think, so that unicode-ignorant code will continue to not crash, at least. > > - calling format_str is necessary when passing a packet-derived string into > > proto_tree_add_string, col_append_fstr and friends. These functions could > > potentially call format_str themselves on every string they get, but the > > performance hit would be big. Faster to continue calling it only where > > needed. > > When presenting strings in human-readable form, we need to figure out what > to do about non-printable characters. The format_str mechanism of showing > them as C string escapes might work, although would we then have to escape > all backslashes in the string? That'd look Really Weird in the SMB > dissector, for example, with pathnames filled up with \\. Perhaps string > fields should have a display option indicating whether to use C string > escapes (complete with escaping of backslashes) or to just show > non-printable characters either as REPLACEMENT CHARACTER or, at least for C0 > control characters, as the SYMBOL FOR xxx characters (0x2400+{value} for > 0x00 through 0x1F and 0x2421 for 0x7F). I suspect for now the current method is satisfactory, being that whatever the UI toolkit does is what we get unless the dissector manually calls format_str. I agree we'll want a flag for string fields at some point, though I'm not sure if it should be on the hf field or in the encoding arg of the tree_add call. > It might also be useful to have some way in the UI to look at the raw bytes > corresponding to individual glyphs, in case, for example, you're trying to > debug a problem with a client that's sending bad UTF-8 sequences over the > wire, or a problem with decomposed vs. composed accented characters. That would be handy, but unless Qt does something nice for us I suspect it would turn into a lot of work. > My personal inclination is to store string values as encoding+raw bytes, > doing the conversion to UTF-8 lazily, and: My worry would be storage space, since running a filter that forced conversion of all cases would double the memory used by these strings. Not a particularly strong objection though. > Having > > protocol.string {comparison} "hello" > > being done either by converting protocol.string to UTF-8 and comparing the > UTF-8 strings or converting "hello" to the encoding, caching the result, and > failing if the conversion fails (>, >=, <, and <= might be done by > converting to UTF-8; they're probably less common, and they raise questions > about what order to impose on strings in any case). The latter would > probably involve fewer conversions. > > Having > > protocol.string {==,!=} xx:xx:xx:xx:xx:xx:xx:xx... > > be done by comparing the sequence of octets with the raw bytes. > > Having the "apply as filter" and "prepare as filter" stuff, when you try to > construct a filter from a string field, use the sequence-of-octets form if > the value can't be converted to Unicode and the string form otherwise > (although that would get tricky if the Unicode string contains non-printable > characters).
You are receiving this mail because:
- You are watching all bug changes.
- Prev by Date: [Wireshark-bugs] [Bug 7885] Buildbot crash output: fuzz-2012-10-21-7332.pcap
- Next by Date: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
- Previous by thread: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
- Next by thread: [Wireshark-bugs] [Bug 8382] MS-MMS dissector crash
- Index(es):