Martijn Schipper wrote:
I have created a dissector for a protocol and one of the fields is UTF-8
encoded. What should I do to display this field in the tree?
If you mean "what should I do to display all the characters in it
correctly", the answer is "change Ethereal's handling of strings to
allow a character encoding to be specified with the string, and add
UTF-8 as one of the valid encodings". (With such a change, the set of
encodings should ultimately include:
ASCII, meaning "display anything with the 8th bit set, as well as all
control characters, as an escaped character";
UTF-8;
16-bit Unicode (big-endian and little-endian);
various PC OEM character sets;
various classic Mac OS character sets (OS X's native encoding is UTF-8,
but the earlier versions might've used MacRoman, etc.);
EBCDIC;
ISO 8859/x;
various EUC character sets;
various other encodings (KOI-8, Shift-JIS,
GBwhatever-that-Chinese-encoding-is, etc.).
Note that iconv isn't necessarily the answer, as we can't guarantee that
the iconv implementation on a given platform will support all the
character sets that Ethereal would need (it's not a question of what
character sets the machine running Ethereal uses, because it has to deal
with the character sets that the machines that transmitted the packets
Ethereal is reading used). Perhaps incorporating a copy of GNU iconv
into Ethereal, and having our own tables for character encodings, would
be the answer.
Note also that to display, print, etc. these characters you have to deal
with:
GTK+ 1.2[.x], which expects text in whatever the encoding is for the
font being used;
GTK+ 1.3[.x] and 2.x, which expect UTF-8 text;
formatting to a text file, which, on UN*X, should probably generate
text in whatever the encoding is for the user's local, and on Windows,
should probably - what? ASCII? 16-bit Unicode? If 16-bit Unicode, how
can it tag the file as such, so that Windows text editors can handle it?
Begin the file with a byte-orde mark?
printing to a printer.
If, however, you are willing to live with only ASCII characters being
displayed correctly, then, if you're adding the strings as fields,
Ethereal should properly escape non-ASCII characters, and if you're
explicitly formatting with "proto_tree_add_text()" or
"proto_tree_add_XXX_format()", use "format_text()" or
"tvb_format_text()" with "%s" format items (which is what people should
be doing *anyway*, to keep non-printable characters from screwing things
up).