Wireshark-dev: Re: [Wireshark-dev] [Wireshark-commits] rev 27872: /trunk/ /trunk/epan/dissector

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Fri, 27 Mar 2009 16:44:49 -0700

On Mar 27, 2009, at 4:22 PM, Stephen Fisher wrote:

Can we work unicode (UTF-8) support into these new string functions?

What does "Unicode (UTF-8) support" mean in this context? If a string passed as the initial value, a string passed to ep_strbuf_append(), a format string passed to ep_strbuf_append_vprintf() or ep_strbuf_append_printf(), or a string corresponding to a %s format item in ep_strbuf_append_vprintf() or ep_strbuf_append_printf() is UTF-8 encoded, the resulting string will be UTF-8 encoded.

The only issue I'd see with UTF-8 support would be if we added an option to set a maximum length for an ep_strbuf, so that somebody could specify that they don't want a string generated from a list of items to grow so big as to fill up the entire Info column or protocol tree item representation (e.g., just stop adding items and just put ", ..." at the end if it gets too big), in which case we'd want to make sure that we don't add some but not all of the bytes of a UTF-8 string - and so that the length is measured, perhaps, in characters rather than bytes (not all characters are the same width, but I'm not sure we want to find out how deep that rabbit hole goes).