https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=5738
Guy Harris <guy@xxxxxxxxxxxx> changed:
What |Removed |Added
----------------------------------------------------------------------------
Platform|Other |All
OS/Version|Linux (other) |All
--- Comment #3 from Guy Harris <guy@xxxxxxxxxxxx> 2011-09-22 16:33:06 PDT ---
The message body is
ХРИСТОМ БО...
but the version of Wireshark I'm using screws up the И - but gets the ХР and СТ
right. The display code is escaping some, but not all, octets with the 8th bit
set - it's escaping 0x98 but not 0xA5, for example.
I suspect this string is being run through format_text(), which is treating the
string as a sequence of single-octet code points, and isprint() is deciding
that 0xA5 is printable but 0x98 isn't.
It is, I think, time to have format_text() treat its argument as UTF-8, not as
some unspecified other ASCII extension with all characters being a single
octet, and to do mapping such as:
C0 control characters -> the Unicode characters intended to display them, or
to a \XXX escape;
valid UTF-8 characters that are printable -> themselves;
valid UTF-8 characters that aren't printable -> something;
octet sequences not valid in UTF-8 -> Unicode REPLACEMENT CHARACTER
(0xFFFD).
I.e., format_text() needs to generate valid and displayable UTF-8, not valid
and displayable ISO 8859-n or any other single-byte extended-ASCII character
set.
--
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are watching all bug changes.