Ethereal-dev: [ethereal-dev] How should dissectors define "printable"?

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxxxxx>
Date: Tue, 4 Jul 2000 19:13:46 -0700
Currently, some dissectors distinguish between "printable" and
"non-printable" characters when deciding whether to display an octet
string as text or as hex digits.  For example, the OSI CLTP and COTP
dissectors use that to decide whether to display TSAPs as strings or
hex.

The CLTP and COTP dissectors, and perhaps at least some other
dissectors, use the "isprint()" function (macro) to distinguish between
"printable" and "non-printable" characters.

Ethereal calls "gtk_set_locale()", which sets the C-language locale to
the native locale; the behavior of "isprint()" is affected by the
locale.  I just checked in a change to Tethereal to call
'setlocale(LC_ALL, "")' to make it do the same.

In the C locale (which is what Ethereal used to use, as it didn't call
"gtk_set_locale()", only printable ASCII characters are considered
"printable"; this means that a TSAP of 0xfffffffefffffffe would be
considered to contain no "printable" characters, and thus would be
displayed as hex digits.

However, in a locale whose character set is, for example, ISO 8859/1
(ISO Latin 1), 0xff is "y with diaresis", and 0xfe is, I think,
lower-case thorn (unless it's lower-case eth - apologies to any
Icelanders whom I've offended if I misremembered), so that TSAP would be
displayed as a string.

There are some dissectors that should perhaps continue to use
"isprint()", as the data in question might well be a string in a
non-English language (although if the string is in a double-byte
character set, or in UTF-8, say, it might still not be displayed
correctly; alas, without knowing what character set it's in, we can't
necessarily display it correctly...).

Other dissectors should perhaps use "isascii() && isprint()", if the
strings in question are, if text, unlikely to be text using accented
letters or other non-ASCII printable characters.