Ethereal-users: Re: [Ethereal-users] Message Error

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

Date Prev · Date Next · Thread Prev · Thread Next
From: Guy Harris <gharris@xxxxxxxxx>
Date: Wed, 30 Jun 2004 22:11:36 -0700
On Wed, Jun 30, 2004 at 01:29:07PM +0200, Biot Olivier wrote:
> You're most certainly correct. The code page in use in Brazil probably is an
> ISO-LATIN variant (ISO-8859-1 maybe), for which the 128 8-bit characters
> (values 128 to 255) have a different encoding in UTF-8. The GTK2 interface
> expects UTF8-encoded text and sees some bytes which seemingly are not
> encoded as UTF-8 hence the Pango rendering engine is complaining :)
>  
> Sounds like we'll have to add locale and charset translation support to
> Ethereal :)

What we should add eventually is:

	a way for a dissector to indicate that a string extracted from
	the protocol is in some character set encoding (note that the
	encoding of a particular string in a particular packet will not
	necessarily be supported by "iconv()" on the platform on which
	Ethereal is running, and some platforms might not even have
	"iconv()", so we can't just use "iconv()" - we might be able to
	take the GNU libiconv code, give it another name to avoid
	namespace collisions, and supply our own conversion tables, *if*
	all the encodings we'd support can be handled by "iconv()");

	a way to translate from that to UTF-8 for GTK+ 2.x (and somehow
	translate to the appropriate encoding for the font being used
	for GTK+ 1.2[.x]; other toolkits will probably support 16-bit
	Unicode, e.g. Windows, although we might need whatever that
	Microsoft add-on is to add Unicode support to Windows 95/98/Me,
	or UTF-8, so I suspect GTK+ 1.2[.x] is the only platform for
	which we'd need that, although that would also require either
	that the platform do something reasonable with characters for
	which it has no glyph or let us find out whether there is a
	glyph for the character);

	a way to translate from that to the appropriate character
	encoding for:

		text output in Tethereal;

		printing to a printer (which will probably eventually
		become platform-dependent);

		saving the summary or detail dissection to a text file.

For now, we should just make heavy use of "format_text()" and
"tvb_format_text()", which will escape non-printable characters (which,
for GTK+ 2.x, is translated as "non-ASCII characters" - yes, that's a
deficiency, but fixing that deficiency requires that we know what
encoding the characters are in...).