Ethereal-users: Re: [Ethereal-users] GSM MAP SMS decode

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxx>
Date: Mon, 01 Aug 2005 10:58:15 -0700
John Vincent wrote:

I notice that the decode from GSM MAP SMS text does not work correctly when the encoding scheme is UCS2 (it works OK for the GSM
7-bit alphabet. Would anybody be intersted in fixing that? To fix it
I guess ethereal needs support for displaying unicode fonts (unless
it already has that...?)

A limited capability to handle it could be provided simply by ignoring (or displaying as "\XNNNN") non-ASCII characters.

Displaying Unicode characters is not hard with GTK+ 2.x - GTK+ 2.x expects to be handed a UTF-8 string. It's harder with GTK+ 1.2[.x] - I'm not even sure how to find out the encoding GTK+ 1.2[.x] expects for strings.

(The native Windows GUI code Gerald's been working on off and on would, I think, require something such as the Microsoft Layer for Unicode:

	http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx

in order to support W95/W98/WMe. Were we to do a native KDE version, I think the version of Qt that KDE 3.x uses handles Unicode natively - either as UCS-2 or UTF-8; OS X also uses UTF-8 encoded Unicode in the GUI code.

This also raises issues of file access - recent versions of GLib have stdio wrapper routines that presumably either do nothing or translate from UTF-8 to the locale's character set on UN*X, and map to Unicode on Windows.)

The main thing Ethereal needs is a more sophisticated scheme for handling strings, as there are a number of different character encodings that can be used for strings (UCS-2, both little-endian as in SMB and probably big-endian in some places; UTF-8; ISO 8859/n for various values of n; assorted other EUC character encodings, e.g. DBCS encodings for various Asian languages; various non-EUC character encodings, including DOS/Windows code pages, old Mac character sets, Shift-JIS, KOI-8, etc., etc., etc. - oh, and don't forget EBCDIC, if, as I think is the case, some SNA protocols we dissect use it).

This might also call for us incorporating our own version of iconv, or something equivalent, as the Single UNIX Specification item on iconv says that the actual encoding names are implementation-dependent, but we need a *platform-independent* way for a dissector to specify that a string is, say, MacRoman or ISO 8859/1 and for the Ethereal core to translate from that to UTF-8.

See the first item under "Dissector infrastructure" in

	http://wiki.ethereal.com/Development_2fWishlist

Even if ethereal can't display the message correctly, it would be nice if it didn't say [Malformed Packet: GSM SMS] which it does at
the moment.

I suspect that's a separate problem.