Wireshark-dev: Re: [Wireshark-dev] UTF8 vs. locale in error messages (bug 5715)

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Tue, 28 Jun 2011 11:36:44 -0700
On Jun 28, 2011, at 10:27 AM, Guy Harris wrote:

> We have an issue regarding strings in packets in general.  Strings might be in a number of encodings, including ASCII (meaning that any byte with the 8th bit set is something that shouldn't be there), other national variants of ISO 646, UTF-8, UTF-16, UCS-2 (meaning "only the Basic Multilingual plane, with no surrogate pairs"), ISO 8859/x for various values of x, various ISO 2022-based encodings (e.g., the EUC encodings), various national standards, various DOS and Windows code pages, various Mac OS encodings, EBCDIC, whatever encodings are used for SMS, etc., etc., etc, etc.:
> 
> 	http://en.wikipedia.org/wiki/Template:Character_encoding

As long as I'm piling up a ton of information about humanity's twisty little maze of character encodings, all different:

SMS:

	https://secure.wikimedia.org/wikipedia/en/wiki/GSM_03.38