Wireshark · Wireshark-dev: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8

Wireshark-dev: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8

From: Guy Harris <guy@xxxxxxxxxxxx>

Date: Mon, 20 Jan 2014 17:27:24 -0800

On Jan 20, 2014, at 1:49 PM, Martin Kaiser <lists@xxxxxxxxx> wrote:

> I committed the change to tvb_get_string() in r54864.

I've changed that *not* to map bytes with the 8th bit set to REPLACEMENT CHARACTER for UTF-8 strings.  For UTF-8 strings, we need to do a more complicated check and map invalid octet sequences to REPLACEMENT CHARACTER.  (We also need to do some more stuff for UCS-2, UTF-16, and UCS-4.)

tvb_get_string() still treats the string as ASCII.

> I'll have a look at tvb_get_stringz() tomorrow.

I've added that (with the same change *not* to do it for UTF-8 strings).  tvb_get_stringz() treats the string as ASCII.

Follow-Ups:
- Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
  - From: Evan Huus

References:
- [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
  - From: Martin Kaiser
- Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
  - From: Evan Huus
- Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
  - From: Martin Kaiser

Prev by Date: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
Next by Date: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
Previous by thread: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
Next by thread: Re: [Wireshark-dev] tvb_get_string_enc() doesn't always return valid UTF-8
Index(es):
- Date
- Thread