Wireshark · Wireshark-dev: Re: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)

Wireshark-dev: Re: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)

From: Guy Harris <guy@xxxxxxxxxxxx>

Date: Thu, 14 Dec 2006 11:49:35 -0800

Sebastien Tandel wrote:

   is there any reason to use guint8* instead of gchar*?


For what purpose?

If you're dealing with an array of 8-bit bytes, or a pointer to asequence of those, guint8 is the right type; it makes it clear thatthey're bytes, not characters (it might be binary, it might be asequence of 16-bit "bytes" in a UTF-16-encoded string, it might be aUTF-8 string, etc.).

I.e., tvb_get_ptr(), for example, should return a "guint8 *", as shouldtvb_memdup(), and the raw packet data you get from Wiretap should bepointed to by a "guint8 *".

Note also that you can safely pass a guint8 or guchar to one of the<ctype.h> routines, but you can't safely pass a gchar to them, as theymight get sign-extended into negative values if the 8th bit is set (Ithink that none of the popular platforms for Windows and modern UN*Xeshave C compilers with "char" an unsigned type, so I think "might" can bereplaced by "will" in practice).

With gcc-4.0, there is the new feature warning you that "pointer target
differs in signedness" (which is not such a bad thing).

I suspect most of those warnings are for cases where you're treatingbyte sequences as character strings.

What I think we *really* need to do, for those cases, is have adifferent way of handling strings. The current way we handle stringsdoesn't take into account the fact that there are a number of differentcharacter encodings for strings - "ASCII" (which would imply that a bytewith the 8th bit set is an error), ISO 8859/n, other EUC encodings,Shift-JIS, KOI8, UTF-8, UTF-16, etc..


See the first item under "Dissector infrastructure" on the

	http://wiki.wireshark.org/Development/Wishlist

page. (That discusses two items - the dissector APIs for handlingstrings, and the UI aspects of this. The former doesn't require thelatter - we can continue to display non-ASCII characters as escapesequences - but the latter, which is something we should ultimately do,requires some way of getting all strings from packets translated intoUnicode.)

May we change these guint8* to gchar* ? I mean may we change the type of
the concerned variables and not cast to every call of a function ?

Which ones are you thinking of? We shouldn't globally replace guint8with gchar, as per my comments in the beginning.

Follow-Ups:
- Re: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)
  - From: Sebastien Tandel
- [Wireshark-dev] String Handling API
  - From: Sebastien Tandel

References:
- [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)
  - From: Sebastien Tandel

Prev by Date: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)
Next by Date: Re: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)
Previous by thread: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)
Next by thread: Re: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)
Index(es):
- Date
- Thread