Wireshark-dev: Re: [Wireshark-dev] guint8* and gchar* ... and Vim ?! :)

From: Sebastien Tandel <sebastien@xxxxxxxxx>
Date: Thu, 14 Dec 2006 22:14:55 +0100
Hi  Guy,
 
Thanks for this quick answer.

The purpose was clear : get rid of these warnings. They clearly come
from calls to g_FUNC which take as parameters gchar*. Every functions
from glib working on string take gchar* as parameter and wireshark uses
a lot of them. (I should have be more accurate)

When taking the example of the array of 8-bit bytes ... I agree with you
for the use of guint8*. And gcc won't output a warning because I'm using
guint8* with ptv_get_ptr() or ptv_memdup.

However talking about strings, I think using guint8* is confusing the
coder as gchar* is (should be) clearly something pointing to an array of
character. (And furthermore it creates a lot of warnings with gcc-4.0
blabla ... :)) What's behind gchar*, to know, the character encoding
(ASCII, UTF-X, ISO8859, ...), is out of the scope (for the current coder
point of view) even if I agree with you for the API handling string (In
fact, it could be a standalone project). It will give the ability to the
coder in wireshark to handle correctly all the character encodings for
strings but ... until now we don't have the API to do this but we know a
type which is called gchar* (defined by glib which is used by wireshark)
... why then using guint8*?


Regards,
Sebastien Tandel

Guy Harris wrote:
> Sebastien Tandel wrote:
>
>   
>>    is there any reason to use guint8* instead of gchar*?
>>     
>
> For what purpose?
>
> If you're dealing with an array of 8-bit bytes, or a pointer to a 
> sequence of those, guint8 is the right type; it makes it clear that 
> they're bytes, not characters (it might be binary, it might be a 
> sequence of 16-bit "bytes" in a UTF-16-encoded string, it might be a 
> UTF-8 string, etc.).
>
> I.e., tvb_get_ptr(), for example, should return a "guint8 *", as should 
> tvb_memdup(), and the raw packet data you get from Wiretap should be 
> pointed to by a "guint8 *".
>
> Note also that you can safely pass a guint8 or guchar to one of the 
> <ctype.h> routines, but you can't safely pass a gchar to them, as they 
> might get sign-extended into negative values if the 8th bit is set (I 
> think that none of the popular platforms for Windows and modern UN*Xes 
> have C compilers with "char" an unsigned type, so I think "might" can be 
> replaced by "will" in practice).
>
>   
>> With gcc-4.0, there is the new feature warning you that "pointer target
>> differs in signedness" (which is not such a bad thing).
>>     
>
> I suspect most of those warnings are for cases where you're treating 
> byte sequences as character strings.
>
> What I think we *really* need to do, for those cases, is have a 
> different way of handling strings.  The current way we handle strings 
> doesn't take into account the fact that there are a number of different 
> character encodings for strings - "ASCII" (which would imply that a byte 
> with the 8th bit set is an error), ISO 8859/n, other EUC encodings, 
> Shift-JIS, KOI8, UTF-8, UTF-16, etc..
>
> See the first item under "Dissector infrastructure" on the
>
> 	http://wiki.wireshark.org/Development/Wishlist
>
> page.  (That discusses two items - the dissector APIs for handling 
> strings, and the UI aspects of this.  The former doesn't require the 
> latter - we can continue to display non-ASCII characters as escape 
> sequences - but the latter, which is something we should ultimately do, 
> requires some way of getting all strings from packets translated into 
> Unicode.)
>
>   
>> May we change these guint8* to gchar* ? I mean may we change the type of
>> the concerned variables and not cast to every call of a function ?
>>     
>
> Which ones are you thinking of?  We shouldn't globally replace guint8 
> with gchar, as per my comments in the beginning.
>   
> _______________________________________________
> Wireshark-dev mailing list
> Wireshark-dev@xxxxxxxxxxxxx
> http://www.wireshark.org/mailman/listinfo/wireshark-dev
>