Wireshark-bugs: [Wireshark-bugs] [Bug 4102] there's UTF8 char in manuf file.

Date: Tue, 30 Nov 2010 14:29:45 -0800 (PST)
https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=4102

Jeff Morriss <jeff.morriss.ws@xxxxxxxxx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #6 from Jeff Morriss <jeff.morriss.ws@xxxxxxxxx> 2010-11-30 14:29:42 PST ---
(In reply to comment #5)
> This is exhibiting our usual lack of unicode support problem, as seen on the
> console:
> 
> (wireshark:70378): Pango-WARNING **: Invalid UTF-8 string passed to
> pango_layout_set_text()

But, you don't see that with this sample capture:

https://bugs.wireshark.org/bugzilla/attachment.cgi?id=5350

There the UTF-8 is rendered correctly in the GUI.  Admittedly it is NOT
rendering correctly in my terminal (from tshark) today, but I think it did work
on my other PC (running a more modern Fedora).

Investigating further, I found that the problem is that in Geutebrück's case,
the string ENDS in UTF-8 and when epan/addr_resolv.c truncates the string (to
MAXMANUFLEN *bytes*), it is corrupting the UTF-8.

I fixed this in rev 35082 by:
- changing make-manuf to be the enforcer of the 8-*character* limit (previously
it was limiting it to 10 characters only to have the string limited again by
epan/addr_resolv.c)
- using dynamic allocation in epan/addr_resolv.c for manuf names.  That way we
don't allocate 8*4 bytes for all the entries (UTF-8 characters can be up to 4
bytes and we're limiting the names to 8 characters) and we don't truncate (and
possibly corrupt) because our storage area isn't big enough for those UTF-8
characters.

-- 
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.