https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=2619
--- Comment #4 from Thomas Boehne <TBoehne@xxxxxxxx> 2008-06-26 00:10:19 PDT ---
I think, the file does not have a consistent encoding scheme. I removed all
ASCII characters from the file and ended up with less than 400 characters. The
linux "file" utility says that the rest is "Non-ISO extended-ASCII text".
Converting it with iconv from iso8859-15 to utf-8 lead to a file where at least
95% of the characters are correct.
The erratic characters in 001F6F are not a problem, because they are in a
region that is skipped when parsing the file.
I tried some other encodings than iso8859, but since most of the non-ASCII
chars should be german umlauts, the other encodings did not lead to proper
results.
I agree that the patch is not perfect, but definitely better than the current
code which produces non-iso extended-ASCII text that is handed to code that
expects utf-8.
--
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.