Ethereal-dev: Re: [Ethereal-dev] Ethereal 0.99.0pre1 available for testing

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Gerald Combs <gerald@xxxxxxxxxxxx>
Date: Tue, 25 Apr 2006 16:48:17 -0500
Andrew Hood wrote:
> Gerald Combs wrote:
>> Ulf Lamping wrote:
>>
>>
>>> When opening the About dialog, the console spits out some: "Invalid utf8
>>> encoding: ..." messages on WinXP caused by some entries in the authors
>>> file.
>>>
>>> The developer build doesn't do so. When I remember correct, Guy recently
>>> committed a change to the authors file ...
>>
>> Guy's fix has been copied to the 1.0 trunk.  Is there a way to reliably
>> check the UTF-8 conformance of the AUTHORS file, and would it be worth
>> it to add a script to the test suite?
> 
> Not trivial but "recode" will complain loudly if something isn't in the
> format it is supposed to be.
> 
> e.g.
> cp AUTHORS AUTHORS.test
> recode utf-8..utf-16 AUTHORS.test
> if [ $? -gt 0 ] ; then echo AUTHORS is broken ; fi
> rm AUTHORS.test

The following checks the beginning of the AUTHORS file for a utf-8 BOM
and verifies that it can be decoded as utf-8.  Is that sufficient for
our needs?

----
#!/bin/env python

import codecs
import sys

authors = open('AUTHORS')

contents = authors.read()

# Make sure the file starts with a UTF8 BOM.
if not contents.startswith(codecs.BOM_UTF8):
    print >> sys.stderr, 'Bad byte order mark'
    sys.exit(1)

# Try decoding the contents
try:
    contents.decode('utf-8')
except:
    print >> sys.stderr, 'Bad encoding'
    sys.exit(1)
----