craig said:
> it turns out that the problem is caused by either the glibc implementation
> of gethostbyaddr() or ethereal's use of signals and longjmp(3) when
> calling it (depending on your perspective).
>
> i fixed the problem with the attached patch to epan/resolv.c, which
> insures that AVOID_DNS_TIMEOUT is never defined. (the comments at the
> front of epan/libresolv.c make it clear that the author considers the
> code associated with AVOID_DNS_TIMEOUT somewhat dubious at best.)
Well, there's no unique referent for the phrase "the author", unless you
mean the author of that comment - and, from the stuff in the comment, yes,
I *do* think it's not ideal that we do the timeouts (it probably *could*
be implemented on Windows, but it'd be more complicated, and, on many
UN*Xes - OS X, for one, and, apparently, Linuxes with recent versions of
glibc, and, as per the OpenBSD changes to tcpdump, perhaps other platforms
as well, although that might've been the result of auditing rather than
seeing problems "live").
> note that gethostbyaddr() seems to timeout after 5 sec all by itself,
> so the AVOID_DNS_TIMEOUT isn't really required.
Well, that depends on the implementation. The original tcpdump changes
were put in to deal with the NIS name resolver in some version of SunOS,
which might just have kept trying forever to resolve names (I think the
NIS client code tends to just retry RPCs if they time out, perhaps to keep
callers of "getpwnam()" and the like from not getting information if the
NIS server is temporarily down or unreachable) - the comment said
/*
* "getname" is written in this atrocious way to make sure we don't
* wait forever while trying to get hostnames from yp.
*/
and, in that case, it arguably *would* have been necessary.
I don't know whether more recent NIS implementations support both "hard
mount" (keep trying forever) and "soft mount" (give up after a while)
lookup calls, or, if they do, whether "gethostbyname()" and
"gethostbyaddr()" use the "give up after a while" calls.
I also don't know whether it matters, i.e. how many sites there are that
*still* use NIS for name <-> address resolution, rather than using DNS.
(I think NIS was created when DNS was much less likely to be used within
an organization, even if that organization was large enough that flat
files didn't work.)
So it might now be OK to just get rid of the signal/longjmp crap, although
there are *still* cases where the timeout is depressingly large - and,
unfortunately, the worst of them is on Windows, where address-to-name
resolution might involve a NetBIOS Name Service "lookup", which consists
of sending to the NBNS port of a machine with a given IP addresa an NBNS
query for information, which would return the name if the machine's
listening on that port, and return absolutely nothing if it's not. That
takes a while to time out....
In the best of all possible worlds, we might somehow do *all* name
resolution asynchronously, either by using ADNS and possibly a home-brew
"ANBNS" for doing NBNS lookups, or by doing the lookups in a separate
process, so that we don't stall the UI behind a lookup; unfortunately,
there'd be a bit more work to get into that world.
So, for now, unless somebody can think of a compelling reason to leave the
alarm/sigjmp code around, I'd vote to remove it.