Ethereal-dev: Re: [Ethereal-dev] ethereal 0.10.3 hangs on Redhat Linux 9 (glibc 2.3.2)

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: craig <craig@xxxxxxxxxxx>
Date: Tue, 17 Aug 2004 18:48:19 -0700
On Tue, 17 Aug 2004 Guy Harris (gharris@xxxxxxxxx) wrote:
} Subject: Re: [Ethereal-dev] ethereal 0.10.3 hangs on Redhat Linux 9 (glibc 2.3.2)

} craig said:
} > it turns out that the problem is caused by either the glibc implementation
} > of gethostbyaddr() or ethereal's use of signals and longjmp(3) when
} > calling it (depending on your perspective).
} >
} > i fixed the problem with the attached patch to epan/resolv.c, which
} > insures that AVOID_DNS_TIMEOUT is never defined.  (the comments at the
} > front of epan/libresolv.c make it clear that the author considers the
} > code associated with AVOID_DNS_TIMEOUT somewhat dubious at best.)
} 
} Well, there's no unique referent for the phrase "the author", unless you
} mean the author of that comment - and, from the stuff in the comment, yes,
} I *do* think it's not ideal that we do the timeouts (it probably *could*
} be implemented on Windows, but it'd be more complicated, and, on many
} UN*Xes - OS X, for one, and, apparently, Linuxes with recent versions of
} glibc, and, as per the OpenBSD changes to tcpdump, perhaps other platforms
} as well, although that might've been the result of auditing rather than
} seeing problems "live").

i guess i assumed that the author of the comment was also the author of
that module, not the author of ethereal.  no offense meant to all the
folks who've contributed ...


} > note that gethostbyaddr() seems to timeout after 5 sec all by itself,
} > so the AVOID_DNS_TIMEOUT isn't really required.
} 
} Well, that depends on the implementation.  The original tcpdump changes
} were put in to deal with the NIS name resolver in some version of SunOS,
} which might just have kept trying forever to resolve names (I think the
} NIS client code tends to just retry RPCs if they time out, perhaps to keep
} callers of "getpwnam()" and the like from not getting information if the
} NIS server is temporarily down or unreachable) - the comment said
} 
}     /*
}       * "getname" is written in this atrocious way to make sure we don't
}       * wait forever while trying to get hostnames from yp.
}       */
} 
} and, in that case, it arguably *would* have been necessary.
} 
} I don't know whether more recent NIS implementations support both "hard
} mount" (keep trying forever) and "soft mount" (give up after a while)
} lookup calls, or, if they do, whether "gethostbyname()" and
} "gethostbyaddr()" use the "give up after a while" calls.
} 
} I also don't know whether it matters, i.e. how many sites there are that
} *still* use NIS for name <-> address resolution, rather than using DNS. 
} (I think NIS was created when DNS was much less likely to be used within
} an organization, even if that organization was large enough that flat
} files didn't work.)
} 
} So it might now be OK to just get rid of the signal/longjmp crap, although
} there are *still* cases where the timeout is depressingly large - and,
} unfortunately, the worst of them is on Windows, where address-to-name
} resolution might involve a NetBIOS Name Service "lookup", which consists
} of sending to the NBNS port of a machine with a given IP addresa an NBNS
} query for information, which would return the name if the machine's
} listening on that port, and return absolutely nothing if it's not.  That
} takes a while to time out....
} 
} In the best of all possible worlds, we might somehow do *all* name
} resolution asynchronously, either by using ADNS and possibly a home-brew
} "ANBNS" for doing NBNS lookups, or by doing the lookups in a separate
} process, so that we don't stall the UI behind a lookup; unfortunately,
} there'd be a bit more work to get into that world.
} 
} So, for now, unless somebody can think of a compelling reason to leave the
} alarm/sigjmp code around, I'd vote to remove it.

actually, i wasn't suggesting that the code be ripped out entirely;
i realize that glibc based systems are only a subset, perhaps a small
subset, of the OS platforms that ethereal runs on.  but for glibc based
systems it seems to be a "bad idea".

unfortunately, AVOID_DNS_TIMEOUT is not one of the variables controlled
by autoconf, though i suppose it could be.  in the same vein, autoconf
doesn't seem to generate a #define that could be used to recognize glibc,
though that could be changed as well.

i looked around to see if the ADNS stuff is installed by default with
RedHat Linux.  if so, that would be the much preferable solution.
but no such luck i'm afraid.

cheers,

craig.


-- 
{apple,amdahl}!veritas!craig				      craig@xxxxxxxxxxx
(415) 668-3564 (h)					      (650) 527-8520 (w)