Ethereal-dev: [Ethereal-dev] ethereal 0.10.3 hangs on Redhat Linux 9 (glibc 2.3.2)
Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.
From: craig <craig@xxxxxxxxxxx>
Date: Tue, 17 Aug 2004 16:32:04 -0700
hi folks, i've run into a problem with several release of ethereal where it will hang while decoding packets during live capture or in a capture file. when this occurs strace shows that ethereal is blocked in the futex() system call. this occurs. it turns out that the problem is caused by either the glibc implementation of gethostbyaddr() or ethereal's use of signals and longjmp(3) when calling it (depending on your perspective). i fixed the problem with the attached patch to epan/resolv.c, which insures that AVOID_DNS_TIMEOUT is never defined. (the comments at the front of epan/libresolv.c make it clear that the author considers the code associated with AVOID_DNS_TIMEOUT somewhat dubious at best.) you'll likely want to do something more elegant for your release. note that gethostbyaddr() seems to timeout after 5 sec all by itself, so the AVOID_DNS_TIMEOUT isn't really required. the problem is that glibc has added calls to pthread_mutex_lock() to gethostbyaddr(), perhaps in an attempt to make it thread safe. this disassmebly of gethostbyaddr() from /usr/lib/libc.a makes that clear: 00000000 <gethostbyaddr>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 57 push %edi 4: 56 push %esi 5: 53 push %ebx 6: b8 00 00 00 00 mov $0x0,%eax 7: R_386_32 __pthread_mutex_lock b: 83 ec 0c sub $0xc,%esp e: 85 c0 test %eax,%eax 10: c7 45 f0 00 00 00 00 movl $0x0,0xfffffff0(%ebp) 17: 74 10 je 29 <gethostbyaddr+0x29> 19: 83 ec 0c sub $0xc,%esp 1c: 68 18 00 00 00 push $0x18 1d: R_386_32 .bss 21: e8 fc ff ff ff call 22 <gethostbyaddr+0x22> 22: R_386_PC32 __pthread_mutex_lock ... meanwhile, host_name_lookup() in epan/libresolv.c includes code to set an alarm and long jump out of gethostbyaddr() if the alarm ticks over: static gchar *host_name_lookup(guint addr, gboolean *found) { int hash_idx; hashname_t * volatile tp; struct hostent *hostp; [... deletia ...] if (addr != 0 && (g_resolv_flags & RESOLV_NETWORK)) { /* Use async DNS if possible, else fall back to timeouts, * else call gethostbyaddr and hope for the best */ # ifdef AVOID_DNS_TIMEOUT /* Quick hack to avoid DNS/YP timeout */ if (!setjmp(hostname_env)) { signal(SIGALRM, abort_network_query); alarm(DNS_TIMEOUT); # endif /* AVOID_DNS_TIMEOUT */ hostp = gethostbyaddr((char *)&addr, 4, AF_INET); # ifdef AVOID_DNS_TIMEOUT alarm(0); # endif /* AVOID_DNS_TIMEOUT */ so, if a call to gethostbyaddr() takes more than 2 sec, the signal occurs and we longjump out of gethostbyaddr() without releasing the mutex. then, when ethereal calls gethostbyaddr() again it deadlocks against the lock. this bug would seem to exist for anyone using recent versions of glibc. cheers, craig. P.S. occasionally my mozilla hangs as well, and strace shows it blocked in futex(). i don't know if it suffers from the same problem or not. P.P.S. here's part of an strace of ethereal running with my fix which shows the lookup timing out all by itself. connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.137.9.163")}, 28) = 0 send(8, "X\375\1\0\0\1\0\0\0\0\0\0\003250\00227\003145\003218\7"..., 45, 0) = 45gettimeofday({1092785388, 119730}, NULL) = 0 poll([{fd=8, events=POLLIN}], 1, 5000) = 0 -- {apple,amdahl}!veritas!craig craig@xxxxxxxxxxx (415) 668-3564 (h) (650) 527-8520 (w)
--- /usr/src/redhat/BUILD/ethereal-0.10.3/epan/resolv.c 2004-01-25 07:46:31.000000000 -0800 +++ /local/src/cmd/ethereal/ethereal-0.10.3/epan/resolv.c 2004-07-20 12:48:24.000000000 -0700 @@ -46,7 +46,7 @@ * code in tcpdump, to avoid those sorts of problems, and that was * picked up by tcpdump.org tcpdump. */ -#if !defined(WIN32) && !defined(__APPLE__) +#if !defined(WIN32) && !defined(__APPLE__) && 0 #ifndef AVOID_DNS_TIMEOUT #define AVOID_DNS_TIMEOUT #endif
- Follow-Ups:
- Prev by Date: [Ethereal-dev] "Find by filter" option for tree context menu
- Next by Date: Re: [Ethereal-dev] ethereal 0.10.3 hangs on Redhat Linux 9 (glibc 2.3.2)
- Previous by thread: [Ethereal-dev] "Find by filter" option for tree context menu
- Next by thread: Re: [Ethereal-dev] ethereal 0.10.3 hangs on Redhat Linux 9 (glibc 2.3.2)
- Index(es):