PDA

View Full Version : [WORKAROUNDS FOR] Linux client crashing



Alien88
12-07-2002, 03:45 PM
I've found a few isolated incidents of the linux client crashing still. It seems to be an issue with glibc, because the machines it is crashing on have libc 2.3.1 or newer.

If your client crashes (it normally happens immediatly), could you please tell me what version of libc you are running, and if it dumps a core (ulimit -c unlimited) run gdb sb core then type in bt and paste it here.

Thanks,
Alien88
-Mike

cballowe
12-07-2002, 04:40 PM
I'm using libc 2.3.1 and the client bombs for me immediately upon starting. Here's the trace from gdb:
#1 0x0809e0ec in dl_open_worker (a=0xbffff340) at dl-open.c:290
#2 0x0806b67f in _dl_catch_error (objname=0xbffff338, errstring=0xbffff33c,
operate=0x809de20 <dl_open_worker>, args=0xbffff340) at dl-error.c:152
#3 0x0809e1e3 in _dl_open (file=0xbffff4b0 "libnss_files.so.2", mode=1,
caller=0x0) at dl-open.c:349
#4 0x0808954e in do_dlopen (ptr=0xbffff488) at dl-libc.c:78
#5 0x0806b67f in _dl_catch_error (objname=0xbffff480, errstring=0xbffff484,
operate=0x8089538 <do_dlopen>, args=0xbffff488) at dl-error.c:152
#6 0x08089441 in __libc_dlopen (__name=0xbffff4b0 "libnss_files.so.2")
at dl-libc.c:42
#7 0x080872aa in __nss_lookup_function (ni=0x82497c8,
fct_name=0x80bb278 "gethostbyname_r") at nsswitch.c:340
#8 0x08087c1a in __nss_lookup (ni=0xbffff5c0,
fct_name=0x80bb278 "gethostbyname_r", fctp=0xbffff5c4) at nsswitch.c:147
#9 0x08065f5f in __gethostbyname_r (name=0xbffff774 "sb.pns.net",
resbuf=0x824766c, buffer=0x82490b8 "", buflen=1024, result=0xbffff604,
h_errnop=0xbffff608) at ../nss/getXXbyYY_r.c:168
#10 0x08065d57 in gethostbyname (name=0xbffff774 "sb.pns.net")
at ../nss/getXXbyYY.c:131
#11 0x08056ef1 in open_connection ()
#12 0x080569fa in block_loop ()
#13 0x080568ce in main ()
#14 0x08059eb2 in __libc_start_main (main=0x8056750 <main>, argc=2,
ubp_av=0xbffffab4, init=0x80480b4 <_init>, fini=0x80a9700 <_fini>,
rtld_fini=0, stack_end=0xbffffaac) at ../sysdeps/generic/libc-start.c:129

--------------------------
hope this helps solve the problem

aaron
12-07-2002, 05:39 PM
Running Debian Sid on Linux 2.5.50, libc 2.3.1-5

aaron@aaron-home:~$ ulimit -c unlimited
aaron@aaron-home:~$ sb /etc/sclient.conf
[Sat Dec 7 14:09:07 2002] client process [v1.0.2] invoked
[Sat Dec 7 14:09:07 2002] priority set to idle
[Sat Dec 7 14:09:07 2002] connecting to server
Segmentation fault (core dumped)
aaron@aaron-home:~$ ls core
core
aaron@aaron-home:~$ gdb sb core
GNU gdb 5.2.90_2002-11-20-cvs-debian
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-linux"...
Core was generated by `sb /etc/sclient.conf'.
Program terminated with signal 11, Segmentation fault.
#0 0x080a7001 in _dl_relocate_object () at ../sysdeps/i386/dl-machine.h:348
348 ../sysdeps/i386/dl-machine.h: No such file or directory.
in ../sysdeps/i386/dl-machine.h
(gdb) bt
#0 0x080a7001 in _dl_relocate_object () at ../sysdeps/i386/dl-machine.h:348
#1 0x0809e0ec in dl_open_worker (a=0xbffff430) at dl-open.c:290
#2 0x0806b67f in _dl_catch_error (objname=0xbffff428, errstring=0xbffff42c,
operate=0x809de20 <dl_open_worker>, args=0xbffff430) at dl-error.c:152
#3 0x0809e1e3 in _dl_open (file=0xbffff5a0 "libnss_files.so.2", mode=1,
caller=0x0) at dl-open.c:349
#4 0x0808954e in do_dlopen (ptr=0xbffff578) at dl-libc.c:78
#5 0x0806b67f in _dl_catch_error (objname=0xbffff570, errstring=0xbffff574,
operate=0x8089538 <do_dlopen>, args=0xbffff578) at dl-error.c:152
#6 0x08089441 in __libc_dlopen (__name=0xbffff5a0 "libnss_files.so.2")
at dl-libc.c:42
#7 0x080872aa in __nss_lookup_function (ni=0x8249df0,
fct_name=0x80bb278 "gethostbyname_r") at nsswitch.c:340
#8 0x08087c1a in __nss_lookup (ni=0xbffff6b0,
fct_name=0x80bb278 "gethostbyname_r", fctp=0xbffff6b4) at nsswitch.c:147
#9 0x08065f5f in __gethostbyname_r (name=0xbffff864 "sb.pns.net",
resbuf=0x824766c, buffer=0x8249800 "", buflen=1024, result=0xbffff6f4,
h_errnop=0xbffff6f8) at ../nss/getXXbyYY_r.c:168
#10 0x08065d57 in gethostbyname (name=0xbffff864 "sb.pns.net")
at ../nss/getXXbyYY.c:131
#11 0x08056ef1 in open_connection ()
#12 0x080569fa in block_loop ()
#13 0x080568ce in main ()
#14 0x08059eb2 in __libc_start_main (main=0x8056750 <main>, argc=2,
ubp_av=0xbffffba4, init=0x80480b4 <_init>, fini=0x80a9700 <_fini>,
rtld_fini=0, stack_end=0xbffffb9c) at ../sysdeps/generic/libc-start.c:129


It is deeply disturbing when the posting page dedicates more space to smilies than to text.

sweetooth
12-07-2002, 09:20 PM
Four machines all with glibc 2.3.1. The client crashes on all boxes immediatly upon startup with the same backtrace.

#0 0x080a7001 in _dl_relocate_object () at ../sysdeps/i386/dl-machine.h:348
#1 0x0809e0ec in dl_open_worker (a=0xbffff6b0) at dl-open.c:290
#2 0x0806b67f in _dl_catch_error (objname=0xbffff6a8, errstring=0xbffff6ac,
operate=0x809de20 <dl_open_worker>, args=0xbffff6b0) at dl-error.c:152
#3 0x0809e1e3 in _dl_open (file=0xbffff820 "libnss_files.so.2", mode=1,
caller=0x0) at dl-open.c:349
#4 0x0808954e in do_dlopen (ptr=0xbffff7f8) at dl-libc.c:78
#5 0x0806b67f in _dl_catch_error (objname=0xbffff7f0, errstring=0xbffff7f4,
operate=0x8089538 <do_dlopen>, args=0xbffff7f8) at dl-error.c:152
#6 0x08089441 in __libc_dlopen (__name=0xbffff820 "libnss_files.so.2")
at dl-libc.c:42
#7 0x080872aa in __nss_lookup_function (ni=0x82495a8,
fct_name=0x80bb278 "gethostbyname_r") at nsswitch.c:340
#8 0x08087c1a in __nss_lookup (ni=0xbffff930,
fct_name=0x80bb278 "gethostbyname_r", fctp=0xbffff934) at nsswitch.c:147
#9 0x08065f5f in __gethostbyname_r (name=0xbffffae4 "sb.pns.net",
resbuf=0x824766c, buffer=0x8248fb8 "", buflen=1024, result=0xbffff974,
h_errnop=0xbffff978) at ../nss/getXXbyYY_r.c:168
#10 0x08065d57 in gethostbyname (name=0xbffffae4 "sb.pns.net")
at ../nss/getXXbyYY.c:131
#11 0x08056ef1 in open_connection ()
#12 0x080569fa in block_loop ()
#13 0x080568ce in main ()
---Type <return> to continue, or q <return> to quit---
#14 0x08059eb2 in __libc_start_main (main=0x8056750 <main>, argc=2,
ubp_av=0xbffffe24, init=0x80480b4 <_init>, fini=0x80a9700 <_fini>,
rtld_fini=0, stack_end=0xbffffe1c) at ../sysdeps/generic/libc-start.c:129

BlckKnght
12-08-2002, 02:44 AM
I just got bitten by on my Debian unstable system (libc6 package version 2.3.1-5). I'm pretty sure I know what the problem is though.

After Debian uploaded glibc 2.3.1 to the unstable archive there were many bugs reported similar to this: binaries statically linked against older libc versions didn't work any more.

The reason was that the binary interface for the NSS libraries changed, and they are dynamically loaded, even by staticlly linked binaries. It's a gross violation of the user's expectations of static binaries, but it's what the glibc developers have gone with.

Anyway, most of the problems were caused by the same thing (a symbol not being exported) and when that was fixed (libc6 package version 2.3.1-3 in Debian), older static binaries worked again. But based on the crash traces it looks like there are still some gotcha's related to doing NSS lookups.

One solution to the problem would be to ship a dynamicly linked version of the binary, or one linked against glibc 2.3.1 (perhaps in addition to the current one).

I hope you will get this fixed soon.

jjjjL
12-08-2002, 10:18 AM
i got a report from a user saying the changing the host name from sb.pns.net to 216.163.34.105 is a good workaround since then the gethostbyname() call doesn't seg fault.

as BlckKnght suggests, this is a bizarre problem that really doesn't seem to have a good explaination. for now, try the work around. if it works (or doesn't) post about it here please. hopefully we can find a way to correct that in the next version.


-Louie

Alien88
12-08-2002, 11:22 AM
you're right.. that works.

jjjjL
12-11-2002, 08:27 AM
read here for more info


http://www.free-dc.org/forum/showthread.php?s=&threadid=2132