Discussion:
Bug#780675: systemd: segfault in systemd when running systemctl daemon-reload
(too old to reply)
Robert Pumphrey
2015-03-17 17:40:03 UTC
Permalink
Package: systemd
Version: 215-12
Severity: critical
Justification: breaks the whole system

Dear Maintainer,

running systemctl daemon-reload causes systemd to segfault :

***@host:~# systemctl daemon-reload

Message from ***@host at Mar 17 16:41:53 ...
kernel:[ 758.716467] systemd[1]: segfault at 7f8d3e4422a0 ip 00007f8d3e4422a0 sp 00007ffd3c533458 error 15
Failed to execute operation: Connection reset by peer

I have disabled as many services as possible, but still get the error.
I have not been able to reproduce on another machine
I have not been able to change the settings on this machine to allow daemon-reload to work

Once the segfault happens, systemd not longer responsds to systemctl status
Unable to get systemd back into a state where it will respond to systemctl status without a reboot

This has been triggered by attempting to upgrade some packages (eg. sudo) whose postrm script calls systemctl --system daemon-reload

-- Package-specific info:

-- System Information:
Debian Release: 8.0
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/12 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages systemd depends on:
ii acl 2.2.52-2
ii adduser 3.113+nmu3
ii initscripts 2.88dsf-58
ii libacl1 2.2.52-2
ii libaudit1 1:2.4-1+b1
ii libblkid1 2.25.2-5
ii libc6 2.19-15
ii libcap2 1:2.24-6
ii libcap2-bin 1:2.24-6
ii libcryptsetup4 2:1.6.6-5
ii libgcrypt20 1.6.2-4+b1
ii libkmod2 18-3
ii liblzma5 5.1.1alpha+20120614-2+b3
ii libpam0g 1.1.8-3.1
ii libselinux1 2.3-2
ii libsystemd0 215-12
ii mount 2.25.2-5
ii sysv-rc 2.88dsf-58
ii udev 215-12
ii util-linux 2.25.2-5

Versions of packages systemd recommends:
ii dbus 1.8.16-1
ii libpam-systemd 215-12

Versions of packages systemd suggests:
pn systemd-ui <none>

-- no debconf information
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Michael Biebl
2015-03-17 18:00:01 UTC
Permalink
gdb --core=/core /lib/systemd/systemd
The type "set logging on" and run "bt full" afterwards,
hit return until the gdb prompt shows up again

and attach
gdb.txt to the bug report.
--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
Michael Biebl
2015-03-17 18:40:01 UTC
Permalink
control: tags -1 moreinfo
control: tags -1 unreproducible
Post by Robert Pumphrey
Package: systemd
Version: 215-12
Severity: critical
Justification: breaks the whole system
Dear Maintainer,
kernel:[ 758.716467] systemd[1]: segfault at 7f8d3e4422a0 ip
00007f8d3e4422a0 sp 00007ffd3c533458 error 15
Failed to execute operation: Connection reset by peer
I have disabled as many services as possible, but still get the error.
I have not been able to reproduce on another machine
Can you reproduce the step, how you got this crash? Did this happen once
or multiple times?
This problem happens every time I run systemctl daemon-reload on this
particular machine. I have installed jessie on another machine and not
been able to reproduce it, but the hardware is not the same.
Even after you've rebooted?
#0 0x00007f8d3e06779b in raise (sig=11) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
resultvar = 0
pid = <optimized out>
#1 0x00007f8d3e4bd3d8 in ?? ()
No symbol table info available.
#2 <signal handler called>
No locals.
#3 0x00007f8d3e4422a0 in ?? ()
No symbol table info available.
#4 0x00007f8d3e4f8caa in ?? ()
No symbol table info available.
#5 0x00007f8d3e56777f in ?? ()
No symbol table info available.
#6 0x00007f8d3e55f558 in ?? ()
No symbol table info available.
#7 0x00007f8d3e4bac6b in ?? ()
No symbol table info available.
#8 0x00007f8d3dcd0b45 in __libc_start_main (main=0x7f8d3e4b6dd0,
argc=1, argv=0x7ffd3c533ea8, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffd3c533e98)
at libc-start.c:287
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0,
5361389912737802513, 140244612264611, 140725615541920, 0, 0,
-5360124553165444847, -5369135033269130991}, mask_was_saved = 0}}, priv
= {pad = {0x0, 0x0,
0x7ffd3c533eb8, 0x7f8d3e4971a8}, data = {prev = 0x0,
cleanup = 0x0, canceltype = 1012088504}}}
not_first_call = <optimized out>
#9 0x00007f8d3e4bb2cc in ?? ()
No symbol table info available.
Hm, nothing interesting/relevant in there. Almost as if the symbols do
not match the binary/core dump or the crash is not in systemd itself.
--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
Michael Biebl
2015-03-17 21:10:02 UTC
Permalink
Post by Michael Biebl
#0 0x00007f8d3e06779b in raise (sig=11) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
resultvar = 0
pid = <optimized out>
#1 0x00007f8d3e4bd3d8 in ?? ()
No symbol table info available.
#2 <signal handler called>
No locals.
#3 0x00007f8d3e4422a0 in ?? ()
No symbol table info available.
#4 0x00007f8d3e4f8caa in ?? ()
No symbol table info available.
#5 0x00007f8d3e56777f in ?? ()
No symbol table info available.
#6 0x00007f8d3e55f558 in ?? ()
No symbol table info available.
#7 0x00007f8d3e4bac6b in ?? ()
No symbol table info available.
#8 0x00007f8d3dcd0b45 in __libc_start_main (main=0x7f8d3e4b6dd0,
argc=1, argv=0x7ffd3c533ea8, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffd3c533e98)
at libc-start.c:287
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0,
5361389912737802513, 140244612264611, 140725615541920, 0, 0,
-5360124553165444847, -5369135033269130991}, mask_was_saved = 0}}, priv
= {pad = {0x0, 0x0,
0x7ffd3c533eb8, 0x7f8d3e4971a8}, data = {prev = 0x0,
cleanup = 0x0, canceltype = 1012088504}}}
not_first_call = <optimized out>
#9 0x00007f8d3e4bb2cc in ?? ()
No symbol table info available.
Hm, nothing interesting/relevant in there. Almost as if the symbols do
not match the binary/core dump or the crash is not in systemd itself.
Do you have systemd-dbg installed?
--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
Robert Pumphrey
2015-03-17 22:00:03 UTC
Permalink
Post by Michael Biebl
Post by Michael Biebl
#0 0x00007f8d3e06779b in raise (sig=11) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
resultvar = 0
pid = <optimized out>
#1 0x00007f8d3e4bd3d8 in ?? ()
No symbol table info available.
#2 <signal handler called>
No locals.
#3 0x00007f8d3e4422a0 in ?? ()
No symbol table info available.
#4 0x00007f8d3e4f8caa in ?? ()
No symbol table info available.
#5 0x00007f8d3e56777f in ?? ()
No symbol table info available.
#6 0x00007f8d3e55f558 in ?? ()
No symbol table info available.
#7 0x00007f8d3e4bac6b in ?? ()
No symbol table info available.
#8 0x00007f8d3dcd0b45 in __libc_start_main (main=0x7f8d3e4b6dd0,
argc=1, argv=0x7ffd3c533ea8, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffd3c533e98)
at libc-start.c:287
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0,
5361389912737802513, 140244612264611, 140725615541920, 0, 0,
-5360124553165444847, -5369135033269130991}, mask_was_saved = 0}}, priv
= {pad = {0x0, 0x0,
0x7ffd3c533eb8, 0x7f8d3e4971a8}, data = {prev = 0x0,
cleanup = 0x0, canceltype = 1012088504}}}
not_first_call = <optimized out>
#9 0x00007f8d3e4bb2cc in ?? ()
No symbol table info available.
Hm, nothing interesting/relevant in there. Almost as if the symbols do
not match the binary/core dump or the crash is not in systemd itself.
Do you have systemd-dbg installed?
Sorry for failing to follow simple instructions.
I have now installed systemd-dbg

#0 0x00007f89d974f79b in raise (sig=11)
at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
resultvar = 0
pid = <optimized out>
#1 0x00007f89d9ba53d8 in crash.lto_priv.234 (sig=11) at
../src/core/main.c:158
rl = {rlim_cur = 18446744073709551615, rlim_max =
18446744073709551615}
sa = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0},
sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0,
sa_restorer = 0x0}
__func__ = "crash"
__PRETTY_FUNCTION__ = "crash"
#2 <signal handler called>
No locals.
#3 0x00007f89d9b262e0 in ?? ()
No symbol table info available.
#4 0x00007f89d9be0caa in bucket_hash () at ../src/shared/hashmap.c:168
p = 0x7f89da3efbe0
h = 0x7f89d9b0a860
#5 hashmap_remove (h=0x7f89d9b0a860, key=***@entry=0x7f89da3efbe0)
at ../src/shared/hashmap.c:574
e = <optimized out>
data = <optimized out>
#6 0x00007f89d9c00f15 in set_remove (s=<optimized out>,
value=***@entry=0x7f89da3efbe0) at ../src/shared/set.c:75
No locals.
#7 0x00007f89d9c4f77f in bidi_set_free (s=0x7f89d9b0a9a0, u=0x7f89da3efbe0)
at ../src/core/unit.c:372
d = <optimized out>
i = 0xffffffffffffffff
other = 0x7f89da3efbe0
#8 unit_free (u=0x7f89da3efbe0) at ../src/core/unit.c:484
d = <optimized out>
i = 0xffffffffffffffff
t = <optimized out>
__PRETTY_FUNCTION__ = "unit_free"
#9 0x00007f89d9c47558 in manager_clear_jobs_and_units.lto_priv.948 (
m=0x7f89da36e6a0) at ../src/core/manager.c:759
u = <optimized out>
__PRETTY_FUNCTION__ = "manager_clear_jobs_and_units"
#10 0x00007f89d9ba2c6b in manager_reload (m=0x7f89da36e6a0)
at ../src/core/manager.c:2413
q = <optimized out>
fds = <optimized out>
r = 0
f = 0x7f89da43a070
#11 main (argc=1, argv=<optimized out>) at ../src/core/main.c:1758
m = 0x7f89da36e6a0
r = <optimized out>
retval = 1
before_startup = <optimized out>
after_startup = <optimized out>
timespan =
"f\336\b`\330qX\034Q\343\061\256\272\343\222|s\302x\356.cx\204\221U_\025J=G\363BE\325\354/cx\204\060cx\204\221x\251Ýš\373\062\000\362?\355s\240sU\241\063S\247\273"
fds = 0x0
reexecute = false
shutdown_verb = 0x0
initrd_timestamp = {realtime = 0, monotonic = 0}
userspace_timestamp = {realtime = 1426610793563272,
monotonic = 5250159}
kernel_timestamp = {realtime = <optimized out>, monotonic = 0}
security_start_timestamp = {realtime = 1426610793568857,
monotonic = 5255744}
security_finish_timestamp = {realtime = 1426610793569662,
monotonic = 5256549}
systemd = "systemd"
skip_setup = false
j = <optimized out>
loaded_policy = <optimized out>
arm_reboot_watchdog = false
queue_default_job = <optimized out>
empty_etc = <optimized out>
switch_root_dir = 0x0
switch_root_init = 0x0
__func__ = "main"
__PRETTY_FUNCTION__ = "main"
debsums indicates everything is operating as expected.
memtester has not yet thrown up any errors.
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Robert Pumphrey
2015-03-17 22:10:01 UTC
Permalink
Post by Michael Biebl
Post by Michael Biebl
#0 0x00007f8d3e06779b in raise (sig=11) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
resultvar = 0
pid = <optimized out>
#1 0x00007f8d3e4bd3d8 in ?? ()
No symbol table info available.
#2 <signal handler called>
No locals.
#3 0x00007f8d3e4422a0 in ?? ()
No symbol table info available.
#4 0x00007f8d3e4f8caa in ?? ()
No symbol table info available.
#5 0x00007f8d3e56777f in ?? ()
No symbol table info available.
#6 0x00007f8d3e55f558 in ?? ()
No symbol table info available.
#7 0x00007f8d3e4bac6b in ?? ()
No symbol table info available.
#8 0x00007f8d3dcd0b45 in __libc_start_main (main=0x7f8d3e4b6dd0,
argc=1, argv=0x7ffd3c533ea8, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffd3c533e98)
at libc-start.c:287
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0,
5361389912737802513, 140244612264611, 140725615541920, 0, 0,
-5360124553165444847, -5369135033269130991}, mask_was_saved = 0}}, priv
= {pad = {0x0, 0x0,
0x7ffd3c533eb8, 0x7f8d3e4971a8}, data = {prev = 0x0,
cleanup = 0x0, canceltype = 1012088504}}}
not_first_call = <optimized out>
#9 0x00007f8d3e4bb2cc in ?? ()
No symbol table info available.
Hm, nothing interesting/relevant in there. Almost as if the symbols do
not match the binary/core dump or the crash is not in systemd itself.
Do you have systemd-dbg installed?
I have identified a duff init.d script (one of our own that previously
worked in wheezy) that is at the root of this problem. I have removed
the script, rebooted and the I can now run systemctl daemon-reload
withough a seg fault.
This bug may just indicate that systemd poorly handles a bad init
script. Please let me know if you would like details of our broken
script, otherwise, I am happy for this to be closed.

Thank you for your patience in helping me report this problem.
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Michael Biebl
2015-03-17 22:20:01 UTC
Permalink
Post by Robert Pumphrey
I have identified a duff init.d script (one of our own that previously
worked in wheezy) that is at the root of this problem. I have removed
the script, rebooted and the I can now run systemctl daemon-reload
withough a seg fault.
This bug may just indicate that systemd poorly handles a bad init
script. Please let me know if you would like details of our broken
script, otherwise, I am happy for this to be closed.
If you can share this init script, this would be appreciated.
systemd certainly shouldn't die because of such a faulty init script and
I'm actually surprised it does, since the SysV support is basically done
in an external generator. So there must be something very fishy with the
generated unit.

As said, if you can attach the faulty init script, that would be great.

Michael
--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
Robert Pumphrey
2015-03-18 15:40:04 UTC
Permalink
Post by Michael Biebl
Post by Robert Pumphrey
I have identified a duff init.d script (one of our own that previously
worked in wheezy) that is at the root of this problem. I have removed
the script, rebooted and the I can now run systemctl daemon-reload
withough a seg fault.
This bug may just indicate that systemd poorly handles a bad init
script. Please let me know if you would like details of our broken
script, otherwise, I am happy for this to be closed.
If you can share this init script, this would be appreciated.
systemd certainly shouldn't die because of such a faulty init script and
I'm actually surprised it does, since the SysV support is basically done
in an external generator. So there must be something very fishy with the
generated unit.
As said, if you can attach the faulty init script, that would be great.
Michael
I have reproduced this on a clean install of Jessie running on a virtual
machine using 32bit i686 arch.

1. Install Debian from netinst
2. put the following into /etc/init.d/firewall
#!/bin/bash
### BEGIN INIT INFO
# Provides: iptables
# Required-Start: $network $remote_fs $syslog
# Required-Stop: $network $remote_fs $syslog
# Should-Start: iptables
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
### END INIT INFO

#####################################################################
###
### Firewall rules
###
#####################################################################

case "$1" in
start)
echo "start"
;;
stop)
echo "stop"
;;
restart)
echo "restart"
;;
*)
echo "Usage: /etc/init.d/firewall {start|stop|restart}"
exit 1
;;
esac

3. chmod u+x /etc/init.d/firewall
4. update-rc.d firewall defaults
5. reboot
6. login as root
7. systemctl --system daemon-reload

then we see:

Message from ***@joule at Mar 18 14:10:40 ...
kernel:[ 27.526029] systemd[1]: segfault at b739cdac ip b739cdac sp
bf9af36c error 15
Failed to execute operation: Connection reset by peer

Also of note are the following entries in dmesg:

[ 1.075782] systemd[1]: Found ordering cycle on firewall.service/start
[ 1.075788] systemd[1]: Found dependency on firewall.service/start
[ 1.075793] systemd[1]: Breaking ordering cycle by deleting job
firewall.service/start
[ 1.075799] systemd[1]: Job firewall.service/start deleted to break
ordering cycle starting with firewall.service/start

Also note that the problem is not reproducible if the Provides: and
Should-Start: name match the init script name, so I guess mismatch in
the script name and header is at the root of the problem.
--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Michael Biebl
2015-03-18 17:10:01 UTC
Permalink
Control: tags -1 = confirmed
Control: severity -1 serious
Post by Robert Pumphrey
2. put the following into /etc/init.d/firewall
#!/bin/bash
### BEGIN INIT INFO
# Provides: iptables
# Required-Start: $network $remote_fs $syslog
# Required-Stop: $network $remote_fs $syslog
# Should-Start: iptables
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
### END INIT INFO
#####################################################################
###
### Firewall rules
###
#####################################################################
case "$1" in
start)
echo "start"
;;
stop)
echo "stop"
;;
restart)
echo "restart"
;;
*)
echo "Usage: /etc/init.d/firewall {start|stop|restart}"
exit 1
;;
esac
3. chmod u+x /etc/init.d/firewall
4. update-rc.d firewall defaults
5. reboot
6. login as root
7. systemctl --system daemon-reload
kernel:[ 27.526029] systemd[1]: segfault at b739cdac ip b739cdac sp
bf9af36c error 15
Failed to execute operation: Connection reset by peer
...
Post by Robert Pumphrey
Also note that the problem is not reproducible if the Provides: and
Should-Start: name match the init script name, so I guess mismatch in
the script name and header is at the root of the problem.
Thanks for sharing the contents of the file. I can confirm the crash and
we have enough information now to debug this issue properly.
Marking the bug accordingly.

Michael
--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
Michael Biebl
2015-03-17 18:50:01 UTC
Permalink
control: tags -1 moreinfo
control: tags -1 unreproducible
Post by Robert Pumphrey
Package: systemd
Version: 215-12
Severity: critical
Justification: breaks the whole system
Dear Maintainer,
kernel:[ 758.716467] systemd[1]: segfault at 7f8d3e4422a0 ip
00007f8d3e4422a0 sp 00007ffd3c533458 error 15
Failed to execute operation: Connection reset by peer
I have disabled as many services as possible, but still get the error.
I have not been able to reproduce on another machine
Can you reproduce the step, how you got this crash? Did this happen once
or multiple times?
This problem happens every time I run systemctl daemon-reload on this
particular machine. I have installed jessie on another machine and not
been able to reproduce it, but the hardware is not the same.
Could you check for faulty RAM (with memtest) and run debsums over your
installed packages.
--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
Loading...