Discussion:
Bug#958934: bind9: named fails to start after upgrade to 9.16.2
(too old to reply)
R. Scott Bailey
2020-04-26 22:00:01 UTC
Permalink
Package: bind9
Version: 1:9.16.2-3
Severity: critical
Justification: breaks unrelated software

Dear Maintainer,

Life was good on my DNS server until my recent update to 9.16.2-3.
After upgrading, the exact configuration that was happy now fails to
start. Example:

# named -g -u bind
26-Apr-2020 17:25:50.861 starting BIND 9.16.2-Debian (Stable Release) <id:b310dc7>
26-Apr-2020 17:25:50.861 running on Linux x86_64 5.5.0-2-amd64 #1 SMP Debian 5.5.17-1 (2020-04-15)
26-Apr-2020 17:25:50.861 built with '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=/usr/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--libdir=/usr/lib/x86_64-linux-gnu' '--sysconfdir=/etc/bind' '--with-python=python3' '--localstatedir=/' '--enable-threads' '--enable-largefile' '--with-libtool' '--enable-shared' '--enable-static' '--with-gost=no' '--with-openssl=/usr' '--with-gssapi=/usr' '--with-libidn2' '--with-libjson-c' '--with-lmdb=/usr' '--with-gnu-ld' '--with-maxminddb' '--with-atf=no' '--enable-ipv6' '--enable-rrl' '--enable-filter-aaaa' '--disable-native-pkcs11' '--enable-dnstap' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/bind9-Co3jFO/bind9-9.16.2=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-aliasing -fno-delete-null-pointer-checks -DNO_VERSION_DATE -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
26-Apr-2020 17:25:50.861 running as: named -g -u bind
26-Apr-2020 17:25:50.861 compiled by GCC 9.3.0
26-Apr-2020 17:25:50.861 compiled with OpenSSL version: OpenSSL 1.1.1g 21 Apr 2020
26-Apr-2020 17:25:50.861 linked to OpenSSL version: OpenSSL 1.1.1g 21 Apr 2020
26-Apr-2020 17:25:50.861 compiled with libxml2 version: 2.9.10
26-Apr-2020 17:25:50.861 linked to libxml2 version: 20910
26-Apr-2020 17:25:50.861 compiled with json-c version: 0.13.1
26-Apr-2020 17:25:50.861 linked to json-c version: 0.13.1
26-Apr-2020 17:25:50.861 compiled with zlib version: 1.2.11
26-Apr-2020 17:25:50.861 linked to zlib version: 1.2.11
26-Apr-2020 17:25:50.861 ----------------------------------------------------
26-Apr-2020 17:25:50.861 BIND 9 is maintained by Internet Systems Consortium,
26-Apr-2020 17:25:50.861 Inc. (ISC), a non-profit 501(c)(3) public-benefit
26-Apr-2020 17:25:50.861 corporation. Support and training for BIND 9 are
26-Apr-2020 17:25:50.861 available at https://www.isc.org/support
26-Apr-2020 17:25:50.861 ----------------------------------------------------
26-Apr-2020 17:25:50.861 found 8 CPUs, using 8 worker threads
26-Apr-2020 17:25:50.861 using 8 UDP listeners per interface
26-Apr-2020 17:25:50.869 using up to 21000 sockets
26-Apr-2020 17:25:50.877 loading configuration from '/etc/bind/named.conf'
26-Apr-2020 17:25:50.881 reading built-in trust anchors from file '/etc/bind/bind.keys'
26-Apr-2020 17:25:50.901 looking for GeoIP2 databases in '/usr/share/GeoIP'
26-Apr-2020 17:25:50.901 using default UDP/IPv4 port range: [1024, 65535]
26-Apr-2020 17:25:50.905 using default UDP/IPv6 port range: [1024, 65535]
26-Apr-2020 17:25:50.905 listening on IPv4 interface lo, 127.0.0.1#5353
26-Apr-2020 17:25:50.913 listening on IPv4 interface br0, 16.1.1.3#5353
26-Apr-2020 17:25:50.917 listening on IPv6 interface lo, ::1#5353
26-Apr-2020 17:25:50.921 unable to set effective uid to 0: Operation not permitted
26-Apr-2020 17:25:50.921 Could not open '//run/named/named.pid'.
26-Apr-2020 17:25:50.921 Please check file and directory permissions or reconfigure the filename.
26-Apr-2020 17:25:50.921 could not open file '//run/named/named.pid': Permission denied
26-Apr-2020 17:25:50.921 generating session key for dynamic DNS
26-Apr-2020 17:25:50.929 unable to set effective uid to 0: Operation not permitted
26-Apr-2020 17:25:50.929 Could not open '//run/named/session.key'.
26-Apr-2020 17:25:50.929 Please check file and directory permissions or reconfigure the filename.
26-Apr-2020 17:25:50.929 could not open file '//run/named/session.key': Permission denied
26-Apr-2020 17:25:50.929 could not create //run/named/session.key
26-Apr-2020 17:25:50.929 failed to generate session key for dynamic DNS: permission denied
26-Apr-2020 17:25:50.929 sizing zone task pool based on 29 zones
26-Apr-2020 17:25:50.933 could not configure root hints from '/usr/share/dns/root.hints': permission denied
26-Apr-2020 17:25:50.957 loading configuration: permission denied
26-Apr-2020 17:25:50.957 exiting (due to fatal error)

Usability of entire system (and in fact, all systems on home network) is impaired as primary
DNS server is unresponsive and failover to secondary server (after query to primary times out)
induces noticeable delay in DNS lookups (and consequently nearly all network operations).

I am reverting to previous bind9 package as soon as I file this report. :-(

Thanks,
-Scott

-- System Information:
Debian Release: bullseye/sid
APT prefers testing
APT policy: (990, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.5.0-2-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages bind9 depends on:
ii adduser 3.118
ii bind9-libs 1:9.16.2-3
ii bind9-utils 1:9.16.2-3
ii debconf [debconf-2.0] 1.5.74
ii dns-root-data 2019052802
ii init-system-helpers 1.57
ii iproute2 5.6.0-1
ii libc6 2.30-4
ii libcap2 1:2.33-1
ii libfstrm0 0.6.0-1+b1
ii libjson-c4 0.13.1+dfsg-7
ii liblmdb0 0.9.24-1
ii libmaxminddb0 1.3.2-1
ii libprotobuf-c1 1.3.3-1+b1
ii libssl1.1 1.1.1g-1
ii libxml2 2.9.10+dfsg-5
ii lsb-base 11.1.0
ii netbase 6.1
ii zlib1g 1:1.2.11.dfsg-2

bind9 recommends no packages.

Versions of packages bind9 suggests:
pn bind-doc <none>
ii bind9-dnsutils [dnsutils] 1:9.16.2-3
ii dnsutils 1:9.16.2-3
pn resolvconf <none>
pn ufw <none>

-- Configuration Files:
/etc/bind/named.conf.local changed:
//
// Do any local configuration here
//
zone "bailey" {
type master;
file "/etc/bind/db.bailey";
};
zone "troy.cartasoft.com" {
type master;
file "/etc/bind/db.troy";
};
zone "ldap.troy.cartasoft.com" {
type master;
file "/etc/bind/db.ldap";
};
zone "1.1.16.in-addr.arpa" {
type master;
file "/etc/bind/db.16.1.1";
};
zone "1.168.192.in-addr.arpa" {
type master;
file "/etc/bind/db.192.168.1";
};
zone "200.168.192.in-addr.arpa" {
type master;
file "/etc/bind/db.192.168.200";
};
// Consider adding the 1918 zones here, if they are not used in your
// organization
include "/etc/bind/zones.rfc1918";

/etc/bind/named.conf.options changed:
options {
directory "/var/cache/bind";
// RSB 1/6/2016 - Use nonstandard port 5353 so there is no conflict with
// dnsmasq, and also basically no chance anybody will query us directly
// instead of using dnsmasq (which apparently is more unflappable)...
// RSB 9/18/2016 - Open external address too for razzleb to query
listen-on port 5353 { 127.0.0.1; 16.1.1.3; };
listen-on-v6 port 5353 { ::1; };
// If there is a firewall between you and nameservers you want
// to talk to, you might need to uncomment the query-source
// directive below. Previous versions of BIND always asked
// questions using port 53, but BIND 8.1 and later use an unprivileged
// port by default.
// query-source address * port 53;
// If your ISP provided one or more IP addresses for stable
// nameservers, you probably want to use them as forwarders.
// Uncomment the following block, and insert the addresses replacing
// the all-0's placeholder.
auth-nxdomain no; # conform to RFC1035
};
// Added manually RSB 2/27/2008
include "/etc/bind/rndc.key";
controls {
inet 127.0.0.1 allow { localhost; };
};
// Added manually RSB 1/6/2016
//logging {
// channel buzz_channel {
// file "/var/log/named.log";
// severity debug;
// print-category yes;
// print-severity yes;
// print-time yes;
// };
// category default {
// buzz_channel;
// };
// category queries {
// buzz_channel;
// };
//};


-- debconf information:
bind9/different-configuration-file:
bind9/start-as-user: bind
bind9/run-resolvconf: false
Ondřej Surý
2020-04-27 12:20:01 UTC
Permalink
Control: severity -1 normal

I concur with Bernhard.

And since there's not enough information, we can go only by the log
messages.

It seems to me that either BIND uses custom selinux/apparmor profile
or something else is in place that disallows named from accessing those
files.

The BIND 9 provides enough information for you to fix the problem before
you start
filling serious bugs. This is not a serious bug.

Ondrej
Dear Scott,
Post by R. Scott Bailey
Life was good on my DNS server until my recent update to 9.16.2-3.
After upgrading, the exact configuration that was happy now fails to
# named -g -u bind
26-Apr-2020 17:25:50.921 Could not open '//run/named/named.pid'.
26-Apr-2020 17:25:50.921 Please check file and directory permissions or
reconfigure the filename.
Permission denied
Post by R. Scott Bailey
26-Apr-2020 17:25:50.921 generating session key for dynamic DNS
26-Apr-2020 17:25:50.929 Could not open '//run/named/session.key'.
26-Apr-2020 17:25:50.929 Please check file and directory permissions or
reconfigure the filename.
Permission denied
Post by R. Scott Bailey
26-Apr-2020 17:25:50.929 could not create //run/named/session.key
permission denied
/run/named should be created by tmpfiles, see
/usr/lib/tmpfiles.d/named.conf .
What does "ls -la /run/named" show?
Post by R. Scott Bailey
26-Apr-2020 17:25:50.929 sizing zone task pool based on 29 zones
26-Apr-2020 17:25:50.933 could not configure root hints from
'/usr/share/dns/root.hints': permission denied> 26-Apr-2020 17:25:50.957
loading configuration: permission denied
That one is supposed to be world-readable...
Do you have any denials from AppArmor in your kernel messages?
I have upgraded several machines from 9.11 to 9.16 and they generally
worked fine.
Bernhard
Ondřej Surý
2020-04-27 13:50:02 UTC
Permalink
Hi Scott,

I would suggest running `named -g -u bind` under strace:

strace -o /tmp/bind9.trace -f /usr/sbin/named -g -u bind

And then look what’s happening when `named` tries to open the files.

Also there’s upstream bug related to this:

26-Apr-2020 17:25:50.921 unable to set effective uid to 0: Operation not permitted

see:
https://gitlab.isc.org/isc-projects/bind9/-/issues/1042
and
https://gitlab.isc.org/isc-projects/bind9/-/issues/1090

But we (as upstream) haven’t got to fix this yet (although I can reproduce it with
9.17.1, so it’s only a matter of having enough time to properly look at the issue).

Ondrej
--
Ondřej SurÜ
buzz:~# ls -la /run/named
total 8
drwxrwxr-x 2 root bind 80 Apr 26 17:46 .
drwxr-xr-x 58 root root 1940 Apr 26 16:23 ..
-rw-r--r-- 1 bind bind 7 Apr 26 17:46 named.pid
-rw------- 1 bind bind 102 Apr 26 17:46 session.key
buzz:~# dpkg -l bind9
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-====================-============-==========================>
ii bind9 1:9.11.5.P4+dfsg-5.1 amd64 Internet Domain Name Server
buzz:~#
1. System running "n-1" testing version of bind9, everything operates normally
2. System updates to bind9 1:9.16.2-3, bind9 refuses to start
3. I look at directories, which exist, and compare settings/permissions with my secondary server (rpi3 running Debian 10.3), and can't find anything that looks different. Even removing "-u bind" from startup doesn't change anything! (although I confirmed at that time that "touch /run/named/xxxx" as root worked.)
4. Do (paraphrased for brevity) "install bind9/stable" to downgrade to known-good package version
5. bind9 starts and runs normally
To clarify, step 2 was an "apply all outstanding updates" activity, so it is possible that another update performed in that bundle is responsible for the breakage. However, step 4 modified only bind9 and its supporting libraries, and was the ONLY thing I did to move from broken to stable operation. Therefore I conclude that although it may not be at fault with respect to root cause, the latest testing version clearly is less resistant to something in my configuration than the stable version.
buzz:/var/log# grep apparmor daemon.log.1
Apr 25 19:53:19 buzz apparmor.systemd[1211]: Restarting AppArmor
Apr 25 19:53:19 buzz apparmor.systemd[1211]: Reloading AppArmor profiles
Apr 25 19:53:19 buzz apparmor.systemd[1248]: Warning: found usr.sbin.chronyd in /etc/apparmor.d/force-complain, forcing complain mode
Apr 25 19:53:19 buzz apparmor.systemd[1248]: Warning from /etc/apparmor.d (/etc/apparmor.d/usr.sbin.chronyd line 60): Warning failed to create cache: usr.sbin.chronyd
buzz:/var/log#
And: (from a system boot while troubleshooting this issue - at the same time as above)
buzz:/var/log# grep -Fi apparmor messages*
messages.1:Apr 25 19:53:19 buzz kernel: [ 0.184487] AppArmor: AppArmor initialized
messages.1:Apr 25 19:53:19 buzz kernel: [ 0.510404] AppArmor: AppArmor Filesystem Enabled
messages.1:Apr 25 19:53:19 buzz kernel: [ 1.407974] AppArmor: AppArmor sha1 policy hashing enabled
buzz:/var/log#
Thanks to noisy kvm messages, I cannot positively confirm there is nothing of interest in dmesg... but I would have expected messages to have been recorded somewhere in /var/log if they were being generated, and there's nothing beyond what I pasted above.
I am totally happy to try upgrading and playtesting again, but I would argue that "the error messages provide enough information for you to fix the problem" is inaccurate when the message is "permission denied", the user of interest clearly has the required permissions on the target directory (which exists), and EVEN THE ROOT ACCOUNT gets "permission denied" -- and yet no other error messages are generated anywhere on the system.
Does anybody have any additional suggestions on what to look at the next time I start a troubleshooting session?
Thanks,
Scott
Andreas Hasenack
2020-04-27 16:30:02 UTC
Permalink
Also check if you don't have systemd overrides in place, that could be
preventing named from writing to certain directories:

systemctl cat bind9.service
buzz:~# journalctl -k -b0 | grep -F apparmor
buzz:~#
So whatever's going on, it doesn't look like AppArmor has anything to do with it.
aa-enabled
sudo aa-status | grep -E '^[0-9]+|named'
Thanks,
Simon
Marc Dequènes (duck)
2021-02-20 08:00:01 UTC
Permalink
Quack,

I had a similar problem just now when migrating from
1:9.11.5.P4+dfsg-5.1+deb10u2 to 1:9.16.11-2~bpo10+1.

Aside from the similar permission errors I also had this output:
# aa-status | grep -E '^[0-9]+|named'
34 profiles are loaded.
18 profiles are in enforce mode.
/usr/sbin/named
named

This means there's two conflicting profiles loaded.

Btw `journalctl -k -b0 | grep -F apparmor` does not catch apparmor
messages in my case, but they are available in /var/log/audit/audit.log
(I have auditd installed but no specific rules).

So I looked at the rules:
# rgrep /usr/sbin/named /etc/apparmor*
/etc/apparmor.d/apparmor-profile:/usr/sbin/named {
/etc/apparmor.d/apparmor-profile: /usr/sbin/named mr,
/etc/apparmor.d/usr.sbin.named:profile named /usr/sbin/named
flags=(attach_disconnected) {
/etc/apparmor.d/usr.sbin.named: /usr/sbin/named mr,

I have no idea where apparmor-profile comes from but it's not of my
doing:
# dlocate /etc/apparmor.d/apparmor-profile
bind9: /etc/apparmor.d/apparmor-profile

I did not find in which version it came from though, maybe some previous
bpo; it's dated 2007 and a very basic profile (I can paste it if one
wants to look at it).

A reload of apparmor was not sufficient but this worked:
apparmor_parser -R /etc/apparmor.d/apparmor-profile
rm /etc/apparmor.d/apparmor-profile
apparmor_parser -r /etc/apparmor.d/usr.sbin.named
systemctl restart named

Hope this helps.
\_o<
--
Marc Dequènes
Continue reading on narkive:
Loading...