Discussion:
Bug#942122: does not fallback to ipv4 when ipv6 fails
(too old to reply)
Antoine Beaupre
2019-10-10 16:40:01 UTC
Permalink
Package: apt-cacher-ng
Version: 3.2-2
Severity: normal
Tags: ipv6

apt-cacher-ng does not deal well with dual-stack failures. At home I
regularly have trouble with my IPv6 connexions, which just hang. Most
applications are able to recover from this and fallback to IPv4, which
just works. This is therefore mostly transparent to users, at worst
there's a slight delay during the switchover.

The way to deal with this is documented in RFC 8305 and is generally
refered to as "happy eyeballs":

https://en.wikipedia.org/wiki/Happy_Eyeballs

In particular, when IPv6 fails, apt-cacher-ng fails with the mysterious:

Err :3 https://deb.debian.org/debian buster/main amd64 emacs-bin-common amd64 1:26.1+1-3.2
Reading from proxy failed - select (115: Opération maintenant en cours) [IP : 192.168.0.3 3142]

That error is in french, but I think it translates to "Operation
currently in progress". The IP there (192.168.0.3) is the IP of the proxy.

When IPv6 returns, apt-cacher-ng magically recovers. But note that it
not only times out, it also totally fails during the IPv6 outage.

Normal `apt` connexions without the proxy have no such problems and can
fallback to IPv4 fairly quickly.

-- Package-specific info:

-- System Information:
Debian Release: 10.1
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.19.0-6-amd64 (SMP w/2 CPU cores)
Locale: LANG=fr_CA.UTF-8, LC_CTYPE=fr_CA.UTF-8 (charmap=UTF-8), LANGUAGE=fr_CA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages apt-cacher-ng depends on:
ii adduser 3.118
ii debconf [debconf-2.0] 1.5.71
ii dpkg 1.19.7
ii libbz2-1.0 1.0.6-9.2~deb10u1
ii libc6 2.28-10
ii libgcc1 1:8.3.0-6
ii liblzma5 5.2.4-1
ii libssl1.1 1.1.1d-0+deb10u1
ii libstdc++6 8.3.0-6
ii libsystemd0 241-7~deb10u1
ii libwrap0 7.6.q-28
ii lsb-base 10.2019051400
ii zlib1g 1:1.2.11.dfsg-1

apt-cacher-ng recommends no packages.

Versions of packages apt-cacher-ng suggests:
ii avahi-daemon 0.7-4+b1
ii doc-base 0.10.8
ii libfuse2 2.9.9-1

-- Configuration Files:
/etc/apt-cacher-ng/acng.conf changed:
CacheDir: /var/cache/apt-cacher-ng
LogDir: /var/log/apt-cacher-ng
SupportDir: /usr/lib/apt-cacher-ng
Remap-debrep: file:deb_mirror*.gz /debian ; file:backends_debian # Debian Archives
Remap-uburep: file:ubuntu_mirrors /ubuntu ; file:backends_ubuntu # Ubuntu Archives
Remap-cygwin: file:cygwin_mirrors /cygwin # ; file:backends_cygwin # incomplete, please create this file or specify preferred mirrors here
Remap-sfnet: file:sfnet_mirrors # ; file:backends_sfnet # incomplete, please create this file or specify preferred mirrors here
Remap-alxrep: file:archlx_mirrors /archlinux # ; file:backend_archlx # Arch Linux
Remap-fedora: file:fedora_mirrors # Fedora Linux
Remap-epel: file:epel_mirrors # Fedora EPEL
Remap-slrep: file:sl_mirrors # Scientific Linux
Remap-gentoo: file:gentoo_mirrors.gz /gentoo ; file:backends_gentoo # Gentoo Archives
Remap-secdeb: security.debian.org ; security.debian.org deb.debian.org/debian-security
ReportPage: acng-report.html
ExThreshold: 4
LocalDirs: acng-doc /usr/share/doc/apt-cacher-ng
PassThroughPattern: .* # allow CONNECT to everything

/etc/apt-cacher-ng/security.conf [Errno 13] Permission non accordée: '/etc/apt-cacher-ng/security.conf'

-- debconf information:
apt-cacher-ng/port: keep
apt-cacher-ng/gentargetmode: No automated setup
apt-cacher-ng/bindaddress: keep
apt-cacher-ng/proxy: keep
apt-cacher-ng/tunnelenab
Eduard Bloch
2019-10-13 10:20:02 UTC
Permalink
Hallo,
Post by Antoine Beaupre
Package: apt-cacher-ng
Version: 3.2-2
Severity: normal
Tags: ipv6
apt-cacher-ng does not deal well with dual-stack failures. At home I
regularly have trouble with my IPv6 connexions, which just hang. Most
applications are able to recover from this and fallback to IPv4, which
just works. This is therefore mostly transparent to users, at worst
there's a slight delay during the switchover.
Okay, I agree. I also experience such situations, rarely, when the
crap modem from my provider gets upset and fails to route IPv6 (but
still reports full IPv6 connectivity in its diagnostics).
Post by Antoine Beaupre
The way to deal with this is documented in RFC 8305 and is generally
https://en.wikipedia.org/wiki/Happy_Eyeballs
Not sure this is feasible exactly the way they describe there. This RFC
is apparently mostly by DNS timing behavior but ACNG currently uses
getaddrinfo which delivers all collected DNS data at once.

What I intend to implement instead is a similar scheme:

a) get all DNS records like ATM
b) filter to IPv4 or IPv6 or both (depending on user preference in <ConnectProto> setting)
c) auto-sort the list so that the first entry is v4 or v6 (depending on <ConnectProto> preference) and the following ones are alternativing
d) start connecting on the first entry
e) when nothing happened after N seconds, start a second connection attempt in parallel (using the second address, which is of different family then)
f) if the second background connect attmpt fails after N, abort it, try the next DNS entry in the sequence, etc. etc. (while the first connect attempt is still ongoing, until <NetworkTimeout> seconds is reached)

Questions:

a) do you consider this reasonable enough?
b) would you like to become a tester for this, in case you can reproduce
this regularly? (I can fake the test case but nothing beats the real
thing)
c) any good name proposal for N? I think about "FastTimeout" (vs. NetworkTimeout)
d) good default value for N? I'd consider 5s (while NetworkTimeout: 60s currently)

Best regards,
Eduard.
Antoine Beaupré
2019-10-13 13:40:01 UTC
Permalink
Post by Eduard Bloch
Hallo,
Post by Antoine Beaupre
Package: apt-cacher-ng
Version: 3.2-2
Severity: normal
Tags: ipv6
apt-cacher-ng does not deal well with dual-stack failures. At home I
regularly have trouble with my IPv6 connexions, which just hang. Most
applications are able to recover from this and fallback to IPv4, which
just works. This is therefore mostly transparent to users, at worst
there's a slight delay during the switchover.
Okay, I agree. I also experience such situations, rarely, when the
crap modem from my provider gets upset and fails to route IPv6 (but
still reports full IPv6 connectivity in its diagnostics).
Post by Antoine Beaupre
The way to deal with this is documented in RFC 8305 and is generally
https://en.wikipedia.org/wiki/Happy_Eyeballs
Not sure this is feasible exactly the way they describe there. This RFC
is apparently mostly by DNS timing behavior but ACNG currently uses
getaddrinfo which delivers all collected DNS data at once.
a) get all DNS records like ATM
b) filter to IPv4 or IPv6 or both (depending on user preference in <ConnectProto> setting)
c) auto-sort the list so that the first entry is v4 or v6 (depending on <ConnectProto> preference) and the following ones are alternativing
d) start connecting on the first entry
e) when nothing happened after N seconds, start a second connection attempt in parallel (using the second address, which is of different family then)
f) if the second background connect attmpt fails after N, abort it, try the next DNS entry in the sequence, etc. etc. (while the first connect attempt is still ongoing, until <NetworkTimeout> seconds is reached)
a) do you consider this reasonable enough?
That seems reasonable, and actually pretty close to RFC8305.
Post by Eduard Bloch
b) would you like to become a tester for this, in case you can reproduce
this regularly? (I can fake the test case but nothing beats the real
thing)
I can try! it can take up to a month for this situation to occur and I
haven't tried reproducing it manually, but I'm definitely happy to help.
Post by Eduard Bloch
c) any good name proposal for N? I think about "FastTimeout"
(vs. NetworkTimeout)
The RFC calls this a "Connection Attempt Delay".
Post by Eduard Bloch
d) good default value for N? I'd consider 5s (while NetworkTimeout: 60s currently)
"N" is 250ms in the rfc, fwiw, which seems more reasonable than (say)
something large than 1s, let alone one *minute*. :)

It also says that connexion attempts should be spaced out by at least
10ms, preferably 100ms. This is all in section 5.

A.
--
You are absolutely deluded, if not stupid, if you think that a
worldwide collection of software engineers who can't write operating
systems or applications without security holes, can then turn around
and suddenly write virtualization layers without security holes.
- Theo de Raadt
Eduard Bloch
2019-11-08 21:30:01 UTC
Permalink
Hallo,
Post by Antoine Beaupré
Post by Eduard Bloch
a) get all DNS records like ATM
b) filter to IPv4 or IPv6 or both (depending on user preference in <ConnectProto> setting)
c) auto-sort the list so that the first entry is v4 or v6 (depending on <ConnectProto> preference) and the following ones are alternativing
d) start connecting on the first entry
e) when nothing happened after N seconds, start a second connection attempt in parallel (using the second address, which is of different family then)
f) if the second background connect attmpt fails after N, abort it, try the next DNS entry in the sequence, etc. etc. (while the first connect attempt is still ongoing, until <NetworkTimeout> seconds is reached)
a) do you consider this reasonable enough?
That seems reasonable, and actually pretty close to RFC8305.
Post by Eduard Bloch
b) would you like to become a tester for this, in case you can reproduce
this regularly? (I can fake the test case but nothing beats the real
thing)
I can try! it can take up to a month for this situation to occur and I
haven't tried reproducing it manually, but I'm definitely happy to help.
Well, I am sorry for the delay. The idea was to get it done ASAP but
it's not as easy as it seemed and I am short on spare time. I wanted to
release a fixed version a week ago but I found more and more issues with
legacy code. OTOH I think those issues has existed for years and nobody
really noticed and my stress tests were not tense enough.

So, the only thing I can offer ATM is a snapshot of unfinished version 3.3.

I believe that this is the best version of apt-cacher-ng since 2014 but
it still has a couple issues which I need to fix prior to release. And
this will take some time. For now, feel free to build it from salsa
("fakeroot debian/rules binary" on the debian/experimental branch) or
take a binary build from
https://www.unix-ag.uni-kl.de/~bloch/acng/snap3.3/ . Receiving some
feedback would be good, but I will first prepare a release to
experimental anyway.

Changelogs:

* Make errors on purging of cache folder non-fatal (closes: #915082)
* Recommends: ca-certificates (closes: #926282)

[ POSSIBLY BREAKING CHANGES ]
* the setting of cachedir and logdir in the built-in defaults is now
configurable at build time (-DACNG_CACHE_DIR=... -DACNG_LOG_DIR=...)
and this settings are also propagated into generated configuration examples
* Dropping support for CMake prior to v3.1, dropped most custom variables
for target locations, now relying on public CMake variables from
GNUInstallDirs module; for details, see
https://cmake.org/cmake/help/v3.2/module/GNUInstallDirs.html
* Dropping support for OpenSSL before 1.0.2

[ FEATURES AND IMPROVEMENTS ]
* Change of default network timeout to 40 seconds
* Alternative fallback scheme for non-primary target connection attempts
(with default timeout value of 4s, see FastTimeout setting). By default,
this should help with unstable (blocking) IPv6 routing where IPv4 is still
operational (Debian bug #942122)
* Refactored DNS resolution&caching code, potential fix of rate connection
problems
* RequiresMountsFor directive in systemd service file example with
additional remarks on keeping this in sync with the config (as in Debian
bug #929035 and partly suggested in #942355)
* PFilePattern extensions for Fedora 29 and 30, by Alan Jenkins, Debian
bug #928270 (thanks!)
* VFilePattern extension for Centos8, by Andy Lowther, Debian bug #944143
(thanks!)
* added very explicit explanation on what "default value" means in the
acng.conf example (Debian bug #855995) and also how to print it
(with acngtool, Debian bug #914746)
* Mirror database update
* Configurable timeout for forced client disconnect on the last portion of
data

[ BUGFIXES ]
* increased size of the decompression line buffer for config file reading
(Debian bug #942634)
* fixes potential data race in DNS resolution
* Typo in INSTALL file (Debian bug #913593)
* In Arch Linux database mirror list, rewrite https URLs to http since the
official JSON query only returns https versions, also add a different
source (from Debian bug #942844)
* Generation of Sourceforge mirror redirectors list is fixed
* Fixed an ancient bug where there last answered request might have been
delayed in processing due to incorrect selection of MSG_MORE flag
* Potential crash on shutdown prevented (misordered destructor sequence)

[ INTERNAL REFACTORING ]
* Overhauling code deployment, using a shared library (reducing installed
file size by up to 20%)
* dropped rfc2553emu code from old APT, using the platform abstraction from
libevent instead
* partial redesign for more singlethreaded IO operation
* Disabled LTO by default, still crashing gold linker with certain option
combinations
* Moved lingering on CLOSE_WAIT sockets to the single main thread

Best regards,
Eduard.

Loading...