Discussion:
Bug#1054642: Failing ARP relay from external -> Linux bridge -> veth port --> NS veth port
Add Reply
Luca Boccassi
2023-10-27 09:30:01 UTC
Reply
Permalink
Control: tags -1 upstream
Package: iproute2
Version: 4.20.0-2+deb10u1
Hello Debian team,
I would like to report problem which possibly has to do with IPROUTE2 package, I’m experiencing it both Debian 10 (this) and 12 (6.1.0-3). I really did my best reviewing at least 7 stack-exchange (and like) stories and I’m at my wit’s end, wondering why this is possibly not fixed in 2023 seeing debates go back into like 2014..
== 2) Once I enslave veth port to bridge, I can’t reach external network <===
3) Veth also does not work on IP level anymore, all the time with ICMP echo from NS space it runs ARP first, though both “ip nei” are populated with mutual MAC records. The following goes in loop..
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vinet-brp, link-type EN10MB (Ethernet), capture size 262144 bytes
11:18:12.966955 IP 70.0.0.251 > 70.0.0.44: ICMP echo request, id 2333, seq 0, length 64
11:18:12.966984 ARP, Request who-has 70.0.0.251 tell 70.0.0.44, length 28
11:18:12.966989 ARP, Reply 70.0.0.251 is-at 0e:61:72:97:aa:ff, length 28
11:18:13.967994 IP 70.0.0.251 > 70.0.0.44: ICMP echo request, id 2333, seq 1, length
4) Once I configure bridge iface with some IP address of same subnet /24 like veth and NS veth (also externals) use → the NS nei can show changing MAC address for bridge veth iface
11:30:27.860907 ARP, Reply 70.0.0.251 is-at 0e:61:72:97:aa:ff, length 28
11:30:28.848251 IP 70.0.0.251 > 70.0.0.44: ICMP echo request, id 2352, seq 14, length 64
11:30:28.884892 ARP, Request who-has 70.0.0.251 tell 70.0.0.44, length 28
11:30:28.884908 ARP, Reply 70.0.0.251 is-at 0e:61:72:97:aa:ff, length 28
11:30:28.980890 ARP, Request who-has 70.0.0.44 tell 70.0.0.251, length 28
11:30:28.980909 ARP, Reply 70.0.0.44 is-at 00:50:56:01:01:02, length 28 <---
inet_bash >> ip nei
70.0.0.1 dev vinet FAILED
70.0.0.44 dev vinet lladdr ce:18:16:4b:0c:c2 DELAY <---
5) The bridge vs NS veth pinging works
https://unix.stackexchange.com/questions/655602/why-two-bridged-veth-cannot-ping-each-other/674621#674621
10) Even tried to stop default MAC learning on bridge veth iface by “ip link set dev vinet-brp type bridge_slave learning off” ⇒ did not work, neigh flushed and pinging .251 vs .254 just worked.
So I believe that bridge veth iface is broken in its essential functionality using default “learning/flooding on” settings.
Thanks for your time to look at this and give hope or deny this so I need to create extra ports in my host to get what I want to.
BR
Peter
You need to report this upstream, nobody here is going to look at
something like this
Daniel Gröber
2023-10-29 10:50:01 UTC
Reply
Permalink
Hi Peter,
No attempt at all? Then it's against your own rules I've read before
submitting this.
I think Luca was a bit harsh here, I'd be happy to help debug this. From a
first look it seems unlikely this is related to iproute2, smells more like
a kernel issue to me, but either way we need a reproducer.

So first step to move this forward is to put together a self contained set
of instructions to reproduce the problem. Your original report is a bit
sparse on context and details.

If you don't feel up to compiling the reproducer script yourself you could
start by showing us your system state using

$ ip -d addr show # on the host and inside the NS if you could
$ bridge -d link; bridge fdb

snippets from /etc/network/interfaces or similar relevant config would help
too.

--Daniel
Daniel Gröber
2023-10-30 12:10:01 UTC
Reply
Permalink
Hi Peter,
Would it be possible to join a Webex session setup by me to check this
out quickly? It's all lab environment.
I don't think that would help with reproducing your environment in this
case, besides I only offer synchronous debugging sessions for paid
consulting engagements.
If not I will proceed per your instructions
Please do.

--Daniel
Daniel Gröber
2023-11-11 00:40:01 UTC
Reply
Permalink
Hi Peter,

looking at the ip/bridge dumps I don't see anything obviously broken so I
started by building a reproducer using two netns'en and a bridge on the
host to simulate your setup, leaving out the vlan stuff for now.

I setup two namespaces ns0/ns1 with a veth pair each connected to br0 on
the host. I assign MAC addresses statically to make looking at `bridge fdb`
easier (grep ^aa:). The script looks like this (trimmed, full version
attached):

ip netns add ns0
ip link add veth0 type veth \
peer name veth0 address aa:00:00:00:00:00 netns ns0

ip netns add ns1
ip link add veth1 type veth \
peer name veth1 address aa:00:00:00:00:01 netns ns1

ip link add br0 address aa:bb:bb:bb:bb:bb type bridge forward_delay 0
#^ forward_delay=0 to disable STP as this delays interfaces coming up
ip link set dev veth0 master br0
ip link set dev veth1 master br0

ip -n ns0 addr add 10.0.0.100/24 dev veth0
ip -n ns1 addr add 10.0.0.101/24 dev veth1

ip link set br0 up
ip link set dev veth0 up
ip -n ns0 link set dev veth0 up
ip link set dev veth1 up
ip -n ns1 link set dev veth1 up

ip -n ns0 link set dev lo up
ip -n ns1 link set dev lo up

ip netns exec ns0 ping -c4 10.0.0.101

Seems to work fine. So we can establish the basic functionality does work
and we need to go deeper.

Peter, can you confirm this script works on your system? If so the next
step is introducing vlans.
network namespace, just external border point "vlan199"
How did you check this?
2) now collecting data for you, honestly I don’t see external mac address
on "inet-br" object, so my previous statement was incorrect.. {ossibly I
might mixed this up with another "labinet-br" (working in its limited
scope) which is IP-defined, while "inet-br" in question is not.
3) so question is, if the MACs learnt via vlan199 are supposed to be
paired (displayed) with "inet-br" object and all way up into NS....
I want to make sure we're on the same page, how do you check if the MAC is
reaching into the NS? I assume using `ip neigh`? I'd have a look at tcpdump
this will tell you whether ARP is even reaching the NS or not.

Something simple like

$ tcpdump -enli $IFACE 'arp or icmp'

optionally you can filter by MAC (`ether host` in pcap-filter speak):

$ tcpdump -enli $IFACE ('arp or icmp) and ether host aa:00:00:00:00:01

Oh and one last thing: please double and tripple check that a firewall
isn't interfering.

--Daniel
Daniel Gröber
2023-11-18 02:50:01 UTC
Reply
Permalink
Hi Peter,
In the meantime, I was stubborn to find a solution to what I need in
order to progress and MACVLAN tech actually delivered it (private mode
enough),
I used to love macvlan too but now I do L3 instead ;P
something newer than VETH tech what I could read about, and it's
just perfect avoiding bridge itself. So no problem to cooperate on this
fix, I will be glad, just it can get lower priority on your side if you
even attributed it some 😊
I'd be happy to still track this bug down but I need you to investigate the
behaviour in your environment. If you've torn down the lab already we can
also just call it quits.

If you do want to continue some questions are still pending, see quoted below.
network namespace, just external border point "vlan199"
How did you check this?
2) now collecting data for you, honestly I don’t see external mac
address on "inet-br" object, so my previous statement was incorrect..
{ossibly I might mixed this up with another "labinet-br" (working in
its limited
scope) which is IP-defined, while "inet-br" in question is not.
3) so question is, if the MACs learnt via vlan199 are supposed to be
paired (displayed) with "inet-br" object and all way up into NS....
I want to make sure we're on the same page, how do you check if the MAC is reaching into the NS? I assume using `ip neigh`? I'd have a look at tcpdump this will tell you whether ARP is even reaching the NS or not.
Something simple like
$ tcpdump -enli $IFACE 'arp or icmp'
$ tcpdump -enli $IFACE ('arp or icmp) and ether host aa:00:00:00:00:01
Oh and one last thing: please double and tripple check that a firewall isn't interfering.
--Daniel
Daniel Gröber
2023-12-03 10:20:01 UTC
Reply
Permalink
Hi Peter,
So now we move to VLAN level?
Yeah, but I'm still waiting for the answers to my questions from two emails
I'd be happy to still track this bug down but I need you to investigate
the behaviour in your environment. If you've torn down the lab already we
can also just call it quits.
If you do want to continue some questions are still pending, see quoted below.
Post by Daniel Gröber
network namespace, just external border point "vlan199"
How did you check this?
2) now collecting data for you, honestly I don’t see external mac
address on "inet-br" object, so my previous statement was incorrect..
{ossibly I might mixed this up with another "labinet-br" (working in
its limited
scope) which is IP-defined, while "inet-br" in question is not.
3) so question is, if the MACs learnt via vlan199 are supposed to be
paired (displayed) with "inet-br" object and all way up into NS....
I want to make sure we're on the same page, how do you check if the MAC
is reaching into the NS? I assume using `ip neigh`? I'd have a look at
tcpdump this will tell you whether ARP is even reaching the NS or not.
Something simple like
$ tcpdump -enli $IFACE 'arp or icmp'
$ tcpdump -enli $IFACE ('arp or icmp) and ether host
aa:00:00:00:00:01
Oh and one last thing: please double and tripple check that a firewall isn't interfering.
--Daniel
Uwe Kleine-König
2024-11-06 21:00:01 UTC
Reply
Permalink
Hello Peter,
Hi Daniel,
hope you are good, had peaceful Christmas time, entering yet better NY 2024 hope so... sorry for overlooking this, even wanted to respond early December, then got delayed again.. Now I do so as its still interesting to me!
1) yes, my sole quick method was "ip nei" command to confirm the ARP passthrough
2) no firewall at all, plain Debian installation
3) you will not believe --> but before Xmas and now, it all works and MAC is passed e2e. That's so pitty. Only change I made was my underlay change of vSwitch uplink to another port... because I re-considered my overall lab setup, yet it hardly could improve this as the external MAC made it to external (VLAN) iface of the bridge, before/. Anyhow, possibly I understand the "bridge fbd" only shows learned MACs on given interface (my VLAN199) and is not supposed to attribute it to all others all way up to NS, like I attempted to guess..
Finally, either this of MACVLAN setup (where I found this), I have new finding which I don’t like as it creates a hell of duplicate traffic into network. The problem is, that either VETH or MACVLAN-configured IP host's VM duplicates incoming packets on its receiving port, connected to vSphere vSwitch, which in turn just dully floods it to uplinks, where my Wireshark sniffer sees it. This is how I discovered that.
I prepared this diagram for you to see and tell. https://docs.google.com/document/d/1mNkZswDSG_OjLnsgXJvIX2tUGSEebcZf720eS29eFCA/edit?usp=sharing
I have problems understanding your mail. Under 3) you write "it all
works" but then there are still some issues about duplicate traffic
(which isn't the original problem?).

Can you please clearify if there is still something to do/fix?

Best regards
Uwe

Official
2024-11-05 07:50:01 UTC
Reply
Permalink
Hello,Did you receive my previous message?
Greetings.
------------------
Pozdrav, jeste li primili moju prethodnu poruku?
Lijepi pozdrav.
Loading...