Anuradha Karuppiah <anuradhak@cumulusnetworks.com>: Author Summary

Builds triggered by Anuradha Karuppiah <anuradhak@cumulusnetworks.com>

Builds triggered by an author are those builds which contains changes committed by the author.
50
27 (54%)
23 (46%)

Breakages and fixes

Broken means the build has failed but the previous build was successful.
Fixed means that the build was successful but the previous build has failed.
12 (24% of all builds triggered)
4 (8% of all builds triggered)
-8
Build Completed Code commits Tests
FRR › FRRREL70 › #3 1 week ago
zebra: display metric for connected routes
In a VRR/VRRP setup we can have connected routes with different costs.
So this change eliminates suppressing metric display for connected routes.

Sample output -
root@TORC11:~# vtysh -c "show ipv6 route vrf vrf1"
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
       > - selected route, * - FIB route

VRF vrf1:
K * ::/0 [255/8192] unreachable (ICMP unreachable), 00:00:36
C * 2001:aa:1::/64 [0/100] is directly connected, vlan1002-v0, 00:00:36
C>* 2001:aa:1::/64 [0/90] is directly connected, vlan1002, 00:00:36

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: set connected route metric based on the devaddr metric
MACVLAN devices are typically used for applications such as VRR/VRRP that
require a second MAC address (virtual). These devices have a corresponding
SVI/VLAN device -
root@TORC11:~# ip addr show vlan1002
39: vlan1002@bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:02:00:00:00:2e brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::2/64 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~# ip addr show vlan1002-v0
40: vlan1002-v0@vlan1002: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::a/64 metric 1024 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~#

The macvlan device is used primarily for RX (VR-IP/VR-MAC). And TX is via
the SVI. To acheive that functionality the macvlan network's metric
is set to a higher value.

Zebra currently ignores the devaddr metric sent by the kernel and hardcodes
it to 0. This commit eliminates that hardcoding. If the devaddr metric
is available (METRIC_MAX) it is used for setting up the connected route
otherwise we fallback to the dev/interface metric.

Setting the macvlan metric to a higher value ensures that zebra will always
select the connected route on the SVI (and subsequently use it for next hop
resolution etc.) -
root@TORC11:~# vtysh -c "show ip route vrf vrf1 2001:aa:1::/64"
Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 1024, vrf vrf1
  Last update 11:30:56 ago
  * directly connected, vlan1002-v0

Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 0, vrf vrf1, best
  Last update 11:30:56 ago
  * directly connected, vlan1002

root@TORC11:~#

Ticket: CM-23511
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Testless build
FRR › FRRV601 › #3 1 week ago
zebra: set connected route metric based on the devaddr metric
MACVLAN devices are typically used for applications such as VRR/VRRP that
require a second MAC address (virtual). These devices have a corresponding
SVI/VLAN device -
root@TORC11:~# ip addr show vlan1002
39: vlan1002@bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:02:00:00:00:2e brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::2/64 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~# ip addr show vlan1002-v0
40: vlan1002-v0@vlan1002: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::a/64 metric 1024 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~#

The macvlan device is used primarily for RX (VR-IP/VR-MAC). And TX is via
the SVI. To acheive that functionality the macvlan network's metric
is set to a higher value.

Zebra currently ignores the devaddr metric sent by the kernel and hardcodes
it to 0. This commit eliminates that hardcoding. If the devaddr metric
is available (METRIC_MAX) it is used for setting up the connected route
otherwise we fallback to the dev/interface metric.

Setting the macvlan metric to a higher value ensures that zebra will always
select the connected route on the SVI (and subsequently use it for next hop
resolution etc.) -
root@TORC11:~# vtysh -c "show ip route vrf vrf1 2001:aa:1::/64"
Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 1024, vrf vrf1
  Last update 11:30:56 ago
  * directly connected, vlan1002-v0

Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 0, vrf vrf1, best
  Last update 11:30:56 ago
  * directly connected, vlan1002

root@TORC11:~#

Ticket: CM-23511
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: display metric for connected routes
In a VRR/VRRP setup we can have connected routes with different costs.
So this change eliminates suppressing metric display for connected routes.

Sample output -
root@TORC11:~# vtysh -c "show ipv6 route vrf vrf1"
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
       > - selected route, * - FIB route

VRF vrf1:
K * ::/0 [255/8192] unreachable (ICMP unreachable), 00:00:36
C * 2001:aa:1::/64 [0/100] is directly connected, vlan1002-v0, 00:00:36
C>* 2001:aa:1::/64 [0/90] is directly connected, vlan1002, 00:00:36

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
438 passed
FRR › FRR › #2116 4 weeks ago
pimd: definition of pim-evpn origination and termination devices
Two devices have special significance to multicast VxLAN tunnels -
1. tunnel origination device -
This device is used as the source device to vxlan-encapsulate BUM traffic.
In the case of the default-vrf this is lo. And in the case of non-default
VRF this is vrf-net-device. This patchset is limited to default-VRF
underlay so all subsequent references of origination-dev are to lo. But it
is possible in the future to extend support to non-default VRFs.
Sample origination mroute on single-VTEP:
(27.0.0.7, 239.1.1.100)          Iif: lo         Oifs: uplink-1

In the case of MLAG we need to mroute traffic form the MLAG-peer so
we force the IIF to the ISL.
Sample origination mroute on MLAG-VTEP:
(36.0.0.9, 239.1.1.100)          Iif: peerlink-3.4094 Oifs: peerlink-3.4094 uplink-1

2. tunnel termination device -
This device is used in the OIL to indicate that packets matching the flow
must be vxlan terminated and overlay packets subsequently forward to the
tenants. A special device has been created for this purpose called ipmr-lo.
This is a simple dummy interface from the kernel perspective which has
special siginficance only to pimd which implicitly enabled pim on the
device and adds it to the termination mroutes.
Sample termination mroute:
(0.0.0.0, 239.1.1.100)           Iif: uplink-1   Oifs: uplink-1 ipmr-lo

PS: currently we default the termination device name to "ipmr-lo" but in
the future it is possible to provide a config command to set the
termination device.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: trigger SG update on l2-vni<=>mcast-grp changes
An SG entry is added (if one doesn't already exist) when a l2-VNI is
associated with a mcast-grp and local-vtep-ip.

And viceversa; when the last l2-vni using a MDT is removed the SG
entry is deleted.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: update vxlan mroute entries when the lo or peerlink vif is updated
For vxlan origination mroutes the IIF is pinned to
a. lo for single VTEPs
b. peerlink-rif for anycast VTEPs

This commit includes the changes to react to  pim-vifi add/del for these
devices.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
lib: return listnode on add for subsequent efficent del
Having to lookup the DLL node to delete it defeats one purpose of using
DLLs.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: vxlan (S, G) cache management
Based code for adding (S, G) entries. These entries are created when
a mcast-group and local-VTEP-IP is associated with and L2 VNI.

The parent (*, G) entries are created implicitly on the (S, G) addition
and play the role of termination entries.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: header changes for l2 vni bum-mcast-grp handling
The multicast group ip address for BUM traffic is configurable per-l2-vni.
One way to configure that is to setup a vxlan device that per-l2-vni and
specify the address against that vxlan device -
root@TORS1:~# vtysh -c "show interface vx-1000" |grep -i vxlan
  Interface Type Vxlan
  VxLAN Id 1000 VTEP IP: 27.0.0.15 Access VLAN Id 1000 Mcast 239.1.1.100
root@TORS1:~# vtysh -c "show evpn vni 1000" |grep Mcast
 Mcast group: 239.1.1.100
root@TORS1:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: display commands for the pim-vxlan-sg database and worklist
Sample output:
root@TORS1:~# vtysh -c "show ip pim vxlan-groups"
Codes: I -> installed
Source          Group           Input           Output          Flags
27.0.0.7        239.1.1.101     lo                              I
*               239.1.1.100     -               ipmr-lo         I
*               239.1.1.101     -               ipmr-lo         I
27.0.0.7        239.1.1.100     lo                              I
root@TORS1:~#

root@TORS1:~# vtysh -c "show ip pim vxlan-work"
Codes: I -> installed
Source          Group           Input           Flags
27.0.0.7        239.1.1.100     lo                              I
PS: note the worklist dump is a hidden command

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: header changes to cache MLAG information needed for pim-vxlan
This information will come in from a MLAG component. Hidden commands
will also be provided for testing the interface independent of the
separate MLAG component.

PS: It is possible that this cache will be merged with the base
pim-mlag datastructures once they are available.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
lib, zebra: changes to propagate vxlan mcast SG entries to pimd
These updates act as triggers to pimd to -
1. join the MDT for rxing VxLAN encapsulated BUM traffic
2. register the local-vtep-ip as a source for the MDT

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: support for vxlan origination-upstream entries
For every (local-vtep-ip, bum-mcast-grp) registered by evpn an origination
mroute is setup by pimd. The purpose of this mroute is to forward vxlan
encapsulated BUM -
Sample mroute (single VTEP):
(27.0.0.7, 239.1.1.100)     Iif: lo      Oifs: uplink-1
Sample mroute (anycast VTEP):
(36.0.0.9, 239.1.1.100)          Iif: peerlink-3.4094\
                                       Oifs: peerlink-3.4094 uplink-1

This commit is part-1 of orignation mroute setup and includes setup
of upstream entries with vxlan properties.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: use "mcast group" instead of just mcast in show and logs
Fixup done in response to Jafar's review comments.

root@act-7726-03:~# vtysh -c  "show interface vxlan1000111"
Interface vxlan1000111 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: default
  index 95 metric 0 mtu 1500 speed 0
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 7e:1d:c1:d5:d1:cc
  Interface Type Vxlan
  VxLAN Id 1000111 VTEP IP: 6.0.0.28 Access VLAN Id 111
  Mcast Group 239.1.1.111 >>>>>>>>>>
  Master (bridge) ifindex 99
root@act-7726-03:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: use VTEP-PIP as pim-register's ip header SIP
The unique physical IP is used as the SIP in the ip header to ensure
that pim-register-stop makes it back to the right MLAG switch.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: extern pim_null_register_send
pim_vxlan will use this for registering the local-VTEP-IP wth the RP
independent of the presence of BUM traffic.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: API for enabling pim on the vxlan term device ipmr-lo
ipmr-lo is a dummy netdev with no additional addressing requirements -
root@TORS1:~# ip -d link show ipmr-lo
28: ipmr-lo: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 12:5a:ae:74:51:a2 brd ff:ff:ff:ff:ff:ff promiscuity 0
    dummy addrgenmode eui64
root@TORS1:~#

This device is used by pim-vxlan to signify multicast-vxlan-tunnel
termination.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: cleanup unncessary null pointer check
This was resulting in static analyzer warnings for subsequent usage
of the same pointer -

pimd/pim_vxlan.c:962:36: warning: Access to field 'info' results in a
dereference of a null pointer (loaded from variable 'ifp')
        pim_ifp = (struct pim_interface *)ifp->info;
                                          ^~~~~~~~~
1 warning generated.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: create pimreg implicity if ipmr-lo is the first pim device
On the first pim interface creation pimreg needs to be implicitly
created.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: hidden command to set MLAG parameters
The MLAG component on the switch is expected to provide some
properties (such as peerlink-rif) to bootstrap the anycast-VTEP
functionality. The final interface for this is being defined as
a part of the pim-mlag functionality.

This commit provides a hidden command to test the anycast-VTEP
functionality independent of the MLAG component.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: fix macro backslash alignment
Fixed in response to Jafar's comments.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: add termination mroutes for each vxlan multicast tunnels
To terminate a multicast VxLAN tunnel entry we setup a mroute with
ipmr-lo in the OIL -
(0.0.0.0, 239.1.1.100)           Iif: uplink-1   Oifs: uplink-1 ipmr-lo

This is done by the vxlan component that add ipmr-lo as a local
member to termination SG entries. In addition termination entries
are also subject to MLAG DF election on the anycast VxLAN-AA setup.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: initial infrastructure to maintain VxLAN SG database
These entries will be used over the subsequent commits for
1. vxlan-tunnel-termination handling - setup MDT to rx VxLAN encapsulated
BUM traffic.
2. vxlan-tunnel-origination handling - register local-vtep-ip as a
multicast source.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: handle VxLAN SG notifications from zebra
zebra sends (S, G) and (*, G) entries for BUM mcast groups to pimd. This
commit includes the changes to handle the notifications and trigger the
creation of (S, G) base cache in pimd.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: provide an upstream control to prevent KAT expiry
For vxlan BUM MDTs we prime the pump and register the local-VTEP-ip
as source even before the first BUM packet is rxed. This commit provides
the infrastructure changes needed for that.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: maintain mcast tunnel origination and termination SG entries
Each multicast tunnel is associated with a -
1. Tunnel origination mroute that is used for forwarding the
VxLAN encapsulated flow -
S - local VTEP-IP
G - BUM mcast-group
2. And a tunnel termination entry -
S - * (any remote VTEP)
G - BUM mcast-group

Multiple L2 VNIs can share the same BUM mcast group (and local-VTEP-IP).
Zebra maintains an mcast (SG) hash table to pass this info to pimd for
subsequent MDT setup.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: vxlan definitions for creation origination and terminatiom mroutes
pim vxlan component will create upstream entries and OIFs for
multicast VxLAN tunnel origination and termination in single-VTEP
and anycast-VTEP (MLAG) setups.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: provide a mechanism to pin the IIF for an SG entry
In the case of vxlan origination entries IIF is set to -
1. lo for single VTEPs
2. MLAG-ISL for VTEPs multihomed via MLAG.

This commit creates the necessary infrastructure by -
1. allowing the IIF to be set statically (without RPF lookup)
2. and by preventing next-hop-tracking registration

PS: Note that I have skipped additional checks in pim_upstream_del
intentionally i.e. an attempt will be made to remove nexthop-tracking
for the upstream entry (with STATIC_IIF) which will fail because of the
up-entry not being in the nh's hash table. Ideally we should maintain
a nh pointer in the up-entry to prevent this unnecessary processing.
In the abscence of that I wanted to avoid spraying STATIC_IIF checks
all over.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: per-SG control to allow any router to register itself as source
In a VxLAN-AA setup both the anycast VTEPS can send VxLAN encapsulated
traffic. This is despite the fact that the it is not-DR on the IIF
associated with the originating mroute.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: allow mroutes with IIF in the OIL
This is specifically needed to allow pim-evpn mroutes in the MLAG setup -
(36.0.0.11, 239.1.1.100)   Iif: peerlink.4094   Oifs: uplink-1, peerlink.4094

I could have gone the other way and disabled PIM_ENFORCE_LOOPFREE_MFC but
that opens the door too wide. Relaxing the checks for mlag-specific mroutes
seemed like the safer choice.

This commit provides the infrastructure to relax checks on a per-mroute
basis.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: suppress IMET route generation if flood mode is PIM-SM
IMET route is optional if the flood mode is PIM-SM and serves
no functional purpose. So this change limits type-3 route generation
to flood-mode=head-end-replication.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pim: fix order of vxlan mroutes cleanup when pimd is shutdown
1. vxlan instance cleanup needs to be done before the upstream entries are
force-flushed.
2. also vxlan callbacks need to be ignored post instance-cleanup.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: process mcast-grp rxed in the vxlan-device
BUP mcast IP address is maintained per-vxlan-device.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: add peerlink-rif to the origination-mroute's OIL
In a PIM MLAG setup (say L11<->L12 is the anycast VTEP pair) the RP
can choose to join either MLAG switch as the anycast VTEP-IP is
present on both. Let's say the RP joins L11.

Hosts are dual connected to L11<->L12 and can send traffic to either
switch. Let's say a host sends broadcast traffic to L12; now L12
must encapsulate and send the traffic toward L11. To do that the
origination-mroute on L12 must include the ISL in its OIL -
(36.0.0.9, 239.1.1.100)   Iif: peerlink-3.4094 Oifs: peerlink-3.4094

L11 has a similar mroute -
(36.0.0.9, 239.1.1.100)  Iif: peerlink-3.4094 Oifs: peerlink-3.4094 uplink-1
This mroute is used to rx traffic on peerlink-3.4094 and send it out of
uplink-1. Note that this mroute also includes the peerlink-rif in its
OIL. Explicit removal of IIF from OIL is done by the kernel (and other
dataplanes) to prevent traffic looping.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: VxLAN-AA base APIs
1. peerlink-rif as OIF in origination mroutes -
Hosts are multi-homed to the anycast-VTEP pair and can send BUM traffic to
either switch. But the RP would have only joined one MLAG switch for
pulling down the MDT. To make that work we add the peerlink/ISL as
an OIF to origination mroutes (TORC11<=>TORC12 is an anycast VTEP pair) -
root@TORC11:~# ip mr |grep "(36.0.0.9, 239.1.1.100)"
(36.0.0.9, 239.1.1.100)  Iif: peerlink-3.4094 Oifs: peerlink-3.4094 uplink-1
root@TORC11:~#
root@TORC12:~# ip mr |grep "(36.0.0.9, 239.1.1.100)"
(36.0.0.9, 239.1.1.100)  Iif: peerlink-3.4094 Oifs: peerlink-3.4094
root@TORC12:~#

2. VTEP-PIP as register source -
TORC11 and TORC12 share the same anycast VTEP IP (36.0.0.9 in the above
example). And that is the source registered by both VTEPs for all the BUM
mcast-groups. However to allow the pim register start machine to close
the SIP in the register-pkt's IP header must be set to an unique IP address.
This is the VTEP PIP.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: cli changes for pim-debug-vxlan
Sample:
root@TORC12:~# vtysh -c "show run" |grep "debug pim vxlan"
debug pim vxlan
root@TORC12:~# vtysh -c "show debug" |grep "pim vxlan"
debug pim vxlan
root@TORC12:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
lib: two extra bytes were being allocated for the SG string
Fixup in response to Jafar's review comments.

This is actually old code moved in from pimd to lib. But the fixup does
make sense.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: register local VTEP-IP for each BUM MDT via NULL registers
For multicast vxlan tunnels we register the local VTEP-IP independent
of the prescence of BUM traffic i.e. we prime the pump. This
is acheived via NULL registers.

VxLAN orig entries with upstream in a PIM_REG_JOIN state are linked to
a work list for periodic NULL register transmission. Once the SPT setup
is complete the upstream-entry moves to a PIM_REG_PRUNE state and is
remved from the VxLAN work list.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: provide an api to force stop kat on an upstream entry
In the case of pim vxlan we create and keep upstream entries alive
in the abscence of traffic. So we need a mechanism to purge entries
abruptly on vxlan SG delete without having to wait for the entry
to age out.

These are again just the infrastructure changes needed for it.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
lib: move SG prefix2str APIs from pimd to lib
This is to allow zebra to use these APIs instead of re-defining.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: propagate flood mode to zebra based on the tunnel-type in the IMET route
IMET/type-3 routes are used by VTEPs to advertise the flood mode for BUM
traffic via the PMSI tunnel attribute. If a type-3 route is not rxed from
a remote-VTEP we default to "no-head-end-rep" for that remote-VTEP. In such
cases static-config such as PIM is likely used for BUM flooding.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: MLAG flag defintions in the PIM upstream entries
Two flags have been introduced per-upstream entry -
1. XXX_MLAG_VXLAN - This indicates that MLAG DF (designated-forwarded)
election is needed on the entry. In the case of pim-evpn this flag is set
for termination (*, G) entries and will be inherited by the (S, G) entries
that are created as a result of SPT switchover on the G.

2. XXX_MLAG_NON_DF - This is set on entries that have lost the
DF election. Such entries are primarily used for blackholing traffic on
one of the MLAG switches. On a hardware accelerated switch this blackholing
happens in the ASIC preventing (non-needed) traffic hitting the CPU.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: maintain flood mcast group per-l2-vni
If PIM-SM if used for BUM flooding the multicast group address can be
configured per-vxlan-device. BGP receives this config from zebra via
the L2 VNI add/update.

Sample output -
root@TORS1:~# vtysh -c "show bgp l2vpn evpn vni 1000" |grep Mcast
  Mcast group: 239.1.1.100
root@TORS1:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
doc: add config sample for pim-evpn
Sample l2-vni config via ifupdown2 -
auto vx-10100
iface vx-10100
        vxlan-id 10100
        bridge-access 100
        vxlan-local-tunnelip 27.0.0.11
        vxlan-mcastgrp 239.1.1.100 >>>>>>>>>.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zebra: install flood FDB entry only if the remote VTEP asked for HER
Remote VTEPs advertise the flood mode via IMET and the ingress VTEP
needs to perform head-end-replication of BUM packets to it only if the
PMSI tunnel type is set to ingress-replication. If a type-3 route is not
rxed or rxed with a mode other than ingress-replication we can skip
installation of the flood fdb entry for that L2-VNI. In that case the
remote VTEP is either not interested in BUM traffic or is using a
"static-config" based replication mode like PIM.

Sample output with HER -
=======================
root@TORS1:~# vtysh -c "show evpn vni 1000" |grep "Remote\|flood"
 Remote VTEPs for this VNI:
  27.0.0.8 flood: HER
root@TORS1:~#

Sample output with PIM-SM -
=========================
root@TORS2:~# vtysh -c "show evpn vni 1000" |grep "Remote\|flood"
 Remote VTEPs for this VNI:
  27.0.0.7 flood: -
root@TORS2:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: handling termination device in the MFC
1. special handling of term device in orig mroutes -
The multicast-vxlan termination device ipmr-lo is added to the (*, G)
mroute -
(0.0.0.0, 239.1.1.100)          Iif: uplink-1   Oifs: uplink-1 ipmr-lo
This means that it will be inherited into all the SG entries including the
origination mroute. However we cannot terminate the traffic we originate
so some special handling is needed to exclude the termination device
in the origination entries -
27.0.0.7, 239.1.1.100)          Iif: lo         Oifs: uplink-1

2. special handling of term device on the MLAG pair -
Both MLAG switches pull down BUM-MDT traffic but only one (the DF) can
terminate the traffic. The non-DF must not exclude the termination device
from the MFC to prevent dups to the overlay.
DF -
root@TORC11:~# ip mr |grep "(0.0.0.0, 239.1.1.100)"
(0.0.0.0, 239.1.1.100)           Iif: uplink-1   Oifs: uplink-1 ipmr-lo  State: resolved
root@TORC11:~#
non-DF -
root@TORC12:~# ip mr |grep "(0.0.0.0, 239.1.1.100)"
(0.0.0.0, 239.1.1.100)           Iif: uplink-1   Oifs: uplink-1  State: resolved
root@TORC12:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: maintain the mcast-grp per-l2vni
This info is propagated to bgpd for appropriate IMET route generation.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: header changes for pim-vxlan staggered processing
pim-vxlan work is maintained in a lists and processing staggered. One
such work is the generation of periodic null registers for the local
VTEP-IP.

This info is instance agnostic and maintained in a global cache.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: add new OIF type in prep for vxlan support
In an anycast VTEP setup the peerlink-rif (ISL) is added as a OIF to the
tunnel origination mroute. A new OIF protocol, VxLAN, has been added to
allow that functionalty.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: provide a per-SG control to disabled register encapsulation of data
In a MLAG setup both of the VTEPs can rx and reg-encapsulate BUM traffic
toward the RP. To prevent these duplicates we need a mechanism to disable
register encaps on specific mroutes.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: add new source types for vxlan orgination and termination mroutes
PIM VxLAN handling will create two types of upstream entries and
maintain app-specific properties against the entry.

This commit provides the header definitions for that.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
pimd: setup multicast vxlan tunnel termination device
An interface needs to be designated as "termination device" and added to
the termination mroute's OIL. This is used by kernel and ASIC backends
to vxlan-decaps matching flows.

The default termination device is expected to have the prefix (start
sub-string) "ipmr-lo". This can be made configurable if needed -
root@TORS1:~# ip -d link show ipmr-lo
28: ipmr-lo: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 12:5a:ae:74:51:a2 brd ff:ff:ff:ff:ff:ff promiscuity 0
    dummy addrgenmode eui64
root@TORS1:~# ip mr

This commit includes the changes to enable pim implicitly on the device
and set it up as the vxlan-term device per-pim-instance.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6609 passed
FRR › FRR › #2114 4 weeks ago
bgpd: lock the tenant-vrf associated with the l2-vni
The l2vni (bgpevpn instance) was maintaining a back pointer to the
tenant vrf without locking it. This would result in bgp_terminate crashing
as the tenant-vrf is released before the underlay-vrf (vpn->bgp_vrf->l2vnis
is NULL). Call stack -
BGP: [bt 3] /lib/libfrr.so.0(listnode_delete+0x11) [0x7f041c967f51]
BGP: [bt 4] /usr/lib/frr/bgpd(bgp_evpn_free+0x26) [0x55e3428eea46]
BGP: [bt 5] /lib/libfrr.so.0(hash_iterate+0x4a) [0x7f041c95f00a]
BGP: [bt 6] /usr/lib/frr/bgpd(bgp_evpn_cleanup+0x22) [0x55e3428f0a72]
BGP: [bt 7] /usr/lib/frr/bgpd(bgp_free+0x180) [0x55e342955f50]
PIM: vxlan SG (*,239.1.1.111) term mroute-up del
BGP: [bt 8] /usr/lib/frr/bgpd(bgp_delete+0x43a) [0x55e342959d7a]
BGP: [bt 9] /usr/lib/frr/bgpd(sigint+0xee) [0x55e3428d6a5e]

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
6609 passed
You have insufficient permissions to see all of the builds.
Build Completed Code commits Tests
FRR › TOPOPR › #384 6 months ago
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).

Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
 will NOT be accepted if the remote seq was not reset in the previous step.

Ticket: CM-22707

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
bgpd: move non-best local path checks outside the function
This change is a fixup to -
7b5e18 -  bgpd: use IP address as tie breaker if the MM seq number is the
same

And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6 of 1240 failed
FRR › TOPOPR › #383 6 months ago
zebra: send a local-mac del to bgpd on mac mod to remote
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.

At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.

To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.

Ticket: CM-22687
Reviewed By: CCR-7935

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: use IP address as tie breaker if the MM seq number is the same
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.

If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP.  If its own route is not the best one, it
will withdraw the route.
]

To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.

Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.

If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.

Ticket: CM-22674
Reviewed By: CCR-7937

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: hidden commands to add/del a local mac
local mac add/del comes from zebra. the hidden commands help verify
various race conditions between bgp and zebra.

Ticket: CM-22687
Reviewed By: CCR-7939

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: perform route selection again when the local path is deleted
This is needed to install the remote dst when a more preferred local
path is removed.

Ticket: CM-22685
Reviewed By: CCR-7936

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).

Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
 will NOT be accepted if the remote seq was not reset in the previous step.

Ticket: CM-22707

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
bgpd: move non-best local path checks outside the function
This change is a fixup to -
7b5e18 -  bgpd: use IP address as tie breaker if the MM seq number is the
same

And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: make neigh active when it is modified from local to remote
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors

The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh

The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>

Ticket: CM-22700
5 of 1240 failed
FRR › TOPOPR › #382 6 months ago
zebra: make neigh active when it is modified from local to remote
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors

The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh

The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>

Ticket: CM-22700
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).

Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
 will NOT be accepted if the remote seq was not reset in the previous step.

Ticket: CM-22707

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
bgpd: use IP address as tie breaker if the MM seq number is the same
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.

If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP.  If its own route is not the best one, it
will withdraw the route.
]

To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.

Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.

If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.

Ticket: CM-22674
Reviewed By: CCR-7937

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: send a local-mac del to bgpd on mac mod to remote
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.

At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.

To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.

Ticket: CM-22687
Reviewed By: CCR-7935

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: perform route selection again when the local path is deleted
This is needed to install the remote dst when a more preferred local
path is removed.

Ticket: CM-22685
Reviewed By: CCR-7936

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: hidden commands to add/del a local mac
local mac add/del comes from zebra. the hidden commands help verify
various race conditions between bgp and zebra.

Ticket: CM-22687
Reviewed By: CCR-7939

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: move non-best local path checks outside the function
This change is a fixup to -
7b5e18 -  bgpd: use IP address as tie breaker if the MM seq number is the
same

And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6 of 1240 failed
FRR › TOPOPR › #343 8 months ago
bgpd: unregister VNI learning from zebra on default instance delete
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.

Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000       L2   169.253.0.11:9   6646:1000  6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#

Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).

Ticket: CM-21566

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
5 of 972 failed
FRR › TOPOPR › #342 8 months ago
bgpd: unregister VNI learning from zebra on default instance delete
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.

Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000       L2   169.253.0.11:9   6646:1000  6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#

Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).

Ticket: CM-21566

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6 of 972 failed
FRR › TOPOPR › #339 8 months ago
bgpd: unregister VNI learning from zebra on default instance delete
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.

Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000       L2   169.253.0.11:9   6646:1000  6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#

Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).

Ticket: CM-21566

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
689 passed
FRR › TOPOPR › #338 8 months ago
bgpd: unregister VNI learning from zebra on default instance delete
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.

Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000       L2   169.253.0.11:9   6646:1000  6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#

Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).

Ticket: CM-21566

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
10 of 824 failed
FRR › TOPOPR › #284 9 months ago
zebra: install EVPN gateway MAC as static/sticky
SVI interface ip/hw address is advertised by the GW VTEP (say TORC11) with
the default-GW community. And the rxing VTEP (say TORC21) installs the GW
MAC as a dynamic FDB entry. The problem with this is a rogue packet from a
server with the GW MAC as source can cause a station move resulting in
TORC21 hijacking the GW MAC address and blackholing all inter rack traffic.

Fix is to make the GW MAC "sticky" pinning it to the GW VTEP (TORC11). This
commit does it by installing the FDB entry as static if the MACIP route is
received with the default-GW community (mimics handling of
mac-mobility-with-sticky community)

Sample output with from TORC12 with TORC11 setup as gateway -
root@TORC21:~# net show evpn mac vni 1004 mac 00:00:5e:00:01:01
MAC: 00:00:5e:00:01:01
 Remote VTEP: 36.0.0.11 Remote-gateway Mac
 Neighbors:
    45.0.4.1
    fe80::200:5eff:fe00:101
    2001:fee1:0:4::1

root@TORC21:~# bridge fdb show |grep 00:00:5e:00:01:01|grep 1004
00:00:5e:00:01:01 dev vx-1004 vlan 1004 master bridge static
00:00:5e:00:01:01 dev vx-1004 dst 36.0.0.11 self static
root@TORC21:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Ticket: CM-21508
Testless build
FRR › TOPOPR › #283 9 months ago
zebra: install EVPN gateway MAC as static/sticky
SVI interface ip/hw address is advertised by the GW VTEP (say TORC11) with
the default-GW community. And the rxing VTEP (say TORC21) installs the GW
MAC as a dynamic FDB entry. The problem with this is a rogue packet from a
server with the GW MAC as source can cause a station move resulting in
TORC21 hijacking the GW MAC address and blackholing all inter rack traffic.

Fix is to make the GW MAC "sticky" pinning it to the GW VTEP (TORC11). This
commit does it by installing the FDB entry as static if the MACIP route is
received with the default-GW community (mimics handling of
mac-mobility-with-sticky community)

Sample output with from TORC12 with TORC11 setup as gateway -
root@TORC21:~# net show evpn mac vni 1004 mac 00:00:5e:00:01:01
MAC: 00:00:5e:00:01:01
 Remote VTEP: 36.0.0.11 Remote-gateway Mac
 Neighbors:
    45.0.4.1
    fe80::200:5eff:fe00:101
    2001:fee1:0:4::1

root@TORC21:~# bridge fdb show |grep 00:00:5e:00:01:01|grep 1004
00:00:5e:00:01:01 dev vx-1004 vlan 1004 master bridge static
00:00:5e:00:01:01 dev vx-1004 dst 36.0.0.11 self static
root@TORC21:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Ticket: CM-21508
Testless build
FRR › TOPOPR › #282 9 months ago
zebra: install EVPN gateway MAC as static/sticky
SVI interface ip/hw address is advertised by the GW VTEP (say TORC11) with
the default-GW community. And the rxing VTEP (say TORC21) installs the GW
MAC as a dynamic FDB entry. The problem with this is a rogue packet from a
server with the GW MAC as source can cause a station move resulting in
TORC21 hijacking the GW MAC address and blackholing all inter rack traffic.

Fix is to make the GW MAC "sticky" pinning it to the GW VTEP (TORC11). This
commit does it by installing the FDB entry as static if the MACIP route is
received with the default-GW community (mimics handling of
mac-mobility-with-sticky community)

Sample output with from TORC12 with TORC11 setup as gateway -
root@TORC21:~# net show evpn mac vni 1004 mac 00:00:5e:00:01:01
MAC: 00:00:5e:00:01:01
 Remote VTEP: 36.0.0.11 Remote-gateway Mac
 Neighbors:
    45.0.4.1
    fe80::200:5eff:fe00:101
    2001:fee1:0:4::1

root@TORC21:~# bridge fdb show |grep 00:00:5e:00:01:01|grep 1004
00:00:5e:00:01:01 dev vx-1004 vlan 1004 master bridge static
00:00:5e:00:01:01 dev vx-1004 dst 36.0.0.11 self static
root@TORC21:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Ticket: CM-21508
Testless build
Build Completed Code commits Tests
FRR › FRR › #2114 4 weeks ago
bgpd: lock the tenant-vrf associated with the l2-vni
The l2vni (bgpevpn instance) was maintaining a back pointer to the
tenant vrf without locking it. This would result in bgp_terminate crashing
as the tenant-vrf is released before the underlay-vrf (vpn->bgp_vrf->l2vnis
is NULL). Call stack -
BGP: [bt 3] /lib/libfrr.so.0(listnode_delete+0x11) [0x7f041c967f51]
BGP: [bt 4] /usr/lib/frr/bgpd(bgp_evpn_free+0x26) [0x55e3428eea46]
BGP: [bt 5] /lib/libfrr.so.0(hash_iterate+0x4a) [0x7f041c95f00a]
BGP: [bt 6] /usr/lib/frr/bgpd(bgp_evpn_cleanup+0x22) [0x55e3428f0a72]
BGP: [bt 7] /usr/lib/frr/bgpd(bgp_free+0x180) [0x55e342955f50]
PIM: vxlan SG (*,239.1.1.111) term mroute-up del
BGP: [bt 8] /usr/lib/frr/bgpd(bgp_delete+0x43a) [0x55e342959d7a]
BGP: [bt 9] /usr/lib/frr/bgpd(sigint+0xee) [0x55e3428d6a5e]

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
6609 passed
FRR › FRR › #2037 1 month ago
bgp: fix misc evpn prefix match problems caused by using incorrect prefixlen
The evpn route prefix len was being hardcoded to 224 bits while the
length of a mac-ip addr is actually 288. Because of this many problems were
seen in the evpn-tests. The sample below is from a test that does a vm-move
to verify extended-evpn-mac-mobility - IP1-M1 => IP2->M1. You can see two
local neighs but only one was inserted into the per-vni route table.

root@TORC11:~# net show evpn arp vni 1001 |grep "2001:fee1:0:1::10\|2001:fee1:0:1::11"
2001:fee1:0:1::10       local  active   00:54:6f:7c:74:64
2001:fee1:0:1::11       local  active   00:54:6f:7c:74:64
root@TORC11:~# net show bgp l2vpn evpn route vni 1001 |grep "2001:fee1:0:1::10\|2001:fee1:0:1::11"
*> [2]:[0]:[48]:[00:54:6f:7c:74:64]:[128]:[2001:fee1:0:1::11]
root@TORC11:~#

Similarly other traffic loss problems were seen because of one prefix updating
another prefix's route.

I think the 224-bits came from the packet format definition of type-2 routes.
However the way FRR maintains the key is very different than the format in
the packet so it seems best to just sizeof the addr.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6998 passed
FRR › FRR › #1879 3 months ago
bgpd: reinstate current bgp best route on an inactive neigh del
When an inactive-neigh delete is rxed bgp will not have a local path to
remove (and re-run path selection). Instead it simply re-installs the
current best remote path if any.

Ticket: CM-23018
Testing Done: evpn-min

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: propagate inactive neigh deletes to bgpd
When a local neigh is added with a MAC that is remote or absent the
neigh is kept in zebra as local/in-active. But not propagated to bgpd.
Similarly when an inactive neigh is deleted the del-msg is not propagated
to bgpd.

Without this change bgp and zebra would fall out of sync as that
bgp would not know to rerun bestpath and for it to reinstall a
known remote path for the mac-ip in question.  To fix this we
now propagate inactive neigh deletes to bgpd.

Ticket: CM-23018
Testing Done:
1. evpn-min
2. manually triggered the out-of-sync state and verified the fix

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: display metric for connected routes
In a VRR/VRRP setup we can have connected routes with different costs.
So this change eliminates suppressing metric display for connected routes.

Sample output -
root@TORC11:~# vtysh -c "show ipv6 route vrf vrf1"
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
       > - selected route, * - FIB route

VRF vrf1:
K * ::/0 [255/8192] unreachable (ICMP unreachable), 00:00:36
C * 2001:aa:1::/64 [0/100] is directly connected, vlan1002-v0, 00:00:36
C>* 2001:aa:1::/64 [0/90] is directly connected, vlan1002, 00:00:36

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: fill the zebra mac-ip route via a common api
Move the info filling for zebra mac-ip install (sent by bgpd) to a
common place.

The commit also fixes missing ROUTER flag for one of the cases
added in a code branch that doesn't have the ROUTER changes -
[
6d8c603a
bgpd: use IP address as tie breaker if the MM seq number is the same
]

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: set connected route metric based on the devaddr metric
MACVLAN devices are typically used for applications such as VRR/VRRP that
require a second MAC address (virtual). These devices have a corresponding
SVI/VLAN device -
root@TORC11:~# ip addr show vlan1002
39: vlan1002@bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:02:00:00:00:2e brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::2/64 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~# ip addr show vlan1002-v0
40: vlan1002-v0@vlan1002: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::a/64 metric 1024 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~#

The macvlan device is used primarily for RX (VR-IP/VR-MAC). And TX is via
the SVI. To acheive that functionality the macvlan network's metric
is set to a higher value.

Zebra currently ignores the devaddr metric sent by the kernel and hardcodes
it to 0. This commit eliminates that hardcoding. If the devaddr metric
is available (METRIC_MAX) it is used for setting up the connected route
otherwise we fallback to the dev/interface metric.

Setting the macvlan metric to a higher value ensures that zebra will always
select the connected route on the SVI (and subsequently use it for next hop
resolution etc.) -
root@TORC11:~# vtysh -c "show ip route vrf vrf1 2001:aa:1::/64"
Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 1024, vrf vrf1
  Last update 11:30:56 ago
  * directly connected, vlan1002-v0

Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 0, vrf vrf1, best
  Last update 11:30:56 ago
  * directly connected, vlan1002

root@TORC11:~#

Ticket: CM-23511
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6996 passed
FRR › FRR › #1675 6 months ago
bgpd: move non-best local path checks outside the function
This change is a fixup to -
7b5e18 -  bgpd: use IP address as tie breaker if the MM seq number is the
same

And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: send a local-mac del to bgpd on mac mod to remote
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.

At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.

To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.

Ticket: CM-22687
Reviewed By: CCR-7935

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).

Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
 will NOT be accepted if the remote seq was not reset in the previous step.

Ticket: CM-22707

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
bgpd: hidden commands to add/del a local mac
local mac add/del comes from zebra. the hidden commands help verify
various race conditions between bgp and zebra.

Ticket: CM-22687
Reviewed By: CCR-7939

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: make neigh active when it is modified from local to remote
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors

The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh

The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>

Ticket: CM-22700
bgpd: use IP address as tie breaker if the MM seq number is the same
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.

If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP.  If its own route is not the best one, it
will withdraw the route.
]

To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.

Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.

If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.

Ticket: CM-22674
Reviewed By: CCR-7937

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: perform route selection again when the local path is deleted
This is needed to install the remote dst when a more preferred local
path is removed.

Ticket: CM-22685
Reviewed By: CCR-7936

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6890 passed