Anuradha Karuppiah <anuradhak@cumulusnetworks.com>: Author Summary

Builds triggered by Anuradha Karuppiah <anuradhak@cumulusnetworks.com>

Builds triggered by an author are those builds which contains changes committed by the author.
45
24 (53%)
21 (47%)

Breakages and fixes

Broken means the build has failed but the previous build was successful.
Fixed means that the build was successful but the previous build has failed.
13 (29% of all builds triggered)
4 (9% of all builds triggered)
-9
Build Completed Code commits Tests
FRR › FRR › #2006 4 days ago
zebra: EVPN DAD trigger was causing zebra to crash
Duplicate address detection and recovery was relying on the l2-vni backptr
in the neighbor entry which was simply not initialized resulting in
a NULL pointer access in a setup with dup-addressed VMs -
VM1:{IP1,M1} and VM2:{IP1,M2}

Call stack:
(gdb) bt 6
    at lib/sigevent.c:249
    nbr=nbr@entry=0x559347f901d0, vtep_ip=..., vtep_ip@entry=..., do_dad=do_dad@entry=true,
    is_dup_detect=is_dup_detect@entry=0x7ffc7f6be59f, is_local=is_local@entry=true)
    at ./lib/ipaddr.h:86
    ip=0x7ffc7f6be6f0, ifp=0x559347f901d0, zvni=0x559347f86800) at zebra/zebra_vxlan.c:3152
(More stack frames follow...)
(gdb) p nbr->zvni
$8 = (zebra_vni_t *) 0x0 <<<<<<<<<<<<<<<<<<<<
(gdb)

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
4 of 7040 failed
FRR › RPKI › #938 4 days ago
zebra: EVPN DAD trigger was causing zebra to crash
Duplicate address detection and recovery was relying on the l2-vni backptr
in the neighbor entry which was simply not initialized resulting in
a NULL pointer access in a setup with dup-addressed VMs -
VM1:{IP1,M1} and VM2:{IP1,M2}

Call stack:
(gdb) bt 6
    at lib/sigevent.c:249
    nbr=nbr@entry=0x559347f901d0, vtep_ip=..., vtep_ip@entry=..., do_dad=do_dad@entry=true,
    is_dup_detect=is_dup_detect@entry=0x7ffc7f6be59f, is_local=is_local@entry=true)
    at ./lib/ipaddr.h:86
    ip=0x7ffc7f6be6f0, ifp=0x559347f901d0, zvni=0x559347f86800) at zebra/zebra_vxlan.c:3152
(More stack frames follow...)
(gdb) p nbr->zvni
$8 = (zebra_vni_t *) 0x0 <<<<<<<<<<<<<<<<<<<<
(gdb)

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Testless build
FRR › RPKI › #929 1 week ago
bgpd: prevent type-5 route creation if bgp_vrf->l3_vni is 0
After a router reboot the L3 network via it converges before the L2
network. This is because MLAG intentionally holds down bridge-access
and vxlan-network ports for some time (MLAG init-delay) to prevent traffic
from switching to a router that is not fully ready. This also means that
routes (from vrf-peering sessions) that qualify for evpn type-5
advertisments are available long before the L3-VNI is available for that
tenant VRF. In these windows bgpd was adding these evpn-type-5 routes with
a L3-VNI of 0 (which was not fixed up after the L3-VNI became available) -

BGP routing table entry for 100.0.0.1:2:[5]:[0]:[0]:[32]:[200.1.1.1]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  MSP1(uplink-1) MSP2(uplink-2)
  Route [5]:[0]:[0]:[32]:[200.1.1.1] VNI 0 >>>>>>>>
  65001 65535
    36.0.0.9 from 0.0.0.0 (27.0.0.9)
      Origin incomplete, metric 0, valid, sourced, local, bestpath-from-AS 65001, best
      Extended Community: ET:8 RT:5544:4001 Rmac:44:38:39:ff:ff:01
      AddPath ID: RX 0, TX 327
      Last update: Wed Feb 27 18:37:10 2019

Fix is to defer creating type-5 routes till the L3-VNI is available for
that tenant VRF (this was already being done for most cases; fixup takes
care of some that missed the check).

Ticket: CM-24022

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Testless build
FRR › FRR › #1994 1 week ago
bgpd: prevent type-5 route creation if bgp_vrf->l3_vni is 0
After a router reboot the L3 network via it converges before the L2
network. This is because MLAG intentionally holds down bridge-access
and vxlan-network ports for some time (MLAG init-delay) to prevent traffic
from switching to a router that is not fully ready. This also means that
routes (from vrf-peering sessions) that qualify for evpn type-5
advertisments are available long before the L3-VNI is available for that
tenant VRF. In these windows bgpd was adding these evpn-type-5 routes with
a L3-VNI of 0 (which was not fixed up after the L3-VNI became available) -

BGP routing table entry for 100.0.0.1:2:[5]:[0]:[0]:[32]:[200.1.1.1]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  MSP1(uplink-1) MSP2(uplink-2)
  Route [5]:[0]:[0]:[32]:[200.1.1.1] VNI 0 >>>>>>>>
  65001 65535
    36.0.0.9 from 0.0.0.0 (27.0.0.9)
      Origin incomplete, metric 0, valid, sourced, local, bestpath-from-AS 65001, best
      Extended Community: ET:8 RT:5544:4001 Rmac:44:38:39:ff:ff:01
      AddPath ID: RX 0, TX 327
      Last update: Wed Feb 27 18:37:10 2019

Fix is to defer creating type-5 routes till the L3-VNI is available for
that tenant VRF (this was already being done for most cases; fixup takes
care of some that missed the check).

Ticket: CM-24022

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6581 passed
FRR › FRRPULLREQ › #6802 1 month ago
bgpd: display label as part of the PMSI tunnel attribute
root@TORS1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]

BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  MSP1(uplink-1) MSP2(uplink-2)
  Route [3]:[0]:[32]:[27.0.0.15] VNI 1003
  Local
    27.0.0.15 from 0.0.0.0 (27.0.0.15)
      Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best
      Extended Community: ET:8 RT:5550:1003
      AddPath ID: RX 0, TX 10
      Last update: Thu Feb  7 00:17:24 2019
      PMSI Tunnel Type: Ingress Replication, label: 1003 >>>>>>>>>>>>>

Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@TORS1:~#

Ticket: CM-23790

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: parse label in pmsi tunnel attribute
Consider the following topo VTEP1->SPINE1->VTEP2. ebgp is being used
for evpn route exchange with SPINE just acting as a pass through.

1. VTEP1 was building the type-3 IMET route with the correct PMSI
tunnel type (ingress-replication) and label (VNI)
2. Spine1 was however only parsing the tunnel-type in the attr (was
skipping parsing of the label field altogether) -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@MSP1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]

BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  TORC11(downlink-1) TORC12(downlink-2) TORC21(downlink-3) TORC22(downlink-4) TORS1(downlink-5) TORS2(downlink-6)
  Route [3]:[0]:[32]:[27.0.0.15]
  5550
    27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
      Origin IGP, valid, external, bestpath-from-AS 5550, best
      Extended Community: RT:5550:1003 ET:8
      AddPath ID: RX 0, TX 227
      Last update: Thu Feb  7 15:44:22 2019
      PMSI Tunnel Type: Ingress Replication, label: 16777213 >>>>>>>

Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@MSP1:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3. So VTEP2 didn't rx the correct label.

In an all FRR setup this doesn't have any functional consequence but some
vendors are validating the content of the label field as well and ignoring
the IMET route from FRR (say VTEP1 is FRR and VTEP2 is 3rd-party). The
functional consequence of this VTEP2 ignores VTEP1's IMET route and doesn't
add VTEP1 to the corresponding l2-vni flood list.

This commit fixes up the PMSI attr parsing on spine-1 -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@MSP1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]

BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  TORC11(downlink-1) TORC12(downlink-2) TORC21(downlink-3) TORC22(downlink-4) TORS1(downlink-5) TORS2(downlink-6)
  Route [3]:[0]:[32]:[27.0.0.15]
  5550
    27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
      Origin IGP, valid, external, bestpath-from-AS 5550, best
      Extended Community: RT:5550:1003 ET:8
      AddPath ID: RX 0, TX 278
      Last update: Thu Feb  7 00:17:40 2019
      PMSI Tunnel Type: Ingress Replication, label: 1003 >>>>>>>>>>>

Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@MSP1:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Ticket: CM-23790

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: fill the pmsi_tnl_type into the type-3 PMSI attr
Currently we are hardcoding it at the time of attr building to
ingress-replication. This is just a code clean-up and has no
functional impact.

Ticket: CM-23790

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
5653 passed
FRR › RPKI › #870 1 month ago
bgpd: display label as part of the PMSI tunnel attribute
root@TORS1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]

BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  MSP1(uplink-1) MSP2(uplink-2)
  Route [3]:[0]:[32]:[27.0.0.15] VNI 1003
  Local
    27.0.0.15 from 0.0.0.0 (27.0.0.15)
      Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best
      Extended Community: ET:8 RT:5550:1003
      AddPath ID: RX 0, TX 10
      Last update: Thu Feb  7 00:17:24 2019
      PMSI Tunnel Type: Ingress Replication, label: 1003 >>>>>>>>>>>>>

Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@TORS1:~#

Ticket: CM-23790

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: parse label in pmsi tunnel attribute
Consider the following topo VTEP1->SPINE1->VTEP2. ebgp is being used
for evpn route exchange with SPINE just acting as a pass through.

1. VTEP1 was building the type-3 IMET route with the correct PMSI
tunnel type (ingress-replication) and label (VNI)
2. Spine1 was however only parsing the tunnel-type in the attr (was
skipping parsing of the label field altogether) -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@MSP1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]

BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  TORC11(downlink-1) TORC12(downlink-2) TORC21(downlink-3) TORC22(downlink-4) TORS1(downlink-5) TORS2(downlink-6)
  Route [3]:[0]:[32]:[27.0.0.15]
  5550
    27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
      Origin IGP, valid, external, bestpath-from-AS 5550, best
      Extended Community: RT:5550:1003 ET:8
      AddPath ID: RX 0, TX 227
      Last update: Thu Feb  7 15:44:22 2019
      PMSI Tunnel Type: Ingress Replication, label: 16777213 >>>>>>>

Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@MSP1:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
3. So VTEP2 didn't rx the correct label.

In an all FRR setup this doesn't have any functional consequence but some
vendors are validating the content of the label field as well and ignoring
the IMET route from FRR (say VTEP1 is FRR and VTEP2 is 3rd-party). The
functional consequence of this VTEP2 ignores VTEP1's IMET route and doesn't
add VTEP1 to the corresponding l2-vni flood list.

This commit fixes up the PMSI attr parsing on spine-1 -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@MSP1:~# net show bgp l2vpn evpn route rd 27.0.0.15:4 type multicast
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]

BGP routing table entry for 27.0.0.15:4:[3]:[0]:[32]:[27.0.0.15]
Paths: (1 available, best #1)
  Advertised to non peer-group peers:
  TORC11(downlink-1) TORC12(downlink-2) TORC21(downlink-3) TORC22(downlink-4) TORS1(downlink-5) TORS2(downlink-6)
  Route [3]:[0]:[32]:[27.0.0.15]
  5550
    27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
      Origin IGP, valid, external, bestpath-from-AS 5550, best
      Extended Community: RT:5550:1003 ET:8
      AddPath ID: RX 0, TX 278
      Last update: Thu Feb  7 00:17:40 2019
      PMSI Tunnel Type: Ingress Replication, label: 1003 >>>>>>>>>>>

Displayed 1 prefixes (1 paths) with this RD (of requested type)
root@MSP1:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Ticket: CM-23790

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: fill the pmsi_tnl_type into the type-3 PMSI attr
Currently we are hardcoding it at the time of attr building to
ingress-replication. This is just a code clean-up and has no
functional impact.

Ticket: CM-23790

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Testless build
You have insufficient permissions to see all of the builds.
Build Completed Code commits Tests
FRR › TOPOPR › #384 4 months ago
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).

Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
 will NOT be accepted if the remote seq was not reset in the previous step.

Ticket: CM-22707

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
bgpd: move non-best local path checks outside the function
This change is a fixup to -
7b5e18 -  bgpd: use IP address as tie breaker if the MM seq number is the
same

And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6 of 1240 failed
FRR › TOPOPR › #383 4 months ago
zebra: send a local-mac del to bgpd on mac mod to remote
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.

At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.

To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.

Ticket: CM-22687
Reviewed By: CCR-7935

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: use IP address as tie breaker if the MM seq number is the same
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.

If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP.  If its own route is not the best one, it
will withdraw the route.
]

To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.

Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.

If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.

Ticket: CM-22674
Reviewed By: CCR-7937

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: hidden commands to add/del a local mac
local mac add/del comes from zebra. the hidden commands help verify
various race conditions between bgp and zebra.

Ticket: CM-22687
Reviewed By: CCR-7939

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: perform route selection again when the local path is deleted
This is needed to install the remote dst when a more preferred local
path is removed.

Ticket: CM-22685
Reviewed By: CCR-7936

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).

Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
 will NOT be accepted if the remote seq was not reset in the previous step.

Ticket: CM-22707

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
bgpd: move non-best local path checks outside the function
This change is a fixup to -
7b5e18 -  bgpd: use IP address as tie breaker if the MM seq number is the
same

And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: make neigh active when it is modified from local to remote
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors

The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh

The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>

Ticket: CM-22700
5 of 1240 failed
FRR › TOPOPR › #382 4 months ago
zebra: make neigh active when it is modified from local to remote
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors

The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh

The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>

Ticket: CM-22700
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).

Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
 will NOT be accepted if the remote seq was not reset in the previous step.

Ticket: CM-22707

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
bgpd: use IP address as tie breaker if the MM seq number is the same
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.

If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP.  If its own route is not the best one, it
will withdraw the route.
]

To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.

Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.

If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.

Ticket: CM-22674
Reviewed By: CCR-7937

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: send a local-mac del to bgpd on mac mod to remote
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.

At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.

To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.

Ticket: CM-22687
Reviewed By: CCR-7935

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: perform route selection again when the local path is deleted
This is needed to install the remote dst when a more preferred local
path is removed.

Ticket: CM-22685
Reviewed By: CCR-7936

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: hidden commands to add/del a local mac
local mac add/del comes from zebra. the hidden commands help verify
various race conditions between bgp and zebra.

Ticket: CM-22687
Reviewed By: CCR-7939

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: move non-best local path checks outside the function
This change is a fixup to -
7b5e18 -  bgpd: use IP address as tie breaker if the MM seq number is the
same

And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6 of 1240 failed
FRR › TOPOPR › #343 6 months ago
bgpd: unregister VNI learning from zebra on default instance delete
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.

Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000       L2   169.253.0.11:9   6646:1000  6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#

Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).

Ticket: CM-21566

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
5 of 972 failed
FRR › TOPOPR › #342 6 months ago
bgpd: unregister VNI learning from zebra on default instance delete
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.

Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000       L2   169.253.0.11:9   6646:1000  6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#

Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).

Ticket: CM-21566

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6 of 972 failed
FRR › TOPOPR › #339 6 months ago
bgpd: unregister VNI learning from zebra on default instance delete
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.

Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000       L2   169.253.0.11:9   6646:1000  6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#

Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).

Ticket: CM-21566

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
689 passed
FRR › TOPOPR › #338 6 months ago
bgpd: unregister VNI learning from zebra on default instance delete
The code for this was always there but was not kicking in because of an
incorrect dependency on is_evpn_enabled. This API attempts to locate the
default instance from bgp_master's instance list. Only the instance
currently being deleted has already been removed from the instance list
by the time bgp_delete->bgp_zebra_instance_deregister is executed.

Symptom of this bug used to show up when a default instance is deleted
and created again. In that case bgp_zebra_instance_register would not be
effective as zebra ignores the register as dup (dereg didn't happen in the
first place) so bgpd wouldn't reload already configured L2-VNIs.
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
* 1000       L2   169.253.0.11:9   6646:1000  6646:1000 vrf1
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6646
root@cel-sea-03:~# sed -i 's/6646/6656/' /etc/frr/frr.conf
root@cel-sea-03:~# grep "router bgp"  /etc/frr/frr.conf
router bgp 6656
root@cel-sea-03:~# systemctl reload frr
root@cel-sea-03:~# net show bgp l2vpn evpn vni |grep 1000
root@cel-sea-03:~#

Fix simply changes the order of dereg to make
bgp_zebra_instance_deregister actually happen (by doing it before the
default instance is removed from the master list).

Ticket: CM-21566

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
10 of 824 failed
FRR › TOPOPR › #284 7 months ago
zebra: install EVPN gateway MAC as static/sticky
SVI interface ip/hw address is advertised by the GW VTEP (say TORC11) with
the default-GW community. And the rxing VTEP (say TORC21) installs the GW
MAC as a dynamic FDB entry. The problem with this is a rogue packet from a
server with the GW MAC as source can cause a station move resulting in
TORC21 hijacking the GW MAC address and blackholing all inter rack traffic.

Fix is to make the GW MAC "sticky" pinning it to the GW VTEP (TORC11). This
commit does it by installing the FDB entry as static if the MACIP route is
received with the default-GW community (mimics handling of
mac-mobility-with-sticky community)

Sample output with from TORC12 with TORC11 setup as gateway -
root@TORC21:~# net show evpn mac vni 1004 mac 00:00:5e:00:01:01
MAC: 00:00:5e:00:01:01
 Remote VTEP: 36.0.0.11 Remote-gateway Mac
 Neighbors:
    45.0.4.1
    fe80::200:5eff:fe00:101
    2001:fee1:0:4::1

root@TORC21:~# bridge fdb show |grep 00:00:5e:00:01:01|grep 1004
00:00:5e:00:01:01 dev vx-1004 vlan 1004 master bridge static
00:00:5e:00:01:01 dev vx-1004 dst 36.0.0.11 self static
root@TORC21:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Ticket: CM-21508
Testless build
FRR › TOPOPR › #283 7 months ago
zebra: install EVPN gateway MAC as static/sticky
SVI interface ip/hw address is advertised by the GW VTEP (say TORC11) with
the default-GW community. And the rxing VTEP (say TORC21) installs the GW
MAC as a dynamic FDB entry. The problem with this is a rogue packet from a
server with the GW MAC as source can cause a station move resulting in
TORC21 hijacking the GW MAC address and blackholing all inter rack traffic.

Fix is to make the GW MAC "sticky" pinning it to the GW VTEP (TORC11). This
commit does it by installing the FDB entry as static if the MACIP route is
received with the default-GW community (mimics handling of
mac-mobility-with-sticky community)

Sample output with from TORC12 with TORC11 setup as gateway -
root@TORC21:~# net show evpn mac vni 1004 mac 00:00:5e:00:01:01
MAC: 00:00:5e:00:01:01
 Remote VTEP: 36.0.0.11 Remote-gateway Mac
 Neighbors:
    45.0.4.1
    fe80::200:5eff:fe00:101
    2001:fee1:0:4::1

root@TORC21:~# bridge fdb show |grep 00:00:5e:00:01:01|grep 1004
00:00:5e:00:01:01 dev vx-1004 vlan 1004 master bridge static
00:00:5e:00:01:01 dev vx-1004 dst 36.0.0.11 self static
root@TORC21:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Ticket: CM-21508
Testless build
FRR › TOPOPR › #282 7 months ago
zebra: install EVPN gateway MAC as static/sticky
SVI interface ip/hw address is advertised by the GW VTEP (say TORC11) with
the default-GW community. And the rxing VTEP (say TORC21) installs the GW
MAC as a dynamic FDB entry. The problem with this is a rogue packet from a
server with the GW MAC as source can cause a station move resulting in
TORC21 hijacking the GW MAC address and blackholing all inter rack traffic.

Fix is to make the GW MAC "sticky" pinning it to the GW VTEP (TORC11). This
commit does it by installing the FDB entry as static if the MACIP route is
received with the default-GW community (mimics handling of
mac-mobility-with-sticky community)

Sample output with from TORC12 with TORC11 setup as gateway -
root@TORC21:~# net show evpn mac vni 1004 mac 00:00:5e:00:01:01
MAC: 00:00:5e:00:01:01
 Remote VTEP: 36.0.0.11 Remote-gateway Mac
 Neighbors:
    45.0.4.1
    fe80::200:5eff:fe00:101
    2001:fee1:0:4::1

root@TORC21:~# bridge fdb show |grep 00:00:5e:00:01:01|grep 1004
00:00:5e:00:01:01 dev vx-1004 vlan 1004 master bridge static
00:00:5e:00:01:01 dev vx-1004 dst 36.0.0.11 self static
root@TORC21:~#

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Ticket: CM-21508
Testless build
Build Completed Code commits Tests
FRR › FRR › #1879 1 month ago
bgpd: reinstate current bgp best route on an inactive neigh del
When an inactive-neigh delete is rxed bgp will not have a local path to
remove (and re-run path selection). Instead it simply re-installs the
current best remote path if any.

Ticket: CM-23018
Testing Done: evpn-min

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: propagate inactive neigh deletes to bgpd
When a local neigh is added with a MAC that is remote or absent the
neigh is kept in zebra as local/in-active. But not propagated to bgpd.
Similarly when an inactive neigh is deleted the del-msg is not propagated
to bgpd.

Without this change bgp and zebra would fall out of sync as that
bgp would not know to rerun bestpath and for it to reinstall a
known remote path for the mac-ip in question.  To fix this we
now propagate inactive neigh deletes to bgpd.

Ticket: CM-23018
Testing Done:
1. evpn-min
2. manually triggered the out-of-sync state and verified the fix

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: display metric for connected routes
In a VRR/VRRP setup we can have connected routes with different costs.
So this change eliminates suppressing metric display for connected routes.

Sample output -
root@TORC11:~# vtysh -c "show ipv6 route vrf vrf1"
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
       > - selected route, * - FIB route

VRF vrf1:
K * ::/0 [255/8192] unreachable (ICMP unreachable), 00:00:36
C * 2001:aa:1::/64 [0/100] is directly connected, vlan1002-v0, 00:00:36
C>* 2001:aa:1::/64 [0/90] is directly connected, vlan1002, 00:00:36

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: fill the zebra mac-ip route via a common api
Move the info filling for zebra mac-ip install (sent by bgpd) to a
common place.

The commit also fixes missing ROUTER flag for one of the cases
added in a code branch that doesn't have the ROUTER changes -
[
6d8c603a
bgpd: use IP address as tie breaker if the MM seq number is the same
]

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: set connected route metric based on the devaddr metric
MACVLAN devices are typically used for applications such as VRR/VRRP that
require a second MAC address (virtual). These devices have a corresponding
SVI/VLAN device -
root@TORC11:~# ip addr show vlan1002
39: vlan1002@bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:02:00:00:00:2e brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::2/64 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~# ip addr show vlan1002-v0
40: vlan1002-v0@vlan1002: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::a/64 metric 1024 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~#

The macvlan device is used primarily for RX (VR-IP/VR-MAC). And TX is via
the SVI. To acheive that functionality the macvlan network's metric
is set to a higher value.

Zebra currently ignores the devaddr metric sent by the kernel and hardcodes
it to 0. This commit eliminates that hardcoding. If the devaddr metric
is available (METRIC_MAX) it is used for setting up the connected route
otherwise we fallback to the dev/interface metric.

Setting the macvlan metric to a higher value ensures that zebra will always
select the connected route on the SVI (and subsequently use it for next hop
resolution etc.) -
root@TORC11:~# vtysh -c "show ip route vrf vrf1 2001:aa:1::/64"
Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 1024, vrf vrf1
  Last update 11:30:56 ago
  * directly connected, vlan1002-v0

Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 0, vrf vrf1, best
  Last update 11:30:56 ago
  * directly connected, vlan1002

root@TORC11:~#

Ticket: CM-23511
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6996 passed
FRR › RPKI › #820 1 month ago
zebra: display metric for connected routes
In a VRR/VRRP setup we can have connected routes with different costs.
So this change eliminates suppressing metric display for connected routes.

Sample output -
root@TORC11:~# vtysh -c "show ipv6 route vrf vrf1"
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
       > - selected route, * - FIB route

VRF vrf1:
K * ::/0 [255/8192] unreachable (ICMP unreachable), 00:00:36
C * 2001:aa:1::/64 [0/100] is directly connected, vlan1002-v0, 00:00:36
C>* 2001:aa:1::/64 [0/90] is directly connected, vlan1002, 00:00:36

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: propagate inactive neigh deletes to bgpd
When a local neigh is added with a MAC that is remote or absent the
neigh is kept in zebra as local/in-active. But not propagated to bgpd.
Similarly when an inactive neigh is deleted the del-msg is not propagated
to bgpd.

Without this change bgp and zebra would fall out of sync as that
bgp would not know to rerun bestpath and for it to reinstall a
known remote path for the mac-ip in question.  To fix this we
now propagate inactive neigh deletes to bgpd.

Ticket: CM-23018
Testing Done:
1. evpn-min
2. manually triggered the out-of-sync state and verified the fix

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: set connected route metric based on the devaddr metric
MACVLAN devices are typically used for applications such as VRR/VRRP that
require a second MAC address (virtual). These devices have a corresponding
SVI/VLAN device -
root@TORC11:~# ip addr show vlan1002
39: vlan1002@bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:02:00:00:00:2e brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::2/64 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~# ip addr show vlan1002-v0
40: vlan1002-v0@vlan1002: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
    link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff
    inet6 2001:aa:1::a/64 metric 1024 scope global
       valid_lft forever preferred_lft forever
root@TORC11:~#

The macvlan device is used primarily for RX (VR-IP/VR-MAC). And TX is via
the SVI. To acheive that functionality the macvlan network's metric
is set to a higher value.

Zebra currently ignores the devaddr metric sent by the kernel and hardcodes
it to 0. This commit eliminates that hardcoding. If the devaddr metric
is available (METRIC_MAX) it is used for setting up the connected route
otherwise we fallback to the dev/interface metric.

Setting the macvlan metric to a higher value ensures that zebra will always
select the connected route on the SVI (and subsequently use it for next hop
resolution etc.) -
root@TORC11:~# vtysh -c "show ip route vrf vrf1 2001:aa:1::/64"
Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 1024, vrf vrf1
  Last update 11:30:56 ago
  * directly connected, vlan1002-v0

Routing entry for 2001:aa:1::/64
  Known via "connected", distance 0, metric 0, vrf vrf1, best
  Last update 11:30:56 ago
  * directly connected, vlan1002

root@TORC11:~#

Ticket: CM-23511
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: fill the zebra mac-ip route via a common api
Move the info filling for zebra mac-ip install (sent by bgpd) to a
common place.

The commit also fixes missing ROUTER flag for one of the cases
added in a code branch that doesn't have the ROUTER changes -
[
6d8c603a
bgpd: use IP address as tie breaker if the MM seq number is the same
]

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: reinstate current bgp best route on an inactive neigh del
When an inactive-neigh delete is rxed bgp will not have a local path to
remove (and re-run path selection). Instead it simply re-installs the
current best remote path if any.

Ticket: CM-23018
Testing Done: evpn-min

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Testless build
FRR › RPKI › #630 4 months ago
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).

Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
 will NOT be accepted if the remote seq was not reset in the previous step.

Ticket: CM-22707

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
bgpd: perform route selection again when the local path is deleted
This is needed to install the remote dst when a more preferred local
path is removed.

Ticket: CM-22685
Reviewed By: CCR-7936

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: make neigh active when it is modified from local to remote
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors

The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh

The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>

Ticket: CM-22700
bgpd: hidden commands to add/del a local mac
local mac add/del comes from zebra. the hidden commands help verify
various race conditions between bgp and zebra.

Ticket: CM-22687
Reviewed By: CCR-7939

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: use IP address as tie breaker if the MM seq number is the same
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.

If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP.  If its own route is not the best one, it
will withdraw the route.
]

To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.

Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.

If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.

Ticket: CM-22674
Reviewed By: CCR-7937

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: send a local-mac del to bgpd on mac mod to remote
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.

At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.

To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.

Ticket: CM-22687
Reviewed By: CCR-7935

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: move non-best local path checks outside the function
This change is a fixup to -
7b5e18 -  bgpd: use IP address as tie breaker if the MM seq number is the
same

And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Testless build
FRR › FRR › #1675 4 months ago
bgpd: move non-best local path checks outside the function
This change is a fixup to -
7b5e18 -  bgpd: use IP address as tie breaker if the MM seq number is the
same

And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: send a local-mac del to bgpd on mac mod to remote
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.

At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.

To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.

Ticket: CM-22687
Reviewed By: CCR-7935

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: set remoteseq to 0 when remote mac is deleted by bgpd
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).

Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
 will NOT be accepted if the remote seq was not reset in the previous step.

Ticket: CM-22707

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
bgpd: hidden commands to add/del a local mac
local mac add/del comes from zebra. the hidden commands help verify
various race conditions between bgp and zebra.

Ticket: CM-22687
Reviewed By: CCR-7939

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
zebra: make neigh active when it is modified from local to remote
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors

The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh

The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>

Ticket: CM-22700
bgpd: use IP address as tie breaker if the MM seq number is the same
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.

If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP.  If its own route is not the best one, it
will withdraw the route.
]

To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.

Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.

If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.

Ticket: CM-22674
Reviewed By: CCR-7937

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
bgpd: perform route selection again when the local path is deleted
This is needed to install the remote dst when a more preferred local
path is removed.

Ticket: CM-22685
Reviewed By: CCR-7936

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
6890 passed