Thanks to visit codestin.com
Credit goes to github.com

Skip to content

lag module: Model peerlink as regular lag link #2084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 48 commits into
base: dev
Choose a base branch
from

Conversation

jbemmel
Copy link
Collaborator

@jbemmel jbemmel commented Mar 25, 2025

Currently the link between MLAG peers is crafted internally within the lag module, using specific internal mechanisms to assign IP addresses and making it impossible to configure features such as OSPF or BGP between the peers.

This PR models the peerlink as a regular lag link with a VLAN trunk, enabling the full suite of Netlab features to be configured between the peers:

  • Assign IP addressing to the link; users can use prefix or pool, or leave default to get VLAN lan pool addressing
  • Apply module processing (ospf, bgp, gateway to name a few)
  • Configure the use of LACP

Fixes #1909

Note: Arubacx templates are modified but will need further adjustments and testing (like defining the vlan that gets created for peering, similar to eos)

Testing: added a test for OSPF over peerlink

  • NETLAB_DEVICE=cumulus_nvue NETLAB_PROVIDER=libvirt ./device-module-test -v lag
  • NETLAB_DEVICE=eos NETLAB_PROVIDER=clab ./device-module-test -v lag

Breaking changes:

  • Dell OS10 - the backup destination IP is now taken from intf.lag.mlag.peer_backup_ip instead of intf.lag.mlag.peer

Implementation notes:

  • IP assignment to a routed vlan is achieved by creating a vlan in irb mode with a prefix, overriding the mode to route at the interface level
  • All vlans allowed on the peerlink by default, unless the user specifies a specific VLAN trunk. New internal flag _allow_all introduced to model this

@jbemmel jbemmel marked this pull request as draft March 25, 2025 23:27
@jbemmel jbemmel force-pushed the lag_peer_vlan branch 2 times, most recently from 96c1995 to 10c28ec Compare March 26, 2025 02:10
@jbemmel jbemmel requested review from ssasso and ipspace March 26, 2025 02:28
@jbemmel jbemmel marked this pull request as ready for review March 26, 2025 02:28
Copy link
Owner

@ipspace ipspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with the obvious one: is there a reason you couldn't add "allocation: p2p" to the global P2P pool?

It is supposed to be used for P2P links anyway, and uses /30 prefix, so there cannot be more than two nodes attached to it. Alternatively, one could define custom pool and use it for peer link VLANs.

Both options are better (IMO) than adding random global attributes.

@jbemmel
Copy link
Collaborator Author

jbemmel commented Mar 26, 2025

Let's start with the obvious one: is there a reason you couldn't add "allocation: p2p" to the global P2P pool?

I suggested that in the past (#1611) but you disagreed at the time. Happy to revisit

@ipspace
Copy link
Owner

ipspace commented Mar 26, 2025

Let's start with the obvious one: is there a reason you couldn't add "allocation: p2p" to the global P2P pool?

I suggested that in the past (#1611) but you disagreed at the time. Happy to revisit

Well, thanks for pointing that out ;) The problem was (and still is) that using P2P allocation strategy on a VLAN link masks the fact that an IRB VLAN has a single prefix. Not specifying the p2p allocation strategy crashes the allocation process, alerting the user to the fact that something weird is going on.

And now we uncovered another gotcha: you're doing too much tweaking. I never got that far because I stumbled on the "new global pool" showstopper.

@ipspace
Copy link
Owner

ipspace commented Mar 26, 2025

This PR models the peerlink as a regular lag link with a VLAN trunk, enabling the full suite of Netlab features to be configured between the peers:

Is there supposed to be a default VLAN to be used on that link for the global IP routing, or is it just a VLAN trunk like any other trunk?

Note: Arubacx templates are modified but will need further adjustments and testing (like defining the vlan that gets created for peering, similar to eos)

There's a showstopper right there ;)

IP assignment to a routed vlan is achieved by creating a vlan in irb mode with a prefix, overriding the mode to route at the interface level

There's absolutely no need for that. Peer links are established between devices of the same type anyway, and if a device does not support mixed trunks, they you use IRB for the global IP routing VLAN. Any further device-specific limitations can be addressed in device quirks.

Please (always) try to modify the minimum amount of data structures necessary. We're learned the hard way that doing anything else creates unpredictable effects in the future.

  • All vlans need to be allowed on the peerlink, I prefer to do this implicitly rather than explicitly list all vlans used in the topology because the latter may create "unsupported" scenarios (such as mixed trunks on Dell OS10).

I have no problem with this (just skip the VLAN assignment on the peer link), but it has to be documented.

It would be nicer to have "allow all VLANs" syntax, such that we could avoid lag module specific modifications to the vlan template

I don't see a need for the "allow all" trunks from the topology perspective, but I have no problem with an internal flag like vlan._all_vlans that would be used in VLAN configuration templates instead of checking the peerlink interface name or type.

  • Global p2p pool could not be used due to the lack of a 'p2p' allocation strategy

As you can assign any pool to any VLAN, I fail to see why this is relevant ;)

@jbemmel
Copy link
Collaborator Author

jbemmel commented Mar 26, 2025

  • Global p2p pool could not be used due to the lack of a 'p2p' allocation strategy

As you can assign any pool to any VLAN, I fail to see why this is relevant ;)

I did try it - it results in an address allocation failure for the topology/input/lag-mlag-m_to_m.yml test case

@jbemmel jbemmel marked this pull request as draft March 26, 2025 13:15
@jbemmel
Copy link
Collaborator Author

jbemmel commented Mar 26, 2025

Please (always) try to modify the minimum amount of data structures necessary. We're learned the hard way that doing anything else creates unpredictable effects in the future.

Ironically, this is exactly the reason why I opted for a separate mlag_p2p addressing pool, rather than add allocation: p2p to the existing global p2p one. It results in the minimum amount of change to the installed base

@jbemmel
Copy link
Collaborator Author

jbemmel commented Mar 26, 2025

Is there supposed to be a default VLAN to be used on that link for the global IP routing, or is it just a VLAN trunk like any other trunk?

All VLANs - including the default VLAN - need to be allowed on the peerlink. On Cumulus the default VLAN is not used for IP routing - no IP address assigned to it - but removing it causes test cases to fail (without clear indications of why)

@ipspace
Copy link
Owner

ipspace commented Mar 26, 2025

BTW, there's no need to set allocation: p2p on the P2P pool. You can set prefix.allocation on any link (or VLAN) and still get the IPv4/IPv6 address assigned from the pool. For example:

provider: clab

nodes:
  a:
    device: frr
    id: 33
  b:
    device: linux
    id: 44

links:
- interfaces: [ a,b ]
  prefix.allocation: p2p

@@ -113,6 +113,12 @@ links:
mlag.peergroup: True # (also) used to derive a unique MAC address for this group of MLAG peers
```

### Peerlink configuration
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This left me a bit confused. I think it would be nice to have two short examples, one of them documenting how the peer link can be used as pure routed link (although we know there are other things going on in the background), another one describing how to use VLANs on the peer link (example use case: VRF Lite)

It's also worth mentioning that you chose that netlab configures "allow all VLANs" on the peer link. That is not always the case in real-life deployments.

@jbemmel
Copy link
Collaborator Author

jbemmel commented Mar 26, 2025

You can set prefix.allocation on any link

I did try that first too, but it didn't seem to work. I'll try again

Update: Here's why that didn't work in this case #2090

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lag module should model peerlink as regular VLAN
2 participants