fix(encapsulation): introduce Cilium support#409
Conversation
Rewrite Cilium encapsulator to create IPIP tunnels instead of using cilium_host interface directly. Each node autodiscovers its cilium_host IP and advertises it via kilo.squat.ai/cilium-internal-ip annotation, allowing other nodes to route IPIP outer packets through Cilium's VxLAN overlay and preventing routing loops. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
Align constant block formatting for gofmt, add ciliumInternalIPs to expected topology test segments, use bytes.Equal for nil-safe CiliumInternalIP comparison, and return error from CleanUp. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
staticcheck SA1021 requires net.IP.Equal for IP comparison. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
When running with --local=false and --compatibility=cilium, the leader node did not create IPIP return routes for non-leader nodes in the same location. This caused asymmetric routing: non-leaders encapsulated traffic to the leader via IPIP (through Cilium's VxLAN overlay), but the leader sent replies directly via the physical interface, which could be dropped by cloud networks blocking IP protocol 4 or by reverse path filtering on the non-leaders. Add a new routing block for the !local case that creates: - Routes in table 1107 using the overlay gateway (Cilium internal IP) so IPIP outer packets traverse the VxLAN tunnel - Policy rules matching traffic arriving on the WireGuard interface (iif kilo0) destined for non-leader private IPs This only activates when the encapsulator returns a gateway different from the node's private IP, i.e. when an overlay like Cilium is in use. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
When Cilium's enable-ipip-termination is active, it renames the kernel's tunl0 to cilium_tunl and creates a receive-only cilium_ipip4 device for DSR. The cilium_ipip4 interface cannot transmit packets (TX errors), so use cilium_tunl which supports both TX and RX. If cilium_tunl already exists (Cilium manages it), reuse it. Otherwise, create it so the interface name is consistent regardless of whether enable-ipip-termination is active. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
Instead of always creating cilium_tunl, reuse it only when Cilium has already created it (enable-ipip-termination). Otherwise create the standard tunl0 — Cilium will rename it later if needed. Also ensure the interface is brought UP when reusing cilium_tunl, as Cilium may leave it in DOWN state. Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
4ea569e to
7abf287
Compare
Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
pkg/mesh/mesh.go
Outdated
| Key: m.pub, | ||
| NoInternalIP: n.NoInternalIP, | ||
| InternalIP: n.InternalIP, | ||
| CiliumInternalIP: m.enc.LocalIP(), |
There was a problem hiding this comment.
This part is weird. The encapsulation interface calls this a LocalIP but the mesh seems to think this is special and reserved for Cilium (same with the code in pkg/k8s/backend). This is kind of confusing IMO. Naming is hard.
Let's change the encapsulation interface so that the LocalIP func becomes CNICompatibilityIP and let's have it return a *net.IPNet rather than a net.IP since that it generally more useful and will box us in less in the future.
Then let's change the annotation keys and struct fields to match CNICompatibilityIP.
squat
left a comment
There was a problem hiding this comment.
This looks really great @kvaps I only have one renaming request to not confuse things: the encapsulation interface uses the, seemingly reusable, term LocalIP whereas all structs and annotation keys/fields specific to Cilium. Can we rename the func/fields/keys to CNICompatibilityIP everywhere?
Also, but less urgently, have the encapsulation interface func return a *net.IPNet for consistency with other methods and for future-proofing.
|
@kvaps this looks great! Can you run |
Rename the encapsulation interface method and all related struct fields and annotation keys from Cilium-specific names to generic CNI compatibility names, as requested in review. Changes: - LocalIP() -> CNICompatibilityIP() returning *net.IPNet - CiliumInternalIP -> CNICompatibilityIP in Node struct - ciliumInternalIPs -> cniCompatibilityIPs in segment struct - kilo.squat.ai/cilium-internal-ip -> kilo.squat.ai/cni-compatibility-ip - Add comments for ignored errors and IPv4 filter in cilium.go Co-Authored-By: Claude <[email protected]> Signed-off-by: Andrei Kvapil <[email protected]>
ebcc6f7 to
b56ab1b
Compare
|
Done, ran |
Summary
cilium_hostIP and advertises it viakilo.squat.ai/cilium-internal-ipannotationLocalIP()method toEncapsulatorinterface for overlay IP autodiscoverytunl0whencilium_tunlis absent; reuse it when Cilium creates it viaenable-ipip-terminationlocal=falsefor non-leader overlay routingTest plan