Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix(encapsulation): introduce Cilium support#409

Merged
squat merged 8 commits intosquat:mainfrom
cozystack:fix/cilium-ipip-overlay-upstream
Feb 28, 2026
Merged

fix(encapsulation): introduce Cilium support#409
squat merged 8 commits intosquat:mainfrom
cozystack:fix/cilium-ipip-overlay-upstream

Conversation

@kvaps
Copy link
Contributor

@kvaps kvaps commented Feb 15, 2026

Summary

  • Rewrite Cilium encapsulator to use IPIP tunnels routed through Cilium's VxLAN overlay, preventing routing loops
  • Each node autodiscovers its cilium_host IP and advertises it via kilo.squat.ai/cilium-internal-ip annotation
  • Add LocalIP() method to Encapsulator interface for overlay IP autodiscovery
  • Fall back to standard tunl0 when cilium_tunl is absent; reuse it when Cilium creates it via enable-ipip-termination
  • Add IPIP return path on leader when local=false for non-leader overlay routing

Test plan

  • Deploy with --compatibility=cilium --encapsulation=crosssubnet and verify IPIP routes use Cilium internal IPs as gateways
  • Verify cross-node pod traffic flows through VxLAN without routing loops
  • Verify non-Cilium encapsulators (IPIP, Flannel, Noop) are unaffected

@kvaps kvaps marked this pull request as ready for review February 15, 2026 21:30
@kvaps kvaps changed the title fix(encapsulation): route Cilium IPIP traffic through VxLAN overlay fix(encapsulation): introduce Cilium support Feb 17, 2026
kvaps and others added 6 commits February 23, 2026 17:49
Rewrite Cilium encapsulator to create IPIP tunnels instead of using
cilium_host interface directly. Each node autodiscovers its cilium_host
IP and advertises it via kilo.squat.ai/cilium-internal-ip annotation,
allowing other nodes to route IPIP outer packets through Cilium's VxLAN
overlay and preventing routing loops.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
Align constant block formatting for gofmt, add ciliumInternalIPs
to expected topology test segments, use bytes.Equal for nil-safe
CiliumInternalIP comparison, and return error from CleanUp.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
staticcheck SA1021 requires net.IP.Equal for IP comparison.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
When running with --local=false and --compatibility=cilium, the leader
node did not create IPIP return routes for non-leader nodes in the same
location. This caused asymmetric routing: non-leaders encapsulated
traffic to the leader via IPIP (through Cilium's VxLAN overlay), but
the leader sent replies directly via the physical interface, which could
be dropped by cloud networks blocking IP protocol 4 or by reverse path
filtering on the non-leaders.

Add a new routing block for the !local case that creates:
- Routes in table 1107 using the overlay gateway (Cilium internal IP)
  so IPIP outer packets traverse the VxLAN tunnel
- Policy rules matching traffic arriving on the WireGuard interface
  (iif kilo0) destined for non-leader private IPs

This only activates when the encapsulator returns a gateway different
from the node's private IP, i.e. when an overlay like Cilium is in use.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
When Cilium's enable-ipip-termination is active, it renames the
kernel's tunl0 to cilium_tunl and creates a receive-only cilium_ipip4
device for DSR. The cilium_ipip4 interface cannot transmit packets
(TX errors), so use cilium_tunl which supports both TX and RX.

If cilium_tunl already exists (Cilium manages it), reuse it. Otherwise,
create it so the interface name is consistent regardless of whether
enable-ipip-termination is active.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
Instead of always creating cilium_tunl, reuse it only when Cilium has
already created it (enable-ipip-termination). Otherwise create the
standard tunl0 — Cilium will rename it later if needed.

Also ensure the interface is brought UP when reusing cilium_tunl,
as Cilium may leave it in DOWN state.

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
@kvaps kvaps force-pushed the fix/cilium-ipip-overlay-upstream branch from 4ea569e to 7abf287 Compare February 23, 2026 16:49
pkg/mesh/mesh.go Outdated
Key: m.pub,
NoInternalIP: n.NoInternalIP,
InternalIP: n.InternalIP,
CiliumInternalIP: m.enc.LocalIP(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is weird. The encapsulation interface calls this a LocalIP but the mesh seems to think this is special and reserved for Cilium (same with the code in pkg/k8s/backend). This is kind of confusing IMO. Naming is hard.

Let's change the encapsulation interface so that the LocalIP func becomes CNICompatibilityIP and let's have it return a *net.IPNet rather than a net.IP since that it generally more useful and will box us in less in the future.

Then let's change the annotation keys and struct fields to match CNICompatibilityIP.

Copy link
Owner

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great @kvaps I only have one renaming request to not confuse things: the encapsulation interface uses the, seemingly reusable, term LocalIP whereas all structs and annotation keys/fields specific to Cilium. Can we rename the func/fields/keys to CNICompatibilityIP everywhere?

Also, but less urgently, have the encapsulation interface func return a *net.IPNet for consistency with other methods and for future-proofing.

@squat
Copy link
Owner

squat commented Feb 27, 2026

@kvaps this looks great! Can you run gofmt on pkg/k8s/backend.go and pkg/mesh/backend.go to satisfy CI? Then we can merge this 🚀

Rename the encapsulation interface method and all related struct
fields and annotation keys from Cilium-specific names to generic
CNI compatibility names, as requested in review.

Changes:
- LocalIP() -> CNICompatibilityIP() returning *net.IPNet
- CiliumInternalIP -> CNICompatibilityIP in Node struct
- ciliumInternalIPs -> cniCompatibilityIPs in segment struct
- kilo.squat.ai/cilium-internal-ip -> kilo.squat.ai/cni-compatibility-ip
- Add comments for ignored errors and IPv4 filter in cilium.go

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
@kvaps kvaps force-pushed the fix/cilium-ipip-overlay-upstream branch from ebcc6f7 to b56ab1b Compare February 27, 2026 18:29
@kvaps
Copy link
Contributor Author

kvaps commented Feb 27, 2026

Done, ran gofmt on both files. Should be good now!

@kvaps kvaps requested a review from squat February 27, 2026 18:29
@squat squat merged commit 2805127 into squat:main Feb 28, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants