-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
Today I tried upgrading from 1.16.0-pre.2 (which I run for an unrelated fix that has probably been backported by now, but why mess with success?) to 1.16.0-pre.3, and I had cilium pods going into CrashLoopBackoff status. Looking at the logs, cilium-agent is dying with either of the following errors:
time=2024-06-05T10:26:51Z level=error msg="Invoke failed" error="reconciler.Config[github.com/cilium/cilium/pkg/maps/bwmap.Edt].Table cannot be nil" function="bwmap.registerReconciler (.../maps/bwmap/cell.go:41)"
or
time="2024-06-05T10:27:16Z" level=fatal msg="failed to start: reconciler.Config[github.com/cilium/cilium/pkg/maps/bwmap.Edt].Table cannot be nil" subsys=daemon
No other relevant errors or warnings.
Looking at the codebase I see it's related to bandwidthManager configuration. I have bandwidthManager.enabled: true and .bbr: true in my helm values, but otherwise no bandwidth related policies configured.. This was working fine under 1.16.0-pre.2, and I don't notice any new helm chart values related to bandwidthManager, so I'm at a loss what the source of this error could be.
I'm wondering if this could be a k8s 1.30 / Talos linux related issue. It can also be me using bandwidthManager improperly without fully understanding the implications of doing so (e.g. combined with wireguard encryption), but it has been working well for me in the past and providing latency improvements.
Cilium Version
cilium-cli: v0.16.8 compiled with go1.22.3 on linux/amd64
cilium image (default): v1.15.5
cilium image (stable): v1.15.5
cilium image (running): unknown. Unable to obtain cilium version. Reason: release: not found
Ran into the issue on upgrade of 1.16.0-pre.2 to 1.16.0-pre.3 using argo-cd.
Kernel Version
Linux version 6.6.32-talos (@buildkitsandbox) (gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP Tue May 28 12:51:33 UTC 2024
Kubernetes Version
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
Regression
No response
Sysdump
No response
Relevant log output
cilium-dbg status output on previously working 1.16.0-pre.2 install:
VStore: Ok Disabled
Kubernetes: Ok 1.30 (v1.30.0) [linux/arm64]
Kubernetes APIs: ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::
CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service
", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: True [enx0************ ***************************** (Direct Routing)]
Host firewall: Disabled
SRv6: Disabled
CNI Chaining: none
CNI Config file: successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist
Cilium: Ok 1.16.0-pre.2 (v1.16.0-pre.2-1bc9e514)
NodeMonitor: Listening for events on 128 CPUs with 64x4096 of shared memory
Cilium health daemon: Ok
IPAM: IPv4: 4/254 allocated from 10.244.3.0/24,
ClusterMesh: 0/0 clusters ready, 0 global-services
IPv4 BIG TCP: Disabled
IPv6 BIG TCP: Disabled
BandwidthManager: EDT with BPF [BBR] [enx0*************]
Routing: Network: Tunnel [vxlan] Host: BPF
Attach Mode: TCX
Masquerading: BPF [enx00cbea9e64a2] 10.244.3.0/24 [IPv4: Enabled, IPv6: Disabled]
Controller Status: 25/25 healthy
Proxy Status: OK, ip 10.244.3.174, 0 redirects active on ports 10000-20000, Envoy: external
Global Identity Range: min 65536, max 131071
Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 8.14 Metrics: Ok
Encryption: Wireguard [NodeEncryption: Disabled, cilium_wg0 (Pubkey: ******************************, Port: 51871, Peers: 4)]
Cluster health: 5/5 reachable (2024-06-05T12:09:59Z)
Modules Health: Stopped(0) Degraded(0) OK(42)Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct