Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat/socket family restrict#128

Open
dorser wants to merge 7 commits into
micromize-dev:mainfrom
dorser:feat/socket-family-restrict
Open

feat/socket family restrict#128
dorser wants to merge 7 commits into
micromize-dev:mainfrom
dorser:feat/socket-family-restrict

Conversation

@dorser
Copy link
Copy Markdown
Collaborator

@dorser dorser commented May 19, 2026

Summary

Evolves socket-restrict from an AF_ALG-only block into a runtime-configurable socket-family / AF_NETLINK-protocol deny-list, populated from BPF maps. Defaults are intentionally conservative to avoid breaking common cloud-native workloads — anything that could regress Kubernetes networking (NETLINK_NETFILTER, AF_PACKET, IPsec sidecars, etc.) is opt-in.

Behavior change

Heads-up for users on the previous iteration of this branch: earlier revisions of this PR had AF_PACKET, AF_VSOCK, NETLINK_NETFILTER, NETLINK_XFRM, NETLINK_AUDIT, and NETLINK_KOBJECT_UEVENT in the default deny-list. They have been moved to opt-in to avoid breaking MetalLB, keepalived, kube-proxy IPVS, firecracker/kata, iptables-nft, nft-based CNIs (Istio CNI included), and IPsec sidecars. Final defaults are listed below.

The new defaults preserve the original AF_ALG / CVE-2026-31431 mitigation (existing event-type IDs EVENT_TYPE_SOCKET_AF_ALG_{CREATE,BIND} are kept) and additionally deny only niche/legacy families with no realistic cloud-native use.

Default deny-list (out of the box)

Family Number Rationale
AF_ALG 38 Kernel crypto userspace API; preserves the original CVE-2026-31431 mitigation.
AF_TIPC 30 Cluster-IPC protocol with multiple historical kernel LPEs; no Kubernetes workload uses it.
AF_RDS 21 Reliable datagram sockets, multiple historical LPEs.
AF_SMC 43 Shared-memory comms, niche.
AF_CAN 29 Controller-area-network bus, automotive.
AF_NFC 39 Near-field-comms stack.
AF_BLUETOOTH 31 Bluetooth stack, never legitimate in containers.
AF_AX25 3 Amateur radio.
AF_ATMPVC 8 ATM permanent VC.
AF_ATMSVC 20 ATM switched VC.
AF_X25 9 X.25 networking.
AF_KCM 41 Kernel connection multiplexer.
AF_CAIF 37 Communication CPU interface.

Opt-in (set via flags)

Item Flag What it covers Compatibility caveat
AF_PACKET --socket-deny-families=AF_PACKET,… Raw link-layer sockets MetalLB, keepalived, tcpdump-in-pod, kube-proxy IPVS, Cilium endpoint operations rely on it.
AF_VSOCK --socket-deny-families=AF_VSOCK,… virtio-vsock LPE surface (CVE-2024-50264) firecracker / kata-containers host↔guest agents will break.
NETLINK_NETFILTER --socket-deny-netlink-protocols=NETLINK_NETFILTER,… The entire nf_tables LPE family: CVE-2022-32250, CVE-2022-34918, CVE-2023-32233, CVE-2024-1086, CVE-2024-26925, CVE-2024-26581, CVE-2024-26809 iptables-nft, kube-proxy nft mode, Istio CNI, every nft-based CNI plugin uses this. Validate carefully.
NETLINK_XFRM --socket-deny-netlink-protocols=NETLINK_XFRM,… XFRM / IPsec control plane IPsec sidecars (strongSwan, Cilium IPsec) will break.
NETLINK_AUDIT --socket-deny-netlink-protocols=NETLINK_AUDIT,… Linux audit control plane auditd-style agents will break.
NETLINK_KOBJECT_UEVENT --socket-deny-netlink-protocols=NETLINK_KOBJECT_UEVENT,… uevent channel udev-like consumers will break.

Recommended rollout (audit → enforce)

  1. Audit, defaults only. Deploy with --enforce=false and the default --socket-deny-families. Watch for socket_family_denied_create / _bind events. The defaults should produce ~zero events on a normal Kubernetes data plane.
  2. Audit, broaden. Once defaults are clean, opt-in additional families/protocols incrementally — e.g. --socket-deny-families=AF_ALG,…,AF_VSOCK for clusters with no vsock workloads, or --socket-deny-netlink-protocols=NETLINK_NETFILTER for clusters using iptables-legacy / pure IPVS. Keep --enforce=false. Validate against your specific data plane (CNI, kube-proxy mode, service-mesh CNI, MetalLB, IPsec sidecars).
  3. Enforce. When the audit log is clean for the target families/protocols across all production workload shapes for at least a release cycle, flip --enforce=true.

What's in this PR

BPF

  • gadgets/socket-restrict/program.bpf.c — replaces the hard-coded switch with two BPF maps (denied_families keyed by __u16, denied_netlink_protocols keyed by __u32). Lookups happen in lsm/socket_create and lsm/socket_bind.
  • Bind-path micro-optimization: sk_protocol is only read via BPF_CORE_READ_BITFIELD_PROBED when family == AF_NETLINK and the family is not already denied. Non-netlink binds skip the field-read entirely.

Userspace / wiring

  • New internal/operators/socket_restrict.go — operator that populates map/denied_families and map/denied_netlink_protocols on each gadget's init (no-op when those maps are absent, i.e. for the other 4 gadgets).
  • Name ↔ number tables for socket families (AF_*) and netlink protocols (NETLINK_*); flags accept symbolic names or decimal numbers, case-insensitive, with whitespace-trimming and dedup.
  • New CLI flags --socket-deny-families (conservative default above) and --socket-deny-netlink-protocols (empty default).
  • Existing event-type IDs (EVENT_TYPE_SOCKET_AF_ALG_{CREATE,BIND} = 11/12 and EVENT_TYPE_SOCKET_FAMILY_DENIED_{CREATE,BIND} = 14/15) and the family / protocol event fields are preserved.

Tests

  • internal/operators/socket_restrict_test.go — covers parsing (mixed names/numbers, case insensitivity, dedup, errors).
  • cmd/micromize/root_test.go — extended TestBuildDisabledSet with socket-restrict.
  • internal/gadget/registry_test.go — registry coverage for all 5 gadgets.
  • tests/integration/probes/af_vsock/main.go + tests/integration/cases/11_af_vsock_audit_mode.sh — opt-in AF_VSOCK + --enforce=false exercise (probe must observe socket() succeeding while the gadget emits an event).

Docs

  • gadgets/socket-restrict/README.md rewritten with the new defaults, per-flag compatibility caveats, and the audit → enforce rollout.
  • Top-level README.md updated: bullet reflects conservative defaults and points at the gadget README; new CLI flags added to the flags table.

Out of scope (future PRs)

  • Allow-list ("subtractive") flags to remove individual entries from the default deny-list without specifying the full list.
  • Per-namespace / per-pod deny-list overrides.
  • True per-family audit vs enforce modes (today --enforce is global).

Copilot AI review requested due to automatic review settings May 19, 2026 05:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands the socket-restrict gadget from blocking only AF_ALG sockets to a baked-in deny-list of high-risk socket families (e.g. AF_VSOCK, AF_PACKET, AF_TIPC, AF_RDS, AF_SMC, AF_CAN, AF_NFC, AF_BLUETOOTH, etc.) and selected AF_NETLINK protocols (NETLINK_NETFILTER, NETLINK_XFRM, NETLINK_AUDIT, NETLINK_KOBJECT_UEVENT) used in container-escape and LPE chains. Adds new event types, a protocol field, and output formatting for them, while preserving the existing AF_ALG visibility path.

Changes:

  • BPF program now applies a switch-based family/protocol deny-list at lsm/socket_create and lsm/socket_bind, reading sk->sk_protocol via CO-RE on bind to determine netlink protocol.
  • New EVENT_TYPE_SOCKET_FAMILY_DENIED_{CREATE,BIND} events (14, 15) plumbed through the C header, Go operator constants, and output formatter (with family/protocol field decoding).
  • Documentation (root README.md, gadget README.md, gadget.yaml) updated to describe the new scope and the default deny-list.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
README.md Updates the socket-restrict bullet to cover the broader family/netlink deny-list and additional CVEs.
include/micromize/event_types.h Adds EVENT_TYPE_SOCKET_FAMILY_DENIED_{CREATE,BIND} = 14/15.
gadgets/socket-restrict/program.bpf.h Adds AF_* and NETLINK_* fallback macros and a protocol field on struct event.
gadgets/socket-restrict/program.bpf.c Introduces is_denied_family, generalizes both LSM hooks, reads sk_protocol via CO-RE in bind, preserves AF_ALG details.
gadgets/socket-restrict/gadget.yaml Documents the new protocol data field.
gadgets/socket-restrict/README.md Rewrites scope, adds default deny-list table and updated hook descriptions.
internal/operators/operators.go Adds new event-type constants and name mappings (14/15).
internal/operators/output.go Adds family/netlink-protocol decode tables and output helpers for the new events.
internal/gadget/registry_test.go Adds a registration test covering all default gadgets including socket-restrict.
cmd/micromize/root_test.go Adds a case asserting socket-restrict can be disabled via --disable-gadgets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants