Description
When containers are started in rootless mode and request an ephemeral host port, container starts can fail with:
error while calling RootlessKit PortManager.AddPort(): cannot expose port X: listen tcp 0.0.0.0:X: bind: address already in use
Retrying the start usualy picks a different port and succeeds.
Diagnosis
daemon/libnetwork/portmappers/nat/mapper_linux.go::MapPorts performs two distinct host-port binds:
-
OSAllocator.RequestPortsInRange (osallocator_linux.go) reserves a port in portallocator's in-memory map and calls bind(2) (osallocator_linux.go) as a probe, retrying on EADDRINUSE (maxAllocateAttempts,osallocator_linux.go). Under rootless Docker, this bind happens in dockerd's network namespace, i.e. rootlesskit's child netns.
-
configPortDriver (mapper_linux.go) then calls rlkclient.PortDriverClient.AddPort, which makes rootlesskit perform a second bind in the host (parent) netns at rootlesskit/pkg/port/builtin/parent/tcp/tcp.go. On EADDRINUSE this error just propagates back to the API client, no retry (mapper_linux.go).
The two netns have separate kernel port tables. The retry in step 1 catches collisions in the child netns but provides no protection against host-netns collisions in step 2. The OSAllocator's bind probe verifies a port is free in the wrong network namespace.
Proposed fix
Retry the allocate-and-bind cycle when pdc.AddPort reports EADDRINUSE. Sadly, the the IPC layer flattens errno into a string, so unless you're willing to do a coordinated API change this will require a sentinel inside the eror message string.
Reproduce
#!/usr/bin/env bash
set -u
DOCKERD_PID="$(pgrep -x dockerd | head -1 || true)"
if [[ -z "$DOCKERD_PID" ]]; then
echo "ERROR: dockerd not running" >&2; exit 1
fi
if [[ "$(readlink /proc/self/ns/net)" == "$(readlink /proc/"$DOCKERD_PID"/ns/net)" ]]; then
echo "ERROR: this shell shares dockerd's netns — re-run from a plain user shell." >&2
exit 1
fi
LO="${LO:-33000}"
HI="${HI:-33099}"
ITERATIONS="${ITERATIONS:-100}"
PARALLEL="${PARALLEL:-10}"
HOLD_S="${HOLD_S:-0.05}"
GAP_S="${GAP_S:-0.05}"
BACKOFF_S="${BACKOFF_S:-0.5}"
RETRY_BUDGET="${RETRY_BUDGET:-10}"
echo "attacking host ports $LO-$HI"
docker pull alpine:3 >/dev/null
python3 - "$LO" "$HI" "$HOLD_S" "$GAP_S" "$BACKOFF_S" <<'PY' &
import socket, sys, threading, time, random
lo, hi = int(sys.argv[1]), int(sys.argv[2])
HOLD_S, GAP_S, BACKOFF_S = float(sys.argv[3]), float(sys.argv[4]), float(sys.argv[5])
print(f"hammering host-netns ports {lo}-{hi} with one worker per port", flush=True)
def worker(port):
time.sleep(random.uniform(0, HOLD_S + GAP_S))
while True:
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("0.0.0.0", port))
s.listen(1)
except OSError:
time.sleep(BACKOFF_S)
continue
time.sleep(HOLD_S)
s.close()
time.sleep(GAP_S)
for port in range(lo, hi + 1):
threading.Thread(target=worker, args=(port,), daemon=True).start()
while True:
time.sleep(60)
PY
HAMMER_PID=$!
trap 'kill $HAMMER_PID 2>/dev/null' EXIT
sleep 0.5
echo "running $ITERATIONS docker iterations, up to $PARALLEL in parallel..."
export LO HI
results=$(seq 1 "$ITERATIONS" | xargs -P "$PARALLEL" -I{} bash -c '
iter="$1"
if out=$(docker run --rm -p "${LO}-${HI}:80" alpine:3 true 2>&1); then
printf "iter %3d OK\n" "$iter"
else
printf "iter %3d FAIL: %s\n" "$iter" "$(echo "$out" | tr "\n" " " | sed "s/ */ /g")"
fi
' _ {})
echo "$results"
fail=$(grep -c ' FAIL' <<<"$results")
echo
awk -v hold="$HOLD_S" -v gap="$GAP_S" -v retries="$RETRY_BUDGET" \
-v fail="$fail" -v total="$ITERATIONS" \
'BEGIN {
p = hold / (hold + gap);
printf "expected per-attempt failure rate: %7.3f%%\n", p * 100;
printf "expected after %dx retry (if fixed): %7.3f%%\n", retries, (p^retries) * 100;
printf "observed failure rate: %7.3f%% (%d/%d)\n", (fail/total) * 100, fail, total;
}'
EDIT: Ugh, sorry the "reproducer" script didn't make sense, fixed it.
EDIT2: Added desync to hammer workers so the p^retries assumption holds.
Expected behavior
Port allocation succeeds.
docker version
Client:
Version: 26.1.5+dfsg1
API version: 1.45
Go version: go1.24.4
Git commit: a72d7cd
Built: Sat May 9 11:34:09 2026
OS/Arch: linux/amd64
Context: default
Server:
Engine:
Version: 26.1.5+dfsg1
API version: 1.45 (minimum version 1.24)
Go version: go1.24.4
Git commit: 411e817
Built: Sat May 9 11:34:09 2026
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.24~ds1
GitCommit: 1.7.24~ds1-6+deb13u1
runc:
Version: 1.1.15+ds1
GitCommit: 1.1.15+ds1-2+b4
docker-init:
Version: 0.19.0
GitCommit:
docker info
Client:
Version: 26.1.5+dfsg1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: 0.13.1+ds1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: 2.26.1-4
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 20
Running: 20
Paused: 0
Stopped: 0
Images: 43
Server Version: 26.1.5+dfsg1
Storage Driver: btrfs
Btrfs:
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 1.7.24~ds1-6+deb13u1
runc version: 1.1.15+ds1-2+b4
init version:
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.12.90+deb13.1-amd64
Operating System: Debian GNU/Linux 13 (trixie)
OSType: linux
Architecture: x86_64
CPUs: 28
Total Memory: 62.49GiB
Name: work
ID: 0b64ca5d-b38b-4de3-abe4-9dfdc0b6965b
Docker Root Dir: /var/containers
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional Info
No response
Description
When containers are started in rootless mode and request an ephemeral host port, container starts can fail with:
error while calling RootlessKit PortManager.AddPort(): cannot expose port X: listen tcp 0.0.0.0:X: bind: address already in useRetrying the start usualy picks a different port and succeeds.
Diagnosis
daemon/libnetwork/portmappers/nat/mapper_linux.go::MapPortsperforms two distinct host-port binds:OSAllocator.RequestPortsInRange(osallocator_linux.go) reserves a port inportallocator's in-memory map and callsbind(2)(osallocator_linux.go) as a probe, retrying on EADDRINUSE (maxAllocateAttempts,osallocator_linux.go). Under rootless Docker, this bind happens in dockerd's network namespace, i.e. rootlesskit's child netns.configPortDriver(mapper_linux.go) then callsrlkclient.PortDriverClient.AddPort, which makes rootlesskit perform a second bind in the host (parent) netns atrootlesskit/pkg/port/builtin/parent/tcp/tcp.go. On EADDRINUSE this error just propagates back to the API client, no retry (mapper_linux.go).The two netns have separate kernel port tables. The retry in step 1 catches collisions in the child netns but provides no protection against host-netns collisions in step 2. The OSAllocator's bind probe verifies a port is free in the wrong network namespace.
Proposed fix
Retry the allocate-and-bind cycle when
pdc.AddPortreports EADDRINUSE. Sadly, the the IPC layer flattens errno into a string, so unless you're willing to do a coordinated API change this will require a sentinel inside the eror message string.Reproduce
EDIT: Ugh, sorry the "reproducer" script didn't make sense, fixed it.
EDIT2: Added desync to hammer workers so the p^retries assumption holds.
Expected behavior
Port allocation succeeds.
docker version
Client: Version: 26.1.5+dfsg1 API version: 1.45 Go version: go1.24.4 Git commit: a72d7cd Built: Sat May 9 11:34:09 2026 OS/Arch: linux/amd64 Context: default Server: Engine: Version: 26.1.5+dfsg1 API version: 1.45 (minimum version 1.24) Go version: go1.24.4 Git commit: 411e817 Built: Sat May 9 11:34:09 2026 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.24~ds1 GitCommit: 1.7.24~ds1-6+deb13u1 runc: Version: 1.1.15+ds1 GitCommit: 1.1.15+ds1-2+b4 docker-init: Version: 0.19.0 GitCommit:docker info
Additional Info
No response