Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

am97
Copy link
Contributor

@am97 am97 commented Oct 4, 2024

fixes #48560

Currently, the DOCKER-USER chains are set up when the firewall is reloaded or when a network is created:

// Sets up the DOCKER-USER chain for each iptables version (IPv4, IPv6)
// that's enabled in the controller's configuration.
for _, ipVersion := range c.enabledIptablesVersions() {
if err := setupUserChain(ipVersion); err != nil {
log.G(context.TODO()).WithError(err).Warnf("Controller.NewNetwork %s:", name)
}
}

During a normal startup, the daemon creates the bridge network, so the DOCKER-USER chains are set up.

But when live-restore is enabled, there may be running containers when the daemon starts. If that's the case, the configureNetworking function will not be called:

moby/daemon/daemon_unix.go

Lines 848 to 852 in 4001d07

if len(activeSandboxes) > 0 {
log.G(context.TODO()).Info("there are running containers, updated network configuration will not take affect")
} else if err := configureNetworking(daemon.netController, cfg); err != nil {
return err
}

configureNetworking calls initBridgeDriver, which calls NewNetwork, which calls setupUserChain, so if configureNetworking isn't called the user chains won't be set up.

This is a problem if the iptables rules change while the daemon is stopped.

- What I did

I made sure the user chains are set up on startup, even if the configureNetworking function is not called

- How I did it

I put the logic for setting up user chains for IPv4 and IPv6 in a separate function, which is called in the original place in NewNetwork, but also in initNetworkController even if there are running containers.

- How to verify it

I still didn't write an integration test, I'll add it when I have some time.

To manually test:

  • dockerd --live-restore
  • Create a dummy container: docker run -d busybox sleep 300
  • Stop dockerd
  • Flush the FORWARD chain: iptables -F FORWARD
  • Run again dockerd --live-restore
  • List the rules: iptables -S FORWARD

-A FORWARD -j DOCKER-USER should be there

- Description for the changelog

After a daemon restart with live-restore, ensure an iptables jump to the DOCKER-USER chain is placed before other rules.

- A picture of a cute animal (not mandatory but encouraged)

undo

@am97 am97 force-pushed the 48560-setup-user-chains branch from b5c757e to 3590816 Compare October 4, 2024 01:59
@akerouanton
Copy link
Member

Thanks for working on this @am97!

Since most of our iptables rules live in the bridge driver, and this is the only driver that uses iptables, I think it'd best to move this rule into that package.

Unfortunately, the bridge driver has the same weird logic: create ipt chains during driver initialization, and then insert the appropriate JUMP rules when a network is created. We'll refactor that package soon to initialize all rules during driver init (we've work in progress in this area).

As such, could you move the JUMP rule insertion to the function setupIPChains in libnetwork/drivers/bridge/setup_ip_tables_linux.go?

@am97
Copy link
Contributor Author

am97 commented Oct 4, 2024

I considered setupIPChains, but the problem is that there are still modifications to the FORWARD chain after that function ends. In libnetwork/drivers/bridge/bridge_linux.go, at the end of configure, the following calls are made: initStore -> populateNetworks -> createNetwork

The createNetwork(config *networkConfiguration) function in bridge_linux.go creates the following rules in my case:

DEBU[2024-10-04T10:15:35.128135154Z] /usr/sbin/iptables, [--wait -t nat -C POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE] 
DEBU[2024-10-04T10:15:35.131805268Z] /usr/sbin/iptables, [--wait -t nat -C DOCKER -i docker0 -j RETURN] 
DEBU[2024-10-04T10:15:35.134676166Z] /usr/sbin/iptables, [--wait -t nat -I DOCKER -i docker0 -j RETURN] 
DEBU[2024-10-04T10:15:35.136706201Z] /usr/sbin/iptables, [--wait -t nat -C POSTROUTING -m addrtype --src-type LOCAL -o docker0 -j MASQUERADE] 
DEBU[2024-10-04T10:15:35.138977284Z] /usr/sbin/iptables, [--wait -t filter -C FORWARD -i docker0 -o docker0 -j DROP] 
DEBU[2024-10-04T10:15:35.141315713Z] /usr/sbin/iptables, [--wait -t filter -C FORWARD -i docker0 -o docker0 -j ACCEPT] 
DEBU[2024-10-04T10:15:35.143606560Z] /usr/sbin/iptables, [--wait -t filter -C FORWARD -i docker0 ! -o docker0 -j ACCEPT] 
DEBU[2024-10-04T10:15:35.148216205Z] /usr/sbin/iptables, [--wait -t nat -C PREROUTING -m addrtype --dst-type LOCAL -j DOCKER] 
DEBU[2024-10-04T10:15:35.152441748Z] /usr/sbin/iptables, [--wait -t nat -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER] 
DEBU[2024-10-04T10:15:35.156509526Z] /usr/sbin/iptables, [--wait -t nat -C OUTPUT -m addrtype --dst-type LOCAL -j DOCKER ! --dst 127.0.0.0/8] 
DEBU[2024-10-04T10:15:35.160239968Z] /usr/sbin/iptables, [--wait -t nat -A OUTPUT -m addrtype --dst-type LOCAL -j DOCKER ! --dst 127.0.0.0/8] 
DEBU[2024-10-04T10:15:35.163854522Z] /usr/sbin/iptables, [--wait -t filter -C FORWARD -o docker0 -j DOCKER] 
DEBU[2024-10-04T10:15:35.166495186Z] /usr/sbin/iptables, [--wait -t filter -C FORWARD -o docker0 -j DOCKER] 
DEBU[2024-10-04T10:15:35.168815993Z] /usr/sbin/iptables, [--wait -t filter -C FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT] 
DEBU[2024-10-04T10:15:35.171216379Z] /usr/sbin/iptables, [--wait -t filter -C FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT] 
DEBU[2024-10-04T10:15:35.173384343Z] /usr/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-ISOLATION-STAGE-1] 
DEBU[2024-10-04T10:15:35.175269996Z] /usr/sbin/iptables, [--wait -D FORWARD -j DOCKER-ISOLATION-STAGE-1] 
DEBU[2024-10-04T10:15:35.177192008Z] /usr/sbin/iptables, [--wait -I FORWARD -j DOCKER-ISOLATION-STAGE-1] 
DEBU[2024-10-04T10:15:35.179048537Z] /usr/sbin/ip6tables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2] 
DEBU[2024-10-04T10:15:35.181234763Z] /usr/sbin/ip6tables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2] 
DEBU[2024-10-04T10:15:35.183304637Z] /usr/sbin/ip6tables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP] 
DEBU[2024-10-04T10:15:35.185185529Z] /usr/sbin/ip6tables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP] 
DEBU[2024-10-04T10:15:35.187071976Z] /usr/sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2] 
DEBU[2024-10-04T10:15:35.189045124Z] /usr/sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2] 
DEBU[2024-10-04T10:15:35.190899872Z] /usr/sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP] 
DEBU[2024-10-04T10:15:35.192831172Z] /usr/sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP] 

The problem is the -I FORWARD -j DOCKER-ISOLATION-STAGE-1. If I move my SetupUserChains to setupIPChains, the jump to DOCKER-ISOLATION-STAGE-1 would be inserted later, so at the end, the beginning of the FORWARD chain would be:

-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -j DOCKER-USER

instead of

-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1

I can duplicate a part of the logic in setupIPChains: the equivalent of the NewChain and AddReturnRule calls in setupUserChain (libnetwork/firewall_linux.go), but the EnsureJumpRule would still be required in initNetworkController (daemon/daemon_unix.go), at the place of my current call to SetupUserChains(). What do you think about that ?

@akerouanton
Copy link
Member

Yikes, you're right! Let's keep this change as is, and we'll move this rule to the bridge driver once we shuffle things around.

@am97
Copy link
Contributor Author

am97 commented Oct 7, 2024

I see the test fails on rootless, that seems normal, as the daemon sets up an unprivileged network stack and has not access to iptables. Is it OK if I just skip the test for rootless ?

There is also this failure: https://github.com/moby/moby/actions/runs/11212045759/job/31163919736?pr=48577
But it seems unrelated (maybe a race condition ?). I ran that test on a Cgroups v1 host with TESTFLAGS='-test.run TestDockerCLIPortSuite' make test-integration-cli and TestPortList is passing.

@akerouanton
Copy link
Member

I'm also seeing this flaky test on one of my PR, so I think it's safe to ignore:

=== Failed
=== FAIL: amd64.integration-cli TestDockerCLIPortSuite/TestPortList (0.08s)
    docker_cli_port_test.go:35: assertion failed: 
        Command:  /usr/local/cli-integration/docker run -d -p 9876:80 busybox top
        ExitCode: 125
        Error:    exit status 125
        Stdout:   13575bc6f4dcdb163f8980d77af5bb042c8d3102bf81164020af7e7e404b41f9
        
        Stderr:   /usr/local/cli-integration/docker: Error response from daemon: driver failed programming external connectivity on endpoint reverent_saha (9887230c7ae65d09b9ab2c34f9421c95287ce7fa99415f568c50952dd3da4c43): failed to bind host port for 0.0.0.0:9876:172.18.0.2:80/tcp: address already in use.
        
        
        Failures:
        ExitCode was 125 expected 0
        Expected no error
    --- FAIL: TestDockerCLIPortSuite/TestPortList (0.08s)

I don't recall exactly how rootless mode interacts with iptables. I'll have a look and keep you posted.

@robmry
Copy link
Contributor

robmry commented Oct 10, 2024

This looks good to me - it just needs the skip for rootless mode (the iptables rules live in a network namespace belonging to rootlesskit, and the test is looking in the host namespace, so it just doesn't see them - I don't think the DOCKER-USER chain in the rootless netns would be used anyway).

@robmry robmry added this to the 28.0.0 milestone Oct 10, 2024
@am97 am97 force-pushed the 48560-setup-user-chains branch from ac6b5cd to 48b7adc Compare October 14, 2024 21:54
@am97
Copy link
Contributor Author

am97 commented Oct 14, 2024

I added the skip for rootless mode. Let me know if I should squash my commits.

@robmry
Copy link
Contributor

robmry commented Oct 15, 2024

I added the skip for rootless mode. Let me know if I should squash my commits.

Great - thank you! And yes, please squash them.

Just to check, in the second commit, that swaps an error return for a log line to avoid an error in tests ... what was the error? (As far as I can see, SetupUserChains only returns an error from setupUserChain, which checks whether rules already exist before it tries to change things.)

@robmry
Copy link
Contributor

robmry commented Oct 15, 2024

In the changelog comment, it'd be good to say why the change is needed. Maybe something like ...

"After a daemon restart with live-restore, ensure an iptables jump to the DOCKER-USER chain is placed before other rules."

@am97
Copy link
Contributor Author

am97 commented Oct 15, 2024

Great - thank you! And yes, please squash them.

Ok ! I'll squash them after the discussion.

Just to check, in the second commit, that swaps an error return for a log line to avoid an error in tests ... what was the error?

In bundles/test-integration/docker.log I got:

failed to start daemon: Error initializing network controller: failed to create DOCKER-USER IPV6 chain: iptables failed: ip6tables --wait -t filter -N DOCKER-USER: ip6tables v1.8.9 (legacy): can't initialize ip6tables table `filter': Table does not exist (do you need to insmod?)
Perhaps ip6tables or your kernel needs to be upgraded.
 (exit status 3)

Here is the full docker.log after running TEST_SKIP_INTEGRATION_CLI="true" make test-integration on my first commit: docker.log

The host was a Debian 11 with a default installation of Docker 27.3.1, which doesn't enable IPv6, so the ip6tables commands fail inside the test container. Other fails throw a warning, so I did the same with my second commit, but thinking twice about it I'm not sure if it's the right approach, maybe I should leave the return err ? (and add a note to the test docs to enable ipv6 for local tests)

In the changelog comment, it'd be good to say why the change is needed. Maybe something like ...

"After a daemon restart with live-restore, ensure an iptables jump to the DOCKER-USER chain is placed before other rules."

That's better, thanks ! I updated the description.

@robmry
Copy link
Contributor

robmry commented Oct 16, 2024

Thanks @am97, that makes sense. The error was only logged the error before, so we should keep it that way ... the change looks fine as it is.

Currently, the DOCKER-USER chains are set up on firewall reload or network
creation. If there are running containers at startup, configureNetworking won't
be called (daemon/daemon_unix.go), so the user chains won't be setup.

This commit puts the setup logic on a separate function, and calls it on the
original place and on initNetworkController.

Signed-off-by: Andrés Maldonado <[email protected]>
@am97 am97 force-pushed the 48560-setup-user-chains branch from 48b7adc to a8bfa83 Compare October 16, 2024 20:46
@am97 am97 changed the title WIP: Fix: setup user chains during libnetwork controller initialization Fix: setup user chains during libnetwork controller initialization Oct 16, 2024
@am97
Copy link
Contributor Author

am97 commented Oct 16, 2024

Ok ! I squashed my commits, the PR is ready to be merged

Copy link
Contributor

@robmry robmry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you.

}

if err := daemon.netController.SetupUserChains(); err != nil {
log.G(context.TODO()).WithError(err).Warnf("initNetworkController")
Copy link
Member

@akerouanton akerouanton Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not a good idea to just log this error, even if that was the old behavior.

Based on previous messages, it's not entirely clear why integration tests were failing.

The host was a Debian 11 with a default installation of Docker 27.3.1, which doesn't enable IPv6

Since IPv6 is now enabled by default, making IPv6-ness a requirement for running integration tests seems to be reasonable. Even with net.ipv6.conf.default.disable_ipv6=1 set on the host, make shell should offer an IPv6-capable dev container. Also, IPv6-ness doesn't imply IPv6 connectivity - so it should be fine in most cases.

We also support running integration tests outside of make shell dev containers, but then it's dev's responsibility to correctly configure their host.

IMHO initNetworkController should error out if user chains can't be configured.

Copy link
Member

@akerouanton akerouanton Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @robmry internally, I think that's something we should fix separately.

#47918 turned some IPv6 errors into logs to allow users with non-IPv6-capable hosts to still start the daemon, even without --ip6tables=false specified. If we error out when SetupUserChains fails, we'll break that.

We should look at auto-detecting IPv6-ness and disabling ip6tables when it's not supported in a follow-up PR. Then, if the host is found to be IPv6-capable, but SetupUserChains fails, it'd be a hard error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOCKER-USER chain not being used or created
5 participants