[1.18] crio wipe: ensure a clean shutdown #3984

haircommander · 2020-07-20T22:00:08Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

there are cases where crio doesn't get the chance to sync before shutdown.
In these cases, container storage can be corrupted.
We need to protect against this case by wiping all of storage if we detect we didn't cleanly shutdown.

Add an option to specify a clean_shutdown_file that crio will create upon syncing at shutdown
Add an option to crio-wipe to clear all of storage if that file is not present
Add integration tests to verify

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

add clean_shutdown_file option to allow crio/crio wipe to verify crio had time to shutdown cleanly

mrunalp · 2020-07-20T22:18:41Z

cmd/crio/main.go

 			logrus.Fatal(err)
 		}
+		// Finally, we clear out the shutdown file
+		if err := os.Remove(config.CleanShutdownFile); err != nil {


As a follow-on we also want to report unclean shutdown as a prometheus metric.

mrunalp · 2020-07-20T22:37:18Z

server/server.go

+
+	syscall.Sync()
+
+	f, err := os.Create(s.config.CleanShutdownFile)


We will have to sync this file :)

getting this atomic is challenging. If we call Create, then immediately after call Sync(), are we guarenteed to have the storage to sync before CleanShutdownFile? If not, we could believe we shutdown cleanly, but actually still have a corrupted storage. In this case, I'd rather let the Create() call be sure to happen after storage is on disk.

If you'd rather, I can add another Sync() after the create, though.

We can do a separate fsync for this file and worst case if it fails then we end up wiping storage. The fsync for this file is to reduce the possibility of that happening. We could still get power cable yanked after the sync and before the fsync of this file and that's okay.

you could just fsync the parent directory, as in the suggestion I've made above.

I don't think it matters much to get it atomic, in unlikely case of a crash between the file creation and the fsync for the parent directory, the file won't be found on the next reboot and the storage is wiped out

saschagrunert · 2020-07-21T06:30:16Z

completions/fish/crio.fish

-complete -c crio -n '__fish_crio_no_subcommand' -f -l selinux -d 'Enable selinux support (default: false)'
+complete -c crio -n '__fish_crio_no_subcommand' -f -l selinux -d 'Enable selinux support (default: true)'


I guess this causes CI issus.

damn it keeps sneaking in there 😃

saschagrunert · 2020-07-21T06:30:27Z

cmd/crio/wipe.go

 		return err
 	}
+	if len(crioContainers) != 0 {
+		logrus.Infof("wiping containers")


Suggested change

logrus.Infof("wiping containers")

logrus.Info("Wiping containers")

saschagrunert · 2020-07-21T06:31:35Z

cmd/crio/main.go

+		// Finally, we clear out the shutdown file
+		if err := os.Remove(config.CleanShutdownFile); err != nil {
+			// not a fatal error, as it could have been cleaned up
+			logrus.Errorf(err.Error())


Suggested change

logrus.Errorf(err.Error())

logrus.Error(err)

saschagrunert · 2020-07-21T06:34:06Z

pkg/config/template.go

+# Location for CRI-O to lay down the clean shutdown file.
+# It is used to check whether crio had time to sync before shutting down.
+# If not, crio wipe will clear the storage directory.
+clean_shutdown_file = "{{ .CleanShutdownFile }}"


Do we want to be able to disable this feature if clean_shutdown_file = "" or commented out?

I have added this

giuseppe · 2020-07-21T06:54:27Z

cmd/crio/main.go

 			logrus.Fatal(err)
 		}
+		// Finally, we clear out the shutdown file
+		if err := os.Remove(config.CleanShutdownFile); err != nil {


let's play safe here and add a:

f, err := os.OpenFile(filepath.Dir(config.CleanShutdownFile), os.O_RDONLY, 0755) if err != nil { ... } defer f.Close() if err = syscall.Fsync(int(f.Fd())); err != nil { ... }

after the file is removed, so we are sure the parent directory is synced to disk

giuseppe · 2020-07-21T07:03:13Z

server/server.go

+
+	// first, make sure we sync all storage changes
+	syscall.Sync()
+


it will be safer to use Syncfs in addition to sync so we have some error code:

f, err := os.OpenFile(store.GraphRoot(), os.O_RDONLY, 0755) if err != nil { ... } defer f.Close() if err = unix.Syncfs(int(f.Fd())); err != nil { ... }

so we are sure the file system holding the graphroot is correctly synced.

The cost should be minimal after the full sync and we have an error if something goes wrong

I went for Fsync as it seems to be more of what we want

giuseppe

good work, I left some comments for the sync machinery

TomSweeneyRedHat · 2020-07-21T12:28:35Z

completions/fish/crio.fish

 complete -c crio -n '__fish_crio_no_subcommand' -f -l apparmor-profile -r -d 'Name of the apparmor profile to be used as the runtime\'s default. This only takes effect if the user does not specify a profile via the Kubernetes Pod\'s metadata annotation.'
 complete -c crio -n '__fish_crio_no_subcommand' -f -l bind-mount-prefix -r -d 'A prefix to use for the source of the bind mounts. This option would be useful if you were running CRI-O in a container. And had `/` mounted on `/host` in your container. Then if you ran CRI-O with the `--bind-mount-prefix=/host` option, CRI-O would add /host to any bind mounts it is handed over CRI. If Kubernetes asked to have `/var/lib/foobar` bind mounted into the container, then CRI-O would bind mount `/host/var/lib/foobar`. Since CRI-O itself is running in a container with `/` or the host mounted on `/host`, the container would end up with `/var/lib/foobar` from the host mounted in the container rather then `/var/lib/foobar` from the CRI-O container. (default: "")'
 complete -c crio -n '__fish_crio_no_subcommand' -f -l cgroup-manager -r -d 'cgroup manager (cgroupfs or systemd)'
+complete -c crio -n '__fish_crio_no_subcommand' -l clean-shutdown-file -r -d 'Location for CRI-O to lay down the clean shutdown file. It indicates whether we\'ve had time to sync changes to disk before shutting down. If not, crio wipe will clear the storage directory'


maybe?

Suggested change

complete -c crio -n '__fish_crio_no_subcommand' -l clean-shutdown-file -r -d 'Location for CRI-O to lay down the clean shutdown file. It indicates whether we\'ve had time to sync changes to disk before shutting down. If not, crio wipe will clear the storage directory'

complete -c crio -n '__fish_crio_no_subcommand' -l clean-shutdown-file -r -d 'Location for CRI-O to lay down the clean shutdown file. It indicates whether we\'ve had time to sync changes to disk before shutting down. If not found, crio wipe will clear the storage directory'

TomSweeneyRedHat · 2020-07-21T12:30:09Z

docs/crio.8.md


 **--cgroup-manager**="": cgroup manager (cgroupfs or systemd) (default: systemd)

+**--clean-shutdown-file**="": Location for CRI-O to lay down the clean shutdown file. It indicates whether we've had time to sync changes to disk before shutting down. If not, crio wipe will clear the storage directory (default: /var/lib/crio/clean.shutdown)


if you take found above, add it here too

TomSweeneyRedHat · 2020-07-21T12:30:28Z

docs/crio.conf.5.md

+**clean_shutdown_file**="/var/lib/crio/clean.shutdown"
+  Location for CRI-O to lay down the clean shutdown file.
+  It is used to check whether crio had time to sync before shutting down.
+  If not, crio wipe will clear the storage directory.


maybe found here too

TomSweeneyRedHat · 2020-07-21T12:30:48Z

internal/criocli/criocli.go

 		},
+		&cli.StringFlag{
+			Name:      "clean-shutdown-file",
+			Usage:     "Location for CRI-O to lay down the clean shutdown file. It indicates whether we've had time to sync changes to disk before shutting down. If not, crio wipe will clear the storage directory",


ditto found

there are cases where crio doesn't get the chance to sync before shutdown. In these cases, container storage can be corrupted. We need to protect against this case by wiping all of storage if we detect we didn't cleanly shutdown. Add an option to specify a clean_shutdown_file that crio will create upon syncing at shutdown Add an option to crio-wipe to clear all of storage if that file is not present Add integration tests to verify Signed-off-by: Peter Hunt <[email protected]>

codecov · 2020-07-21T19:55:41Z

Codecov Report

Merging #3984 into release-1.18 will decrease coverage by 0.09%.
The diff coverage is 15.62%.

@@               Coverage Diff                @@
##           release-1.18    #3984      +/-   ##
================================================
- Coverage         40.82%   40.72%   -0.10%     
================================================
  Files               106      106              
  Lines              8703     8734      +31     
================================================
+ Hits               3553     3557       +4     
- Misses             4837     4860      +23     
- Partials            313      317       +4

saschagrunert

LGTM

openshift-ci-robot · 2020-07-22T06:52:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

saschagrunert · 2020-07-22T06:52:57Z

/retest

saschagrunert · 2020-07-22T07:21:10Z

/retest

saschagrunert · 2020-07-22T09:30:18Z

/retest

rhatdan · 2020-07-22T10:22:32Z

/test kata-containers

saschagrunert · 2020-07-22T10:47:25Z

/retest

saschagrunert · 2020-07-22T11:11:48Z

Hm, kata seems broken unfortunately:

11:02:26 #   `OVERRIDE_OPTIONS="--additional-devices /dev/null:/dev/qifoo:rwm" start_crio' failed
11:02:26 # time="2020-07-22T11:02:22Z" level=error msg="error opening storage: /dev/sdb is already part of a volume group \"storage\": must remove this device from any volume group or provide a different device"
11:02:26 # time="2020-07-22T11:02:24Z" level=fatal msg="failed to connect: failed to connect, make sure you are running as root and the runtime has been started: context deadline exceeded"
11:02:26 # time="2020-07-22T11:02:26Z" level=fatal msg="failed to connect: failed to connect, make sure you are running as root and the runtime has been started: context deadline exceeded"

haircommander · 2020-07-22T13:51:33Z

/test kata-containers

openshift-ci-robot · 2020-07-22T14:14:03Z

@haircommander: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/kata-jenkins	`564915e`	link	`/test kata-containers`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2020-07-22T14:14:03Z

@haircommander: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/kata-jenkins	`564915e`	link	`/test kata-containers`

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

umohnani8 · 2020-07-22T15:31:13Z

LGTM

haircommander · 2020-07-22T15:39:42Z

/hold

let's get master version in first, and let it sit a bit

sdodson · 2020-07-30T12:58:33Z

let's get master version in first, and let it sit a bit

Where "a bit" is along the lines of days/weeks of soak time and careful scrutiny please. Master PR is #3999 just for anyone else who tracks the 4.6 BZ to this PR and wonders where the master branch PR is.

haircommander · 2020-10-13T20:38:59Z

I do not think we want this anymore

haircommander requested review from mrunalp and runcom as code owners July 20, 2020 22:00

openshift-ci-robot requested review from sameo and umohnani8 July 20, 2020 22:00

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2020

mrunalp reviewed Jul 20, 2020

View reviewed changes

haircommander force-pushed the clean-shutdown-1.18 branch 2 times, most recently from d83f893 to 23f6d8e Compare July 20, 2020 23:24

saschagrunert reviewed Jul 21, 2020

View reviewed changes

giuseppe mentioned this pull request Jul 21, 2020

[master] Revert "container_server: disable fdatasync() for atomic writes" #3975

Merged

giuseppe reviewed Jul 21, 2020

View reviewed changes

TomSweeneyRedHat reviewed Jul 21, 2020

View reviewed changes

haircommander force-pushed the clean-shutdown-1.18 branch 2 times, most recently from e84579d to 28241c6 Compare July 21, 2020 15:42

haircommander force-pushed the clean-shutdown-1.18 branch from 28241c6 to 564915e Compare July 21, 2020 19:38

saschagrunert approved these changes Jul 22, 2020

View reviewed changes

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2020

haircommander closed this Oct 14, 2020


		syscall.Sync()

		f, err := os.Create(s.config.CleanShutdownFile)

		complete -c crio -n '__fish_crio_no_subcommand' -f -l selinux -d 'Enable selinux support (default: false)'
		complete -c crio -n '__fish_crio_no_subcommand' -f -l selinux -d 'Enable selinux support (default: true)'

	logrus.Infof("wiping containers")
	logrus.Info("Wiping containers")


		// first, make sure we sync all storage changes
		syscall.Sync()


		--cgroup-manager="": cgroup manager (cgroupfs or systemd) (default: systemd)

		--clean-shutdown-file="": Location for CRI-O to lay down the clean shutdown file. It indicates whether we've had time to sync changes to disk before shutting down. If not, crio wipe will clear the storage directory (default: /var/lib/crio/clean.shutdown)

[1.18] crio wipe: ensure a clean shutdown #3984

[1.18] crio wipe: ensure a clean shutdown #3984

Uh oh!

Conversation

haircommander commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrunalp Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giuseppe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 21, 2020

Codecov Report

Uh oh!

saschagrunert left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Jul 22, 2020

Uh oh!

saschagrunert commented Jul 22, 2020

Uh oh!

saschagrunert commented Jul 22, 2020

Uh oh!

saschagrunert commented Jul 22, 2020

Uh oh!

rhatdan commented Jul 22, 2020

Uh oh!

saschagrunert commented Jul 22, 2020

Uh oh!

saschagrunert commented Jul 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haircommander commented Jul 22, 2020

Uh oh!

haircommander commented Jul 20, 2020 •

edited

Loading

mrunalp Jul 20, 2020 •

edited

Loading

saschagrunert commented Jul 22, 2020 •

edited

Loading