-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Confidential Containers - Skip pullimage for runtimes that are handling it #8008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
A friendly reminder that this PR had no activity for 30 days. |
df446ee to
09f2c65
Compare
ad5e548 to
6c24d32
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8008 +/- ##
==========================================
- Coverage 46.96% 46.71% -0.25%
==========================================
Files 150 150
Lines 21873 22056 +183
==========================================
+ Hits 10272 10303 +31
- Misses 10538 10683 +145
- Partials 1063 1070 +7 🚀 New features to boost your workflow:
|
|
/retest |
|
A friendly reminder that this PR had no activity for 30 days. |
6c24d32 to
986d122
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: littlejawa The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
986d122 to
f5f3bd2
Compare
f5f3bd2 to
5cd1f22
Compare
| someNameOfTheImage, imageID, someRepoDigest, imageAnnotations, err := s.getInfoFromImage(userRequestedImage) | ||
| if err != nil { | ||
| return nil, err | ||
| if isRuntimePullImage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we using the artifact functions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah @saschagrunert suggested this. it feels weird to me to use the artifact functions here, why did we go this route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the image is not there, we need a way to access the remote repo's metadata to get all needed info, without pulling the full image.
In a way or another, I need to access the data without pulling the layers. The containers/image API did not allow me to do that (or I missed it?) so Sascha suggested this alternative, which allows me to get just the manifest and config that I need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think the approach is good but maybe naming the functions differently so it's clearer that's what we're doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can split the existing bits if required. 👍
|
@mtrmac would you be able to PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m afraid I don’t understand the full problem space.
Still, this seems closer to a naive prototype, with a lot of divergence from the current CRI-O behavior.
- Doing the pull outside of
PullImagemeans Pod’s image pull secrets will not be available. I think that’s simply not viable, especially for the security-paranoid customers this feature might attract. So if the VM API requires the pull to happen only later in CreateContainer, it seems to me that the VM API needs to be redesigned now, before this could gain (any more) users. - Users, sadly (and the more “enterprise” they are, the worse), rely on short names, and tags. These things must be resolved once into a stable image identifier (what
PullImagereturns as anImageRef). This PR does not implement the (regrettable) unqualified search at all, and it is adding code paths with tag-change races all over the place. - I have no idea what it means to have
StorageImageIDvalues that no longer refer to local storage, with nothing added to support that. I suspect that things break all over the codebase (for starters, theIsRunningImageAllowedcall will simply fail — and that code path is used by default on all RHEL and OpenShift systems).- E.g. what happens with
ImageStatus(when Kubelet asks, checking whether it needs to pull at all)? WithListImagesand Kubelet-driven GC? - Overall I find this PR very unlikely to be correct. If this PR were a complete reimplementation of
ImageServerfor the VM backend, I don’t know, my eyes would probably glaze over but I would think that it’s possible that the rest of CRI-O can work with no changes. Just injectingStorageImageIDvalues that don’t refer to anything… I can’t believe that works.
- E.g. what happens with
| name = reference.TagNameOnly(name) // make sure to add ":latest" if needed | ||
|
|
||
| ref, err := o.impl.NewReference(name) | ||
| src, err := o.getSourceFromImageName(ctx, img, opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(pre-existing?) src must be closed, otherwise this leaves around keep-alive HTTP connections.
| opts = &PullOptions{} | ||
| } | ||
|
|
||
| src, err := o.getSourceFromImageName(ctx, img, opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating an ImageSource is fairly costly (HTTP round-trips); it should only be done once, not from scratch for every operation. Same for GetConfig.
Overall, the way this file has been refactored seems mostly misguided to me; it introduces abstraction boundaries in wrong places.
| } | ||
|
|
||
| unparsedInstance := unparsedToplevel | ||
| if manifest.MIMETypeIsMultiImage(topMIMEType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is pretty surprising for GetConfig to do its own manifest resolution. Why isn’t that centralized?
This is both costly (fetching manifests twice or more), and risks that the code will diverge (… as it seems to already have done).
| } | ||
|
|
||
| func (o *OCIArtifact) getParsedManifest(ctx context.Context, src types.ImageSource, opts *PullOptions) (manifest.Manifest, error) { | ||
| manifestBytes, mimeType, err := o.impl.GetManifest(ctx, src, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetManifest means the caller is responsible for validating digest references; this one isn’t doing so.
Use UnparsedInstance instead; that can also handle caching to avoid repeated requests about the same image.
|
|
||
| // SpecAddAnnotations adds annotations to the spec. | ||
| SpecAddAnnotations(ctx context.Context, sb SandboxIFace, containerVolume []oci.ContainerVolume, mountPoint, configStopSignal string, imageResult *storage.ImageResult, isSystemd bool, seccompRef, platformRuntimePath string) error | ||
| SpecAddAnnotations(ctx context.Context, sb SandboxIFace, containerVolume []oci.ContainerVolume, mountPoint, configStopSignal string, imageName *references.RegistryImageReference, imageID *storage.StorageImageID, isSystemd bool, seccompRef, platformRuntimePath string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have spent a lot of effort (and attracted a lot of disagreement and controversy) to explicitly document the weird semantics of these values, by using names like SomeNameOfThisImage, as well as accompanying comments.
So, naturally, I want all those caveats to be preserved and the comments to be duplicated if this is going to stop referencing the existing data structures.
| if cfg == nil || cfg.Metadata == nil { | ||
| return "", nil | ||
| } | ||
| name := strings.Join([]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to avoid a FOURTH copy of the same hard-coded convention?
| rhName, rh := s.GetRuntimeHandlerForPod(ctx, req.SandboxConfig) | ||
| if rh != nil { | ||
| if rh.RuntimePullImage { | ||
| // don't pull the image in crio, as it will be done by the runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the image does not exist at all?
What happens if the network is flaky? This effectively blinds Kubelet’s visibility into pulls, and breaks its retry logic.
The credentials provided by Kubelet are discarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Kubelet can gather credentials for several sources and will loop over them in PullImage until it finds credentials accepted by the registry.
I don’t particularly like that behavior but that’s one more reason for PullImage to be supported properly, whatever that means (a RPC to the VM?).
| // don't pull the image in crio, as it will be done by the runtime | ||
| log.Debugf(ctx, "Skip image pull for runtime %s - image %s", rhName, image) | ||
| return &types.PullImageResponse{ | ||
| ImageRef: req.Image.Image, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowadays, the returned ImageRef is always a digested reference. If this code path starts putting less structured user-controlled data into the field, I don’t immediately know what that means. E.g. does ImageStatus accept these values?
| // don't pull the image in crio, as it will be done by the runtime | ||
| log.Debugf(ctx, "Skip image pull for runtime %s - image %s", rhName, image) | ||
| return &types.PullImageResponse{ | ||
| ImageRef: req.Image.Image, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least per lines just above, req.Image can be nil. And the code has already dealt with that, and the line above uses the result; i.e. the debug log and the actually-returned value are not consistent.
| return parsedManifest, nil | ||
| } | ||
|
|
||
| func (o *OCIArtifact) GetConfig(ctx context.Context, img string, opts *PullOptions) (*v1.Image, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, artifacts don’t have image configs. If they have image configs, they are images. (I can see a semantic argument that “image is an artifact”, but… what are we doing here?)
I realize that, at least with #7471, many parts of that are already true:
But, well, it’s been over a year since that PR. If it is not going to be handled now, is it ever going to be handled? |
|
Hey @mtrmac , Let me try to answer some of your questions:
Name resolution needs to happen on the VM side, where the pull occurs.
That is actually done on purpose. The problem we're trying to address with Confidential Containers is when customers want to run a sensitive workload on a cluster they don't trust. The current approach in CoCo is to provision the image pull secret inside the VM before image pull. This secret along with any other configs like the container policy, signature verification keys, decryption keys are provisioned after verification of the VM environment. Post that the image pull is executed.
My feeling is that, from CRI-O's perspective, there is just a container being created.
Me neither :) Some questions from me about that (also to @haircommander): @mtrmac and @saschagrunert mentioned the ImageServer. I can make one used exclusively for the Confidential use case, but wouldn't it still need to output some references to (fake) storage IDs? I didn't go that route because it didn't seem to fix my problem, but I'm willing to go that way if it helps. Again, any suggestion is welcome. |
See a parallel conversation; I think CRI-O and the VM must structurally be setup to so that they agree on the identity of the image which is used.
I have only a very vague idea of how confidential containers work, so this might be completely wrong: I thought confidential containers are confidential by encrypting the contents of the image; in this threat model, the encrypted contents of the image are ~public enough (e.g. it must be possible to copy the container image to the cloud where it is to be run). Is it really the case that the registry credentials must be invisible to CRI-O = that the registry is a trusted component in the confidential container design?
And all of that happens invisible to CRI-O? That… complicates maintenance a lot. How is feature parity going to be maintained over time if there is a parallel implementation invisible to the CRI-O codebase (and drive-by contributors ~like me not being aware of it at all)?
Kubelet is still going to report on node’s disk usage, and it is still going to run its image garbage collection loop, isn’t it? And possibly try to evict / make scheduling decisions based on node’s disk usage. AFAICS right now that is all based on the filesystem hosting the node’s c/storage store, with no idea that the VMs might be elsewhere, possibly unaware that the node’s filesystem is critically overloaded.
As a first guess, I think all of that code ~assumes that images exist locally in c/storage. If that is not going to be the case, that code needs to either be, function by function, modified to handle in-VM images, or analyzed to make an argument that CRI-O does not actually need to do anything because $reason. Hypothetically it might be the case that everything is in the latter category and we can just fake values, but I think that’s very unlikely. I’d expect at least IOW, if the design is that we have to very different storage backends, I’d expect that to be directly reflected / modeled in a CRI-O codebase. It can’t be one or two And if the same node could contain both c/storage images and in-VM images simultaneously… that would also need to be explicitly modeled. Some good news is that |
Also, some of the APIs / code paths are ambiguous WRT whether they accept registry references or storage IDs. In some cases Kubelet does not actually care, to the extent that it just obtains the value from one CRI API and provides it to another ({ |
yeah I agree I think any place we use storageID we'll probably want to evaluate how to delegate the understanding of that image to the VM. is it possible to have the vm tell CRI-O what the storage ID is? then, cri-o can populate that everywhere. IT also could use the Pinned CRI field to make sure the kubelet doesn't consider it for garbage collection, while still reporting it so securityy scanners and the like can identify them. or is that too much information passed up to cri-o @littlejawa |
@haircommander - We don't have a way in our current API to report that information. Now reading mtrmak comment I've tried to copy the same data type, assuming this was a hard requirement. But maybe that's just wrong? If we make an abstraction at the ImageServer level, can we then manage another kind of ID that would be VM-specific? If the goal is to have a 1:1 relationship, making sure an ID refers to a uniquely identifiable image, maybe we can find a way to build this ID based on what CRI-O know about the pod, the runtime process that manages the VM, etc... |
|
I think this needs to start with “what are the immutable parts”. Then we figure out what the behavior should be. And last what the internal structure of the code to implement that behavior should be. (In principle, this is Open Source, and everything can be changed.). But, let’s say that the CRI, and most of Kubelet’s behavior are immutable. IIRC, Kubelet, primarily:
That needs to be carefully re-analyzed, but let’s start with that as a hypothesis. Now, Kubelet has fairly strict expectations on the format of userInput; for imageID, IIRC it doesn’t care ~at all — but the values show up in K8s resources and are user-visible, so it’s possible that various users’ custom scripts expect things. So, what should the above map to with Confidential Containers?
It is CRI-O’s ~choice to use the c/storage.Image.ID value as imageID in the CRI API; it plausibly could do something else (but, see above about user-visible values and custom scripts). If Confidential Containers don’t have c/storage, they obviously don’t have c/storage.Image.ID. Still, it might turn out to be necessary to produce exactly the same values (typically = config digest without My (fairly strong) preference is that the CRI-O codebase should retain the strong type separation between various string values. So there should be some “image ID” type, distinct from “user input” or “repo@digest reference” and others. That might be a single type with a single syntax, or maybe a holder wrapping a interface, or something else, I don’t currently have an opinion — that follows from choosing and designing behavior. |
|
A friendly reminder that this PR had no activity for 30 days. |
|
For the records - I'm still looking into that, but have been pre-empted by other tasks. Now we have an implementation of Confidential Containers on the containerd side that works with encrypted containers. |
|
A friendly reminder that this PR had no activity for 30 days. |
|
A friendly reminder that this PR had no activity for 30 days. |
|
@littlejawa: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
A friendly reminder that this PR had no activity for 30 days. |
|
A friendly reminder that this PR had no activity for 30 days. |
|
A friendly reminder that this PR had no activity for 30 days. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
With #7471 we introduced a way to support Confidential Container's feature of pulling the image in the guest VM.
When this happens, cri-o is still pulling the image on the host, because kubernetes keeps sending the "PullImageRequest", and cri-o has no reason to not process it.
This has two drawbacks:
This failure will block the container creation, as kubernetes will not proceed with CreateContainer if PullImage failed.
This PR is meant to make crio skip the pull image phase when the runtime is configured with the "runtime_pull_image" flag introduced in #7471
Which issue(s) this PR fixes:
Fixes: #8261
Special notes for your reviewer:
I have modified the code in crio to skip the pull image processing when the runtime is configured with the "runtime_pull_image" flag.
Crio then just reports a success status to kubernetes, which is happy with that, even when subsequent ImageList or ImageStatus report the image is missing.
Now without an image in the local store, cri-o will fail to create the container. This required further changes in create_sandbox_container and runtimeService, to get information from the remote repository, without pulling the image.
I kept those changes in separate commits for easier review, but I suspect it would make sense to squash them, and get a single, functional commit.
Does this PR introduce a user-facing change?