Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

destijl
Copy link
Member

@destijl destijl commented May 8, 2017

This replaces #454 and is an updated version so we could continue to move the design forward as andrew didn't have bandwidth to continue.

I've updated the doc with most of the feedback and split out a few design options of increasing complexity so we can decide where to go next.

Based on the work that's been done in:
https://docs.google.com/document/d/1lFhPLlvkCo3XFC2xFDPSn0jAGpqKcCCZaNsBAv8zFdE/.

Related:
kubernetes/enhancements#92
kubernetes/kubernetes#32579
kubernetes/kubernetes#12742

@kubernetes/sig-auth-misc

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label May 8, 2017
AEAD, using the standard Go library for AES-GCM.

Each encryption provider will have a unique string identifier to ensure
versioning of contents on disk and to allow future schemes to be replaced.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

versioning of the [cipher text] on disk and to allow future schemes to be [added].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I think "disk" was also misleading, changed to "in etcd".

io.ReadFull(crypto_rand.Reader, nonce)
authenticatedData := ETCD_KEY
cipherText := aead.Seal(nil, nonce, value, authenticatedData)
storedData := providerId + keyId + base64.Encode(nonce + authenticatedData + cipherText)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is just the pseudo code, but couldn't this use a struct to store these separate pieces instead of string munging/byte manipulation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stored data being a struct has been discussed - it comes with additional complexity in cost and perf. Prefixing is the cheapest option that requires no new serialization struct.

Our proto at rest format follows this pattern (magic prefix to make the bytes recognizable).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

1. A new encryption key is created.
1. The key is added to a file on the API master with metadata including an ID
and an expiry time. Subsequent calls to rotate will prepend new keys to the
file such that the first key is always current.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such that the first key is always the [key to use for encryption].

user. To enable encryption a user calls PUT on a /rotate API endpoint:

1. A new encryption key is created.
1. The key is provided back to the caller for persistent storage. It only lives
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for persistent storage. [Within the cluster, it] only lives

The user will also need to specify the encryption provider and the resources to
encrypt as follows:
```yaml
--key-encryption-key-db-path=/path/to/key-encryption-key/db
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/--key-encryption-key-db-path/--encryption-provider-config/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. The list of DEKs being used by the master is updated in etcd so that the new
key is the current write key and is available to all masters. TODO: Is there
a race here where the etcd write is finished but not all masters have the
data?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a lag where some master may not be using the new write key, but as long as they all have it as a read key, it isn't important. Some will write with the new key, some will write with the old key, but all will be able to read with either key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

new key is in the list of read keys.
1. The list of DEKs being used by the master is updated in etcd so that the
new key is in the list of read keys available to all masters. TODO: seems
racey?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the race condition because not all masters will have the read key at the same time? If so, as long as no one is writing with that key, it doesn't matter. This step prepares all the masters for the safe transition to writing with the new key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Make new key available everywhere before anyone writes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Pros:

- ?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most closely matches the pattern that will be used for integrating with external encryption systems. Amazon KMS, Google KMS and HSM will serve the purpose of KEK storage rather than local disk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 10, 2017
AEAD, using the standard Go library for AES-GCM.

Each encryption provider will have a unique string identifier to ensure
versioning of contents on disk and to allow future schemes to be replaced.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I think "disk" was also misleading, changed to "in etcd".


TODO: Decide if
[secretbox](https://godoc.org/golang.org/x/crypto/nacl/secretbox) should be used
instead.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton did you want to switch to this? I got an internal crypto reviewer to look and he agreed with Diogo that XSalsa20 and Poly1305 should be OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than

  1. pulling in a newer library (which has some small risk) which is not part of go standard crypto
  2. requiring us to manage AEAD ourselves (bigger concern)
  3. the cache attack is not much of a concern on x86 with AES-NI, but is more so on ARM

I'm not terribly concerned. I'm actually less concerned with having two impls than just having secretbox in the short term.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. Put this in an alternatives considered section.

io.ReadFull(crypto_rand.Reader, nonce)
authenticatedData := ETCD_KEY
cipherText := aead.Seal(nil, nonce, value, authenticatedData)
storedData := providerId + keyId + base64.Encode(nonce + authenticatedData + cipherText)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

The user will also need to specify the encryption provider and the resources to
encrypt as follows:
```yaml
--key-encryption-key-db-path=/path/to/key-encryption-key/db
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

new key is in the list of read keys.
1. The list of DEKs being used by the master is updated in etcd so that the
new key is in the list of read keys available to all masters. TODO: seems
racey?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. The list of DEKs being used by the master is updated in etcd so that the new
key is the current write key and is available to all masters. TODO: Is there
a race here where the etcd write is finished but not all masters have the
data?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Pros:

- ?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@destijl
Copy link
Member Author

destijl commented May 11, 2017

@andrewsykim @stevesloka I think you may have had use cases for this. Can you comment on the three options above? In particular if option 1 satisfies any hard requirements you may have?

@destijl
Copy link
Member Author

destijl commented May 12, 2017

I just added a configuration section to call out the api server options we're proposing adding. @jcbsmpsn @smarterclayton are we ready to make the call on going with option 1? FWIW it gets my vote.

aead := cipher.NewGCM(c.block)
keyId := primaryKeyId

// string prefix chosen over a struct to mimimize complexity and for write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed.

@stevesloka
Copy link

@destijl did my comments show up? I don't see now, but did from phone, wanted to make sure I replied to your comment.

user will specify:

```yaml
--encryption-provider-config=/path/to/config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide an example configuration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs more thinking, but yes.

In order to take an API driven approach for key rotation, new API objects will
be defined:

* Key Encryption Key (KEK) - key used to unlock the Data Encryption Key. Stored
Copy link
Contributor

@0xmichalis 0xmichalis May 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an API object? If so, please add it below with the rest of the objects, otherwise, you'll need to mention it before this section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an API object but won't be exposed over REST. It would be like a component config or kubeconfig.

1. Confirm that all masters have the new DEK for reading. Key point here is that
all readers have the new key before anyone writes with it.
1. The list of DEKs being used by the master is updated in memory so that the
new key is is the current write key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

new key is in the list of read keys.
1. The list of keys being used by the master is updated in memory so that the
new key is is the current write key.
1. All secrets are re-encrypted with the new key.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected volume of secrets here? Is there a performance impact of choosing only a single layer of keys vs. envelope encryption (DEK/KEK)?
A lot of customers I talk to are in the 100-1000 secrets range, so this should be fine, but worth considering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good question, 100-1000 seems about the right range to me. I wonder if the original secrets design had anything to say about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We routinely deal with clusters that have 100k and up secrets. In practice, it doesn't matter how many they are, they still have to be rotated. However, they don't have to be rotated synchronously - the whole point of having multiple keys is to allow that process to be done in the background. You have to execute a no-op PUT to every secret and verify you get a resource change (no resource change means the secret is already stored with the new key). Once you've completed that, you can remove the old key (it's zero cost to leave it, other than search order though).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is the same feature we use to move stored API resources to a new storage version)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. This seems like me to a push to envelope encryption model

1. Encrypt DEK with KEK[N+1]

Each rotation generates a new KEK and DEK. Two DEKs will be in-use temporarily
during rotation, but only one at steady-state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the rotation need to generate a new DEK? Or rather, in a non-emergency rotation (say, with automated rotation), this adds a lot of complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to reason through this (feel free to correct me):

  1. Rotation of just the KEK time-limits exposure of the previous KEK where the attacker didn't also retrieve the DEKs.

  2. DEKs need to be rotated every 2^32 writes somehow.

  3. We need to rotate everything if DEKs and KEKs got exposed.

Given we need to write code for 2 and 3, is there much advantage shooting just for 1?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: 2, ahh I see my confusion. Recommendation would be to generate a new DEK every write (not change on reads)
re: 3, yes, but this is (hopefully) a rare case. We still need to write code but it shouldn't be built into the 'default' rotation model.

Cons:

- End state is still KEKs on disk on the master. This is equivalent to the much
simpler list of keys on disk in terms of key management and security.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred end state is a combination of 2 & 3 - local DEK are generated and sent elsewhere to be encrypted with KEK. In this implementation, it may be a local KEK on disk, but longer term, there could be other backends and others. This also circumvents the KEK agreement bit because in 4 because they just need to agree on a single source which maintains the KEKs, rather than agree on the KEKs themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think doing it like this only really makes sense if you use the external store. I'll reword this one when I get a chance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like the ideal design to me as well. In the near term—if the initial implementation only supports storing KEKs as files on disk, is it correct we could:

a) make sure that the paths they're located at on disk (as defined in --encryption-provider-config=/path/to/config) is somewhere on our existing encrypted ramdisk? This is where we write secrets after we've fetched them from Vault, and;

b) automatically back them up as normal Vault secrets, and place them on new apiserver k8s hosts the way we place other secrets already?

These questions are probably a bit out of scope for a spec document, but I wanted to think through how we'd use this in a real world scenario.

one of the following:

* A local HSM implementation that retrieves the keys from the secure enclave
prior to reusing the AES-GCM implementation (initialization of keys only)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 1 with some sort of root key that unlocks all the disk keys?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think? this is saying there's only one layer of keys, and they are retrieved from the HSM. So it's like option 1 with the keys in the HSM instead of on disk. @philips wrote this originally I believe.


In future versions, storing a KEK off-host and off-loading encryption/decryption
of the DEK to AWS KMS, Google Cloud KMS, Hashicorp Vault etc. should be
possible. The decrypted DEK would be cached locally after boot.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the DEK forcibly cached in memory? Or do we want to put an external store in the serving path of a secret? (this likely also depends on the sensitivity of the workload, where it's hosted/ latency to external store, etc.). I would propose that if one of the DEK/KEK solutions is pursued, this should be a design option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the DEK is held in memory and we don't want to have to wait on decrypt by an external store for every secret. I think you're saying we should consider relying completely on the external store and not doing any caching?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we have caching be the default, but might want to make this an option (not high priority).

Copy link
Contributor

@philips philips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What implementation are we driving towards? It is unclear from this proposal. We have all of these alternatives and no recommendation.

In order to take an API driven approach for key rotation, new API objects will
be defined:

* Key Encryption Key (KEK) - key used to unlock the Data Encryption Key. Stored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an API object but won't be exposed over REST. It would be like a component config or kubeconfig.

AES-GCM of the value (the JSON or protobuf data) along with a set of
authenticated data to create the ciphertext, and then on decryption use the
nonce and the authenticated data to decode. The keys come from configuration on
the local disk (potentially decrypted at startup time using a stronger password
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keys coming from local disk seems like a limitation to the design. Wouldn't this be difficult to manage by administrators of the cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should take the talk of key management out of this section (holdover from the previous doc). The crypto implementation doesn't care where the keys come from. Trade offs in terms of management complexity are covered under the options presented in the keys section.

must follow a known structure and apply a specific algorithm.

The provider would take a set of keys and unique key identifiers from the
command line, with the key values stored on disk. One key is identified as the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this question on 454 also. Do you mind describing a little more by what it means "from the command line"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on which of the key management options we choose. For option 1 the key itself would be inside the config file. I reworded this to use a key retrieval interface that will be one of those options.

@destijl
Copy link
Member Author

destijl commented May 19, 2017

@stevesloka I don't see any. I saw your comment on the issue about HIPAA...?

@destijl
Copy link
Member Author

destijl commented May 19, 2017

@philips let me coordinate with @jcbsmpsn who is working on code and choose the option.

### Backwards Compatibility

Once a user encrypts any resource in etcd, they are locked to that Kubernetes
version and higher unless they choose to manually decrypt that resource in etcd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized - we probably need to at least think about how to disable encryption. That means we need a config way to have decrypter keys in rotation, but ensure the write key is "none"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should rather have a special none provider that has to be very explicitly configured, allowing "none" keys is generally dangerous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's in the implementation now.

@tiran
Copy link

tiran commented Jul 19, 2017

@edknapp Secretbox is XSalsa20 + Poly1305. XSalsa20 is not free from the nonce misuse problems. It merely uses a very large nonce. In combination with a strong, high quality random source (CSPRNG like getrandom() syscall), a nonce collision is unlikely to happen. [1]. In theory a 192bit nonce is safe for the remaining lifetime of our universe. In practice you have to be careful that your virtualization software injects good randomness or HW random generator into your VMs.

[1] https://download.libsodium.org/doc/advanced/xsalsa20.html

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 15, 2017
@smarterclayton
Copy link
Contributor

I forgot this hasn't merged. We need to freshen it and get it in.

@philips
Copy link
Contributor

philips commented Aug 31, 2017

Bump, we do need to get this merged!

@philips
Copy link
Contributor

philips commented Sep 1, 2017

Also, I still don't see the final recommendation of what is implemented in Kubernetes 1.8. Did that get added? Link to the line number? #607 (review)

@sakshamsharma
Copy link

@fejta fejta added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. keep-open labels Dec 15, 2017
@fejta fejta reopened this Dec 15, 2017
@kubernetes kubernetes deleted a comment from k8s-github-robot Dec 15, 2017
@liggitt
Copy link
Member

liggitt commented Dec 15, 2017

@destijl can we get this merged. would like to fold https://docs.google.com/document/d/1S_Wgn-psI0Z7SYGvp-83ePte5oUNMr4244uanGLYUmw/edit# into this doc

@k8s-github-robot k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 6, 2018
@destijl destijl force-pushed the db-secrets-encryption branch from 03d3b48 to 4353588 Compare February 8, 2019 23:47
@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Feb 8, 2019
Copy link
Member Author

@destijl destijl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I merged in a few minor changes and updated the branch, but I haven't done a full pass over the document and all the comments. It's been long enough that I think we should just merge this so @immutableT can freshen it up with what was actually implemented. I've lost all the state I originally had. I moved it under the auth directory, so now need a sig-auth lead to approve @liggitt.

@destijl destijl force-pushed the db-secrets-encryption branch from e7c6fb7 to 73c3c62 Compare February 9, 2019 00:08
@immutableT
Copy link

/retest

@liggitt
Copy link
Member

liggitt commented Feb 14, 2019

agree on merging to capture current state, and to provide a base for future KEPs to reference/extract from

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 14, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: destijl, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 14, 2019
@k8s-ci-robot k8s-ci-robot merged commit f1bbcf2 into kubernetes:master Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.