Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: automatic key rotation #52

Closed
coder/coder
#15066
@sreya

Description

@sreya

Problem

We have a few symmetric keys that we use for signing (and also sometimes encrypting) various payloads that don't ever get rotated after creation. We've already encountered some friction with our more security conscious customers concerning our External Provisioners and pre shared keys...and the only reason why we haven't had more pushback on our symmetric key usage is because they are simply unaware of what is happening under the hood.

We already have three features (workspace apps, peer reconnection tokens, and a key used to convert built-in users to oauth) that require key signing and it's possible we may introduce more in the future. We should take the initiative while the debt is somewhat low and implement a system for rotating these internal keys.

Proposal

We will implement a rotation schedule -- configurable by the user -- where keys will be rotated based on an expiration. We should start with a single value that dictates the schedule for all keys. Monthly will be the default. We will spawn a process on startup that checks on same cadence (every 10 minutes?) to see if any keys need to be rotated. If an active key is within 1 hour of its expiration we will create a new key and set it starts_at equivalent to the expiration of the old key.

Implementation Notes

  • Expiration is a computed value defined as starts_at + key_duration, where key_duration is a value provided at runtime by the user.
  • deletes_at will be populated when a new key is inserted for the feature. It is defined as starts_at from the newest key + token_duration + 1h.
  • We create new keys once existing keys are within an hour of their expiration so that we have plenty of time to propagate the new key to other services (aka workspace proxies).
  • Keys are valid for verifying if now() < deletes_at or deletes_at == NULL.
  • Keys should only be used for signing if starts_at <= now() < deletes_at.
  • When a key breaches its deletes_at we will set the secret field to NULL.

The following are the various token durations for our current signing keys:

  • WorkspaceApps: 1m
  • OAuth account conversion: 5m
  • Peer Reconnection: 24h

Schema Updates

Right now keys are part of the site_config. I propose that we migrate them into their own proper table. The table will be called keys with the following columns.

feature (text) sequence (integer) secret (text) starts_at (timestamptz) deletes_at (timestamptz)

Where the Primary Key is (feature, sequence).

The starts_at column is a bit strange, but since we will be creating keys an hour ahead of time we should avoid using the newer keys until they've been properly propagated.

Considerations

High Availability

The query to insert new keys needs to take HA deployments into consideration. As a result we will use the RepeatableRead isolation level along with some row locking.

Workspace Proxies

We will refetch keys by leveraging our existing RegisterWorkspaceProxyLoop. The loop runs every 15s by default so 1 hour is more than sufficient to ensure proper propagation.

Other Requirements

  • Part of the startup process should be checking to see if the new rotation schedule immediately invalidates any of the existing keys' expiration and handle it accordingly
  • All keys should be encrypted using dbcrypt
  • All key rotations should be audited
  • We should fix our use of multiple JWT libraries. I have no opinion on which library to select but we should come to some sort of conclusion.

Implementation

  • Add schema-related changes (db*, migrations, etc)
  • Implement coderd/keyrotate package
  • Update workspace proxies to be compatible with key updates
  • Centralize key-signing logic into coderd/keysigning package
  • Migrate keys to new crypto_keys table and implement remaining glue

Metadata

Metadata

Assignees

Labels

kiwiTasks being handled by the NETGRU team

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions