-
Notifications
You must be signed in to change notification settings - Fork 1.5k
WIP: Add openid connect token verification (feedback welcomed) #2078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add openid connect token verification (feedback welcomed) #2078
Conversation
Adding in a new command, io.openid.verify, which accepts two parameters (a string as a token to be verified, and an array of strings as the list of trusted IdP's). It returns an array composed of two elements, boolean indicated if it is verified against one of the trusted IdP's from the list, and the payload of the token as a the parsed JSON.
(go mod vendor) (Note the protobuf was here, but not used)
patrick-east
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, its looking pretty good! I think generally for the signature and built-in implementation things are pretty close. Just have some comments on the API signatures and caching. For the actual verification I added some comments WRT using go-oidc vs what is already available in OPA.
| var token *string | ||
| if token, err = getString(a); err != nil { | ||
| return | ||
| } | ||
|
|
||
| // Parse the trusted issuers. | ||
| var trustedIssuers []*string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do token and trustedIssuers need to be pointers to strings (*string)? It looks like they can just be normal string types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct 100%. They could be normal string types. It's safer to use string directly, to eliminate the possibility of nil dereference. Though I was thinking it may be faster to use pointers (to reduce the number of string copies).
|
|
||
| // May be used to manage an token verification against an entire collection | ||
| // of trusted IdP's. | ||
| type TrustedIdProviderManager interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to consider moving this code into its own internal package over in /internal/ like /internal/oidc or something. I think there is enough of this that is generic (ie, not builtin or topdown specific) that we can split it out into a reusable internal package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good suggestion, many thx
| // Private struct to implement interface, TrustedIdProviderManager. | ||
| // We use a sync map to safely manage a collection of trusted | ||
| // issuers and their verifiers. | ||
| type TrustedIdProviderManagerImpl struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep it as a private implementation we should rename to trustedIdProviderManagerImpl to avoid exporting it from the package.
I think too that you could even just name it like oidcIDPManager (gist being drop impl and instead say what implementation it is, in that example oidcIDP versus some like mockTrustedIdProviderManager or whatever else we end up with)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, originally trustedIdProviderManagerImpl was actually private. Then I tried to write some tests for it, and, it became difficult, so I opted for a public struct. That comment needs to be corrected.
I like the suggestion on a simpler and more descriptive name for it, oidcIDPs
| func CreateOrGetVerifier(idp *string) (*oidc.IDTokenVerifier, error) { | ||
|
|
||
| // If we already have a verifier for this issuer, use it. | ||
| if loadedVerifier, ok := globalTrustedIdProviderManager.trustedVerifiers.Load(*idp); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever need to be worried about the global cache of these becoming stale or needing to be invalidated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question, though they'll never be stale or invalid.
The state being saved in those objects are the public keys of the IdP's (we only want to request the IdP's keys 1x, for repeated verifications). I double-checked the RFC for token exchange just to be sure, but it appears that the public keys never expire. Manually looking at a few example keys, no TTL's or EXP fields seem to corroborate this:
(our test idp)
https://dev-530681.oktapreview.com/oauth2/v1/keys
(google's prod keys)
https://www.googleapis.com/oauth2/v3/certs
| } | ||
|
|
||
| // Implements full JWT decoding, validation and verification. | ||
| func builtinOpenIdConnectTokenVerifyAndParse(a ast.Value, b ast.Value) (v ast.Value, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not need it right away, but it might be good to switch to the newer builtin signature https://github.com/open-policy-agent/opa/blob/master/topdown/builtins.go#L50 so that we have access to the builtin context and the cache provided there.
The advantage is that if for whatever reason someone had a policy that required re-evaluating the builtin function we could (for each query evaluation) avoid re-computing anything and return the same value immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh very nice. A decision caching layer. Thanks for the heads up on this.
Though, one followup question. How long does the context cache last? I ask because token validity is directly dependent on time, so a re-eval of the same inputs may result in a different decision. due to a token's validity period. But if it's only for the same request, I would venture using the caching layer is far more a benefit than a risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be on a per-query evaluation basis, so like using the OPA server that roughly means per-request to OPA.
| } | ||
|
|
||
| // Create a new issuer verifier, save it, and use it. | ||
| ctx, _ := context.WithTimeout(context.Background(), globalTrustedIssuerContextTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to expose the timeout to the caller of the builtin function since, as I understand it, this will go off and make a network request. Signature-wise we could maybe just add an options object or something as the third parameter, or maybe change the second parameter to be an object with one of the keys being for the list of IdP's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for the comment. You are correct, there are (possible) outbound network calls happening to fetch the IdP's public keys. About the user-facing option, it would be nice to not be forced to pick some reasonable timeout, but maybe there's something we can do here.
2 thoughts come to mind:
(1) This can't be the only outbound network call OPA is makes. Sort of makes me wonder what the other timeout's might already be, and, maybe just borrow that timeout value? (to keep the interface simple)
(2) Maybe there's a config that should hold the details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For evaluation-time builtins its the second one. Only http.send does right now and it doesn't have a specified timeout .. and there is a bug open to fix that 😆
I think the best option is to just provide an options object as the third parameter. While we could add something to the opa config file that is only specific to the OPA server/repl. Plus eventually someone is going to want it to be dynamic with the policy.. so we might as well just start there.
| typedVerifier := verifier.(*oidc.IDTokenVerifier) | ||
| ctx, cancel := context.WithTimeout(context.TODO(), globalTrustedIssuerContextTimeout) | ||
| defer cancel() // releases resources if slowOperation completes before timeout elapses | ||
| verifiedToken, err := typedVerifier.Verify(ctx, *token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WRT to the verification steps. I think we actually have the majority of the code required to do this over in
Line 739 in f786006
| func builtinJWTDecodeVerify(a ast.Value, b ast.Value) (v ast.Value, err error) { |
If we refactor some of that verification code to be re-usable we are pretty much only left with the bits of the go-idc library that fetch the config like in https://github.com/coreos/go-oidc/blob/v2/oidc.go#L113-L158 and the remote key fetching over in https://github.com/coreos/go-oidc/blob/8d771559cf6e5111c9b9159810d0e4538e7cdc82/jwks.go#L26-L38
I think the tradeoff then is whether or not we want to re-implement those bits, pull just that part of the code in to our internal jwx package, or use go-idc. Looking at the list of dependencies that go-idc comes with I'm concerned for library users of OPA having conflicts with some of them... That being said having testify might be nice 😄 I guess if we do use go-oidc we probably want to standardize on the implementations for these rather than have two similar but not quite the same JWT/JWS/JWKS/etc implementations available.
@tsandall thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, with all the love in the world, I disagree that OPA users may find conflicts with 3rd party library that implements OpenID Connect.
Quite the opposite. OPA is relatively new, and, so is Styra. The only way to really gain a sense of confidence and credibility in the market is to not re-implementing things that have well-defined and adopted libraries. Also, time-to-market matters a great deal here, too.
But let's step back and look at the big picture here, we've got these other tickets,
#1925
and,
#1205
all of these requests are partial and incomplete fragments of good developers trying to give themselves tools so that they can do the right thing: verify tokens. OpenID Connect is the standard for how to do that, and it's basically compatible with all JWT tokens. Several RFC's are written on both Token Exchange and well-known endpoints to provide for a bulletproof protocol that covers all edge cases with well-defined behavior. Because of this, it is incorrect to only verify signatures of incoming JWT tokens against a hard-coded URL containing JWKs, because, that's not how OpenId Connect works.
Personally, I humbly admit that it is hard and time-consuming to get all that right, and, that it would be far more in the spirit of our open-source community to yield to those more knowledgeable and versed than myself, who already published the client library for OpenID Connect for everyone.
You also point out that conformity is critical. I agree 100%. But from my humble outsider perspective, the little JWT processing snippets that are already in OPA may be the ones that diverge from a standard implementation?
Just sharing open and honest feedback and customer perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since feedback is welcomed, I'll try to provide my 2 cents here :)
OpenID Connect is not a standard for verifying JWT tokens - it is an identity layer on top of OAuth2, which adds among many things the concept of ID tokens. That naturally includes means to verify tokens of that type, but not more. #1205 that you reference above is a good example of OIDC and ID tokens by specification not being involved, precisely because there is no identity (human) involved in the token retrieval and hence won't be part of the token claims either. Some of the parts of OIDC not strictly about identity - such as presenting server capabilities at a well known metadata endpoint, public key components at a JWKS endpoint, and so on.. have been proven good enough to be "backported" to OAuth2 in the form of additional extension specs, as I commented on here: #2057 (comment)
It would be a shame if all the great work you've been doing here could not be used to verify any type of JWT given a known issuer and a metadata/JWKS endpoint, be it ID tokens or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thx for taking the time to share.
Sounds like we may have different values driving us: I am not interested in verifying arbitrary JWTs (JWTs that do not represent a soon-to-be-authenticated identity), and I was wrong to assume everyone shared that value. My bad.
Just to share the context behind this PR, we were initially looking at OPA to help solve the policy-based access control. But even before we can make any policy-based authorization decisions ("Are you allowed to do X?"), we need the identity from the authentication ("Who are you?").
At a very high level, we have some choices on where to AuthN incoming requests: ingress controllers, the services themselves, or maybe inside OPA?
Right now, authentication occurs in our services (so for N services, we have N different authentication implementations, very difficult-to-audit or manage for SOC2 compliance). Unfortunately, our ingress controllers are not uniformly deployed either (we have services in Kubernetes, AWS, Azure, Google, and dedicated data centers, all with different ingress controllers managing inbound requests).
We're trying to build foundational infrastructure that is easy for engineers to contribute securely (meaning, if we could perform both AuthN and AuthZ inside OPA, that would be an amazing value proposition, not only for the security team, audit + compliance but also for the entire eng org to not have to write their own AuthN layers in all the services as is done today).
So, I started looking into how to actually perform some AuthN checks alongside our policy-based AuthZ checks inside OPA.
With my understanding of how OPA works, today, we would either need to hard-code actual IdP keys (or paths to JWKs if we go with something more like #1205 ), which does not really work for us, because, our IdPs keys are extremely short-lived. Hence, the idea for a new primitive in OPA to just perform an OpenID Connect compliant token verification, against a trusted list of identity issuers.
This way, all services' AuthN verifications could be covered by a single primitive in OPA. Perfect. Then later, we wanted to use the Styra platform to manage the change (SOC2 is coming) and visualize not just the AuthZ decisions, but also, the AuthN decisions too - increasing the value prop to use Styratoo cough cough @tim-styra @marco-styra
There is a really great security benefit here too: OPA itself no longer has to trust that something somewhere in our complex ecosystems of services has correctly performed the required AuthN check.
If anyone has suggestions on a more pragmatic approach, it would be most appreciative. [email protected]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing some background, Jon! Having worked with Oauth2 and OpenID Connect for the last 5 years or so I know there's certainly nuances to each implementation, and how everyone values different things in these standards.
Our use cases are actually fairly similar as we are using centrally issued JWT's as the primary way of identifying users and clients alike inside our microservice environment. We just prefer to use JWTs as access tokens rather than ID tokens to be able to treat requests in a uniform manner regardless of whether a human or a client/service/cronjob/etc was behind it. Our access token JWTs actually carry some identity in them too, but not more than what a normal service is expected to need to know - they can always request more info on an as-needed basis from the userinfo endpoint (which may return an ID token) should they need it. Most don't.
The point I was trying to make was, that if what you use to verify an ID token JWT is pretty much identical to what is needed to verify another type of JWT given the same requirements (like an iss claim matching a pre-configured issuer), then maybe there's a win in catering to both. Would probably be easy to add later on though as long as things are kept fairly generic when possible. Not gonna deny I'm trying to hitch a free ride here ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @jonmclachlanatpurestorage and @anderseknert! I think its good that we hash some of this out.
Let me try and add in my thoughts on a few of these points:
The only way to really gain a sense of confidence and credibility in the market is to not re-implementing things that have well-defined and adopted libraries. Also, time-to-market matters a great deal here, too.
I think the time-to-market is one thing, but maintainability of OPA is also crucial. Adding in these dependencies makes OPA be at the mercy of say Square's JOSE implementation and CoreOS RedHat IBM's(?) OIDC client. If (when) there are bugs or security issues in them then OPA end-users will have to wait for a bug fix in those libraries, or we have to go push changes to them (hoping they are still maintained). Having the code in OPA, especially if it isn't very much (which is the case for these, relatively speaking), means faster and more responsive fixes for OPA users.
Giving some context from the OPA maintainer team, there is also some hesitation for this change because it is pushing the boundaries for what built-in functions in OPA have be responsible for historically. This introduces a significantly more complex workflow with remote calls than any of the other standard ones provide. The error conditions and handling alone are significantly higher than something like the string helpers. So.. level setting, this isn't a change that is going to be quickly merged regardless of whether we use the go-oidc library or implement it ourselves.
To clarify WRT:
But today, getting oidc quickly is a little more important to me than getting more general features.
There are options to help achieve that goal which do not require making changes to upstream OPA and you can be unblocked ASAP. Merging feature enhancements into OPA which are (like in this case) somewhat controversial will take some time. We can't rush some of this stuff to unblock a smaller subset of users and change it later... these builtins are going to be around for a very long time so we need to be careful about what all is added. It may be a different story for something more simple like a math or string helper, but as mentioned earlier this is a significantly more complex addition to the Rego language.
On the assumption everyone is ok to move forward with making an incremental change on what we already have in OPA (/me looks at @tsandal). I tend to think that there isn't a great reason why we can't support both of the use-cases being discussed.. Maybe I'm missing something, but in essence there are a few high level actions that need to be done:
- Hit some well known OIDC config URI and parse the response
- Use the parsed response to get a JWKS URI
- Fetch some JWKS as needed
- Use the JWKS to decode the JWT
- Validate the JWT contents using data gathered in previous steps
For (1) and (2) the http.send builtin should work just fine (maybe this is where some helper Rego library to abstract some of this comes in). Then we could extend the existing io.jwt.* helpers to support remote JWKS URI's instead of the actual cert so they'll automatically take care of (3). Then its just a matter of using io.jwt.decode_verify for (4) and (5).. which maybe needs some tweaks to the API/new version of it so a user can specify the right constraints to validate the token for OIDC stuff (and again, for UX simplification a Rego library could help with this). Please correct me if I'm missing some piece of this workflow, but looking through the go-oidc implementation it seems like there isn't anything special beyond just criteria for what fields need to validated.. AFAIK that was the intent behind the constraints passed into io.jwt.decode_verify so if it isn't sufficient then we should, IMO, fix that.
@anderseknert Something like that should work for you too, right?
All that being said I see a handful of options going forward with this stuff:
- Implement the incremental changes to the jwx helper stuff already in OPA.
- Optionally make some helper Rego functions for developer/user UX improvement.
- Implement the required functionality all in Rego using the
http.send, andio.jwt.*builtins (maybe not as crazy as it sounds) - For the sake of unblocking @jonmclachlanatpurestorage quickly there is always the option for you to keep this code using go-oidc separate as a plugin and just build a custom opa entrypoint as shown in https://www.openpolicyagent.org/docs/latest/extensions/#custom-plugins-for-opa-daemon Its pretty maintainable if you use that model where OPA is just a golang dependency versus a fork to make it easier to update OPA versions. The main downside being that isn't available for everyone to use (unless it gets shared publicly). One advantage for this is that if its successful, stable, and people are using it... it would be good evidence that upstream OPA should maybe just do the same thing and use the library.
That isn't a comprehensive list.. just the ways I could see this moving forward with the least resistance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have much time for a proper reply right now, but I think that's both a great summary of the current state as well as a great strategy going forward @patrick-east 👍
I like the incremental improvement approach to the current tools as it could eventually include many sub-types of JWTs. What's most important to us in those 5 steps you outlined is that any remote calls is cacheable for a potentially very long time. The whole point for us using JWTs in the first place is to avoid having an "online" dependency to the identity platform as that would take away the main benefit of a distributed auth model entirely. I know I have seen a few tickets around caching, so just having those in place would be a great start for incremental improvements on possible JWT verification flows IMO.
| return nil, err | ||
| } | ||
| req.Header.Add("content-type", "application/json") | ||
| res, err := http.DefaultClient.Do(req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try and avoid making any actual HTTP requests in the unit tests. We've set the bar that they should work in an offline environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Do we have an integration test suite somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really have one for OPA right now that would be a good spot to use external servers. What would normally happen is the tests would use the http test server https://golang.org/pkg/net/http/httptest/ and you would essentially feed mocked responses back to the test client with a handler on the test server.
|
Hey @jonmclachlanatpurestorage, just wanted to check if there is any progress still on this approach and/or if https://github.com/jonmclachlanatpurestorage/opa-oidc-plugin is working alright? (dropping the link in here for folks that might be wanting this feature in the short term 😃). |
|
Going to go ahead and close this for now. Feel free to re-open when/if we're ready to resume progress on it. |
WORK IN PROGRESS
Early feedback welcomed.
This PR is a potential approach for #2057
Summary:
io.openid.verifyto verify and parse on successful verification an id token.