OAuth - OpenID Connect

Created: 2019-10-22 11:37:23 -0700 Modified: 2022-08-12 20:20:51 -0700

Basics

OpenID is for authentication and OAuth is for authorization. OpenID Connect (OIDC) is an authentication layer on top of OAuth 2.0.
Terms
- Identity provider: the service that will send metadata and a token identifying the user, e.g. Google, Facebook, Twitter, etc.
- Relying party (or RP): refers to the service that needs the identity (likely your site in this situation since you’re reading this note), not the one providing the identity.
- Single sign-on (or SSO): https://en.wikipedia.org/wiki/Single_sign-on
- User/client/server: in OAuth, people typically say “user” (or “end user”) to refer to the user, “client” to refer to the relying party, and “server” to talk about the identity provider. This is not the same as when most programmers say “client” to talk about the user and “server” to talk about the relying party.
- Authorization flow vs. implicit flow (AKA “authorization grant” or “implicit grant”) (reference): these flows are based on your particular needs from OAuth (namely whether you can securely store a CLIENT_SECRET and whether you want the final access token to reside on the server or client (reference)). See more at this auth0 page. I also wrote notes on the flows themselves farther down this page.
As a relying party, you need a client_id and a client_secret in order for a user of your service to identify which service they want a token for. This also allows the user to later revoke access to that service on their own via the identity provider.
- Each identity provider has different links to let you as an RP generate these pieces of information
  - Google
  - Facebook
  - Twitch (somewhere in there)
  - Discord
  - Apple (doc here with a 91-page PDF here and Okta summary here)
- The wording is a bit unusual, but I believe that if you support any SSO provider, you need to support Apple (reference).

Flows

(note: for this section, “server” typically refers to your server as a relying party, e.g. example.com)

Authorization flow (reference)

This flow uses the CLIENT_SECRET given to you, the relying party, by the identity provider. As such, you can only use it when your app is server-side (so that you don’t leak the CLIENT_SECRET to end users and allow them to act as you).

The end user navigates to something like example.com/login (where example.com is your site as a relying party).
This link will get the user to the identity provider’s login page, either via redirection or a pop-up. The mechanics of redirection vs. pop-ups don’t really matter too; there just needs to be a way to fetch the result of the login page.
1. The user logs in to the identity provider. This likely involves saying “I’m okay with example.com having access to these privileges”.
2. The identity provider responds with a code to the user.
The user passes the code (along with some CSRF protection, “state”, if the server doesn’t support PKCE) back to your server, e.g. at example.com/getToken.
1. If you have access to my Firebase course notes, then check out my note called “Connecting Twitch OpenID to Firebase auth” for extensive information about this “state” parameter.
Your server passes the code, CLIENT_ID, and CLIENT_SECRET to the identity provider.
The identity provider returns an access token (and a refresh token) to your server. Your server can use this to act on behalf of the user based on the scopes that were provided.
At this point, since your server has their access and refresh tokens, it can act on behalf of the user, albeit limited by the scopes that the user agreed to (e.g. on Twitter, maybe those tokens can tweet for you, but maybe they can’t delete your account). However, the user themselves wouldn’t have gotten any authentication information yet in this flow. If you want that, e.g. so that they can authenticate to you as a relying party using this third-party identity provider, then you would typically send your own application’s form of authentication that can be linked to the identity provider’s information (as opposed to just sending the user their own access token). For example, this can be a JWT that simply signs a JSON blob that looks like this: { userId: ‘abcd-efgh-ijkl’ } (and note that you would use JSON Web Key Sets to validate that the token was signed properly, that way you’re not subject to a MitM altering the token). Then, on your server, you would look up the user’s information based on that ID. Due to the fact that it would be a JWT (and thus signed), the user couldn’t tamper with it even though they could read it.

The important difference between this and the implicit flow is that the access token resides on your server, so the user could be compromised and wouldn’t necessarily have also compromised their access token (reference).

Implicit flow (reference)

This is for when you want clients to directly be able to get an access token from an identity provider without the use of an intermediate server at all. This is good for when you can’t store the CLIENT_SECRET in a way that’s not accessible by the client. However, the access token will be on the client after all of that, so it could result in the client’s third-party identity being compromised (reference).

Connecting identities across multiple services (AKA “correlating users”)

(note: “correlating users” is just my own terminology)

Picture this scenario:

You’re a relying party (RP) and you want to allow, say, 5 identity providers: Google, Facebook, Twitter, GitHub, and Microsoft.
User X has an account with Google under the email userX@example.com.
They sign up for your site using Google as an identity provider. You create an account for them and they’re good to go.
Later, they want to sign in to their account, but maybe they don’t remember that they used Google originally. They also have an account with GitHub, and that account also has the email address userX@example.com. They click “sign in with GitHub” to try to sign in.

At this point, you could see that the email addresses on both accounts match and correlate that the two identities must be the same user, but there are two problems:

You don’t know for sure that they’re the same identity. I.e. it could be that GitHub doesn’t actually verify email addresses, meaning an Evil User could have signed up on GitHub using User X’s email address.
Even if you know the Twitter and GitHub accounts represent the same identity, you don’t know that User X is the one to have triggered this sign in. It could be that Evil User compromised User X’s GitHub account. In that case, Evil User could just be signing in on their own computer with the compromised account, and they shouldn’t get access to User X’s entire account on your site as a result.

The solution for both is to verify that the owner of the GitHub account also has access to any of the identities providers on the account already. For example, in this case, User X would have to re-verify their Twitter account, that way it proves that whoever this is, they have access to both the GitHub and Twitter identities.

There’s still a small issue here if an identity provider doesn’t verify email addresses: data leakage. Suppose GitHub doesn’t verify email addresses, so Evil User signs up as userX@example.com. Now, Evil User signs into your site, and it detects that User X already has three identities on this site: Twitter, Facebook, and Microsoft. Evil User potentially learned information about User X.

Honestly, I think the solution to this is just to ensure that your identity providers require email verification. Some sites like GitHub outright tell you that they do, but for other ones, just try to authenticate without having verified your email to see if it’s possible.

Finally, one last thing to note is that a user may have a different email address for each identity that they provide (or even something like “user+twitter@example.com” and “user+google@example.com”), so you can’t just associate a single email address to each user. Users can also change their email addresses. Some services (like Twitter) don’t even necessarily supply an email address in all cases, so you may want to have a fall-back or just outright deny those users.