KEMBAR78
crates.io token scopes by pietroalbini · Pull Request #2947 · rust-lang/rfcs · GitHub
Skip to content

Conversation

@pietroalbini
Copy link
Member

@pietroalbini pietroalbini commented Jun 24, 2020

This RFC proposes implementing scopes for crates.io tokens, allowing users to choose which endpoints the token is allowed to call and which crates it's allowed to affect.

Rendered

@pietroalbini pietroalbini added A-security Security related proposals & ideas T-crates-io Relevant to the crates.io team, which will review and decide on the RFC. labels Jun 24, 2020
Co-authored-by: XAMPPRocky <4464295+XAMPPRocky@users.noreply.github.com>
@kornelski
Copy link
Contributor

kornelski commented Jun 24, 2020

For @rust-bus I would love to have ability to add a co-owner of a crate who does not have permission to kick out other owners. This is how crates-io currently makes github team accounts behave, but teams cannot be invited (and conversely it would also make sense for a github team to be able to fully own a crate with all permissions).

Tying change-owners permission tied only to a token is insufficient for the @rust-bus-owner case, because people co-own crates using accounts, not tokens.

@pietroalbini
Copy link
Member Author

For @rust-bus I would love to have ability to add a co-owner of a crate who does not have permission to kick out other owners. This is how crates-io currently makes github team accounts behave, but teams cannot be invited (and conversely it would also make sense for a github team to be able to fully own a crate with all permissions).

Tying change-owners permission tied only to a token is insufficient for the @rust-bus-owner case, because people co-own crates using accounts, not tokens.

@kornelski I feel like user permissions are out of scope for this RFC: my goal here is mostly to improve the security for automated processes or CI environments.

I don't think there will be conflics between token scopes and user roles/permissions, as token scopes are an additional restriction on top of the normal authorization process. With this RFC I will be able to create a token with the crates scope of serde, but that doesn't mean I will be able to actually publish that crate unless I'm invited as an owner.

@kornelski
Copy link
Contributor

kornelski commented Jun 24, 2020

For CI publishing, adding limits to tokens reduces damage they can do, but I still don't feel safe about using them in CI, because scopes doesn't do anything to prevent tokens from being stolen and misused within their scope.

There's no downside to having scopes as an option, but I feel they're not enough for making CI secure. If you were to add only one thing, I'd prefer a second factor auth for publishing, rather than scopes.

For example, I'd like ability to only upload a new package from CI, which would not be published until I confirm that in some more secure way. It could be as simple as sending an e-mail to crate owners with a confirmation link that has to be clicked to make the new version public. That would give a chance to prevent damage from stolen token (not just limit scope of the damage), and make misuse visible.

@joshtriplett
Copy link
Member

Should there be separate scopes for publishing a new crate and a version of an existing crate, instead of the single publish scope?

Yes, please. I'd like to have that separation, as a self-protection measure.

@pietroalbini pietroalbini force-pushed the crates-io-token-scopes branch from c88187e to 905a789 Compare June 25, 2020 08:25
@pietroalbini
Copy link
Member Author

@lu-zero @joshtriplett sounds good, changed the text to split publish in publish-new and publish-existing.

@smarnach
Copy link

I second the sentiment that we will need a permissions model for crate owners as well in the long run. While I appreciate that this model is out of scope for this RFC, I think it would be worthwhile to think about how the token scopes can be generalized to user permissions in the long run, to avoid having two different models for closely related purposes, and to reduce the overall implementation effort and codebase complexity in crates.io. I don't see any problem with the approach in this regard, so I'm not asking for any specific changes.

It could be as simple as sending an e-mail to crate owners with a confirmation link that has to be clicked to make the new version public.

This would definitely be great to have. I think it could be implemented as an extension to the scopes model in this RFC, by simply adding a boolean flag that any action taken via this token requires email confirmation. Maybe we leave this as a follow-up as well, rather than including it in this RFC?

@pietroalbini
Copy link
Member Author

@kornelski

For CI publishing, adding limits to tokens reduces damage they can do, but I still don't feel safe about using them in CI, because scopes doesn't do anything to prevent tokens from being stolen and misused within their scope.

There's no downside to having scopes as an option, but I feel they're not enough for making CI secure. If you were to add only one thing, I'd prefer a second factor auth for publishing, rather than scopes.

Of course scopes aren't the only thing that will improve the security of the crates.io ecosystem, but they will considerably improve the security for large projects and organizations, and I think are worth pursuing in the near term. For example, in the rust-lang org we have a bunch of repositories and crates we might want to setup publish automation for, but to get a token we have to either:

  • Create a GitHub account for each token we need to create, and add that GitHub account as owner. This has the downsides of allowing a compromise to transfer the crate away (instead of just publishing bad versions we can yank), and you might run into GitHub's abuse detection systems if you create a bunch of accounts.
  • Use the token of a project member's personal account, which will become a problem when the person leaves the project and extends a possible compromise to crates unrelated to the org.

With token scopes we could have a central bot account owner of all the rust-lang crates, who also owns all the tokens used around the organization. This would reduce maintenance and make security response easier in case of a breach (as we don't have to chase who owns which token).

For example, I'd like ability to only upload a new package from CI, which would not be published until I confirm that in some more secure way. It could be as simple as sending an e-mail to crate owners with a confirmation link that has to be clicked to make the new version public. That would give a chance to prevent damage from stolen token (not just limit scope of the damage), and make misuse visible.

There is already work underway to send email notifications every time a new version is published: the backend and frontend is done, and the only thing left is actually sending the emails. While this is not exactly what you propose, it would still allow crate authors to notice misuses and revoke the tokens / yank the crates.

Second factor for publishes might be implemented on top of that feature, and I don't think it's worth blocking this RFC on it.

* **`^`** is added at the start of the pattern, and **`$`** is added at the end of it.
* **`,`** is desugared into `|`, separating multiple patterns.
* **`*`** is desugared into `.+`, matching one or more characters greedily.
* All other characters are quoted to prevent them from having a special meaning.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to allow only characters that are allowed in crate names in addition to , and *. Since none of the characters in crate names have a special meaning in regular expressions, and , and * are not allowed in crate names, this would remove any ambiguities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I'm not sure changing this matters that much though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is more of an implementation commment – what I suggested is easier to translate into code, but it doesn't really matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the example below, I think "All other characters" should be changed to "All other non-alphanumeric characters". In practice, I believe - and _ are the only characters that matter right now (and _ matches a literal '_' character anyway). I think it makes sense to reject a crate scope if it contains unexpected characters, but that's an implementation detail.

[unresolved-questions]: #unresolved-questions

* Are there more scopes that would be useful to implement from the start?
* Should crate scopes be allowed on tokens with the legacy endpoint scope?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, legacy scopes should disappear if possible. I know it's not possible, but let's not add new features to them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think it's good to artificially restrict features, but I personally don't care much about it either way.

Copy link
Member

@jtgeibel jtgeibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've provided some comments inline. The overall theme is some changes to how we handle legacy endpoints. I've gone through all the authenticated endpoints and have some notes about the existing behavior and suggestions on what we should change. However, I think that discussion is orthogonal to this RFC, in that making existing endpoints cookie-only isn't a breaking change (per this RFC alone) unless the endpoint has been declared to be covered by a scope. Decisions we make about access to legacy endpoints (now or in the future) aren't and shouldn't be covered by this RFC.

To avoid distracting this RFC with details about legacy endpoints, I plan to open a PR with my proposed changes and the specifics can be discussed there. My proposed approach is that:

  • We independently discuss removing all legacy token access. It is unlikely that there are any existing users and token access to many of these endpoints appears to be accidental. We should structure our internal API to favor cookie authentication, with an explicit scope check required when allowing token access.
  • Meanwhile, the references to legacy in this RFC can be simplified. We add an opt-in legacy scope, with the intention that during the implementation stage we shrink this scope by either defining new scopes or removing the access altogether. The goal is that the concept of legacy goes away before we merge the implementation. If it doesn't go away entirely, then this scope is exempt from the backwards comparability guarantee. We can remove endpoints from the legacy scope at any time.


Tokens created before the implementation of this RFC won't have a crates scope,
and it will be possible to use a crates scope in a token with the legacy
endpoint scope.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I follow this correctly, I think it may be clearer to phrase this as:

Tokens created before the implementation of this RFC will default to an empty crate scope filter (equivalent to no restrictions). It will be possible to change the crate scope for existing tokens, allowing a crate scope to be added even to legacy tokens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence is spot on! I'm not so sure if I want to commit to having a "edit token" functionality in the RFC though.

* **`^`** is added at the start of the pattern, and **`$`** is added at the end of it.
* **`,`** is desugared into `|`, separating multiple patterns.
* **`*`** is desugared into `.+`, matching one or more characters greedily.
* All other characters are quoted to prevent them from having a special meaning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the example below, I think "All other characters" should be changed to "All other non-alphanumeric characters". In practice, I believe - and _ are the only characters that matter right now (and _ matches a literal '_' character anyway). I think it makes sense to reject a crate scope if it contains unexpected characters, but that's an implementation detail.

There will also be an option to opt out of endpoint scopes and retain the
permission model implemented before this RFC (called "legacy"), which allows
access to all (documented and undocumented) crates.io API endpoints except for
adding new tokens.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose that this be replaced by a legacy scope.

* **publish-update**: allows publishing a new version for existing crates the
user owns
* **yank**: allows yanking and unyanking existing versions of the user's crates
* **change-owners**: allows inviting new owners or removing existing owners
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose an additional legacy scope. Removing an endpoint from this scope is not considered a breaking change. During the implementation stage, endpoints can be promoted to their own scope or make cookie-only. Ideally, this scope goes away entirely before landing in production.

legacy permission model.

Tokens created before the implementation of this RFC will use the legacy
permission model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the deployment, if the legacy scope remains, then I think we should migrate existing tokens to only include the default scopes required by cargo. If there are any token-based users of legacy endpoints, then I think they should have to explicitly opt-in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this would break everyone using the token to authenticate to endpoints not used by Cargo. Even though we're not providing stability guarantees for them I'd be wary of blindly breaking them. At least I'd like some stats on which endpoints are accessed with the cookie and which are used with tokens.

@jsha
Copy link

jsha commented Jul 12, 2020

Great proposal. I think defining token scopes more tightly will be really valuable. A couple of thoughts:

It seems like the main use case is defining low-privilege tokens for use in CI. It seems like those tokens will only want "publish-update." Can the proposal be simplified into two scopes: "publish-update" and "manage" (which would include what's currently called "publish-new", "yank", and "change-owners")? It's much easier to add more granular scopes later than it is to remove scopes if they turn out to be non-useful.

Instead of treating the regex-ish minilanguage as part of the security properties, I would recommend modelling the relationship between token and crates explicitly - for instance with a 1:many table of token IDs to crate IDs. Using regex to enforce security properties is often a source of bugs, and so are ad-hoc syntaxes. I think it's possible to achieve similar ergonomics wins by implementing a pattern matcher in the JS frontend, and using that to quickly select groups of crates. The disadvantage would be that a user who creates a new crate after creating their scope-limited token would need to explicitly add that crate to the token; but I think that is actually a bit of an advantage, in that it makes the permission explicit. And if it turns out to be highly burdensome for users, it's possible to add automation after the fact that achieves the same goal.

@pietroalbini
Copy link
Member Author

It seems like the main use case is defining low-privilege tokens for use in CI. It seems like those tokens will only want "publish-update." Can the proposal be simplified into two scopes: "publish-update" and "manage" (which would include what's currently called "publish-new", "yank", and "change-owners")? It's much easier to add more granular scopes later than it is to remove scopes if they turn out to be non-useful.

I think publish-new is also something that would be useful in CI, especially for big projects or monorepos, otherwise when a new crate is added a person would have to publish it manually. As others pointed out it's useful to split publish-new and publish-update as not everyone wants this feature, but I can see it being used (examples of such projects are rustc-auto-publish which will need to publish new crates when they're added to rustc, or rusoto which contains more than 100 auto-generated crates).

I agree with you that yank and change-owners are way less likely to be used on CI, and I could see them being merged into a manage scope. I'm wondering though if from a security point of view it makes sense to merge them: yanking is a reversible action and it can be easily rolled back if a malicious actor took control over the token, but changing ownership will led to you losing control over the crate if someone abused the token.

I could see myself using a token without the change-owners scope on my workstation, and generating-then-revoking a temporary one if I happened to transfer a crate.


Instead of treating the regex-ish minilanguage as part of the security properties, I would recommend modelling the relationship between token and crates explicitly - for instance with a 1:many table of token IDs to crate IDs. Using regex to enforce security properties is often a source of bugs, and so are ad-hoc syntaxes. I think it's possible to achieve similar ergonomics wins by implementing a pattern matcher in the JS frontend, and using that to quickly select groups of crates. The disadvantage would be that a user who creates a new crate after creating their scope-limited token would need to explicitly add that crate to the token; but I think that is actually a bit of an advantage, in that it makes the permission explicit. And if it turns out to be highly burdensome for users, it's possible to add automation after the fact that achieves the same goal.

Yep, that was the other possible implementation I thought of for this. Your proposal makes complete sense, but it's indeed burdersome for large projects. The thing that prompted me to write this RFC was creating a token to allow the rust-lang/chalk repo to publish their multiple crates from CI: being able to just allow chalk-* would be way better than having the Chalk developers ping someone on infra every time they add a new crate.

I'm wondering if a middle ground could be to still have users input the pattern, but also have a off-by-default "Allow future crates matching this pattern" checkbox. If the box isn't checked the backend will resolve the pattern and replace it with the list of matched crates without any wildcard. For example, chalk-* would result in the following pattern if the box was not checked:

chalk-derive,chalk-engine,chalk-ir,chalk-recursive,chalk-solve

@jsha
Copy link

jsha commented Jul 19, 2020

Thinking about this some more, it feels like this can be simplified further. The proposal aims to express two different, but related, concepts:

  1. Authorization for a specific subset of crates, and
  2. Authorization for a specific subset of actions.

If we can split up those concepts, I believe we'll wind up with a solution that is simpler and more orthogonal.

How about using teams to express (1)?

Proposal: Members of a team should be able to create a "team API token" that is authorized for the set of crates that team is authorized for. Additionally, for (2), all tokens (user and team) should have a scope that allows "manage" or "publish" access, but tokens are not scoped by crate name - that role would be played by teams.

This has a few advantages. First, crates already has the notion of grouping packages by authorization (teams), so we wouldn't need to develop a parallel system for managing tokens. Also, the system for teams allows adding and removing crates from a team's ownership, which solves the "allow future crates" issue we've been discussing. As crates are added to a team, that team's token would automatically have authorization for them.

Second, this allows management of the tokens as a team. If there is a compromise of a token in CI, anyone on the team could revoke the team token, rather than having to wake up whoever created the CI token on their individual user.

Third, this makes it easier to avoid a potentially subtle CI security problem. In both GitHub and Travis CI, anyone with write access to the repo can read out the secrets. For a single repo, that's no big deal. Someone with write access could just use that access to publish a malicious version anyhow. However, if crates A and B are part of the same project but are developed in separate repos, these can be out of sync. Imagine that crates A and B are using the same API token to publish from CI. Someone who has write access on crates A's repo can extract the token from CI and use it to publish new versions of crate B. Under my proposal, we can give straightforward advice to avoid this problem: "When using a team token to publish from CI, make sure only that team has write access to the relevant repos."

Some currently-existing teams might not express the exact set of crate ownerships that a given group of crates wants, but team creation is free.

How do this work for an individual user who wants to publish from CI, and also silo off groups of their personally-maintained crates? They can create an organization in GitHub (also free) and then make teams under that organization.

I agree with you that yank and change-owners are way less likely to be used on CI, and I could see them being merged into a manage scope. I'm wondering though if from a security point of view it makes sense to merge them: yanking is a reversible action and it can be easily rolled back if a malicious actor took control over the token

This is a good point! However, splitting these into separate scopes only makes sense if you expect people to generate low privilege yank-only tokens. I can't think of a scenario where that's useful.

being able to just allow chalk-* would be way better than having the Chalk developers ping someone on infra every time they add a new crate.

I didn't fully understand this sentence. Why do Chalk developers need to ping the infra team when they add a new crate?

I could see myself using a token without the change-owners scope on my workstation, and generating-then-revoking a temporary one if I happened to transfer a crate.

Alternately you could use the "Manage Owners" UI on the website version for this purpose. If the website gained UI for yank / unyank, people could then use publish-only tokens on their workstations and wouldn't need such fine-grained scopes.

@passcod
Copy link

passcod commented Jul 19, 2020

This is a good point! However, splitting these into separate scopes only makes sense if you expect people to generate low privilege yank-only tokens. I can't think of a scenario where that's useful.

I would be pretty annoyed if yanking wouldn't be separate, because then I couldn't give out full "manage releases" permissions on a crate without also potentially giving up ownership on that crate. Being able to give out (temporary) release + release management rights is something I'd love to be able to do natively here, instead of proxying actions through a "bot" that manages those fine-grained permissions.

We might want to discuss the default endpoint scope separately. It could be beneficial to not select anything by default, or just `publish-update` as a default for CI systems.
This is somewhat unrelated to the previous sentence and deserves its own paragraph
@Turbo87
Copy link
Member

Turbo87 commented Oct 21, 2022

we've just discussed this RFC in the weekly crates.io team and decided to move this to final-comment-period. last chance everyone, if you have any major objections then please let us know!

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Oct 21, 2022

Team member @Turbo87 has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Oct 21, 2022
@rfcbot
Copy link

rfcbot commented Oct 28, 2022

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Nov 7, 2022
@rfcbot
Copy link

rfcbot commented Nov 7, 2022

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@Turbo87
Copy link
Member

Turbo87 commented Nov 8, 2022

The @rust-lang/crates-io team has decided to accept this RFC.

To track further discussion, subscribe to the tracking issue here: rust-lang/crates.io#5443

meegta

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-security Security related proposals & ideas disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-crates-io Relevant to the crates.io team, which will review and decide on the RFC. to-announce

Projects

None yet

Development

Successfully merging this pull request may close these issues.