KEMBAR78
[BE][SparseAdam] cleaner way to verify no sparse params by janeyx99 · Pull Request #114425 · pytorch/pytorch · GitHub
Skip to content

Conversation

@janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented Nov 22, 2023

Deprecation

As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups. Hitherto, the SparseAdam constructor had accidentally allow raw tensors as the params argument to the constructor. Now, if you write the following code, there will be a warning: "Passing in a raw Tensor as params to SparseAdam is deprecated. In the future, this will raise an error. Please wrap your Tensor in an iterable instead."

import torch
param = torch.rand(16, 32)
optimizer = torch.optim.SparseAdam(param)

Instead you should replace the last line with

optimizer = torch.optim.SparseAdam([param])

to avoid the warning.

Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the list(...) construct. However, this meant that SparseAdam deviated from other optimizers in that it could accept a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling super().__init__ first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024.

(But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛)

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 22, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114425

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3e061a6 with merge base 4a4c9fb (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@drisspg
Copy link
Contributor

drisspg commented Nov 27, 2023

Channeling my inner @albanD here but how sure that this BC break isn't a big deal?

@janeyx99
Copy link
Contributor Author

janeyx99 commented Nov 27, 2023

Channeling my inner @albanD here but how sure that this BC break isn't a big deal?

haha, maybe 95% sure. two main observations (but my position here is movable):

  1. A survey of the first 5 pages of https://github.com/search?q=SparseAdam%28&type=code&p=2 yields that people who use SparseAdam actually tended the other way of list-ifying their params (due to the bug mentioned in Fix generator exhaustion in SparseAdam #47724). Empirically I did not spot a single place where a singular tensor is passed in.
  2. Generally, in ML, the BC breaking case would be when someone optimizes 1 tensor, which I expect to be rare.

I suppose a decent test could be importing this PR internally and seeing if anything breaks?

Copy link
Contributor

@drisspg drisspg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, BC concerns on you lol

Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking but would be minor, I believe.




[ghstack-poisoned]
Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking but would be minor, I believe.




[ghstack-poisoned]
raise TypeError("params argument given to the optimizer should be "
"an iterable of Tensors or dicts, but got " +
torch.typename(params))
if self.__class__.__name__ == 'SparseAdam':
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to use a different method than looking into the dunders here, just didn't know of another way

@janeyx99
Copy link
Contributor Author

Imported diff was all green, but due to the very very slight chance that someone may have used SparseAdam incorrectly between the last landed PR and now, I decided to be safe and add a deprecation warning.

if self.__class__.__name__ == 'SparseAdam':
warnings.warn(("Passing in a raw Tensor is deprecated. In the future, "
"this will raise an error. Please wrap your Tensor in "
"an iterable instead."), UserWarning)
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
"an iterable instead."), UserWarning)
"an iterable instead."), FutureWarning)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference here?

Copy link
Contributor

@mikaylagawarecki mikaylagawarecki Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FutureWarning is a warning message that warns you about deprecated features that will be removed in a future version of a package.

I don't think we are that consistent about using FutureWarning for deprecations, but I suppose it makes sense to do this in case users want to explicitly suppress/allow certain warnings

"an iterable of Tensors or dicts, but got " +
torch.typename(params))
if self.__class__.__name__ == 'SparseAdam':
warnings.warn(("Passing in a raw Tensor is deprecated. In the future, "
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
warnings.warn(("Passing in a raw Tensor is deprecated. In the future, "
warnings.warn(("Passing in a raw Tensor as ``params`` to SparseAdam is deprecated. In the future, "

@mikaylagawarecki mikaylagawarecki added the topic: bc breaking topic category label Nov 28, 2023
Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024.

(But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛)




[ghstack-poisoned]
@janeyx99
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 29, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@janeyx99 janeyx99 added the topic: deprecation topic category label Jan 19, 2024
pytorchmergebot pushed a commit that referenced this pull request May 25, 2024
This continues the full deprecation after #114425. It's been 6 months! And I'm fairly certain no one is going to yell at me as this patch is not really used.

------

# BC Breaking note

As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups and fully complete deprecation of this path. Hitherto, the SparseAdam constructor had allowed raw tensors as the params argument to the constructor. Now, if you write the following code, there will be an error similar to every other optim: "params argument given to the optimizer should be an iterable of Tensors or dicts"

```
import torch
param = torch.rand(16, 32)
optimizer = torch.optim.SparseAdam(param)
```

Instead you should replace the last line with
```
optimizer = torch.optim.SparseAdam([param])
```
to no longer error.

Pull Request resolved: #127081
Approved by: https://github.com/soulitzer
titaiwangms pushed a commit to titaiwangms/pytorch that referenced this pull request May 28, 2024
This continues the full deprecation after pytorch#114425. It's been 6 months! And I'm fairly certain no one is going to yell at me as this patch is not really used.

------

# BC Breaking note

As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups and fully complete deprecation of this path. Hitherto, the SparseAdam constructor had allowed raw tensors as the params argument to the constructor. Now, if you write the following code, there will be an error similar to every other optim: "params argument given to the optimizer should be an iterable of Tensors or dicts"

```
import torch
param = torch.rand(16, 32)
optimizer = torch.optim.SparseAdam(param)
```

Instead you should replace the last line with
```
optimizer = torch.optim.SparseAdam([param])
```
to no longer error.

Pull Request resolved: pytorch#127081
Approved by: https://github.com/soulitzer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: optim topic: deprecation topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants