KEMBAR78
Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` by mikaylagawarecki · Pull Request #139662 · pytorch/pytorch · GitHub
Skip to content

Conversation

@mikaylagawarecki
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki commented Nov 4, 2024

Fixes #139467

Refactor nn.utils.clip_grad_norm_ into nn.utils.get_total_norm and then nn.utils.clip_grads_with_norm_ . clip_grad_norm_ now calls into these two new ops,

get_total_norm is generalized (rather than get_grad_norm due to the discussion on the issue from @awgu)

Stack from ghstack (oldest at bottom):

cc @wconstab @zijian-hu

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 4, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139662

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 890bdcd with merge base 6bdbc86 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikaylagawarecki added a commit that referenced this pull request Nov 4, 2024
@mikaylagawarecki mikaylagawarecki added release notes: nn release notes category topic: improvements topic category labels Nov 4, 2024
…rad_norm_`"

Fixes #139467

Add `nn.utils.get_grad_norm` and `nn.utils.clip_grads_`




[ghstack-poisoned]
…rad_norm_`"

Fixes #139467

Add `nn.utils.get_grad_norm` and `nn.utils.scale_grads_`

Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative




[ghstack-poisoned]
@mikaylagawarecki mikaylagawarecki changed the title Add APIs to separate norm and clipping in nn.utils.clip_grad_norm_ Add APIs to separate norm and scaling in nn.utils.clip_grad_norm_ Nov 4, 2024
@mikaylagawarecki mikaylagawarecki changed the title Add APIs to separate norm and scaling in nn.utils.clip_grad_norm_ Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ Nov 4, 2024
… in `nn.utils.clip_grad_norm_`"

Fixes #139467

Add `nn.utils.get_grad_norm` and `nn.utils.scale_grads_`

Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative




[ghstack-poisoned]
… in `nn.utils.clip_grad_norm_`"

Fixes #139467

Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_grad_norm` and then `nn.utils.scale_grads_` .

Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative




[ghstack-poisoned]
… in `nn.utils.clip_grad_norm_`"

Fixes #139467

Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_grad_norm` and then `nn.utils.scale_grads_` . Since `clip_grad_norm_` calls into these two new ops, the prior testing applies.

Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative


cc wconstab zijian-hu


[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Nov 4, 2024
@mikaylagawarecki mikaylagawarecki marked this pull request as draft November 4, 2024 23:07
… in `nn.utils.clip_grad_norm_`"

[WIP] Going to change `get_grad_norm` to `get_total_norm` so it can be used with any list of Tensors

Fixes #139467

Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_grad_norm` and then `nn.utils.scale_grads_` . Since `clip_grad_norm_` calls into these two new ops, the prior testing applies.

Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative


cc wconstab zijian-hu


[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Nov 5, 2024
… in `nn.utils.clip_grad_norm_`"

[WIP] Going to change `get_grad_norm` to `get_total_norm` so it can be used with any list of Tensors

Fixes #139467

Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_grad_norm` and then `nn.utils.scale_grads_` . Since `clip_grad_norm_` calls into these two new ops, the prior testing applies.

Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative


cc wconstab zijian-hu


[ghstack-poisoned]
@mikaylagawarecki mikaylagawarecki marked this pull request as ready for review November 6, 2024 21:46
… in `nn.utils.clip_grad_norm_`"

Fixes #139467

Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.scale_grads_` . `clip_grad_norm_` now calls into these two new ops,

`get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from awgu)


cc wconstab zijian-hu


[ghstack-poisoned]
… in `nn.utils.clip_grad_norm_`"

Fixes #139467

Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.clip_grads_with_norm_` . `clip_grad_norm_` now calls into these two new ops,

`get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from awgu)


cc wconstab zijian-hu


[ghstack-poisoned]
mikaylagawarecki added a commit that referenced this pull request Nov 6, 2024
Copy link
Member

@H-Huang H-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

from . import parametrizations, rnn, stateless
from .clip_grad import clip_grad_norm, clip_grad_norm_, clip_grad_value_
from .clip_grad import (
_clip_grads_with_norm_ as clip_grads_with_norm_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, is there a particular reason the APIs are prepended with _ if they are going to be public anyways?

Copy link
Contributor Author

@mikaylagawarecki mikaylagawarecki Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid the public API docs test complaining that torch.nn.utils.clip_grad.{foo} is not documented (when it is already documented and publicly exposed as torch.nn.utils.{foo})

@mikaylagawarecki
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test

Details for Dev Infra team Raised by workflow job

@mikaylagawarecki
Copy link
Contributor Author

mikaylagawarecki commented Nov 7, 2024

Could not find a version that satisfies the requirement jinja2 (from torch) (from versions: none) in linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test (gh) is unrelated

@mikaylagawarecki
Copy link
Contributor Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…ls.clip_grad_norm_` (pytorch#139662)

Fixes pytorch#139467

Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.clip_grads_with_norm_` . `clip_grad_norm_` now calls into these two new ops,

`get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from @awgu)

Pull Request resolved: pytorch#139662
Approved by: https://github.com/H-Huang
@github-actions github-actions bot deleted the gh/mikaylagawarecki/285/head branch December 8, 2024 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: nn release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants