Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` #139662

mikaylagawarecki · 2024-11-04T22:02:12Z

Refactor nn.utils.clip_grad_norm_ into nn.utils.get_total_norm and then nn.utils.clip_grads_with_norm_ . clip_grad_norm_ now calls into these two new ops,

get_total_norm is generalized (rather than get_grad_norm due to the discussion on the issue from @awgu)

Stack from ghstack (oldest at bottom):

-> Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ #139662

cc @wconstab @zijian-hu

[ghstack-poisoned]

pytorch-bot · 2024-11-04T22:02:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139662

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 890bdcd with merge base 6bdbc86 ():

NEW FAILURE - The following job has failed:

linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test (gh)
ERROR: Could not find a version that satisfies the requirement jinja2 (from torch) (from versions: none)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 0a50487 Pull Request resolved: #139662

…rad_norm_`" Fixes #139467 Add `nn.utils.get_grad_norm` and `nn.utils.clip_grads_` [ghstack-poisoned]

…rad_norm_`" Fixes #139467 Add `nn.utils.get_grad_norm` and `nn.utils.scale_grads_` Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative [ghstack-poisoned]

… in `nn.utils.clip_grad_norm_`" Fixes #139467 Add `nn.utils.get_grad_norm` and `nn.utils.scale_grads_` Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative [ghstack-poisoned]

… in `nn.utils.clip_grad_norm_`" Fixes #139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_grad_norm` and then `nn.utils.scale_grads_` . Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative [ghstack-poisoned]

… in `nn.utils.clip_grad_norm_`" Fixes #139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_grad_norm` and then `nn.utils.scale_grads_` . Since `clip_grad_norm_` calls into these two new ops, the prior testing applies. Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative cc wconstab zijian-hu [ghstack-poisoned]

ghstack-source-id: 14df072 Pull Request resolved: #139662

… in `nn.utils.clip_grad_norm_`" [WIP] Going to change `get_grad_norm` to `get_total_norm` so it can be used with any list of Tensors Fixes #139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_grad_norm` and then `nn.utils.scale_grads_` . Since `clip_grad_norm_` calls into these two new ops, the prior testing applies. Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative cc wconstab zijian-hu [ghstack-poisoned]

ghstack-source-id: a685889 Pull Request resolved: #139662

… in `nn.utils.clip_grad_norm_`" [WIP] Going to change `get_grad_norm` to `get_total_norm` so it can be used with any list of Tensors Fixes #139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_grad_norm` and then `nn.utils.scale_grads_` . Since `clip_grad_norm_` calls into these two new ops, the prior testing applies. Chose `scale_grads_` instead of `clip_grads` as there is also `clip_grad_value` which does clamping, so `clip_grads_` did not feel representative cc wconstab zijian-hu [ghstack-poisoned]

… in `nn.utils.clip_grad_norm_`" Fixes #139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.scale_grads_` . `clip_grad_norm_` now calls into these two new ops, `get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from awgu) cc wconstab zijian-hu [ghstack-poisoned]

… in `nn.utils.clip_grad_norm_`" Fixes #139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.clip_grads_with_norm_` . `clip_grad_norm_` now calls into these two new ops, `get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from awgu) cc wconstab zijian-hu [ghstack-poisoned]

ghstack-source-id: 20b8fa9 Pull Request resolved: #139662

H-Huang

Looks good to me!

H-Huang · 2024-11-07T20:04:51Z

torch/nn/utils/__init__.py

 from . import parametrizations, rnn, stateless
-from .clip_grad import clip_grad_norm, clip_grad_norm_, clip_grad_value_
+from .clip_grad import (
+    _clip_grads_with_norm_ as clip_grads_with_norm_,


just curious, is there a particular reason the APIs are prepended with _ if they are going to be public anyways?

This is to avoid the public API docs test complaining that torch.nn.utils.clip_grad.{foo} is not documented (when it is already documented and publicly exposed as torch.nn.utils.{foo})

mikaylagawarecki · 2024-11-07T20:36:00Z

@pytorchbot merge

pytorchmergebot · 2024-11-07T20:37:49Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-07T22:44:22Z

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test

Details for Dev Infra team

Raised by workflow job

mikaylagawarecki · 2024-11-07T23:04:28Z

Could not find a version that satisfies the requirement jinja2 (from torch) (from versions: none) in linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test (gh) is unrelated

mikaylagawarecki · 2024-11-07T23:04:48Z

@pytorchbot merge -i

pytorchmergebot · 2024-11-07T23:07:31Z

Merge started

Your change will be merged while ignoring the following 1 checks: linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@awgu

…ls.clip_grad_norm_` (pytorch#139662) Fixes pytorch#139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.clip_grads_with_norm_` . `clip_grad_norm_` now calls into these two new ops, `get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from @awgu) Pull Request resolved: pytorch#139662 Approved by: https://github.com/H-Huang

Add APIs to separate norm and clipping in nn.utils.clip_grad_norm_

04dd1ac

[ghstack-poisoned]

mikaylagawarecki added a commit that referenced this pull request Nov 4, 2024

Add APIs to separate norm and clipping in nn.utils.clip_grad_norm_

0965f37

ghstack-source-id: 0a50487 Pull Request resolved: #139662

mikaylagawarecki added release notes: nn release notes category topic: improvements topic category labels Nov 4, 2024

Update on "Add APIs to separate norm and clipping in `nn.utils.clip_g…

5cda2e4

…rad_norm_`" Fixes #139467 Add `nn.utils.get_grad_norm` and `nn.utils.clip_grads_` [ghstack-poisoned]

mikaylagawarecki changed the title ~~Add APIs to separate norm and clipping in nn.utils.clip_grad_norm_~~ Add APIs to separate norm and scaling in nn.utils.clip_grad_norm_ Nov 4, 2024

mikaylagawarecki changed the title ~~Add APIs to separate norm and scaling in nn.utils.clip_grad_norm_~~ Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ Nov 4, 2024

mikaylagawarecki mentioned this pull request Nov 4, 2024

Seperate grad norm computation from torch.nn.utils.clip_grad_norm_ #139467

Closed

mikaylagawarecki marked this pull request as ready for review November 4, 2024 22:56

mikaylagawarecki requested review from albanD and jbschlosser as code owners November 4, 2024 22:56

mikaylagawarecki added a commit that referenced this pull request Nov 4, 2024

Add APIs to separate norm and clipping in nn.utils.clip_grad_norm_

0465727

ghstack-source-id: 14df072 Pull Request resolved: #139662

mikaylagawarecki marked this pull request as draft November 4, 2024 23:07

mikaylagawarecki added a commit that referenced this pull request Nov 5, 2024

Add APIs to separate norm and clipping in nn.utils.clip_grad_norm_

3a3e021

ghstack-source-id: a685889 Pull Request resolved: #139662

mikaylagawarecki marked this pull request as ready for review November 6, 2024 21:46

mikaylagawarecki added a commit that referenced this pull request Nov 6, 2024

Add APIs to separate norm and clipping in nn.utils.clip_grad_norm_

f34d680

ghstack-source-id: 20b8fa9 Pull Request resolved: #139662

H-Huang approved these changes Nov 7, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2024

pytorchmergebot added the merging label Nov 7, 2024

H-Huang mentioned this pull request Nov 7, 2024

Fix PP clip_grad_norm pytorch/torchtitan#649

Merged

pytorchmergebot removed the merging label Nov 7, 2024

pytorchmergebot added the merging label Nov 7, 2024

pytorchmergebot added the Merged label Nov 7, 2024

pytorchmergebot closed this in 2ee91db Nov 7, 2024

pytorchmergebot removed the merging label Nov 7, 2024

github-actions bot deleted the gh/mikaylagawarecki/285/head branch December 8, 2024 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` #139662

Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` #139662

Uh oh!

mikaylagawarecki commented Nov 4, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 4, 2024 •

edited

Loading

Uh oh!

H-Huang left a comment

Uh oh!

H-Huang Nov 7, 2024

Uh oh!

mikaylagawarecki Nov 7, 2024 •

edited

Loading

Uh oh!

mikaylagawarecki commented Nov 7, 2024

Uh oh!

pytorchmergebot commented Nov 7, 2024

Uh oh!

pytorchmergebot commented Nov 7, 2024

Uh oh!

mikaylagawarecki commented Nov 7, 2024 •

edited

Loading

Uh oh!

mikaylagawarecki commented Nov 7, 2024

Uh oh!

pytorchmergebot commented Nov 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ #139662

Add APIs to separate norm calculation and gradient scaling in nn.utils.clip_grad_norm_ #139662

Uh oh!

Conversation

mikaylagawarecki commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139662

❌ 1 New Failure

Uh oh!

H-Huang left a comment

Choose a reason for hiding this comment

Uh oh!

H-Huang Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki commented Nov 7, 2024

Uh oh!

pytorchmergebot commented Nov 7, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 7, 2024

Merge failed

Uh oh!

mikaylagawarecki commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikaylagawarecki commented Nov 7, 2024

Uh oh!

pytorchmergebot commented Nov 7, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` #139662

Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` #139662

mikaylagawarecki commented Nov 4, 2024 •

edited

Loading

pytorch-bot bot commented Nov 4, 2024 •

edited

Loading

mikaylagawarecki Nov 7, 2024 •

edited

Loading

mikaylagawarecki commented Nov 7, 2024 •

edited

Loading