KEMBAR78
Fix docstring for clip_grads_with_norm_ to reflect clamping behavior by RajeshvShiyal · Pull Request #158200 · pytorch/pytorch · GitHub
Skip to content

Conversation

@RajeshvShiyal
Copy link
Contributor

@RajeshvShiyal RajeshvShiyal commented Jul 13, 2025

Fix docstring for clip_grads_with_norm_ to reflect clamping behavior
This PR updates the docstring for torch.nn.utils.clip_grads_with_norm_ to accurately reflect the implementation behavior. The current documentation suggests that gradients are always scaled by:

grad = grad * (max_norm / (total_norm + eps))

However, the actual implementation clamps the scale coefficient to a maximum of 1.0, ensuring gradients are only scaled down, not up. This PR corrects the formula and adds a clarifying note to avoid confusion for users.

Updated the formula in the docstring to:

grad = grad * min(max_norm / (total_norm + eps), 1.0)

Added a note explaining the rationale for clamping (to prevent gradient amplification).
Ensured consistency with the behavior of clip_grad_norm_.

Fixes #151554

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158200

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 7e384dc with merge base fac0be7 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@RajeshvShiyal
Copy link
Contributor Author

@pytorchbot label "release notes: python_frontend"
@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the release notes: python_frontend python frontend release notes category label Jul 13, 2025
@RajeshvShiyal RajeshvShiyal marked this pull request as draft July 14, 2025 02:53
@RajeshvShiyal RajeshvShiyal marked this pull request as ready for review July 14, 2025 02:55
@RajeshvShiyal
Copy link
Contributor Author

Hello @spzala,
I have tried to raise a pull request. Please review the pull request.

@albanD albanD removed their request for review July 14, 2025 15:45
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix lint

@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 14, 2025
@RajeshvShiyal RajeshvShiyal marked this pull request as draft July 15, 2025 04:33
@RajeshvShiyal RajeshvShiyal marked this pull request as ready for review July 15, 2025 04:38
@mikaylagawarecki
Copy link
Contributor

lint is still failing

Copy link
Contributor

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RajeshvShiyal thanks for the PR and addressing review comments quickly. Seems like your new changes failing lint. Make sure to run it successfully locally.

@RajeshvShiyal
Copy link
Contributor Author

Hello @mikaylagawarecki, @spzala,

Following error seems infra specific.

Collecting uv==0.1.45
Downloading uv-0.1.45-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (32 kB)
Requirement already satisfied: setuptools in /var/lib/jenkins/ci_env/lib/python3.9/site-packages (79.0.1)
Downloading uv-0.1.45-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.8 MB)
25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/12.8 MB ? eta -:--:--
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 180.6 MB/s eta 0:00:00
25hInstalling collected packages: uv
Attempting uninstall: uv
Found existing installation: uv 0.7.20
Uninstalling uv-0.7.20:
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/var/lib/jenkins/ci_env/bin/uv'
Check the permissions.

main()

File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 98, in main
run_cmd_or_die(f"docker exec -t {container_name} /exec")
File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 39, in run_cmd_or_die
raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}")
RuntimeError: Command docker exec -t 131240263c68dfcc261644d3bf605add8ddee94f1a4d3080a27b482d269f0ce3 /exec failed with exit code 1

Error: Process completed with exit code 1.

Note: I have also run lintrunner locally, but no lint errors found on my local machine.

@spzala
Copy link
Contributor

spzala commented Jul 16, 2025

@RajeshvShiyal thanks much for verifying locally. I agree the error seems related to build process. Hopefully, rerunning test shouldn't have it.

@mikaylagawarecki
Copy link
Contributor

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

This PR updates the docstring for torch.nn.utils.clip_grads_with_norm_ to accurately reflect the implementation behavior. The current documentation suggests that gradients are always scaled by:

grad = grad * (max_norm / (total_norm + eps))

However, the actual implementation clamps the scale coefficient to a maximum of 1.0, ensuring gradients are only scaled down, not up. This PR corrects the formula and adds a clarifying note to avoid confusion for users.

Updated the formula in the docstring to:

grad = grad * min(max_norm / (total_norm + eps), 1.0)

Added a note explaining the rationale for clamping (to prevent gradient amplification).
Ensured consistency with the behavior of clip_grad_norm_.
@pytorchmergebot
Copy link
Collaborator

Successfully rebased main onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout main && git pull --rebase)

@mikaylagawarecki
Copy link
Contributor

mikaylagawarecki commented Jul 16, 2025

The lint failure is real https://github.com/pytorch/pytorch/actions/runs/16324470514/job/46120316635?pr=158200

@spzala
Copy link
Contributor

spzala commented Jul 16, 2025

The lint failure is real https://github.com/pytorch/pytorch/actions/runs/16324470514/job/46120316635?pr=158200

@mikaylagawarecki yes, it's real this time :) Thanks! The previous failure only had ERROR: Could not install packages due to an OSError.....
@RajeshvShiyal please fix. Thanks!

I have run the lintrunner locally. It was simulated. Now it would be resolved.
@RajeshvShiyal
Copy link
Contributor Author

RajeshvShiyal commented Jul 17, 2025

Hello @mikaylagawarecki, @spzala,

I have run lintrunner locally and resolved the errors.

Note: Lint also suggested me below changes, so, those changes also done.

used below format

] = _group_tensors_by_device_and_dtype(
    [grads]
)  # type: ignore[assignment]

instead this

] = _group_tensors_by_device_and_dtype([grads])  # type: ignore[assignment]

@RajeshvShiyal
Copy link
Contributor Author

RajeshvShiyal commented Jul 17, 2025

Hello @mikaylagawarecki, @spzala,

Still framework/infra specific lint error.

image image image

@spzala
Copy link
Contributor

spzala commented Jul 17, 2025

@RajeshvShiyal the error seems real if you see here, https://github.com/pytorch/pytorch/actions/runs/16336731501/job/46188865026?pr=158200#step:15:163
It seems related to changes you made unrelated to your PR changes. I would suggest to keep only your original changes. Thanks!

@RajeshvShiyal
Copy link
Contributor Author

Hello @mikaylagawarecki, @spzala,

Changes which are not related to this PR reverted back. Thank you

@RajeshvShiyal
Copy link
Contributor Author

Hello @mikaylagawarecki, @spzala

Now following one failure. It seems specific to "sudo: setrlimit(RLIMIT_STACK): Operation not permitted"

image

@RajeshvShiyal
Copy link
Contributor Author

Hello sorry for below commits, As I was trying to raise PR for another issue by mistake those changes committed in this PR.

image

@RajeshvShiyal
Copy link
Contributor Author

RajeshvShiyal commented Jul 25, 2025

Hello @mikaylagawarecki, @spzala

Still, one failure which is same as previous one.

image image

@mikaylagawarecki
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 25, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

yangw-dev pushed a commit that referenced this pull request Aug 1, 2025
…158200)

Fix docstring for clip_grads_with_norm_ to reflect clamping behavior
This PR updates the docstring for torch.nn.utils.clip_grads_with_norm_ to accurately reflect the implementation behavior. The current documentation suggests that gradients are always scaled by:

grad = grad * (max_norm / (total_norm + eps))

However, the actual implementation clamps the scale coefficient to a maximum of 1.0, ensuring gradients are only scaled down, not up. This PR corrects the formula and adds a clarifying note to avoid confusion for users.

Updated the formula in the docstring to:

grad = grad * min(max_norm / (total_norm + eps), 1.0)

Added a note explaining the rationale for clamping (to prevent gradient amplification).
Ensured consistency with the behavior of clip_grad_norm_.

Fixes #151554

Pull Request resolved: #158200
Approved by: https://github.com/mikaylagawarecki
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: python_frontend python frontend release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Documentation Clarification Needed for Clamping of Scale Coefficient in clip_grads_with_norm_

6 participants