KEMBAR78
Documentation Clarification Needed for Clamping of Scale Coefficient in clip_grads_with_norm_ · Issue #151554 · pytorch/pytorch · GitHub
Skip to content

Documentation Clarification Needed for Clamping of Scale Coefficient in clip_grads_with_norm_ #151554

@qiaoqiaoLF

Description

@qiaoqiaoLF

📚 The doc issue

In the current documentation for torch.nn.utils.clip_grads_with_norm_, the formula for the scale coefficient is as follows:

$$ \text{grad} = \text{grad} \times \frac{\text{maxnorm}}{\text{totalnorm} + 1e-6} $$

However, in practical usage, it appears that the scale coefficient is clamped to a maximum value of 1, which prevents the gradient from scaling the gradient up. This behavior, while important for the correct functionality of gradient clipping, is not currently mentioned in the documentation.

Suggested Change:
It would be helpful to explicitly mention in the documentation that the scale coefficient is clamped to 1. This will provide more clarity to users about how the function operates in practice and ensure there are no misunderstandings regarding its behavior.

Suggest a potential alternative/fix

The formula should be updated as follows:

$$ \text{grad} = \text{grad} \times min( \frac{\text{maxnorm}}{\text{totalnorm} + 1e-6}, 1) $$

cc @svekars @sekyondaMeta @AlannaBurke @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: docsRelated to our documentation, both in docs/ and docblocksmodule: nnRelated to torch.nnmodule: norms and normalizationtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions