-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Description
📚 The doc issue
In the current documentation for torch.nn.utils.clip_grads_with_norm_, the formula for the scale coefficient is as follows:
However, in practical usage, it appears that the scale coefficient is clamped to a maximum value of 1, which prevents the gradient from scaling the gradient up. This behavior, while important for the correct functionality of gradient clipping, is not currently mentioned in the documentation.
Suggested Change:
It would be helpful to explicitly mention in the documentation that the scale coefficient is clamped to 1. This will provide more clarity to users about how the function operates in practice and ensure there are no misunderstandings regarding its behavior.
Suggest a potential alternative/fix
The formula should be updated as follows:
cc @svekars @sekyondaMeta @AlannaBurke @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki