feat(inductor): Add `RAdam` to Inductor by converting data-dependent control-flow to `torch.where` #110351

jon-chuang · 2023-10-01T07:06:10Z

For small epochs (adaptive learning rate = 1), this will be more costly than not computing rect and adaptive_lr, but unlikely by much - they are all fused into a single kernel anyway.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

pytorch-bot · 2023-10-01T07:06:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110351

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit cf655e4 with merge base 4069d1d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vadimkantorov · 2023-10-01T09:12:21Z

I hope at some point, we can have polymorphic torch.sqrt(1.0) -> 1.0 for python scalars just like in NumPy and torch.pi to not have to import math just for these simple things and have dispatch helpers :) or speed up https://github.com/jon-chuang/pytorch/blob/89eb7a75a251c41c4bee86e9ede1001b0d3998af/torch/optim/optimizer.py#L84 cpu scalar tensors

…uang/radam-inductor

jon-chuang · 2023-10-03T21:41:03Z

torch/optim/radam.py

-            if rho_t > 5
-            else 0
-            for rho_t in rho_t_list
+            torch.where(


Hmm, not sure this is a good idea in the eager case? Anyway, it executes on CPU, but still might be a bit slow

Putting this on the capturable path would help resolve this as well

jon-chuang · 2023-10-03T21:44:03Z

torch/optim/radam.py

+        unrect_step_size = [(lr * _get_value(rect) / bc) * -1 for rect, bc in zip(unrectified, bias_correction1)]
        bias_correction2_sqrt_times_rect_step_size = [
-            _dispatch_sqrt(1 - beta2 ** _get_value(step)) * (lr * rect / bc) * -1
+            _dispatch_sqrt(1 - beta2 ** _get_value(step)) * (lr * _get_value(rect) / bc) * -1


We should probably have a "capturable" code path similar to Adam/NAdam and now Adagrad (#110339) / Adamax

It can use foreach_sqrt for the GPU path? 🤔

The only reason we wouldn't use it here is because we want this bias correction to be on the Scalar codepath, but that doesn't really make that much sense since it is consumed in foreach below.

hmm, I think the issue with converting to foreach here is the _get_value part, but honestly I could do list comprehension to create the corresponding step values.

Adding a capturable code path may be the right move here. Then, we could get rid of the need for get_value as the capturable path should use the foreach ops, and the eager path can still keep scalar math on python.

peterbell10 · 2023-10-04T12:12:14Z

torch/optim/optimizer.py

+    if not torch.jit.is_scripting() and isinstance(x, torch.Tensor):
+        return x.abs()
+    else:
+        return abs(x)


I improved dynamo's abs support in #110398 so this function shouldn't be needed any more.

Thanks @peterbell10! @jon-chuang let's rebase this PR onto that one and eradicate this?

janeyx99

Agreed that a capturable path will more sense here. However, adding capturable well requires more work as you should also support it in the single tensor case and add the right test cases in test_cuda and test_optim.

janeyx99 · 2023-10-04T19:55:18Z

torch/optim/optimizer.py

+    if not torch.jit.is_scripting() and isinstance(x, torch.Tensor):
+        return x.abs()
+    else:
+        return abs(x)


Thanks @peterbell10! @jon-chuang let's rebase this PR onto that one and eradicate this?

janeyx99 · 2023-10-04T20:01:58Z

torch/optim/radam.py

-            if rho_t > 5
-            else 0
-            for rho_t in rho_t_list
+            torch.where(


Putting this on the capturable path would help resolve this as well

janeyx99 · 2023-10-04T20:05:07Z

torch/optim/radam.py

+        unrect_step_size = [(lr * _get_value(rect) / bc) * -1 for rect, bc in zip(unrectified, bias_correction1)]
        bias_correction2_sqrt_times_rect_step_size = [
-            _dispatch_sqrt(1 - beta2 ** _get_value(step)) * (lr * rect / bc) * -1
+            _dispatch_sqrt(1 - beta2 ** _get_value(step)) * (lr * _get_value(rect) / bc) * -1


Adding a capturable code path may be the right move here. Then, we could get rid of the need for get_value as the capturable path should use the foreach ops, and the eager path can still keep scalar math on python.

vadimkantorov · 2023-10-05T20:18:53Z

I hope at some point, we can have polymorphic torch.sqrt

For this specific case, can torch.sqrt call be replaced by x ** 0.5? This expression should be both working for tensors and python scalars (does x ** 0.5 torch.pow call dispatch to torch.sqrt directly if power == 0.5?)

github-actions · 2023-12-04T20:33:58Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

jon-chuang added 8 commits September 30, 2023 18:33

fix

3327015

fix

d87af9a

commit

0780421

undo

a5a2f55

fix

1faebc2

done

34baba8

minor

3535519

undo dirty

5643561

jon-chuang requested review from albanD and janeyx99 as code owners October 1, 2023 07:06

pytorch-bot bot added the release notes: optim label Oct 1, 2023

github-actions bot added module: inductor module: dynamo ciflow/inductor labels Oct 1, 2023

pytorchbot added the open source label Oct 1, 2023

jon-chuang added 3 commits October 1, 2023 03:43

lint

f0aca1d

minor

e67706c

minor lint

df076d5

fix implementation

a0f5b5b

jon-chuang force-pushed the jon-chuang/radam-inductor branch from 88c348b to a0f5b5b Compare October 1, 2023 14:14

jon-chuang added 2 commits October 2, 2023 09:15

Merge branch 'main' of https://github.com/pytorch/pytorch into jon-ch…

166bd18

…uang/radam-inductor

Merge branch 'main' of https://github.com/pytorch/pytorch into jon-ch…

cf655e4

…uang/radam-inductor

jon-chuang commented Oct 3, 2023

View reviewed changes

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 3, 2023

peterbell10 reviewed Oct 4, 2023

View reviewed changes

janeyx99 reviewed Oct 4, 2023

View reviewed changes

vadimkantorov mentioned this pull request Oct 5, 2023

[discussion] Have PyTorch functions support python scalars (like NumPy) + introduce convenience constants like torch.pi and torch.e and maybe analogue of scipy.constants namespace #110636

Closed

github-actions bot added the Stale label Dec 4, 2023

github-actions bot closed this Jan 3, 2024

feat(inductor): Add RAdam to Inductor by converting data-dependent control-flow to torch.where #110351

feat(inductor): Add RAdam to Inductor by converting data-dependent control-flow to torch.where #110351

Uh oh!

Conversation

jon-chuang commented Oct 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110351

⏳ No Failures, 1 Pending

Uh oh!

vadimkantorov commented Oct 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vadimkantorov commented Oct 5, 2023

Uh oh!

github-actions bot commented Dec 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

feat(inductor): Add `RAdam` to Inductor by converting data-dependent control-flow to `torch.where` #110351

feat(inductor): Add `RAdam` to Inductor by converting data-dependent control-flow to `torch.where` #110351

jon-chuang commented Oct 1, 2023 •

edited

Loading

pytorch-bot bot commented Oct 1, 2023 •

edited

Loading

vadimkantorov commented Oct 1, 2023 •

edited

Loading