-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[aotd] Fix rrelu compilation #136008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aotd] Fix rrelu compilation #136008
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136008
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7201db2 with merge base efed357 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Issues: #135083 #120292 rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph. Also that decomposition is registered as python_dispatch kernel for AutogradCUDA. Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this. Testing: ``` python test/functorch/test_aotdispatch.py -k test_rrelu ``` [ghstack-poisoned]
Issues: #135083 #120292 rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph. Also that decomposition is registered as python_dispatch kernel for AutogradCUDA. Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this. Testing: ``` python test/functorch/test_aotdispatch.py -k test_rrelu ``` [ghstack-poisoned]
|
||
is_factory_function: bool = False | ||
|
||
is_randomized_result: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more about why we need this flag? The main reason I'm surprised is:
(1) most ops that are involve randomness can be made deterministic by setting various pytorch state, e.g. torch.mamul_seed
(doc). I think we can... probably figure out how to reset the state between running the ref/test so the output? (also, we have other rng ops that don't seem to need a special flag in the OpInfo tests today)
(2) inductor does generate randomness eagerly from eager mode, but the tests you're updating aren't actually using inductor (they use AOTDispatcher, aka they are capturing a graph of the same underlying aten ops and replaying them)
Issues: #135083 #120292 rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph. Also that decomposition is registered as python_dispatch kernel for AutogradCUDA. Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this. Testing: ``` python test/functorch/test_aotdispatch.py -k test_rrelu ``` [ghstack-poisoned]
Issues: #135083 #120292 rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph. Also that decomposition is registered as python_dispatch kernel for AutogradCUDA. Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this. Testing: ``` python test/functorch/test_aotdispatch.py -k test_rrelu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
Issues: #135083 #120292 rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph. Also that decomposition is registered as python_dispatch kernel for AutogradCUDA. Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this. Testing: ``` python test/functorch/test_aotdispatch.py -k test_rrelu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
@pytorchbot merge |
1 similar comment
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command
Details for Dev Infra teamRaised by workflow job |
Issues: #135083 #120292 rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph. Also that decomposition is registered as python_dispatch kernel for AutogradCUDA. Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this. Testing: ``` python test/functorch/test_aotdispatch.py -k test_rrelu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command
Details for Dev Infra teamRaised by workflow job |
Issues: #135083 #120292 rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph. Also that decomposition is registered as python_dispatch kernel for AutogradCUDA. Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this. Testing: ``` python test/functorch/test_aotdispatch.py -k test_rrelu ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
When I used this updated code I found another problem, the same code in eager mode did not give the same result as inductor compile.
output: |
@junlin-habana Thanks for the testing. Yes, there is additional problem, that noise argument is not marked as mutated argument in the schema of rrelu. As a result this mutation is not auto-functionalized. We will fix it in follow up diff changing the schema for all rrelu ops. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@junlin-habana I rechecked your test:
And noise mutation is captured:
And aot graph also captures the mutation:
Could you please recheck on the latest master? |
Issues: pytorch#135083 pytorch#120292 rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph. Also that decomposition is registered as python_dispatch kernel for AutogradCUDA. Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this. Testing: ``` python test/functorch/test_aotdispatch.py -k test_rrelu ``` Pull Request resolved: pytorch#136008 Approved by: https://github.com/bdhirsh
Ok,thanks for you help, looks good on the latest master. |
@IvanKobzarev
|
@intellinjun Thanks, reproduced. With bf16 it is not mutating noise, smth with dtype convertion in compilation. Will debug it. |
@intellinjun #136784 for tracking as a separate issue |
Stack from ghstack (oldest at bottom):
Issues:
#135083
#120292
rrelu decomposition contains mutation, copy_. Decompositions are executed below Functionalization, as a result AOT produces non-functional graph.
Also that decomposition is registered as python_dispatch kernel for AutogradCUDA.
Autograd dispatch happens above Functionalization, so registering it for Autograd to handle all backends makes functionalization running after this.
Testing:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang