-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[inductor] Parallelize Max Autotune step 1: refactor autotune_process #109126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Step 1 in revamping subprocess autotune to support multiple GPUs. This diff just does some refactoring to autotune_process.py in order to prepare for the next diff: * Move all logic for managing the sub-process (like detecting sub-process crashes) into the TuningProcess class. * Use log.debug statements instead of print statements Test Plan: python test/inductor/test_max_autotune.py [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109126
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 143ebfa with merge base 264f1e7 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…une_process" Summary: Step 1 in revamping subprocess autotune to support multiple GPUs. This diff just does some refactoring to autotune_process.py in order to prepare for the next diff: * Move all logic for managing the sub-process (like detecting sub-process crashes) into the TuningProcess class. * Use log.debug statements instead of print statements Test Plan: python test/inductor/test_max_autotune.py cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
…une_process" Summary: Step 1 in revamping subprocess autotune to support multiple GPUs. This diff just does some refactoring to autotune_process.py in order to prepare for the next diff: * Move all logic for managing the sub-process (like detecting sub-process crashes) into the TuningProcess class. * Use log.debug statements instead of print statements Test Plan: python test/inductor/test_max_autotune.py cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
|
@shunting314 FYI this is a redo of #107982. We had to revert that change because it didn't play well in fbcode. In fbcode, everything is a .xar file and we got feedback that we can't necessarily guarantee the proper environment for a subprocess started via Popen. So this change goes back to using multiprocessing and multiprocessing queues. This change just does some reorg to make the next diff in the stack a little easier to review. |
…une_process" Summary: Step 1 in revamping subprocess autotune to support multiple GPUs. This diff just does some refactoring to autotune_process.py in order to prepare for the next diff: * Move all logic for managing the sub-process (like detecting sub-process crashes) into the TuningProcess class. * Use log.debug statements instead of print statements Test Plan: python test/inductor/test_max_autotune.py cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
Test Plan: `python test/inductor/test_max_autotune.py` `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` `TORCHINDUCTOR_AUTOTUNE_MULTI_DEVICE=1 TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` Pull Request resolved: #109127 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #109126
Stack from ghstack (oldest at bottom):
Summary: Step 1 in revamping subprocess autotune to support multiple GPUs. This diff just does some refactoring to autotune_process.py in order to prepare for the next diff:
Test Plan: python test/inductor/test_max_autotune.py
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov