- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.7k
[graph partition] Add way to register custom rule #163310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test [ghstack-poisoned]
| 🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163310
 Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit d13ce51 with merge base f4c33cd ( This comment was automatically generated by Dr. CI and updates every 15 minutes. | 
This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good for dynamically deciding whether to partition nodes for custom ops! The only other thing here would be data-dependent dynamic sizes
        
          
                torch/_inductor/scheduler.py
              
                Outdated
          
        
      | `register_should_partition_rule` is currently private and experimental. | ||
| Use at your own risk. | ||
| """ | ||
| assert isinstance(op, (torch._ops.OpOverload, torch._ops.HigherOrderOperator)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this be torch._ops.HigherOrderOperator ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no clue lol. It is very likely that the HOP part of this doesn't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to assert against HOP to start. There's not many HOPs that would be here anyways (they're not FallbackKernel)
        
          
                torch/_inductor/scheduler.py
              
                Outdated
          
        
      | should_partition_fn = _custom_should_partition_fns[operator] | ||
| fx_node = ir_node.get_origin_node() | ||
| assert fx_node is not None | ||
| _, fake_args, fake_kwargs = ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, this wont always succeed.. assert on first returned arg ? tbh not sure why i didnt just make this optional return..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't realize the first returned arg was a bool for if this succeeded or not
        
          
                torch/_inductor/scheduler.py
              
                Outdated
          
        
      | should_partition_fn = _custom_should_partition_fns[operator] | ||
| fx_node = ir_node.get_origin_node() | ||
| assert fx_node is not None | ||
| _, fake_args, fake_kwargs = ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if the operator doesnt have fixed require strides, it's possible the strides of the inputs that will be passed in at runtime are different than the strides in fake_args fake_kwargs. Not sure it really matters though, non-blocking for land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yike
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IvanKobzarev has an example showing how to get the args with correct strides here: https://github.com/pytorch/pytorch/pull/162377/files#diff-8dfc175f817a04eb10c183509564129a91f32db31190e7b4ad3fd9653337c6f6R3672. but again not landing blocking.
| _P = ParamSpec("_P") | ||
|  | ||
|  | ||
| _custom_should_partition_fns: weakref.WeakKeyDictionary[ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be anyway of resetting this ? it's fine if not since this is very particular use case and vllm can do it manually. also note this wont be part of fx graph cache but i guess that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
easiest way is torch._inductor.scheduler._custom_should_partition_fns.reset() lol
I'm not sure
This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]
| @pytorchbot merge | 
| Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team | 
This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: #163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: #162117, #162307, #162651
This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: pytorch#163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: pytorch#162117, pytorch#162307, pytorch#162651
This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: pytorch#163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: pytorch#162117, pytorch#162307, pytorch#162651
This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: #163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: #162117, #162307, #162651
This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: pytorch#163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: pytorch#162117, pytorch#162307, pytorch#162651
Stack from ghstack (oldest at bottom):
This PR adds an experimental way to register a custom rule for if
inductor should partition the graph around an operator.
Test Plan:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben