[cpp wrapper] add AOTI shim for collective ops #154492

Valentine233 · 2025-05-28T08:08:29Z

Implementations:

Move collective ops to c10d namespace, so that we can call them externally.
Add AOTI shims for collective ops.

Testing

Add c10d functional UT for cpu.
Include the above one in cpp wrapper UT.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

pytorch-bot · 2025-05-28T08:08:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154492

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c44381a with merge base 04178d3 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-05-28T08:12:58Z

Attention! PyTorch one of the C-stable API file was changed

You MUST NOT change existing function declarations in this, as this header defines a stable C ABI. If you need to change the signature for a function, introduce a new v2 version of the function and modify code generation to target the new version of the function.

Caused by:

torch/csrc/inductor/aoti_torch/c/shim_cpu.h

leslie-fang-intel · 2025-06-04T01:36:25Z

test/inductor/test_cpu_cpp_wrapper.py

 from torch.testing._internal.inductor_utils import HAS_CPU


+def load_test_module(name):


has this function be defined else where? Can we reuse that?

Thanks. The change is removed according to Baobin's suggestion.

leslie-fang-intel · 2025-06-04T03:04:49Z

torch/_inductor/utils.py

    return (
-        type(node) == ir._CollectiveKernel and (op is None or node.op_overload is op)
+        isinstance(node, ir._CollectiveKernel)
+        and not isinstance(node, ir._WaitKernel)


From the definition of ir._WaitKernel which is a subclass of ir._CollectiveKernel, why it will be excluded here?

Because ir._WaitKernel does not have any constant_args like other collective kernels.

The function is_collective is used here https://github.com/pytorch/pytorch/blob/main/test/inductor/test_snode_runtime.py#L244. If the check is True, it will finally call _get_group_size_by_name(node.constant_args[-1]) https://github.com/pytorch/pytorch/blob/main/torch/_inductor/comm_analysis.py#L73, which raises an error for ir._WaitKernel.
Ditto for the change of get_collective_group_size in common_analysis.py.

Then shouldn't get_collective_group_size handle handles _WaitKernel differently? (disclaimer: I am not sure if _WaitKernel has a concept of group size)

_WaitKernel is a special _CollectiveKernel, which does not have group_size (generate from group_name) like others.

For example, you can see the diff between schemas of wait_tensor and all_reduce:
"wait_tensor(Tensor tensor) -> Tensor"
"all_reduce(Tensor input, str reduce_op, str group_name) -> Tensor"

Valentine233 · 2025-06-06T01:17:47Z

@desertfire @yifuwang Could you have a look at this PR? Thanks!

desertfire · 2025-06-13T14:38:36Z

torch/csrc/inductor/aoti_torch/c/shim_cpu.h

    AtenTensorHandle qScaleAndZeros,
    AtenTensorHandle* ret0);

+AOTI_TORCH_EXPORT AOTITorchError aoti_torch_cpu__c10d_functional_all_reduce_(


I wish there is a way to autogen these...

Thanks! As collective OPs are not defined under torch.ops.aten, their AOTI shims cannot be automatically generated. Do you have any suggestions for how to autogen these kernels?

desertfire · 2025-06-13T14:56:13Z

test/inductor/test_cpu_cpp_wrapper.py

+    testdir = Path(__file__).absolute().parent.parent
+    with mock.patch("sys.path", [*sys.path, str(testdir)]):
+        return SourceFileLoader(
+            name, str(testdir / f"{name.replace('.', '/')}.py")


This looks fragile. Can you simply test cpp_wrapper in test_c10d_functional_native.py?

Thanks and modified!

desertfire · 2025-06-13T14:58:31Z

torch/_inductor/ir.py

        kernel = self.op_overload
-        self.cpp_kernel_name = kernel._schema.name
+        if cpp_kernel_name is not None:
+            self.cpp_kernel_name = cpp_kernel_name


Why do you need to explicitly pass in cpp_kernel_name here? cpp_wrapper_cpu.py should have taken care of the schema name to C shim name conversion.

Thanks for your comment.

In function set_cpp_kernel_name, the optional param cpp_kernel_name is not used. I suppose we need to take it into account when this is explicitly pass.

As collective OPs are not defined under torch.ops.aten, their AOTI shims cannot be automatically generated. So here I need to pass the cpp_kernel_name like what this does: https://github.com/pytorch/pytorch/blob/main/torch/_inductor/mkldnn_ir.py#L1265.

desertfire · 2025-06-13T15:01:56Z

torch/_inductor/utils.py

    return (
-        type(node) == ir._CollectiveKernel and (op is None or node.op_overload is op)
+        isinstance(node, ir._CollectiveKernel)
+        and not isinstance(node, ir._WaitKernel)


Then shouldn't get_collective_group_size handle handles _WaitKernel differently? (disclaimer: I am not sure if _WaitKernel has a concept of group size)

desertfire · 2025-06-13T15:04:31Z

torch/csrc/inductor/aoti_torch/shim_cpu.cpp

+    AtenTensorHandle* ret0) {
+  AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE({
+    auto tmp_result = c10d::all_reduce_(
+        *tensor_handle_to_tensor_pointer(inp), reduce_op, group_name);


Please use resolve_tensor_dispatch_flags instead of tensor_handle_to_tensor_pointer.

The modification would raise an error: cannot bind non-const lvalue reference of type ‘at::Tensor&’ to an rvalue of type ‘at::Tensor’, because all_reduce_ expects the type at::Tensor&.

The you can just need to create another tmp variable first.

Not a particular issue to this PR. I will do some clean up later.

Valentine233 · 2025-06-20T01:38:34Z

@desertfire Hi, could you help review again? Thanks~

Valentine233 · 2025-06-24T01:12:29Z

@pytorchbot merge

pytorchmergebot · 2025-06-24T01:14:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-06-24T01:30:12Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / build

Details for Dev Infra team

Raised by workflow job

Valentine233 · 2025-06-25T01:12:45Z

@pytorchbot merge

pytorchmergebot · 2025-06-25T01:14:26Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Implementations: 1. Move collective ops to c10d namespace, so that we can call them externally. 2. Add AOTI shims for collective ops. Testing 1. Add c10d functional UT for cpu. 2. Include the above one in cpp wrapper UT. Pull Request resolved: pytorch#154492 Approved by: https://github.com/desertfire

pytorch-bot bot added ciflow/inductor module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category release notes: inductor (aoti) labels May 28, 2025

Valentine233 added the topic: not user facing topic category label May 28, 2025

Valentine233 marked this pull request as draft May 28, 2025 08:09

pytorchbot added the open source label May 28, 2025

Valentine233 force-pushed the collective_c_shim branch from b34bd4c to 6e9d8ed Compare June 3, 2025 05:47

Valentine233 requested a review from leslie-fang-intel June 4, 2025 01:12

Valentine233 marked this pull request as ready for review June 4, 2025 01:13

leslie-fang-intel reviewed Jun 4, 2025

View reviewed changes

Valentine233 requested review from desertfire and yifuwang June 4, 2025 05:10

jcaip added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 5, 2025

desertfire reviewed Jun 13, 2025

View reviewed changes

Valentine233 force-pushed the collective_c_shim branch from 9b1ad35 to f72f4fe Compare June 18, 2025 07:00

Valentine233 requested review from desertfire and leslie-fang-intel June 18, 2025 07:00

Valentine233 force-pushed the collective_c_shim branch 2 times, most recently from d96ff80 to e467d25 Compare June 19, 2025 01:30

desertfire approved these changes Jun 23, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 24, 2025

pytorchmergebot added the merging label Jun 24, 2025

pytorchmergebot removed the merging label Jun 24, 2025

Valentine233 added 6 commits June 24, 2025 00:25

[cpp wrapper] add AOTI shim for collective ops

6599627

update

8e00005

update

645b4dd

collective_c_shim

0f19be0

update ut

255cc8e

fix macos building issue

c44381a

Valentine233 force-pushed the collective_c_shim branch from e467d25 to c44381a Compare June 24, 2025 07:26

pytorchmergebot added the merging label Jun 25, 2025

pytorchmergebot added the Merged label Jun 25, 2025

pytorchmergebot closed this in 02c7ab2 Jun 25, 2025

pytorchmergebot removed the merging label Jun 25, 2025

github-actions bot deleted the collective_c_shim branch July 25, 2025 02:20

		from torch.testing._internal.inductor_utils import HAS_CPU


		def load_test_module(name):

[cpp wrapper] add AOTI shim for collective ops #154492

[cpp wrapper] add AOTI shim for collective ops #154492

Uh oh!

Conversation

Valentine233 commented May 28, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154492

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

github-actions bot commented May 28, 2025

Attention! PyTorch one of the C-stable API file was changed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Valentine233 Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Valentine233 Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Valentine233 commented Jun 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Valentine233 Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Valentine233 Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Valentine233 commented Jun 20, 2025

Uh oh!

Valentine233 commented Jun 24, 2025

Uh oh!

pytorchmergebot commented Jun 24, 2025

Merge started

Uh oh!

pytorchmergebot commented Jun 24, 2025

Merge failed

Uh oh!

Valentine233 commented Jun 25, 2025

Uh oh!

pytorchmergebot commented Jun 25, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Valentine233 commented May 28, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 28, 2025 •

edited

Loading

Valentine233 Jun 4, 2025 •

edited

Loading

Valentine233 Jun 16, 2025 •

edited

Loading

Valentine233 Jun 16, 2025 •

edited

Loading

Valentine233 Jun 16, 2025 •

edited

Loading