KEMBAR78
[inductor] Add ABI shim function for torch.scatter by aakhundov · Pull Request #114027 · pytorch/pytorch · GitHub
Skip to content

Conversation

@aakhundov
Copy link
Contributor

@aakhundov aakhundov commented Nov 18, 2023

Stack from ghstack (oldest at bottom):

Summary: Scatter fallback calls at::scatter in the C++ wrapper codegen. This doesn't work in the ABI compatibility mode, as the latter requires a shim function. One is added in this PR.

Test Plan:

$ python test/inductor/test_aot_inductor.py -k test_scatter_fallback
s...
----------------------------------------------------------------------
Ran 4 tests in 52.713s

OK (skipped=1)

Reviewers:

Subscribers:

Tasks:

Tags:

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler

Summary: Scatter fallback calls `at::scatter` in the C++ wrapper codegen. This doesn't work in the ABI compatibility mode, as the latter requires a shim function. One is added in this PR.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_scatter_fallback
...
----------------------------------------------------------------------
Ran 4 tests in 54.839s
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 18, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114027

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit caa0c20 with merge base 5e30741 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aakhundov added a commit that referenced this pull request Nov 18, 2023
Summary: Scatter fallback calls `at::scatter` in the C++ wrapper codegen. This doesn't work in the ABI compatibility mode, as the latter requires a shim function. One is added in this PR.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_scatter_fallback
...
----------------------------------------------------------------------
Ran 4 tests in 54.839s
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 7d9000e
Pull Request resolved: #114027
@aakhundov aakhundov changed the title Add ABI shim function for torch.scatter [inductor] Add ABI shim function for torch.scatter Nov 18, 2023
@aakhundov aakhundov requested review from chenyang78 and desertfire and removed request for desertfire November 18, 2023 21:49
@aakhundov aakhundov added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Nov 18, 2023
Copy link
Contributor

@chenyang78 chenyang78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR LGTM.

I am having a question to @SherlockNoMad: We are handling scatter different from other fallback kernels. Particularly, we have the following class hierarchies. I am wondering if we could also use ProxyExecutor to cover scatter fallback? Thanks.

class FallbackKernel(ExternKernelAlloc)
class ExternKernelAlloc(ExternKernel):
class ScatterFallback(ExternKernel)

@aakhundov
Copy link
Contributor Author

FWIW, I'd prefer having an ABI shim function for the cases we can cover and rely on ProxyExecutor for the uncovered cases. As I understand, ProxyExecutor comes with slight performance overhead (e.g., for ser/des and additional dispatch logic)?

@chenyang78
Copy link
Contributor

FWIW, I'd prefer having an ABI shim function for the cases we can cover and rely on ProxyExecutor for the uncovered cases. As I understand, ProxyExecutor comes with slight performance overhead (e.g., for ser/des and additional dispatch logic)?

There were some discussions about this last Friday. I think it would mostly impact CPU backend. Not sure about how it would affect CUDA backend performance. It might be small. I think @SherlockNoMad volunteered to do some perf experiments.

@aakhundov
Copy link
Contributor Author

I think it would mostly impact CPU backend. Not sure about how it would affect CUDA backend performance.

In some models, we can have very small kernels and / or CUDA device sync, so non-trivial CPU overhead can become visible.

Summary: Scatter fallback calls `at::scatter` in the C++ wrapper codegen. This doesn't work in the ABI compatibility mode, as the latter requires a shim function. One is added in this PR.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_scatter_fallback
s...
----------------------------------------------------------------------
Ran 4 tests in 52.713s

OK (skipped=1)
```

Reviewers:

Subscribers:

Tasks:

Tags:

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler

[ghstack-poisoned]
aakhundov added a commit that referenced this pull request Nov 20, 2023
Summary: Scatter fallback calls `at::scatter` in the C++ wrapper codegen. This doesn't work in the ABI compatibility mode, as the latter requires a shim function. One is added in this PR.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_scatter_fallback
...
----------------------------------------------------------------------
Ran 4 tests in 54.839s
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 988da0b
Pull Request resolved: #114027
@aakhundov
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants