KEMBAR78
[funcol] a few optimizations to funcol by wanchaol · Pull Request #113324 · pytorch/pytorch · GitHub
Skip to content

Conversation

@wanchaol
Copy link
Collaborator

@wanchaol wanchaol commented Nov 9, 2023

Stack from ghstack (oldest at bottom):

Apply a few optimizations to funcol:

  • allgather on non-0 dim, the resulting tensor already needs to access
    data in order to do torch.cat, so we sync wait here so that we don;t
    need to go through ACT dispatch for chunk + cat alltogether
  • have a fast return logic to aten.view as it's a commonly hit op for
    view related ops

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @fduwjj @wz337 @tianyu-l @wconstab @yf225 @kiukchung @LucasLLC @d4l3k

Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 9, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113324

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 2de50e9 with merge base 7843df6 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Nov 9, 2023
Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

ghstack-source-id: 1d5a0cb
Pull Request resolved: #113324
Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

[ghstack-poisoned]
@fegin fegin added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 13, 2023
Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

cc H-Huang awgu kwen2501 fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu

[ghstack-poisoned]
Copy link
Contributor

@XilunWu XilunWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is ready to merge?

@wanchaol wanchaol added the topic: not user facing topic category label Nov 14, 2023
Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

cc H-Huang awgu kwen2501 fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu

[ghstack-poisoned]

@classmethod
def __torch_dispatch__(cls, func, types, args=(), kwargs=None):
if func == torch.ops.aten.view.default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with perf improvement change, but would this make the code look hacky?

@XilunWu
Copy link
Contributor

XilunWu commented Nov 17, 2023

Adding wait() to FakeTensor should fix the issue. Is there anything else we miss except this?

Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

cc H-Huang awgu kwen2501 fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Dec 5, 2023
Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

ghstack-source-id: 921e657
Pull Request resolved: #113324
@wanchaol wanchaol added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Dec 5, 2023
@wanchaol
Copy link
Collaborator Author

wanchaol commented Dec 5, 2023

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

cc H-Huang awgu kwen2501 fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu tianyu-l

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/wanchaol/390/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/113324)

pytorchmergebot pushed a commit that referenced this pull request Dec 5, 2023
Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

ghstack-source-id: addcea0
Pull Request resolved: #113324
@wanchaol
Copy link
Collaborator Author

wanchaol commented Dec 6, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

@wanchaol
Copy link
Collaborator Author

wanchaol commented Dec 6, 2023

@pytorchbot merge -i "windows failures not related"

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 6, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: windows failures not related

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,close} ...

Try @pytorchbot --help for more info.

@wanchaol
Copy link
Collaborator Author

wanchaol commented Dec 6, 2023

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 3 checks: periodic / win-vs2019-cuda11.8-py3 / build, .github/workflows/trunk.yml / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral), .github/workflows/trunk.yml / win-vs2019-cpu-py3 / test (default, 2, 3, windows.4xlarge.nonephemeral)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@albanD albanD added oncall: distributed Add this issue/PR to distributed oncall triage queue and removed module: distributed labels Dec 8, 2023
@facebook-github-bot facebook-github-bot deleted the gh/wanchaol/390/head branch December 10, 2023 15:27
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
Apply a few optimizations to funcol:

- allgather on non-0 dim, the resulting tensor already needs to access
data in order to do torch.cat, so we sync wait here so that we don;t
need to go through ACT dispatch for chunk + cat alltogether
- have a fast return logic to aten.view as it's a commonly hit op for
view related ops

Pull Request resolved: pytorch#113324
Approved by: https://github.com/XilunWu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants