KEMBAR78
[TP][Inference] Enable DTensor TP inference by fduwjj · Pull Request #110751 · pytorch/pytorch · GitHub
Skip to content

Conversation

@fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Oct 6, 2023

Stack from ghstack (oldest at bottom):

In #109977, we observed that during inference mode, aten.Linear does not get decomposed. So instead of enabling sharding propagation for linear op, we use func.decompose so that it gets decomposed to matmul and mm.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110751

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 7e716a8 with merge base a3e5ec4 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@fduwjj fduwjj changed the title Enable DTensor TP inference [TP][Inference] Enable DTensor TP inference Oct 6, 2023

In #109977, we observed that during inference mode, aten.Linear does not get decomposed. So instead of enabling sharding propagation for linear op, we use func.decompose so that it gets decomposed to matmul and mm.


[ghstack-poisoned]

In #109977, we observed that during inference mode, aten.Linear does not get decomposed. So instead of enabling sharding propagation for linear op, we use func.decompose so that it gets decomposed to matmul and mm.


[ghstack-poisoned]

In #109977, we observed that during inference mode, aten.Linear does not get decomposed. So instead of enabling sharding propagation for linear op, we use func.decompose so that it gets decomposed to matmul and mm.


[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Oct 6, 2023
ghstack-source-id: 502aaf6
Pull Request resolved: #110751
@fduwjj fduwjj requested a review from wanchaol October 6, 2023 21:25
Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left light comments but lgtm!


In #109977, we observed that during inference mode, aten.Linear does not get decomposed. So instead of enabling sharding propagation for linear op, we use func.decompose so that it gets decomposed to matmul and mm.


[ghstack-poisoned]
@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 7, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 7, 2023
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job


In #109977, we observed that during inference mode, aten.Linear does not get decomposed. So instead of enabling sharding propagation for linear op, we use func.decompose so that it gets decomposed to matmul and mm.


[ghstack-poisoned]
@fduwjj fduwjj added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Oct 7, 2023

In #109977, we observed that during inference mode, aten.Linear does not get decomposed. So instead of enabling sharding propagation for linear op, we use func.decompose so that it gets decomposed to matmul and mm.


[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Oct 7, 2023
ghstack-source-id: 462029d
Pull Request resolved: #110751
@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 7, 2023

@pytorchbot merge -f "The failing test are not related to this PR."

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: dtensor distributed tensor tag release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants