KEMBAR78
[dtensor] support local_map as a decorator by xmfan · Pull Request #161353 · pytorch/pytorch · GitHub
Skip to content

Conversation

@xmfan
Copy link
Member

@xmfan xmfan commented Aug 23, 2025

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161353

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 Cancelled Job, 2 Unrelated Failures

As of commit 3a92a02 with merge base dbef606 (image):

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

xmfan added a commit that referenced this pull request Aug 23, 2025
ghstack-source-id: 4c58d07
Pull Request resolved: #161353
@pytorch-bot pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Aug 23, 2025
[ghstack-poisoned]
xmfan added a commit that referenced this pull request Aug 23, 2025
ghstack-source-id: df49179
Pull Request resolved: #161353
[ghstack-poisoned]
xmfan added a commit that referenced this pull request Aug 24, 2025
ghstack-source-id: 95cbd87
Pull Request resolved: #161353
[ghstack-poisoned]
@xmfan xmfan added the release notes: distributed (dtensor) release notes category label Aug 26, 2025
@xmfan xmfan marked this pull request as ready for review August 26, 2025 17:11
@xmfan xmfan requested review from fduwjj and zpcore August 26, 2025 21:18
Copy link
Member

@zpcore zpcore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! cc @XilunWu

@xmfan xmfan requested a review from XilunWu August 26, 2025 21:41
@xmfan
Copy link
Member Author

xmfan commented Aug 27, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 27, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable), trunk / macos-py3-arm64 / test (mps, 1, 1, macos-m1-14)

Details for Dev Infra team Raised by workflow job

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #161479

pytorchmergebot pushed a commit that referenced this pull request Aug 28, 2025
Adds the pre-dispatch handling for the AC hop. This lets the HOP pre-dispatch export without actually pre-dispatch tracing into it,. However, this is not sufficient to support AC in export:
- because the HOP body will still be in torch IR, so it will fail export verifiers
- the exported module also can't be ran in eager because the AC HOP relies on partitioner to embed RNG state saving/restoring

So it must be lowered by AOT Autograd into post-dispatch first before being executed, It suffices for my purposes though.

If users had checkpoint API use in their exported model, the behavior goes from silently incorrect to now be validation error.

Pull Request resolved: #161479
Approved by: https://github.com/ydwu4
ghstack dependencies: #161353
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
And extract it out as a convenience function for dynamo to wrap

Pull Request resolved: pytorch#161353
Approved by: https://github.com/zpcore
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
Adds the pre-dispatch handling for the AC hop. This lets the HOP pre-dispatch export without actually pre-dispatch tracing into it,. However, this is not sufficient to support AC in export:
- because the HOP body will still be in torch IR, so it will fail export verifiers
- the exported module also can't be ran in eager because the AC HOP relies on partitioner to embed RNG state saving/restoring

So it must be lowered by AOT Autograd into post-dispatch first before being executed, It suffices for my purposes though.

If users had checkpoint API use in their exported model, the behavior goes from silently incorrect to now be validation error.

Pull Request resolved: pytorch#161479
Approved by: https://github.com/ydwu4
ghstack dependencies: pytorch#161353
@github-actions github-actions bot deleted the gh/xmfan/280/head branch September 27, 2025 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants