[dtensor] Add propagate_tensor_meta function that skips cache if _are_we_tracing #161334

azahed98 · 2025-08-23T01:18:29Z

Fixes an issue where the log softmax handler checked the tensor metadata cache without checking for tracing or symints.

Probably best to merge this after #160798, but not strictly blocking.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

pytorch-bot · 2025-08-23T01:18:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161334

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (10 Unrelated Failures)

As of commit c94282a with merge base 2f0de0f ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

trunk / win-vs2022-cpu-py3 / test (default, 1, 3, lf.windows.4xlarge.nonephemeral) (gh) (matched win rule in flaky-rules.json)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under 'C:\actions-runner\_work\pytorch\pytorch\.github\actions\teardown-win'. Did you forget to run actions/checkout before running your local action?
trunk / win-vs2022-cpu-py3 / test (default, 2, 3, lf.windows.4xlarge.nonephemeral) (gh) (matched win rule in flaky-rules.json)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under 'C:\actions-runner\_work\pytorch\pytorch\.github\actions\teardown-win'. Did you forget to run actions/checkout before running your local action?
trunk / win-vs2022-cpu-py3 / test (default, 3, 3, lf.windows.4xlarge.nonephemeral) (gh) (matched win rule in flaky-rules.json)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under 'C:\actions-runner\_work\pytorch\pytorch\.github\actions\teardown-win'. Did you forget to run actions/checkout before running your local action?

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
functorch_maml_omniglot
inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
maml_omniglot
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_torchbench, 1, 2, linux.8xlarge.amx) (gh) (trunk failure)
functorch_maml_omniglot
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_torchbench, 2, 2, linux.8xlarge.amx) (gh) (trunk failure)
maml_omniglot
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.8xlarge.amx) (gh) (trunk failure)
functorch_maml_omniglot
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.8xlarge.amx) (gh) (trunk failure)
maml_omniglot
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal) (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

xmfan

Add a comment to not use _propagate_tensor_meta directly

azahed98 · 2025-08-26T16:07:07Z

@pytorchbot merge

pytorchmergebot · 2025-08-26T16:13:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

bdhirsh · 2025-08-27T13:31:46Z

torch/distributed/tensor/_sharding_prop.py

+        """
        return self._propagate_tensor_meta_non_cached(op_schema)

+    def propagate_tensor_meta(


hmm @XilunWu is it an intentional decision in DTensor to have the existing _propagate_tensor_meta methods be private? @azahed98 if so we should probably keep things that way (maybe just rename _propagate_tensor_meta to _propagate_tensor_meta_cached)

PR to rename: #161744

Rename the wrapper `propagate_tensor_meta` added in #161334 to make it clearly private, and rename the existing LRU function to accommodate. Pull Request resolved: #161744 Approved by: https://github.com/bdhirsh

…_we_tracing (pytorch#161334) Fixes an issue where the log softmax handler checked the tensor metadata cache without checking for tracing or symints. Probably best to merge this after pytorch#160798, but not strictly blocking. Pull Request resolved: pytorch#161334 Approved by: https://github.com/xmfan

Rename the wrapper `propagate_tensor_meta` added in pytorch#161334 to make it clearly private, and rename the existing LRU function to accommodate. Pull Request resolved: pytorch#161744 Approved by: https://github.com/bdhirsh

azahed98 requested review from bdhirsh and xmfan August 23, 2025 01:18

azahed98 added the release notes: distributed (dtensor) release notes category label Aug 23, 2025

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Aug 23, 2025

xmfan approved these changes Aug 25, 2025

View reviewed changes

azahed98 added 2 commits August 25, 2025 15:38

Add propagate_tensor_meta function that skips cache if _are_we_tracing

d7f2b41

Add comments

92c690e

azahed98 force-pushed the fix/log_softmax_cache branch from 86cdb4b to 92c690e Compare August 25, 2025 22:49

Fix indent

c94282a

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 26, 2025

pytorchmergebot added the merging label Aug 26, 2025

pytorchmergebot added the Merged label Aug 26, 2025

pytorchmergebot closed this in d4703fb Aug 26, 2025

pytorchmergebot removed the merging label Aug 26, 2025

bdhirsh reviewed Aug 27, 2025

View reviewed changes

azahed98 mentioned this pull request Aug 28, 2025

Rename propagate_tensor_meta to make private again #161744

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[dtensor] Add propagate_tensor_meta function that skips cache if _are_we_tracing #161334

[dtensor] Add propagate_tensor_meta function that skips cache if _are_we_tracing #161334

Uh oh!

azahed98 commented Aug 23, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 23, 2025 •

edited

Loading

Uh oh!

xmfan left a comment

Uh oh!

azahed98 commented Aug 26, 2025

Uh oh!

pytorchmergebot commented Aug 26, 2025

Uh oh!

bdhirsh Aug 27, 2025

Uh oh!

azahed98 Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[dtensor] Add propagate_tensor_meta function that skips cache if _are_we_tracing #161334

[dtensor] Add propagate_tensor_meta function that skips cache if _are_we_tracing #161334

Uh oh!

Conversation

azahed98 commented Aug 23, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161334

✅ You can merge normally! (10 Unrelated Failures)

Uh oh!

xmfan left a comment

Choose a reason for hiding this comment

Uh oh!

azahed98 commented Aug 26, 2025

Uh oh!

pytorchmergebot commented Aug 26, 2025

Merge started

Uh oh!

bdhirsh Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

azahed98 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

azahed98 commented Aug 23, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 23, 2025 •

edited

Loading