KEMBAR78
Allow dynamic shapes for DTensor slice by azahed98 · Pull Request #157953 · pytorch/pytorch · GitHub
Skip to content

Conversation

@azahed98
Copy link
Contributor

@azahed98 azahed98 commented Jul 9, 2025

This PR allows for symints in gen_slice_strategy which is the strategy for aten.slice.Tensor. Previously, using dynamic shapes with slicing would result in

   File ".../pytorch/torch/distributed/tensor/_ops/_tensor_ops.py", line 348, in gen_slice_strategy
     assert isinstance(end, int)
 torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function getitem>(*(DTensor(local_tensor=FakeTensor(..., device='cuda:0', size=(s3, 2)), device_mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)), slice(None, (s77//2), None)), **{}): got AssertionError()

Questions before merge:

  1. dim is still asserted to be int. Is this fine, or is this potentially dynamic as well?
  2. I'm using argtype ignore for normalize_dim. Should I instead change types for normalize_dim and further dependency to be IntLike as well?

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

@azahed98 azahed98 requested a review from bdhirsh July 9, 2025 18:50
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157953

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit f2ce7fb with merge base 1f57e0e (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jul 9, 2025
for t in torch.tensor_split(x, 2)
]

x = DTensor.from_local(torch.rand(4, 4), mesh, [Shard(0)], run_check=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did we ensure these particular tensor dims were treated as dynamic? should we explicitly mark them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally manually marked the dimensions, but I removed that because torch.compile(..., dynamic=True) was enough to reproduce the error, so it shouldn't be necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i was mostly asking for my edification, what does dynamic=True mean, assume all dims are dynamic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! All dimensions are marked dynamic this way.

@azahed98 azahed98 requested a review from wconstab July 11, 2025 18:01
@azahed98
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 11, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / before-test / target-determination

Details for Dev Infra team Raised by workflow job

@azahed98 azahed98 force-pushed the fix/dtensor_split_symint branch from 4a196e6 to f2ce7fb Compare July 14, 2025 20:01
@azahed98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request dynamo-tensor-subclasses Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants