Fix OpSchema equality check #161231

swolchok · 2025-08-22T01:18:15Z

Stack from ghstack (oldest at bottom):

__eq__ didn't compare lists of DTensorSpec, but __hash__ did (and
it looks like attention was paid to hash, so I made comparison follow
suit).

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

`__eq__` didn't compare lists of DTensorSpec, but `__hash__` did (and it looks like attention was paid to hash, so I made comparison follow suit). [ghstack-poisoned]

pytorch-bot · 2025-08-22T01:18:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161231

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7fd9430 with merge base a85711d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

`__eq__` didn't compare lists of DTensorSpec, but `__hash__` did (and it looks like attention was paid to hash, so I made comparison follow suit). ghstack-source-id: 33571f3 Pull Request resolved: #161231

wconstab

I was at first hesitant to stamp this because while we definitely should fix this, it is possible DTensor did something that profited from the divergence and I don't want to regress that. However looking at the details of how eq was broken, i think its a safe fix.

Still i'm yolo'ing a bit here becuase i have not thoroughly reviewed where DTensor uses hash and where it uses eq so it's possible i'm missing something.

zpcore · 2025-08-22T23:19:01Z

I was at first hesitant to stamp this because while we definitely should fix this, it is possible DTensor did something that profited from the divergence and I don't want to regress that. However looking at the details of how eq was broken, i think its a safe fix.

Still i'm yolo'ing a bit here becuase i have not thoroughly reviewed where DTensor uses hash and where it uses eq so it's possible i'm missing something.

My understanding why we didn't suffer the issue before is that we do op_schema comparison after call unwrap_to_op_info:

pytorch/torch/distributed/tensor/_dispatch.py

Lines 146 to 150 in f521e82

    
           op_info = self.unwrap_to_op_info(op_call, args, kwargs) 
        
           logger.debug("Dispatching op_call: %s", op_info.schema) 
        
           try: 
        
               self.sharding_propagator.propagate(op_info)

in this case, self_arg can be DTensorSpec, but not a list[DTensorSpec].

pytorchmergebot · 2025-08-25T15:24:50Z

Starting merge as part of PR stack under #161285

…h__` and `__eq__` (#161234) The performance cost of `dict` lookups keyed by `OpSchema` is a significant minority of DTensor overhead. With this change we shave a net ~1% off the total running time of the benchmark from #160580, as measured by using cProfile and comparing cumulative time spent in propagate + OpSchema's `__post_init__`. (`__post_init__` grew from 2.5% to 6.4% (+3.9%) and propagate shrank from 12.5% to 7.8% (-4.7%)). Pull Request resolved: #161234 Approved by: https://github.com/wconstab ghstack dependencies: #161231

`self is other` means the same thing as `id(self) == id(other)`, but it's one operator instead of 3. Pull Request resolved: #161235 Approved by: https://github.com/wconstab, https://github.com/zpcore, https://github.com/fduwjj ghstack dependencies: #161231, #161234

…161240) get_write_alias() call count reduction explained briefly in code comment. We don't need to check write_aliases against None in the final outs_to_return calculation because we just did that check. Pull Request resolved: #161240 Approved by: https://github.com/wconstab ghstack dependencies: #161231, #161234, #161235

…ly_alias_match (#161284) Containers are truthy iff they're non-empty. Pull Request resolved: #161284 Approved by: https://github.com/Skylion007, https://github.com/wconstab ghstack dependencies: #161231, #161234, #161235, #161240

Drives down the overhead of return_and_correct_storage_aliasing slightly. Hopefully you'll agree it doesn't compromise readability. Pull Request resolved: #161285 Approved by: https://github.com/wconstab ghstack dependencies: #161231, #161234, #161235, #161240, #161284

`__eq__` didn't compare lists of DTensorSpec, but `__hash__` did (and it looks like attention was paid to hash, so I made comparison follow suit). Pull Request resolved: pytorch#161231 Approved by: https://github.com/wconstab, https://github.com/XilunWu, https://github.com/zpcore

…h__` and `__eq__` (pytorch#161234) The performance cost of `dict` lookups keyed by `OpSchema` is a significant minority of DTensor overhead. With this change we shave a net ~1% off the total running time of the benchmark from pytorch#160580, as measured by using cProfile and comparing cumulative time spent in propagate + OpSchema's `__post_init__`. (`__post_init__` grew from 2.5% to 6.4% (+3.9%) and propagate shrank from 12.5% to 7.8% (-4.7%)). Pull Request resolved: pytorch#161234 Approved by: https://github.com/wconstab ghstack dependencies: pytorch#161231

`self is other` means the same thing as `id(self) == id(other)`, but it's one operator instead of 3. Pull Request resolved: pytorch#161235 Approved by: https://github.com/wconstab, https://github.com/zpcore, https://github.com/fduwjj ghstack dependencies: pytorch#161231, pytorch#161234

…ytorch#161240) get_write_alias() call count reduction explained briefly in code comment. We don't need to check write_aliases against None in the final outs_to_return calculation because we just did that check. Pull Request resolved: pytorch#161240 Approved by: https://github.com/wconstab ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235

…ly_alias_match (pytorch#161284) Containers are truthy iff they're non-empty. Pull Request resolved: pytorch#161284 Approved by: https://github.com/Skylion007, https://github.com/wconstab ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235, pytorch#161240

…#161285) Drives down the overhead of return_and_correct_storage_aliasing slightly. Hopefully you'll agree it doesn't compromise readability. Pull Request resolved: pytorch#161285 Approved by: https://github.com/wconstab ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235, pytorch#161240, pytorch#161284

Fix OpSchema equality check

7fd9430

`__eq__` didn't compare lists of DTensorSpec, but `__hash__` did (and it looks like attention was paid to hash, so I made comparison follow suit). [ghstack-poisoned]

pytorch-bot bot added the ciflow/inductor label Aug 22, 2025

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Aug 22, 2025

swolchok requested review from XilunWu and wanchaol and removed request for wanchaol August 22, 2025 01:18

swolchok added topic: bug fixes topic category release notes: distributed (dtensor) release notes category labels Aug 22, 2025

swolchok requested review from ezyang and zpcore August 22, 2025 22:29

wconstab approved these changes Aug 22, 2025

View reviewed changes

XilunWu approved these changes Aug 22, 2025

View reviewed changes

XilunWu added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 22, 2025

zpcore approved these changes Aug 22, 2025

View reviewed changes

This was referenced Aug 23, 2025

Avoid double hash lookup in torch._library.simple_registry #161328

Closed

Fix accidental copy in pushPyOutToStack #161329

Closed

pytorchmergebot closed this in 5d6434b Aug 25, 2025

pytorchmergebot added the Merged label Aug 25, 2025

github-actions bot deleted the gh/swolchok/788/head branch September 25, 2025 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix OpSchema equality check #161231

Fix OpSchema equality check #161231

swolchok commented Aug 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

wconstab left a comment

Uh oh!

zpcore commented Aug 22, 2025

Uh oh!

pytorchmergebot commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix OpSchema equality check #161231

Fix OpSchema equality check #161231

Conversation

swolchok commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161231

✅ No Failures

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

zpcore commented Aug 22, 2025

Uh oh!

pytorchmergebot commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

swolchok commented Aug 22, 2025 •

edited

Loading

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading