-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Fix OpSchema equality check #161231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix OpSchema equality check #161231
Conversation
`__eq__` didn't compare lists of DTensorSpec, but `__hash__` did (and it looks like attention was paid to hash, so I made comparison follow suit). [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161231
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7fd9430 with merge base a85711d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was at first hesitant to stamp this because while we definitely should fix this, it is possible DTensor did something that profited from the divergence and I don't want to regress that. However looking at the details of how eq was broken, i think its a safe fix.
Still i'm yolo'ing a bit here becuase i have not thoroughly reviewed where DTensor uses hash and where it uses eq so it's possible i'm missing something.
My understanding why we didn't suffer the issue before is that we do pytorch/torch/distributed/tensor/_dispatch.py Lines 146 to 150 in f521e82
in this case, |
|
Starting merge as part of PR stack under #161285 |
…h__` and `__eq__` (#161234) The performance cost of `dict` lookups keyed by `OpSchema` is a significant minority of DTensor overhead. With this change we shave a net ~1% off the total running time of the benchmark from #160580, as measured by using cProfile and comparing cumulative time spent in propagate + OpSchema's `__post_init__`. (`__post_init__` grew from 2.5% to 6.4% (+3.9%) and propagate shrank from 12.5% to 7.8% (-4.7%)). Pull Request resolved: #161234 Approved by: https://github.com/wconstab ghstack dependencies: #161231
`self is other` means the same thing as `id(self) == id(other)`, but it's one operator instead of 3. Pull Request resolved: #161235 Approved by: https://github.com/wconstab, https://github.com/zpcore, https://github.com/fduwjj ghstack dependencies: #161231, #161234
…161240) get_write_alias() call count reduction explained briefly in code comment. We don't need to check write_aliases against None in the final outs_to_return calculation because we just did that check. Pull Request resolved: #161240 Approved by: https://github.com/wconstab ghstack dependencies: #161231, #161234, #161235
…ly_alias_match (#161284) Containers are truthy iff they're non-empty. Pull Request resolved: #161284 Approved by: https://github.com/Skylion007, https://github.com/wconstab ghstack dependencies: #161231, #161234, #161235, #161240
`__eq__` didn't compare lists of DTensorSpec, but `__hash__` did (and it looks like attention was paid to hash, so I made comparison follow suit). Pull Request resolved: pytorch#161231 Approved by: https://github.com/wconstab, https://github.com/XilunWu, https://github.com/zpcore
…h__` and `__eq__` (pytorch#161234) The performance cost of `dict` lookups keyed by `OpSchema` is a significant minority of DTensor overhead. With this change we shave a net ~1% off the total running time of the benchmark from pytorch#160580, as measured by using cProfile and comparing cumulative time spent in propagate + OpSchema's `__post_init__`. (`__post_init__` grew from 2.5% to 6.4% (+3.9%) and propagate shrank from 12.5% to 7.8% (-4.7%)). Pull Request resolved: pytorch#161234 Approved by: https://github.com/wconstab ghstack dependencies: pytorch#161231
`self is other` means the same thing as `id(self) == id(other)`, but it's one operator instead of 3. Pull Request resolved: pytorch#161235 Approved by: https://github.com/wconstab, https://github.com/zpcore, https://github.com/fduwjj ghstack dependencies: pytorch#161231, pytorch#161234
…ytorch#161240) get_write_alias() call count reduction explained briefly in code comment. We don't need to check write_aliases against None in the final outs_to_return calculation because we just did that check. Pull Request resolved: pytorch#161240 Approved by: https://github.com/wconstab ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235
…ly_alias_match (pytorch#161284) Containers are truthy iff they're non-empty. Pull Request resolved: pytorch#161284 Approved by: https://github.com/Skylion007, https://github.com/wconstab ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235, pytorch#161240
…#161285) Drives down the overhead of return_and_correct_storage_aliasing slightly. Hopefully you'll agree it doesn't compromise readability. Pull Request resolved: pytorch#161285 Approved by: https://github.com/wconstab ghstack dependencies: pytorch#161231, pytorch#161234, pytorch#161235, pytorch#161240, pytorch#161284
Stack from ghstack (oldest at bottom):
is, not ==, to check exact type matches in _python_dispatch #161304__hash__and__eq__#161234__eq__didn't compare lists of DTensorSpec, but__hash__did (andit looks like attention was paid to hash, so I made comparison follow
suit).
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta