-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[MPS] Speedup torch.full for 1-byte types #158874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158874
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit ce64e3f with merge base ddd74d1 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
By using [`fillBuffer:range:value:`](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/fillbuffer:range:value:?language=objc) rather than MPSGraph op, which should be faster and also does not have INT_MAX limit ghstack-source-id: dd69275 Pull Request resolved: #158874
By using [`fillBuffer:range:value:`](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/fillbuffer:range:value:?language=objc) rather than MPSGraph op, which should be faster and also does not have INT_MAX limit ghstack-source-id: 5332cf5 Pull Request resolved: #158874
test/test_indexing.py
Outdated
| def test_index_put_accumulate_large_tensor(self, device): | ||
| if device.startswith("mps"): | ||
| raise unittest.SkipTest("Crash with max number of dimentions") | ||
| # if device.startswith("mps"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just remove it instead of commenting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the plan, but I hugely suspect I'll have to leave the skip for MacOS-13, where 4GB tensors are big taboo
By using [`fillBuffer:range:value:`](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/fillbuffer:range:value:?language=objc) rather than MPSGraph op, which should be faster and also does not have INT_MAX limit ghstack-source-id: 98be968 Pull Request resolved: #158874
By using [`fillBuffer:range:value:`](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/fillbuffer:range:value:?language=objc) rather than MPSGraph op, which should be faster and also does not have INT_MAX limit ghstack-source-id: 98be968 Pull Request resolved: #158874
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Though testing is a lie and dependent on #153835 Fixes #153789 Pull Request resolved: #158888 Approved by: https://github.com/albanD ghstack dependencies: #158874
Though testing is a lie and dependent on #153835 Fixes #153789 Pull Request resolved: #158888 Approved by: https://github.com/albanD ghstack dependencies: #158874
Stack from ghstack (oldest at bottom):
By using
fillBuffer:range:value:rather than MPSGraph op, which should be faster and also does not have INT_MAX limitWhich in turn fixes
test_index_put_accumulate_large_tensor_mpstest