-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[MPS] Add upsample_bicubic2d as Metal op #136123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136123
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 13b2f66 with merge base 08dba25 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Attention! native_functions.yaml was changedIf you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info. Caused by: |
|
It now has an unexpected successes, which is a good sign, next step is to fix it for uint8 dtype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration sounds good, only small nit on edge case.
I didn't double check the convolution algorithm but I guess that CI is enough to validate it?
530eefe to
9f0cd01
Compare
|
@pytorchbot merge -f "Lint + MPS tests are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
More or less literal copy-n-paste of https://github.com/pytorch/pytorch/blob/c33b0580e6a702be0cd5be691b3b465da012aa34/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu#L24 and https://github.com/pytorch/pytorch/blob/c33b0580e6a702be0cd5be691b3b465da012aa34/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu#L99 Missing `uint8` implementation mimics CUDA behavior Initial version coded live in https://www.youtube.com/watch?v=shi6Kb5xxvk Later refinements: - Switch from 2D dispatch to 1D one (to match CUDA behavior) - Added batch + channel loops - Fixed scale computation to match align corners behavior - Added backward implementation Backward implementation again, mimics CUDA, so it has issues precision issue for `torch.half` as well as a somewhat slow simulation of atomic adds using atomic compare and exchange of the pair of adjacent values, i.e. ```metal emplate <typename T> static inline void atomic_add_helper( device atomic<int>* data, long offset, float value) { auto ptr = data + (offset >> 1); auto old = atomic_load_explicit(ptr, memory_order_relaxed); union { int i; T t[2]; } val; do { val.i = old; val.t[offset & 1] += static_cast<T>(value); } while (!atomic_compare_exchange_weak_explicit( ptr, &old, val.i, memory_order_relaxed, memory_order_relaxed)); } ``` Bump basic Metal language version to 3.0, as it's supported on MacOS13 and that's the first version that has `atomic_float` Pull Request resolved: pytorch#136123 Approved by: https://github.com/albanD
More or less literal copy-n-paste of
pytorch/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu
Line 24 in c33b058
and
pytorch/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu
Line 99 in c33b058
Missing
uint8implementation mimics CUDA behaviorInitial version coded live in https://www.youtube.com/watch?v=shi6Kb5xxvk
Later refinements:
Backward implementation again, mimics CUDA, so it has issues precision issue for
torch.halfas well as a somewhat slow simulation of atomic adds using atomic compare and exchange of the pair of adjacent values, i.e.Bump basic Metal language version to 3.0, as it's supported on MacOS13 and that's the first version that has
atomic_float