KEMBAR78
[MPS] Add upsample_bicubic2d as Metal op by malfet · Pull Request #136123 · pytorch/pytorch · GitHub
Skip to content

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Sep 16, 2024

More or less literal copy-n-paste of

__global__ void upsample_bicubic2d_out_frame(

and
__global__ void upsample_bicubic2d_backward_out_frame(

Missing uint8 implementation mimics CUDA behavior
Initial version coded live in https://www.youtube.com/watch?v=shi6Kb5xxvk
Later refinements:

  • Switch from 2D dispatch to 1D one (to match CUDA behavior)
  • Added batch + channel loops
  • Fixed scale computation to match align corners behavior
  • Added backward implementation

Backward implementation again, mimics CUDA, so it has issues precision issue for torch.half as well as a somewhat slow simulation of atomic adds using atomic compare and exchange of the pair of adjacent values, i.e.

emplate <typename T>
static inline void atomic_add_helper(
    device atomic<int>* data,
    long offset,
    float value) {
  auto ptr = data + (offset >> 1);
  auto old = atomic_load_explicit(ptr, memory_order_relaxed);
  union {
    int i;
    T t[2];
  } val;
  do {
    val.i = old;
    val.t[offset & 1] += static_cast<T>(value);
  } while (!atomic_compare_exchange_weak_explicit(
      ptr, &old, val.i, memory_order_relaxed, memory_order_relaxed));
}

Bump basic Metal language version to 3.0, as it's supported on MacOS13 and that's the first version that has atomic_float

@malfet malfet requested a review from kulinseth as a code owner September 16, 2024 00:27
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136123

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 13b2f66 with merge base 08dba25 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Sep 16, 2024
@github-actions
Copy link
Contributor

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.


Caused by:

@malfet
Copy link
Contributor Author

malfet commented Sep 16, 2024

It now has an unexpected successes, which is a good sign, next step is to fix it for uint8 dtype.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration sounds good, only small nit on edge case.
I didn't double check the convolution algorithm but I guess that CI is enough to validate it?

@malfet malfet force-pushed the malfet/mps-add-bicubic-sample-2d branch from 530eefe to 9f0cd01 Compare September 24, 2024 15:59
@malfet
Copy link
Contributor Author

malfet commented Sep 24, 2024

@pytorchbot merge -f "Lint + MPS tests are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

BoyuanFeng pushed a commit to BoyuanFeng/pytorch that referenced this pull request Sep 25, 2024
More or less literal copy-n-paste of https://github.com/pytorch/pytorch/blob/c33b0580e6a702be0cd5be691b3b465da012aa34/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu#L24
and
https://github.com/pytorch/pytorch/blob/c33b0580e6a702be0cd5be691b3b465da012aa34/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu#L99
Missing `uint8` implementation mimics CUDA behavior
Initial version coded live in https://www.youtube.com/watch?v=shi6Kb5xxvk
Later refinements:
 - Switch from 2D dispatch to 1D one (to match CUDA behavior)
 - Added batch + channel loops
 - Fixed scale computation to match align corners behavior
 - Added backward implementation

Backward implementation again, mimics CUDA, so it has issues precision issue for `torch.half` as well as a somewhat slow simulation of atomic adds using atomic compare and exchange of the pair of adjacent values, i.e.
```metal
emplate <typename T>
static inline void atomic_add_helper(
    device atomic<int>* data,
    long offset,
    float value) {
  auto ptr = data + (offset >> 1);
  auto old = atomic_load_explicit(ptr, memory_order_relaxed);
  union {
    int i;
    T t[2];
  } val;
  do {
    val.i = old;
    val.t[offset & 1] += static_cast<T>(value);
  } while (!atomic_compare_exchange_weak_explicit(
      ptr, &old, val.i, memory_order_relaxed, memory_order_relaxed));
}
```
Bump basic Metal language version to 3.0, as it's supported on MacOS13 and that's the first version that has `atomic_float`
Pull Request resolved: pytorch#136123
Approved by: https://github.com/albanD
@github-actions github-actions bot deleted the malfet/mps-add-bicubic-sample-2d branch October 25, 2024 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) Merged release notes: mps Release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants