KEMBAR78
Implement unfold_backward on MPS by malfet · Pull Request #135411 · pytorch/pytorch · GitHub
Skip to content

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Sep 7, 2024

This PR adds native implementation of unfold_backward as metal shader, mostly copy-n-paste of algorithms used in CUDA and CPU implementations, i.e. considering out = in.unfold(dim, size, step), then following holds true:

  • out.shape[dim] == (in.shape[dim] - size) / step + 1
  • out.shape[-1] == size
  • out.ndim == in.ndim + 1
    unfold_backward Metal kernel receives grad_in and returns grad_out such that:
  • grad_in.shape == out.shape
  • grad_out.shape == in.shape

For each index in grad_out find the elements contributing to it and sum them up. Such algorithm requires no synchronization between threads.
That is grad_out[...,out_dim_idx,...] accumulates all values grad_in[...,in_dim_idx,...,in_last_idx], where in_dim_idx is range [(out_dim_idx - size) / step, out_dim_idx / step] clamped to (0, in_dim_size) and in_last_idx are equal out_dim_idx - in_dim_idx * step . Accumulation step is skipped if in_last_idx is outside of [0, size] range.

This operator has been requested 16 times on #77764

@malfet malfet added the ciflow/mps Run MPS tests (subset of trunk) label Sep 7, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135411

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 15a2972 with merge base 7578a0b (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2024

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.


Caused by:

@malfet malfet added topic: improvements topic category release notes: mps Release notes category labels Sep 7, 2024
@malfet malfet marked this pull request as draft September 8, 2024 15:24
@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2024

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Nov 7, 2024
@malfet malfet changed the title Enable unfold_backward on MPS Implement unfold_backward on MPS Nov 12, 2024
@malfet malfet removed the Stale label Nov 12, 2024
@malfet malfet marked this pull request as ready for review November 12, 2024 22:35
Co-authored-by: Manuel Candales <42380156+manuelcandales@users.noreply.github.com>
@malfet
Copy link
Contributor Author

malfet commented Nov 13, 2024

@pytorchbot merge -f "This was mostly green in the past"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
This PR adds native implementation of unfold_backward as metal shader, mostly copy-n-paste of algorithms used in CUDA and CPU implementations, i.e. considering `out = in.unfold(dim, size, step)`, then following holds true:
* `out.shape[dim] == (in.shape[dim] - size) / step + 1`
* `out.shape[-1] == size`
* `out.ndim == in.ndim + 1`
`unfold_backward` Metal kernel  receives `grad_in` and returns `grad_out` such that:
* `grad_in.shape == out.shape`
* `grad_out.shape == in.shape`

For each index in `grad_out` find the elements contributing to it and sum them up. Such algorithm requires no synchronization between threads.
That is `grad_out[...,out_dim_idx,...]` accumulates all values `grad_in[...,in_dim_idx,...,in_last_idx]`, where `in_dim_idx` is range [`(out_dim_idx - size) / step`, `out_dim_idx / step`] clamped to (0, `in_dim_size`) and `in_last_idx` are equal `out_dim_idx - in_dim_idx * step` . Accumulation step is skipped if `in_last_idx` is outside of [0, size] range.

This operator has been requested 16 times on pytorch#77764

Pull Request resolved: pytorch#135411
Approved by: https://github.com/manuelcandales

Co-authored-by: Manuel Candales <42380156+manuelcandales@users.noreply.github.com>
@malfet malfet deleted the malfet-patch-14 branch December 12, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) Merged release notes: mps Release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants