fix mean_out: op does not update parameter out for BF16/FP16 dtype on CPU #135174

DavidGu-Datong · 2024-09-05T02:36:06Z

For BF16/FP16, when a tensor is specified in out parameter of mean, the mean kernel should use its storage for output, but that doesn't happen, since an at::to in the current code causes storage to be allocated again, but the out parameter tensor's storage doesn't get updated, resulting in it not holding the mean output.

cc @albanD

pytorch-bot · 2024-09-05T02:36:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135174

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1a4ee95 with merge base cf31724 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-09-05T02:36:10Z

The committers listed above are authorized under a signed CLA.

✅ login: DavidGu-Datong / name: DavidGu (1a4ee95)

DavidGu-Datong · 2024-09-05T05:46:01Z

@pytorchbot label "topic: not user facing

pytorch-bot · 2024-09-05T05:46:03Z

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

DavidGu-Datong · 2024-09-05T05:46:49Z

@pytorchbot label "topic: not user facing"

soulitzer

Thanks, could you add a test (see test/test_reductions.py)

DavidGu-Datong · 2024-09-06T02:37:27Z

Thanks for review, I have added a test to check whether the out of mean_out op is the alias of the return at (test/test_reductions.py) file.

DavidGu-Datong · 2024-09-10T06:48:14Z

@mruberry Please help me to review the code, thanks.

soulitzer · 2024-09-10T20:57:41Z

aten/src/ATen/native/ReduceOps.cpp

Is it possible to no longer set result_mut and no longer do auto& result_mut = const_cast<Tensor&>(result); above?

I think yes, let me try to test it local.

soulitzer · 2024-09-10T20:58:31Z

test/test_reductions.py

Thanks for adding the test, but maybe also a small check for correctness by comparing with the out-of-place mean op would be good?

Is it looks good? I use allclose to check it with the target.

soulitzer

Thanks!

soulitzer · 2024-09-12T19:30:26Z

@pytorchbot merge

pytorchmergebot · 2024-09-12T19:33:10Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

CaoE · 2024-09-13T07:23:21Z

aten/src/ATen/native/ReduceOps.cpp

+    at::sum_out(result_temp, self, opt_dim, keepdim, sum_out_dtype).div_(dim_prod);
+    // After sum & div, cast result_temp back to BF16 or FP16, if required.
+    if (is_half_type) {
+      result.copy_(result_temp.to(dtype));


Do we need to call result_temp.to(dtype) ? result.copy_(result_temp) should works and save the overhead of result_temp.to(dtype).

Awesome， I just known copy_ do this convert. I have changed it.

~~@CaoE - for accuracy reasons, wouldn't it be better if we have intermediate FP32 sum output that's input for division? Thanks~~

Just noticed that sum_out_dtype already ensures it.

CaoE · 2024-09-13T07:54:30Z

aten/src/ATen/native/ReduceOps.cpp

We'd better add other data types in the testing to ensure that they also return the correct result.

@onlyCPU @dtypes(torch.half, torch.bfloat16, torch.float, torch.double) def test_mean_out_float16_is_alias_of_return(self, dtype, device): a = torch.tensor([[[1.0, 1.0, 1.0, 1.0]], [[2.0, 2.0, 2.0, 2.0]], [[3.0, 3.0, 3.0, 3.0]]], dtype=dtype, device=device) ...

Can we also avoid creating a new tensor when is_half_type is false to avoid the overhead of creating a new tensor ?
For example. We can just use:

if (is_half_type) { auto _result_mut = result.to(sum_out_dtype); at::sum_out(_result_mut, self, opt_dim, keepdim, sum_out_dtype).div_(dim_prod); result.copy_(_result_mut); } else { at::sum_out(result, self, opt_dim, keepdim, sum_out_dtype).div_(dim_prod); }

Thanks for your review. I add it.

Thanks for the fix!

Is there any way we can avoid the copy? If not, can we add a comment on why it's necessary, so that someone reading the code in the future may be able to understand it without going through the history of the file? Thanks!

It cannot be avoided because the promotion needs more storage. Thus, it cannot reuse the storage of input "out" parameter and need to update the result to "out".

I have added some comments about it. It is my first time do contribution to GitHub. Thanks for all you guy's excellent suggestions!

sanchitintel · 2024-09-20T22:35:44Z

@pytorchbot merge

pytorchmergebot · 2024-09-20T22:37:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-20T22:38:02Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 3, 5, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

pytorch-bot · 2024-09-20T22:39:55Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'rebas' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

sanchitintel · 2024-09-20T22:40:11Z

@pytorchbot rebase

pytorchmergebot · 2024-09-20T22:41:40Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

…#134848 Signed-off-by: DavidGu-Datong <datong_gu@163.com>

pytorchmergebot · 2024-09-20T22:41:42Z

Successfully rebased dev/datonggu/1/base onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout dev/datonggu/1/base && git pull --rebase)

sanchitintel · 2024-09-20T22:48:09Z

@pytorchbot merge

pytorchmergebot · 2024-09-20T22:49:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-21T04:48:37Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

DavidGu-Datong · 2024-09-21T14:19:27Z

@pytorchbot merge

pytorchmergebot · 2024-09-21T14:21:19Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchbot added the open source label Sep 5, 2024

DavidGu-Datong closed this Sep 5, 2024

DavidGu-Datong reopened this Sep 5, 2024

pytorch-bot bot added the topic: not user facing topic category label Sep 5, 2024

DavidGu-Datong closed this Sep 5, 2024

DavidGu-Datong reopened this Sep 5, 2024

soulitzer self-requested a review September 6, 2024 02:07

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 6, 2024

soulitzer reviewed Sep 6, 2024

View reviewed changes

DavidGu-Datong force-pushed the dev/datonggu/1/base branch from 3a6ee44 to 464c72c Compare September 6, 2024 02:34

DavidGu-Datong requested a review from mruberry as a code owner September 6, 2024 02:34

soulitzer reviewed Sep 10, 2024

View reviewed changes

DavidGu-Datong force-pushed the dev/datonggu/1/base branch from 464c72c to 2cd4f06 Compare September 11, 2024 02:56

soulitzer approved these changes Sep 12, 2024

View reviewed changes

soulitzer added topic: bug fixes topic category module: python frontend For issues relating to PyTorch's Python frontend and removed topic: not user facing topic category labels Sep 12, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 12, 2024

pytorchmergebot added the merging label Sep 12, 2024

pytorchmergebot removed the merging label Sep 12, 2024

CaoE reviewed Sep 13, 2024

View reviewed changes

DavidGu-Datong force-pushed the dev/datonggu/1/base branch from 2cd4f06 to 66c4558 Compare September 13, 2024 07:43

CaoE reviewed Sep 13, 2024

View reviewed changes

DavidGu-Datong force-pushed the dev/datonggu/1/base branch from 66c4558 to 5b356f4 Compare September 13, 2024 08:00

sanchitintel changed the title ~~fix mean_out op does not update value of given parameter out. #134848~~ fix mean_out: op does not update parameter out for BF16/FP16 dtype Sep 14, 2024

DavidGu-Datong force-pushed the dev/datonggu/1/base branch 3 times, most recently from d40cd70 to 336eccc Compare September 20, 2024 12:32

pytorchmergebot removed the merging label Sep 20, 2024

fix mean_out op does not update value of given parameter out. pytorch…

1a4ee95

…#134848 Signed-off-by: DavidGu-Datong <datong_gu@163.com>

pytorchmergebot force-pushed the dev/datonggu/1/base branch from 336eccc to 1a4ee95 Compare September 20, 2024 22:41

pytorchmergebot added the merging label Sep 20, 2024

sanchitintel changed the title ~~fix mean_out: op does not update parameter out for BF16/FP16 dtype~~ fix mean_out: op does not update parameter out for BF16/FP16 dtype on CPU Sep 20, 2024

pytorchmergebot added the Merged label Sep 21, 2024

pytorchmergebot closed this in fb4670a Sep 21, 2024

pytorchmergebot removed the merging label Sep 21, 2024

CaoE mentioned this pull request Dec 10, 2024

[ATEN][OP]mean_out op does not update value of given parameter out. #134848

Closed

fix mean_out: op does not update parameter out for BF16/FP16 dtype on CPU #135174

fix mean_out: op does not update parameter out for BF16/FP16 dtype on CPU #135174

Uh oh!

Conversation

DavidGu-Datong commented Sep 5, 2024 • edited by sanchitintel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135174

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DavidGu-Datong commented Sep 5, 2024

Uh oh!

pytorch-bot bot commented Sep 5, 2024

Uh oh!

DavidGu-Datong commented Sep 5, 2024

Uh oh!

soulitzer left a comment

Choose a reason for hiding this comment

Uh oh!

DavidGu-Datong commented Sep 6, 2024

Uh oh!

DavidGu-Datong commented Sep 10, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer left a comment

Choose a reason for hiding this comment

Uh oh!

soulitzer commented Sep 12, 2024

Uh oh!

pytorchmergebot commented Sep 12, 2024

Merge failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchitintel Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CaoE Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchitintel Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchitintel commented Sep 20, 2024

Uh oh!

pytorchmergebot commented Sep 20, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 20, 2024

Merge failed

Uh oh!

pytorch-bot bot commented Sep 20, 2024

Uh oh!

sanchitintel commented Sep 20, 2024

Uh oh!

pytorchmergebot commented Sep 20, 2024

DavidGu-Datong commented Sep 5, 2024 •

edited by sanchitintel

Loading

pytorch-bot bot commented Sep 5, 2024 •

edited

Loading

linux-foundation-easycla bot commented Sep 5, 2024 •

edited

Loading

sanchitintel Sep 13, 2024 •

edited

Loading

CaoE Sep 13, 2024 •

edited

Loading

sanchitintel Sep 13, 2024 •

edited

Loading