Gaussian nll loss scalar variance support #138931

michael-diggin · 2024-10-25T19:43:55Z

Adds support for variance being a Tensor or a float in gaussian_nll_loss to avoid a cpu-gpu sync point in the loss function, when the variance is a static tensor like <scalar>*torch.ones_like(input)

linux-foundation-easycla · 2024-10-25T19:43:59Z

The committers listed above are authorized under a signed CLA.

✅ login: michael-diggin / name: Michael Diggin (6d1b76f, 6da0ea9, 04d779a, 85fd00e, 695dbd4)

pytorch-bot · 2024-10-25T19:43:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138931

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ You can merge normally! (1 Unrelated Failure)

As of commit 6da0ea9 with merge base de34f58 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-focal-rocm6.2-py3.10 / test (default, 1, 2, linux.rocm.gpu) (gh) (similar failure)
test_cuda.py::TestCudaMallocAsync::test_memory_usage_in_bytes

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikaylagawarecki

Thanks

mikaylagawarecki · 2024-11-06T23:33:49Z

@pytorchbot merge -r

pytorchmergebot · 2024-11-06T23:35:16Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-11-06T23:35:19Z

Successfully rebased gaussian-nll-loss-scalar-var onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout gaussian-nll-loss-scalar-var && git pull --rebase)

pytorchmergebot · 2024-11-06T23:36:33Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-06T23:36:40Z

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

mikaylagawarecki · 2024-11-08T00:01:37Z

Seems like CI is failing

michael-diggin · 2024-11-08T07:32:04Z

Seems like CI is failing

I think it wasn't liking the fact I had used Tensor | float as that isn't supported on Python 3.9. I've changed it to use Union[Tensor, Float] which should be the exact same.

michael-diggin · 2024-11-11T19:09:28Z

@mikaylagawarecki I've updated the PR which should fix the failing CI, if you would be able to re-trigger that workflow if you're happy with the update?

michael-diggin · 2024-11-12T08:37:31Z

Thanks! I think the only remaining failure is the bc-lint here: https://github.com/pytorch/pytorch/actions/runs/11784651031/job/32831020347?pr=138931#step:2:833
Which I believe is a false positive, as Tensor -> Union[Tensor, Float] is a backwards compatible change.
If you agree we can add the suppress-bc-linter label and I think that should raise them as notifications rather than errors?

michael-diggin · 2024-11-15T18:32:21Z

Thanks! I think the only remaining failure is the bc-lint here: https://github.com/pytorch/pytorch/actions/runs/11784651031/job/32831020347?pr=138931#step:2:833 Which I believe is a false positive, as Tensor -> Union[Tensor, Float] is a backwards compatible change. If you agree we can add the suppress-bc-linter label and I think that should raise them as notifications rather than errors?

Hi @mikaylagawarecki, would you be able to take a look at this comment if you get a chance? I believe adding that label (following the docs here) and rerunning the bc-lint check should fix the last remaining failing check. Thanks! 🙏

mikaylagawarecki · 2024-11-15T23:08:45Z

@pytorchbot merge -r

pytorchmergebot · 2024-11-15T23:10:13Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-11-15T23:10:16Z

Successfully rebased gaussian-nll-loss-scalar-var onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout gaussian-nll-loss-scalar-var && git pull --rebase)

pytorchmergebot · 2024-11-15T23:11:31Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-15T23:11:40Z

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

michael-diggin · 2024-11-20T18:30:10Z

Hi @mikaylagawarecki - sorry to bother you again (and hopefully last time!).
The one failing check looks like an unrelated flake based on the pytorch-bot comment here. Is it safe to ignore it, or can we rerun the failed test?

mikaylagawarecki · 2024-11-21T18:12:00Z

@pytorchbot merge

pytorchmergebot · 2024-11-21T18:13:47Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes pytorch#138747 Adds support for `variance` being a Tensor or a float in `gaussian_nll_loss` to avoid a cpu-gpu sync point in the loss function, when the variance is a static tensor like `<scalar>*torch.ones_like(input)` Pull Request resolved: pytorch#138931 Approved by: https://github.com/mikaylagawarecki

michael-diggin requested review from albanD, jbschlosser and mikaylagawarecki as code owners October 25, 2024 19:43

pytorch-bot bot added the release notes: nn release notes category label Oct 25, 2024

pytorchbot added the open source label Oct 25, 2024

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 28, 2024

mikaylagawarecki approved these changes Nov 6, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 6, 2024

mikaylagawarecki added topic: not user facing topic category and removed ciflow/trunk Trigger trunk jobs on your pull request labels Nov 6, 2024

pytorchmergebot force-pushed the gaussian-nll-loss-scalar-var branch from 7e35745 to 50ba45c Compare November 6, 2024 23:35

pytorchmergebot added the merging label Nov 6, 2024

pytorchmergebot removed the merging label Nov 6, 2024

mikaylagawarecki added the suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) label Nov 15, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 15, 2024

Let gaussian nll loss accept scalar variance

695dbd4

michael-diggin added 4 commits November 15, 2024 23:10

update .pyi.in

85fd00e

linting

04d779a

use union for py 3.9 compatibilty

6d1b76f

also update .pyi.in

6da0ea9

pytorchmergebot force-pushed the gaussian-nll-loss-scalar-var branch from 7c7cc82 to 6da0ea9 Compare November 15, 2024 23:10

pytorchmergebot added the merging label Nov 15, 2024

pytorchmergebot removed the merging label Nov 15, 2024

pytorchmergebot added the merging label Nov 21, 2024

pytorchmergebot added the Merged label Nov 21, 2024

pytorchmergebot closed this in 723498a Nov 21, 2024

pytorchmergebot removed the merging label Nov 21, 2024

michael-diggin deleted the gaussian-nll-loss-scalar-var branch December 4, 2024 07:52

Gaussian nll loss scalar variance support #138931

Gaussian nll loss scalar variance support #138931

Uh oh!

Conversation

michael-diggin commented Oct 25, 2024

Uh oh!

linux-foundation-easycla bot commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138931

❗ 1 Active SEVs

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

mikaylagawarecki left a comment

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki commented Nov 6, 2024

Uh oh!

pytorchmergebot commented Nov 6, 2024

Uh oh!

pytorchmergebot commented Nov 6, 2024

Uh oh!

pytorchmergebot commented Nov 6, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 6, 2024

Merge failed

Uh oh!

mikaylagawarecki commented Nov 8, 2024

Uh oh!

michael-diggin commented Nov 8, 2024

Uh oh!

michael-diggin commented Nov 11, 2024

Uh oh!

michael-diggin commented Nov 12, 2024

Uh oh!

michael-diggin commented Nov 15, 2024

Uh oh!

mikaylagawarecki commented Nov 15, 2024

Uh oh!

pytorchmergebot commented Nov 15, 2024

Uh oh!

pytorchmergebot commented Nov 15, 2024

Uh oh!

pytorchmergebot commented Nov 15, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 15, 2024

Merge failed

Uh oh!

michael-diggin commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikaylagawarecki commented Nov 21, 2024

Uh oh!

pytorchmergebot commented Nov 21, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

linux-foundation-easycla bot commented Oct 25, 2024 •

edited

Loading

pytorch-bot bot commented Oct 25, 2024 •

edited

Loading

michael-diggin commented Nov 20, 2024 •

edited

Loading