KEMBAR78
enabled AT_USE_JITERATOR() for `tan` and `tanh` kernels. by parth-desai · Pull Request #102427 · pytorch/pytorch · GitHub
Skip to content

Conversation

@parth-desai
Copy link
Contributor

@parth-desai parth-desai commented May 27, 2023

This MR fixes the test failures for jiterator implemenation for tan and tanh Unary kernels as mentioned in #100842.

The failures were fixed by adjusting tolerances but some failures were in test_unary_ufuncs.py required adjusting input values as well. Since the jiterator kernels are using libstdc++, the supported input range is smaller than thrust implementation.

@pytorch-bot
Copy link

pytorch-bot bot commented May 27, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102427

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 8b29a12 with merge base c323944 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: foreach_frontend release notes category label May 27, 2023
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 27, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@lezcano lezcano requested a review from peterbell10 May 28, 2023 17:56
@peterbell10 peterbell10 added the module: testing Issues related to the torch.testing module (not tests) label May 29, 2023
@parth-desai parth-desai marked this pull request as ready for review May 29, 2023 23:15
@parth-desai
Copy link
Contributor Author

@mruberry @peterbell10 @ngimel Can you let me know if i need to format my code or update documentation anywhere for this change?

Copy link
Collaborator

@peterbell10 peterbell10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failures look real, e.g.

test_unary_ufuncs.py::TestUnaryUfuncsCUDA::test_reference_numerics_extremal_asinh_cuda_complex64 - AssertionError: Tensor-likes are not close!

@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 31, 2023
@parth-desai
Copy link
Contributor Author

@peterbell10 Could you take a look at this again?

@peterbell10
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/102427/head returned non-zero exit code 1

Rebasing (1/10)
Rebasing (2/10)
Rebasing (3/10)
Rebasing (4/10)
Auto-merging test/test_foreach.py
Auto-merging test/test_unary_ufuncs.py
Auto-merging torch/testing/_internal/common_methods_invocations.py
CONFLICT (content): Merge conflict in torch/testing/_internal/common_methods_invocations.py
error: could not apply a49f5db04f0... Added tolerance decorator to opinfo
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply a49f5db04f0... Added tolerance decorator to opinfo

Raised by https://github.com/pytorch/pytorch/actions/runs/5280395913

@peterbell10
Copy link
Collaborator

@parth-desai would you mind resolving the merge conflicts? I'm hoping a rebase would resolve the CI issues.

@parth-desai parth-desai force-pushed the fix-jitterator-unary-op branch from 0e65d90 to dd03d66 Compare June 18, 2023 23:57
@parth-desai
Copy link
Contributor Author

@parth-desai would you mind resolving the merge conflicts? I'm hoping a rebase would resolve the CI issues.

@peterbell10 I merged the latest main to my branch. Please review again.

@peterbell10 peterbell10 added the keep-going Don't stop on first failure, keep running tests until the end label Jun 19, 2023
@peterbell10
Copy link
Collaborator

This test failure looks related:

2023-06-19T15:19:25.1218162Z =================================== FAILURES ===================================
2023-06-19T15:19:25.1218410Z _ TestUnaryUfuncsCUDA.test_reference_numerics_extremal_nn_functional_tanhshrink_cuda_bfloat16 _
2023-06-19T15:19:25.1218592Z Traceback (most recent call last):
2023-06-19T15:19:25.1218848Z   File "/var/lib/jenkins/workspace/test/test_unary_ufuncs.py", line 315, in test_reference_numerics_extremal
2023-06-19T15:19:25.1219016Z     self._test_reference_numerics(dtype, op, tensors)
2023-06-19T15:19:25.1219246Z   File "/var/lib/jenkins/workspace/test/test_unary_ufuncs.py", line 260, in _test_reference_numerics
2023-06-19T15:19:25.1219375Z     _helper_reference_numerics(
2023-06-19T15:19:25.1219613Z   File "/var/lib/jenkins/workspace/test/test_unary_ufuncs.py", line 206, in _helper_reference_numerics
2023-06-19T15:19:25.1219749Z     self.assertEqualHelper(
2023-06-19T15:19:25.1219974Z   File "/var/lib/jenkins/workspace/test/test_unary_ufuncs.py", line 171, in assertEqualHelper
2023-06-19T15:19:25.1220077Z     self.assertEqual(
2023-06-19T15:19:25.1220472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3099, in assertEqual
2023-06-19T15:19:25.1220610Z     raise error_metas[0].to_error(
2023-06-19T15:19:25.1220830Z AssertionError: Tensor-likes are not close!
2023-06-19T15:19:25.1220854Z 
2023-06-19T15:19:25.1220993Z Mismatched elements: 565 / 16384 (3.4%)

It's strange though, because this PR should only effect complex dtypes, but the test is for bfloat16.

@parth-desai
Copy link
Contributor Author

@peterbell10 I couldn't reproduce the error on my local machine. I even tried the CI docker image as well as build scripts. Any suggestions on fixing this test?

@parth-desai parth-desai force-pushed the fix-jitterator-unary-op branch from dd03d66 to bc8f744 Compare June 21, 2023 16:58
@peterbell10
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #102427, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

Copy link
Collaborator

@peterbell10 peterbell10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, main often has some test failures from PRs that need to be reverted. You're best rebasing against viable/strict, which should be the most recent commit where CI passed. That should improve the signal-to-noise on CI.

I've set the keep-going flag so you should see all the tests failures, and it looks like there are still quite a lot of relevant CI failures for complex dtypes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove these comments as well.

Copy link
Collaborator

@peterbell10 peterbell10 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bfloat16 failure is due to this toleranceOverride taking precedence over the one above for bfloat16. I can get the test to pass locally by copying bfloat16's tol term from above here.

@parth-desai parth-desai force-pushed the fix-jitterator-unary-op branch from bc8f744 to 5b1e327 Compare June 25, 2023 05:29
@parth-desai
Copy link
Contributor Author

@peterbell10 Thanks for the pointer. I didn't realize my environment had issues and my build was not running cuda tests. Anyways I have fixed the bfloat16 issue and rebased with viable/strict. Could you run the CI again to see if there are any further failures?

@parth-desai parth-desai force-pushed the fix-jitterator-unary-op branch from f77ead2 to 6884e84 Compare August 30, 2023 03:19
@parth-desai
Copy link
Contributor Author

@peterbell10 The failures now seem to be CI infrastructure failures. Can you take a look?

@huydhn
Copy link
Contributor

huydhn commented Sep 1, 2023

@pytorchbot drci

@huydhn
Copy link
Contributor

huydhn commented Sep 1, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@parth-desai
Copy link
Contributor Author

@huydhn Any suggestions on how to fix these?

@huydhn
Copy link
Contributor

huydhn commented Sep 1, 2023

@pytorchbot merge -f 'Bypass unrelated issues as Dr.CI has classified them as flaky'

Sorry I was about to force merge your change but got side tracked by other tasks

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor

huydhn commented Sep 1, 2023

And thank you for your first contribution to PyTorch (IIRC) :)

@parth-desai
Copy link
Contributor Author

thank you @huydhn @peterbell10 for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: testing Issues related to the torch.testing module (not tests) open source release notes: foreach_frontend release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants