KEMBAR78
[Reland] Added reference tests to ReductionOpInfo by heitorschueroff · Pull Request #64273 · pytorch/pytorch · GitHub
Skip to content

Conversation

@heitorschueroff
Copy link
Contributor

@heitorschueroff heitorschueroff commented Aug 31, 2021

Stack from ghstack:

Reintroduced sample_inputs_prod and constrained the range of values for large reference tests.

This reverts commit e4fd2ab.

Differential Revision: D30672097

This reverts commit e4fd2ab.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 31, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit c06c0e6 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang7_asan_test2 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 09 18:35:37 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in
Sep 09 18:35:37     #9 0x55a5e0b678f2 in PyEval_EvalCode /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:731
Sep 09 18:35:37     #10 0x55a5e0bcfcd5 in run_mod /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:1025
Sep 09 18:35:37     #11 0x55a5e0bd1d5d in PyRun_StringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:949
Sep 09 18:35:37     #12 0x55a5e0bd1dbb in PyRun_SimpleStringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:445
Sep 09 18:35:37     #13 0x55a5e0bd2926 in run_command /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:301
Sep 09 18:35:37     #14 0x55a5e0bd2926 in Py_Main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:749
Sep 09 18:35:37     #15 0x55a5e0b0c196 in main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Programs/python.c:69
Sep 09 18:35:37     #16 0x7f472b5d283f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
Sep 09 18:35:37     #17 0x55a5e0b9c33d in _start (/opt/conda/bin/python3.6+0x1a733d)
Sep 09 18:35:37 
Sep 09 18:35:37 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Sep 09 18:35:37 + retcode=1
Sep 09 18:35:37 + set -e
Sep 09 18:35:37 + return 1
Sep 09 18:35:37 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 == *-NO_AVX-* ]]
Sep 09 18:35:37 + [[ '' == \n\o\g\p\u\_\N\O\_\A\V\X ]]
Sep 09 18:35:37 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 == *-NO_AVX2-* ]]
Sep 09 18:35:37 + [[ '' == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
Sep 09 18:35:37 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 == *-NO_AVX512-* ]]
Sep 09 18:35:37 + [[ '' == \n\o\g\p\u\_\N\O\_\A\V\X\5\1\2 ]]
Sep 09 18:35:37 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 != *coverage* ]]

See CircleCI build pytorch_windows_vs2019_py38_cuda10.1_on_cpu_test1 (2/2)

Step: "Install Cuda" (full log | diagnosis details | 🔁 rerun)

ls: cannot access '/c/Program Files/NVIDIA GPU ...UDA/v10.1/bin/nvcc.exe': No such file or directory

Folders: 11
Files: 130
Size:       907512
Compressed: 111420
+ mkdir -p 'C:/Program Files/NVIDIA Corporation/NvToolsExt'
+ cp -r NvToolsExt/bin NvToolsExt/docs NvToolsExt/include NvToolsExt/lib NvToolsExt/samples 'C:/Program Files/NVIDIA Corporation/NvToolsExt/'
+ export 'NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\'
+ NVTOOLSEXT_PATH='C:\Program Files\NVIDIA Corporation\NvToolsExt\'
+ ls '/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe'
ls: cannot access '/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe': No such file or directory
+ echo 'CUDA installation failed'
CUDA installation failed
+ mkdir -p /c/w/build-results
+ 7z a 'c:\w\build-results\cuda_install_logs.7z' cuda_install_logs

7-Zip 19.00 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2019-02-21

Scanning the drive:
1 folder, 2 files, 3721950 bytes (3635 KiB)


1 job timed out:

  • pytorch_linux_xenial_py3_clang7_asan_test2

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

heitorschueroff added a commit that referenced this pull request Aug 31, 2021
This reverts commit e4fd2ab.

ghstack-source-id: d05ab79
Pull Request resolved: #64273
@heitorschueroff
Copy link
Contributor Author

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Reintroduced sample_inputs_prod and constrained the range of values for large reference tests.

This reverts commit e4fd2ab.

Differential Revision: [D30672097](https://our.internmc.facebook.com/intern/diff/D30672097)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this pull request Sep 2, 2021
This reverts commit e4fd2ab.

ghstack-source-id: cf384f3
Pull Request resolved: #64273
@heitorschueroff
Copy link
Contributor Author

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

mruberry commented Sep 6, 2021

fyi paralleltbb failure is unrelated and now resolved

@mruberry
Copy link
Collaborator

mruberry commented Sep 6, 2021

@albanD you're shepherding this, right?

@albanD
Copy link
Collaborator

albanD commented Sep 6, 2021

@mruberry yes I can take a look.

Reintroduced sample_inputs_prod and constrained the range of values for large reference tests.

This reverts commit e4fd2ab.

Differential Revision: [D30672097](https://our.internmc.facebook.com/intern/diff/D30672097)

[ghstack-poisoned]
@heitorschueroff
Copy link
Contributor Author

@albanD This should be ready for another review

@heitorschueroff
Copy link
Contributor Author

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Reintroduced sample_inputs_prod and constrained the range of values for large reference tests.

This reverts commit e4fd2ab.

Differential Revision: [D30672097](https://our.internmc.facebook.com/intern/diff/D30672097)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this pull request Sep 7, 2021
This reverts commit e4fd2ab.

ghstack-source-id: 21b5cfe
Pull Request resolved: #64273
@heitorschueroff
Copy link
Contributor Author

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy with all the tolerance overrides but I think the PR is good :)

elif isinstance(x, dict):
return {k: to_numpy(v) for k, v in x.items()}
elif isinstance(x, torch.dtype):
return torch.empty(0, dtype=x).numpy().dtype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting move :D
Maybe we can re-use:

numpy_to_torch_dtype_dict = {
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy with all the tolerance overrides but I think the PR is good :)

Could you elaborate more on your concerns about the tolerance overrides? Do you think we should skip the tests instead? Or if it a concern with the numerical instability they expose?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so many functions need to override the default value, maybe the default is not good?
Or we want to run these in double to make sure they are correct (like we do for gradcheck)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is mostly with sum and prod for float16. I changed the decorator to override the tolerance for float16 for all tests. Some tests are specifically testing numerical stability so it's expected we need to override the tolerance, this is also a way to document numerical instability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the large reference tests from float32 to float64

Reintroduced sample_inputs_prod and constrained the range of values for large reference tests.

This reverts commit e4fd2ab.

Differential Revision: [D30672097](https://our.internmc.facebook.com/intern/diff/D30672097)

[ghstack-poisoned]
Reintroduced sample_inputs_prod and constrained the range of values for large reference tests.

This reverts commit e4fd2ab.

Differential Revision: [D30672097](https://our.internmc.facebook.com/intern/diff/D30672097)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this pull request Sep 9, 2021
This reverts commit e4fd2ab.

ghstack-source-id: 326b073
Pull Request resolved: #64273
@heitorschueroff
Copy link
Contributor Author

@albanD What else is this PR missing to get accepted?

@heitorschueroff
Copy link
Contributor Author

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Reintroduced sample_inputs_prod and constrained the range of values for large reference tests.

This reverts commit e4fd2ab.

Differential Revision: [D30672097](https://our.internmc.facebook.com/intern/diff/D30672097)

[ghstack-poisoned]
heitorschueroff added a commit that referenced this pull request Sep 9, 2021
This reverts commit e4fd2ab.

ghstack-source-id: ace17e9
Pull Request resolved: #64273
@heitorschueroff
Copy link
Contributor Author

@heitorschueroff has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@facebook-github-bot
Copy link
Contributor

@heitorschueroff merged this pull request in 8535418.

@facebook-github-bot facebook-github-bot deleted the gh/heitorschueroff/20/head branch September 16, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants