-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Open
Labels
module: flaky-testsProblem is a flaky test in CIProblem is a flaky test in CImodule: rocmAMD GPU support for PytorchAMD GPU support for PytorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Bug
test_events_wait under pytorch-linux-bionic-rocm4.2-py3.6 seems be a flaky test based on my discussion with @mruberry.
To Reproduce
Steps to reproduce the behavior:
- This issue is occasionally reproduced when
test_events_waitis ran on Jenkins.
Expected behavior
The test should consistently pass or fail.
Environment
The test is ran with Jenkins in the automated test environment.
Additional context
Here is an except of the console log. An example of the full log is here.
15:45:13 FAIL [0.093s]: test_events_wait (__main__.TestCuda)
15:45:13 ----------------------------------------------------------------------
15:45:13 Traceback (most recent call last):
15:45:13 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1071, in wrapper
15:45:13 method(*args, **kwargs)
15:45:13 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1071, in wrapper
15:45:13 method(*args, **kwargs)
15:45:13 File "test_cuda.py", line 1168, in test_events_wait
15:45:13 self.assertTrue(s0.query())
15:45:13 AssertionError: False is not true
15:45:13
15:45:14 ----------------------------------------------------------------------
15:45:14 Ran 164 tests in 43.224s
15:45:14
15:45:14 FAILED (failures=1, skipped=22)
15:45:14
15:45:14 Generating XML reports...
15:45:14 Generated XML report: test-reports/python-unittest/test_cuda/TEST-TestCuda-20210802154430.xml
15:45:14 Generated XML report: test-reports/python-unittest/test_cuda/TEST-TestCudaComm-20210802154430.xml
15:45:16 Traceback (most recent call last):
15:45:16 File "test/run_test.py", line 1092, in <module>
15:45:16 main()
15:45:16 File "test/run_test.py", line 1071, in main
15:45:16 raise RuntimeError(err_message)
15:45:16 RuntimeError: test_cuda failed!
15:45:17
15:45:17 real 10m30.614s
15:45:17 user 12m59.565s
15:45:17 sys 6m10.787s
15:45:17 + cleanup
15:45:17 + retcode=1
15:45:17 + set +x
15:45:17 =================== sccache compilation log ===================
15:45:17 =========== If your build fails, please take a look at the log above for possible reasons ===========
15:45:17 Compile requests 0
15:45:17 Compile requests executed 0
15:45:17 Cache hits 0
15:45:17 Cache misses 0
15:45:17 Cache timeouts 0
15:45:17 Cache read errors 0
15:45:17 Forced recaches 0
15:45:17 Cache write errors 0
15:45:17 Compilation failures 0
15:45:17 Cache errors 0
15:45:17 Non-cacheable compilations 0
15:45:17 Non-cacheable calls 0
15:45:17 Non-compilation calls 0
15:45:17 Unsupported compiler calls 0
15:45:17 Average cache write 0.000 s
15:45:17 Average cache read miss 0.000 s
15:45:17 Average cache read hit 0.000 s
15:45:17 Cache location Local disk: "/var/lib/jenkins/.cache/sccache"
15:45:17 Cache size 0 bytes
15:45:17 Max cache size 10 GiB
15:45:17 Stopping sccache server...
15:45:17 Compile requests 0
15:45:17 Compile requests executed 0
15:45:17 Cache hits 0
15:45:17 Cache misses 0
15:45:17 Cache timeouts 0
15:45:17 Cache read errors 0
15:45:17 Forced recaches 0
15:45:17 Cache write errors 0
15:45:17 Compilation failures 0
15:45:17 Cache errors 0
15:45:17 Non-cacheable compilations 0
15:45:17 Non-cacheable calls 0
15:45:17 Non-compilation calls 0
15:45:17 Unsupported compiler calls 0
15:45:17 Average cache write 0.000 s
15:45:17 Average cache read miss 0.000 s
15:45:17 Average cache read hit 0.000 s
15:45:17 Cache location Local disk: "/var/lib/jenkins/.cache/sccache"
15:45:17 Cache size 0 bytes
15:45:17 Max cache size 10 GiB
15:45:17 + echo 'Stopping container...'
15:45:17 Stopping container...
15:45:17 + '[' -n '' ']'
15:45:17 + docker rm -f 0f5b4d4e01941a4f7246cef9efdc5352fc45a104cac4a94b64b7a0e998da4c0e
15:45:18 Build step 'Execute shell' marked build as failure
15:45:18 [xUnit] [INFO] - Starting to record.
15:45:18 [xUnit] [INFO] - Processing JUnit
15:45:18 [xUnit] [INFO] - [JUnit] - No test report file(s) were found with the pattern 'test-*.xml' relative to '/var/lib/jenkins/workspace/pytorch-builds/pytorch-linux-bionic-rocm4.2-py3.6-test1' for the testing framework 'JUnit'. Did you enter a pattern relative to the correct directory? Did you generate the result report(s) for 'JUnit'?
15:45:18 [xUnit] [WARNING] - No test reports found for the metric 'JUnit' with the resolved pattern 'test-*.xml'.
15:45:18 [xUnit] [INFO] - Skipping the metric tool processing.
15:45:18 [xUnit] [INFO] - There are errors when processing test results.
15:45:18 [xUnit] [INFO] - Skipping tests recording.
15:45:18 [BFA] Scanning build for known causes...
15:45:18 [BFA] No failure causes found
15:45:18 [BFA] Done. 0s
15:45:18 Finished: FAILURE
Metadata
Metadata
Assignees
Labels
module: flaky-testsProblem is a flaky test in CIProblem is a flaky test in CImodule: rocmAMD GPU support for PytorchAMD GPU support for PytorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module