-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Closed
Labels
ci: sevcritical failure affecting PyTorch CIcritical failure affecting PyTorch CIci: sev-infra.thirdpartylabels a Ci Sev that is caused by infra - trigger of the issue is a thirdpartylabels a Ci Sev that is caused by infra - trigger of the issue is a thirdparty
Description
Current Status
Recovered
Error looks like
2023-09-04T13:50:49.2687734Z Download action repository 'pytorch/test-infra@main' (SHA:b7c64078de18cc53be1628e0756896165482c35b)
2023-09-04T13:50:49.5711762Z Download action repository 'pytorch/pytorch@main' (SHA:7e878c9d10e134deb61d2104243e14ef1f3ac291)
2023-09-04T13:52:29.5790374Z ##[warning]Failed to download action 'https://api.github.com/repos/pytorch/pytorch/tarball/7e878c9d10e134deb61d2104243e14ef1f3ac291'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
2023-09-04T13:52:29.5799344Z ##[warning]Back off 15.866 seconds before retry.
2023-09-04T13:54:25.4572998Z ##[warning]Failed to download action 'https://api.github.com/repos/pytorch/pytorch/tarball/7e878c9d10e134deb61d2104243e14ef1f3ac291'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
2023-09-04T13:54:25.4574279Z ##[warning]Back off 17.89 seconds before retry.
2023-09-04T13:56:23.3514907Z ##[error]The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
Incident timeline (all times pacific)
04/09 18:50 CEST - Incident detected
05/09 04:28 UTC - GHA incident closed at github side
05/09 05:00 UTC - We experienced recover on our CI jobs, starting to gatekeep PRs
05/09 05:30 UTC - Removing merge blocking
User impact
Most CI jobs are failing with timeout reaching GH API
Root cause
Investigating if related to GH API incident: https://www.githubstatus.com/incidents/76xp2jd3px64
Mitigation
How did we mitigate the issue?
Prevention/followups
How do we prevent issues like this in the future?
Metadata
Metadata
Assignees
Labels
ci: sevcritical failure affecting PyTorch CIcritical failure affecting PyTorch CIci: sev-infra.thirdpartylabels a Ci Sev that is caused by infra - trigger of the issue is a thirdpartylabels a Ci Sev that is caused by infra - trigger of the issue is a thirdparty