KEMBAR78
GH API issues are preventing multiple ci jobs to start. · Issue #108524 · pytorch/pytorch · GitHub
Skip to content

GH API issues are preventing multiple ci jobs to start. #108524

@jeanschmidt

Description

@jeanschmidt

Current Status

Recovered

Error looks like

2023-09-04T13:50:49.2687734Z Download action repository 'pytorch/test-infra@main' (SHA:b7c64078de18cc53be1628e0756896165482c35b)
2023-09-04T13:50:49.5711762Z Download action repository 'pytorch/pytorch@main' (SHA:7e878c9d10e134deb61d2104243e14ef1f3ac291)
2023-09-04T13:52:29.5790374Z ##[warning]Failed to download action 'https://api.github.com/repos/pytorch/pytorch/tarball/7e878c9d10e134deb61d2104243e14ef1f3ac291'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
2023-09-04T13:52:29.5799344Z ##[warning]Back off 15.866 seconds before retry.
2023-09-04T13:54:25.4572998Z ##[warning]Failed to download action 'https://api.github.com/repos/pytorch/pytorch/tarball/7e878c9d10e134deb61d2104243e14ef1f3ac291'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
2023-09-04T13:54:25.4574279Z ##[warning]Back off 17.89 seconds before retry.
2023-09-04T13:56:23.3514907Z ##[error]The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.

Incident timeline (all times pacific)

04/09 18:50 CEST - Incident detected
05/09 04:28 UTC - GHA incident closed at github side
05/09 05:00 UTC - We experienced recover on our CI jobs, starting to gatekeep PRs
05/09 05:30 UTC - Removing merge blocking

User impact

Most CI jobs are failing with timeout reaching GH API

Root cause

Investigating if related to GH API incident: https://www.githubstatus.com/incidents/76xp2jd3px64

Mitigation

How did we mitigate the issue?

Prevention/followups

How do we prevent issues like this in the future?

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci: sevcritical failure affecting PyTorch CIci: sev-infra.thirdpartylabels a Ci Sev that is caused by infra - trigger of the issue is a thirdparty

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions