KEMBAR78
[EZ] Do not checkout builder for Linux builds by malfet · Pull Request #142282 · pytorch/pytorch · GitHub
Skip to content

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Dec 7, 2024

Stack from ghstack (oldest at bottom):

All logic should have been migrated to .ci/manywheel folder from builder repo a while back

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142282

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 38 New Failures

As of commit 4898673 with merge base 6e203ae (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet added a commit that referenced this pull request Dec 7, 2024
All logic should have been migrated to .ci/manywheel folder from builder repo a while back

ghstack-source-id: 4c205c8
Pull Request resolved: #142282
@malfet malfet added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Dec 7, 2024
[ghstack-poisoned]
malfet added a commit that referenced this pull request Dec 9, 2024
All logic should have been migrated to .ci/manywheel folder from builder repo a while back

ghstack-source-id: 0c45d06
Pull Request resolved: #142282
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@atalman
Copy link
Contributor

atalman commented Dec 9, 2024

[ghstack-poisoned]
malfet added a commit that referenced this pull request Dec 9, 2024
All logic should have been migrated to .ci/manywheel folder from builder repo a while back

ghstack-source-id: 7294d98
Pull Request resolved: #142282
@malfet
Copy link
Contributor Author

malfet commented Dec 9, 2024

@malfet Please rebase and also bring, check_binary.sh https://github.com/pytorch/pytorch/blob/main/.circleci/scripts/binary_linux_test.sh#L93

It's not used from build step, and I plan to keep test as is for now

@atalman
Copy link
Contributor

atalman commented Dec 9, 2024

@malfet builder is still used in test step. Why not eliminate dependency on builder completely ?

I see we are still getting builder here: https://github.com/pytorch/pytorch/actions/runs/12240753927/job/34146607849?pr=142282#step:11:40

@malfet
Copy link
Contributor Author

malfet commented Dec 9, 2024

@malfet builder is still used in test step. Why not eliminate dependency on builder completely ?

Yes, this is expected, I can do it in the followup PRs, this PR is specifically about build

@malfet
Copy link
Contributor Author

malfet commented Dec 9, 2024

@pytorchbot merge -f "Lint wheel builds are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jeffdaily
Copy link
Collaborator

This broke rocm nightly builds.

@huydhn
Copy link
Contributor

huydhn commented Dec 10, 2024

This broke rocm nightly builds.

I was looking into this during my investigation of the broken check workflow lint job in trunk. The fix is here #142476, this is a landrace between #142294 abd #142282

@atalman
Copy link
Contributor

atalman commented Dec 10, 2024

I believe this should take care of rocm breakage due to missing check_binary.sh script: #142482

pytorchmergebot pushed a commit that referenced this pull request Dec 10, 2024
Fixes #142485

The workflow check lint job timed out in trunk, i.e. https://github.com/pytorch/pytorch/actions/runs/12261226178/job/34207762939, and here was what happened:

1. #142294 landed yesterday to build ROCm on 3.13, but the PR had a landrace with #142282 in the generated workflow file
2. The trunk lint check caught that in https://github.com/pytorch/pytorch/blob/main/.github/scripts/report_git_status.sh#L2
3. However, the script also attempted to print the difference with `git diff .github/workflows`.  This command was the one that stuck because `git diff` uses page by default and requires a prompt to display the next page ¯\_(ツ)_/¯

It took so long to debug this because a timeout Nova GHA doesn't print any progress.  I'll create an issue for this.

Bonus:

I also fix the broken print from test tool lint job that confuses GitHub https://github.com/pytorch/pytorch/actions/runs/12261226178 with an annotation failure `Credentials could not be loaded, please check your action inputs`

Pull Request resolved: #142476
Approved by: https://github.com/wdvr
mori360 pushed a commit to mori360/pytorch that referenced this pull request Dec 11, 2024
All logic should have been migrated to .ci/manywheel folder from builder repo a while back
Pull Request resolved: pytorch#142282
Approved by: https://github.com/atalman
ghstack dependencies: pytorch#142276, pytorch#142277, pytorch#142382
mori360 pushed a commit to mori360/pytorch that referenced this pull request Dec 11, 2024
Fixes pytorch#142485

The workflow check lint job timed out in trunk, i.e. https://github.com/pytorch/pytorch/actions/runs/12261226178/job/34207762939, and here was what happened:

1. pytorch#142294 landed yesterday to build ROCm on 3.13, but the PR had a landrace with pytorch#142282 in the generated workflow file
2. The trunk lint check caught that in https://github.com/pytorch/pytorch/blob/main/.github/scripts/report_git_status.sh#L2
3. However, the script also attempted to print the difference with `git diff .github/workflows`.  This command was the one that stuck because `git diff` uses page by default and requires a prompt to display the next page ¯\_(ツ)_/¯

It took so long to debug this because a timeout Nova GHA doesn't print any progress.  I'll create an issue for this.

Bonus:

I also fix the broken print from test tool lint job that confuses GitHub https://github.com/pytorch/pytorch/actions/runs/12261226178 with an annotation failure `Credentials could not be loaded, please check your action inputs`

Pull Request resolved: pytorch#142476
Approved by: https://github.com/wdvr
pytorchmergebot pushed a commit that referenced this pull request Dec 12, 2024
Follow Up after: #142282

Remove Checkout pytorch/builder for Linux Binary Builds
I believe we where not using builder already. Hence remove this checkout.
We should be using scripts from this folder:
```
/pytorch/.ci/${{ inputs.PACKAGE_TYPE }}/build.sh
```

TODO: Will followup with removing BUILDER_ROOT everywhere from PyTorch repo
Pull Request resolved: #143125
Approved by: https://github.com/kit1980
kit1980 added a commit that referenced this pull request Dec 12, 2024
…43131)

Follow Up after: #142282

Remove Checkout pytorch/builder for Linux Binary Builds
I believe we where not using builder already. Hence remove this checkout.
We should be using scripts from this folder:
```
/pytorch/.ci/${{ inputs.PACKAGE_TYPE }}/build.sh
```

TODO: Will followup with removing BUILDER_ROOT everywhere from PyTorch repo
Pull Request resolved: #143125
Approved by: https://github.com/kit1980

Co-authored-by: atalman <atalman@fb.com>
bluenote10 pushed a commit to bluenote10/pytorch that referenced this pull request Dec 14, 2024
All logic should have been migrated to .ci/manywheel folder from builder repo a while back
Pull Request resolved: pytorch#142282
Approved by: https://github.com/atalman
ghstack dependencies: pytorch#142276, pytorch#142277, pytorch#142382
bluenote10 pushed a commit to bluenote10/pytorch that referenced this pull request Dec 14, 2024
Fixes pytorch#142485

The workflow check lint job timed out in trunk, i.e. https://github.com/pytorch/pytorch/actions/runs/12261226178/job/34207762939, and here was what happened:

1. pytorch#142294 landed yesterday to build ROCm on 3.13, but the PR had a landrace with pytorch#142282 in the generated workflow file
2. The trunk lint check caught that in https://github.com/pytorch/pytorch/blob/main/.github/scripts/report_git_status.sh#L2
3. However, the script also attempted to print the difference with `git diff .github/workflows`.  This command was the one that stuck because `git diff` uses page by default and requires a prompt to display the next page ¯\_(ツ)_/¯

It took so long to debug this because a timeout Nova GHA doesn't print any progress.  I'll create an issue for this.

Bonus:

I also fix the broken print from test tool lint job that confuses GitHub https://github.com/pytorch/pytorch/actions/runs/12261226178 with an annotation failure `Credentials could not be loaded, please check your action inputs`

Pull Request resolved: pytorch#142476
Approved by: https://github.com/wdvr
bluenote10 pushed a commit to bluenote10/pytorch that referenced this pull request Dec 14, 2024
Follow Up after: pytorch#142282

Remove Checkout pytorch/builder for Linux Binary Builds
I believe we where not using builder already. Hence remove this checkout.
We should be using scripts from this folder:
```
/pytorch/.ci/${{ inputs.PACKAGE_TYPE }}/build.sh
```

TODO: Will followup with removing BUILDER_ROOT everywhere from PyTorch repo
Pull Request resolved: pytorch#143125
Approved by: https://github.com/kit1980
@github-actions github-actions bot deleted the gh/malfet/78/head branch January 11, 2025 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants