KEMBAR78
Add CUDA 12.6 Linux Builds to Binaries Matrix by tinglvv · Pull Request #138899 · pytorch/pytorch · GitHub
Skip to content

Conversation

@tinglvv
Copy link
Collaborator

@tinglvv tinglvv commented Oct 25, 2024

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138899

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit d55064c with merge base ea0f60e (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding a new flavor? Let's delete something (for example 12.1)

@tinglvv tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Oct 25, 2024
@tinglvv
Copy link
Collaborator Author

tinglvv commented Oct 25, 2024

Removing 12.1 for the nightly binary build per suggestion.
CI/docker images will be deprecated at a later stage.

@tinglvv tinglvv marked this pull request as ready for review October 29, 2024 22:06
@tinglvv tinglvv requested a review from a team as a code owner October 29, 2024 22:06
@tinglvv
Copy link
Collaborator Author

tinglvv commented Oct 29, 2024

Not sure if we should remove 12.1 from LINUX_BINARY_SMOKE_WORKFLOWS, removing temporarily due to the below error

tingl@tingl-mlt pytorch % sh .github/regenerate.sh 
Traceback (most recent call last):
  File "/Users/tingl/Documents/github/pytorch/.github/scripts/generate_ci_workflows.py", line 177, in <module>
    build_configs=generate_binary_build_matrix.generate_wheels_matrix(
  File "/Users/tingl/Documents/github/pytorch/.github/scripts/generate_binary_build_matrix.py", line 471, in generate_wheels_matrix
    "container_image": WHEEL_CONTAINER_IMAGES[arch_version],
KeyError: '12.1'

@tinglvv tinglvv marked this pull request as draft October 29, 2024 22:19
"nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cudnn-cu12==9.1.0.70; platform_system == 'Linux' and platform_machine == 'x86_64' | "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a good time to update CUDNN as well anyway?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's not mix different updates (CUDA and cuDNN) into the same PR, but follow up separately.

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an exception in generate_conda_matrix to not include any 12.6 builds. We don't want to add new conda builds for 12.6

@tinglvv
Copy link
Collaborator Author

tinglvv commented Nov 8, 2024

Error for windows-binary-wheel might be due to #138458 which set 12.4 as default

@tinglvv
Copy link
Collaborator Author

tinglvv commented Nov 8, 2024

linux aarch64 failures should be resolved after correcting build script for aarch64.
windows-conda-build fails with
Run actions/upload-artifact@v4.4.0 Error: No files were found with the provided path: C:\actions-runner\_work\_temp/artifacts. No artifacts will be uploaded

@tinglvv
Copy link
Collaborator Author

tinglvv commented Nov 8, 2024

@pytorchbot rebase

@tinglvv tinglvv marked this pull request as ready for review November 8, 2024 21:55
@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/138899/head returned non-zero exit code 1

Rebasing (1/16)
Rebasing (2/16)
Rebasing (3/16)
Rebasing (4/16)
Rebasing (5/16)
Rebasing (6/16)
Auto-merging .github/workflows/generated-linux-binary-conda-nightly.yml
Auto-merging .github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml
Auto-merging .github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml
Auto-merging .github/workflows/generated-linux-binary-manywheel-main.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-binary-manywheel-main.yml
Auto-merging .github/workflows/generated-linux-binary-manywheel-nightly.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-binary-manywheel-nightly.yml
Auto-merging .github/workflows/generated-windows-binary-conda-nightly.yml
Auto-merging .github/workflows/generated-windows-binary-libtorch-debug-main.yml
Auto-merging .github/workflows/generated-windows-binary-libtorch-debug-nightly.yml
Auto-merging .github/workflows/generated-windows-binary-libtorch-release-main.yml
Auto-merging .github/workflows/generated-windows-binary-libtorch-release-nightly.yml
Auto-merging .github/workflows/generated-windows-binary-wheel-nightly.yml
error: could not apply 991a7019318... remove 12.1 from LINUX_BINARY_SMOKE_WORKFLOWS
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 991a7019318... remove 12.1 from LINUX_BINARY_SMOKE_WORKFLOWS

Raised by https://github.com/pytorch/pytorch/actions/runs/11750112212

tinglvv and others added 4 commits November 8, 2024 15:51
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
atalman added a commit to atalman/pytorch that referenced this pull request Nov 12, 2024
pytorchmergebot pushed a commit that referenced this pull request Nov 12, 2024
Fixes Lint after: #138899
Due to landrace.
Run ``./regenerate.sh``
Pull Request resolved: #140446
Approved by: https://github.com/wdvr, https://github.com/huydhn, https://github.com/seemethere, https://github.com/malfet
q10 added a commit to q10/FBGEMM that referenced this pull request Nov 21, 2024
q10 added a commit to q10/FBGEMM that referenced this pull request Nov 21, 2024
q10 added a commit to q10/FBGEMM that referenced this pull request Nov 21, 2024
- Upgrade gcc version to support newer libstdc++, which is required now that
pytorch/pytorch#141035 has landed

- Deprecate support for CUDA 12.1 and add support for 12.6, per changes in
pytorch/pytorch#138899
q10 added a commit to q10/FBGEMM that referenced this pull request Nov 21, 2024
- Upgrade gcc version to support newer libstdc++, which is required now that
pytorch/pytorch#141035 has landed

- Deprecate support for CUDA 12.1 and add initial support for 12.6, per changes
in pytorch/pytorch#138899
q10 added a commit to q10/FBGEMM that referenced this pull request Nov 21, 2024
- Upgrade gcc version to support newer libstdc++, which is required now that
pytorch/pytorch#141035 has landed

- Deprecate support for CUDA 12.1 and add initial support for 12.6, per changes
in pytorch/pytorch#138899
q10 added a commit to q10/FBGEMM that referenced this pull request Nov 21, 2024
- Upgrade gcc version to support newer libstdc++, which is required now that
pytorch/pytorch#141035 has landed

- Deprecate support for CUDA 12.1 and add initial support for 12.6, per changes
in pytorch/pytorch#138899
facebook-github-bot pushed a commit to pytorch/FBGEMM that referenced this pull request Nov 21, 2024
Summary:
X-link: facebookresearch/FBGEMM#486

- Upgrade gcc version to support newer libstdc++, which is required now that
pytorch/pytorch#141035 has landed

- Deprecate support for CUDA 12.1 and add support for 12.6, per changes in
pytorch/pytorch#138899

Pull Request resolved: #3398

Reviewed By: sryap

Differential Revision: D66277492

Pulled By: q10

fbshipit-source-id: 24817efb5c07c1985ab3beeb1610879edbd81acc
@johnnynunez
Copy link
Contributor

which version finally? 12.6, 12.6.2 or 12.6.3?
In CES 2025, rtx50, rtx mobile and maybe nvidia arm will be released, so it expects at always that this month will be released cuda 12.7 (December) and with the new ones hardware will be released 12.8

@tinglvv
Copy link
Collaborator Author

tinglvv commented Dec 3, 2024

Hi @johnnynunez

which version finally? 12.6, 12.6.2 or 12.6.3? In CES 2025, rtx50, rtx mobile and maybe nvidia arm will be released, so it expects at always that this month will be released cuda 12.7 (December) and with the new ones hardware will be released 12.8

for x86 nightly build, it is 12.6.3 now - #141433. For windows builds, it is 12..6.2 as windows AMI takes time to build and may not make it before 2.6.0 code freeze. cc @atalman

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Related to pytorch#138440

Issue tracker: pytorch#138609

Version based on https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

Pull Request resolved: pytorch#138899
Approved by: https://github.com/atalman

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
q10 added a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
Pull Request resolved: facebookresearch/FBGEMM#486

- Upgrade gcc version to support newer libstdc++, which is required now that
pytorch/pytorch#141035 has landed

- Deprecate support for CUDA 12.1 and add support for 12.6, per changes in
pytorch/pytorch#138899

X-link: pytorch#3398

Reviewed By: sryap

Differential Revision: D66277492

Pulled By: q10

fbshipit-source-id: 24817efb5c07c1985ab3beeb1610879edbd81acc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR Merged open source skip-pr-sanity-checks topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants