KEMBAR78
[CD] fix xpu support packages version by chuanqi129 · Pull Request #138189 · pytorch/pytorch · GitHub
Skip to content

Conversation

@chuanqi129
Copy link
Collaborator

Works for #114850

@chuanqi129 chuanqi129 requested a review from jeffdaily as a code owner October 17, 2024 08:48
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138189

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit e54fc4e with merge base 9c084cc (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 17, 2024
@chuanqi129 chuanqi129 requested a review from atalman October 17, 2024 08:48
@drisspg drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 18, 2024
@chuanqi129 chuanqi129 requested a review from EikanWang October 18, 2024 07:51
@EikanWang EikanWang added this to the 2.5.1 milestone Oct 21, 2024
Comment on lines 50 to +53
if [ -n "$XPU_VERSION" ]; then
apt-get install -y intel-for-pytorch-gpu-dev-${XPU_VERSION} intel-pti-dev
apt-get install -y intel-for-pytorch-gpu-dev-${XPU_VERSION} intel-pti-dev-0.9
else
apt-get install -y intel-for-pytorch-gpu-dev intel-pti-dev
apt-get install -y intel-for-pytorch-gpu-dev-0.5 intel-pti-dev-0.9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rhel and sles always install intel-for-pytorch-gpu-dev-0.5 intel-pti-dev-0.9. May I know why ubuntu needs to install different intel-for-pytorch-gpu-dev based on XPU_VERSION?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ubuntu used for CI, which extend the XPU_VERSION param for future upgrade

Comment on lines +44 to +46
if [[ "${XPU_DRIVER_TYPE,,}" == "rolling" ]]; then
apt-get install -y intel-ocloc
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the other two OSes not require rolling check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the installation between LTS and latest rolling driver on ubuntu are different now. Previously, the ocloc package included in igc package on ubuntu both for LTS and rolling. But in the latest ubuntu rolling driver, the ocloc has been extracted as a standalone package. For others 2 OSes, the ocloc is always a standalone package.

@malfet
Copy link
Contributor

malfet commented Oct 21, 2024

@EikanWang can you please elaborate why this had 2.5.1 milestone?

@atalman
Copy link
Contributor

atalman commented Oct 21, 2024

HI @chuanqi129 this does not change the packaging of the wheel, but changes to Docker file. How does it fixes #135867 ?

@chuanqi129
Copy link
Collaborator Author

chuanqi129 commented Oct 21, 2024

HI @chuanqi129 this does not change the packaging of the wheel, but changes to Docker file. How does it fixes #135867 ?

Hi @atalman, thanks for the review and comment. Because we the new PTI version is the patch release, the max.min version and so name didn't change, so we don't need update packaging of the wheel part. But if bundle/PTI release a new version in the future, which change the max.min version, we will update the packaging part. For now, we can fix the version and upgrade the version by PR in case the interrupt. Does it make sense to you?

We need it for pytorch 2.5.1 release, because when we build the docker image for release wheel 2.5.1, we need to apply those changes to make sure we install right version packages in the docker image.

CC: @malfet @EikanWang

@chuanqi129
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 23, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda12_4-build / build, linux-binary-libtorch-cxx11-abi / libtorch-cpu-shared-with-deps-cxx11-abi-build / build

Details for Dev Infra team Raised by workflow job

@atalman
Copy link
Contributor

atalman commented Oct 23, 2024

@pytorchmergebot merge -f "failures are irrelevant"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@EikanWang
Copy link
Collaborator

@pytorchbot cherry-pick --onto release/2.5 -c critical

pytorchbot pushed a commit that referenced this pull request Oct 23, 2024
@pytorchbot
Copy link
Collaborator

Cherry picking #138189

The cherry pick PR is at #138694 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

@malfet
Copy link
Contributor

malfet commented Oct 23, 2024

@pytorchbot cherry-pick --onto release/2.5 -c critical

@EikanWang why is it critical?

@malfet
Copy link
Contributor

malfet commented Oct 23, 2024

We need it for pytorch 2.5.1 release, because when we build the docker image for release wheel 2.5.1, we need to apply those changes to make sure we install right version packages in the docker image.

It would only be needed if there are plan to rebuild docker images for 2.5 branch, but there isn't at the moment

@kit1980
Copy link
Contributor

kit1980 commented Oct 23, 2024

2.5.1 is an emergency patch release to address known large regressions, moving this to 2.6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants