KEMBAR78
[CI] Fix xpu linux ci build environment duplicated issue by chuanqi129 · Pull Request #141546 · pytorch/pytorch · GitHub
Skip to content

Conversation

@chuanqi129
Copy link
Collaborator

@chuanqi129 chuanqi129 commented Nov 26, 2024

We found that there are duplicated build environments in XPU linux ci test, it led to test jobs may download wrong pytorch build artifact file. Refer https://github.com/pytorch/pytorch/actions/runs/12023238798/job/33518351906#step:14:633

Works for #139722 and #114850

@chuanqi129 chuanqi129 requested a review from a team as a code owner November 26, 2024 06:04
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141546

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f86884e with merge base 9e299b8 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 26, 2024
@chuanqi129 chuanqi129 requested a review from atalman November 26, 2024 06:04
@chuanqi129 chuanqi129 added the ciflow/xpu Run XPU CI tasks label Nov 26, 2024
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like possible CI issue: echo 'Error: Available diskspace is less than 70 percent. Not enough diskspace.'

@atalman
Copy link
Contributor

atalman commented Nov 26, 2024

The failures seems to be unrelated, however please look into these :
inductor/test_inductor_freezing.py::FreezingGpuTests::test_mm_concat_xpu inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_maximize_xpu

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Please investigate the XPU failures on this PR

@chuanqi129
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased fix_xpu_ci onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix_xpu_ci && git pull --rebase)

@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 26, 2024
@chuanqi129
Copy link
Collaborator Author

@pytorchbot rebase -b main

@chuanqi129 chuanqi129 requested a review from atalman November 27, 2024 02:01
@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased fix_xpu_ci onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout fix_xpu_ci && git pull --rebase)

@chuanqi129
Copy link
Collaborator Author

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased fix_xpu_ci onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout fix_xpu_ci && git pull --rebase)

@chuanqi129
Copy link
Collaborator Author

chuanqi129 commented Nov 27, 2024

Hi @atalman, I have resolved the diskspace issue and rebased the PR with latest main, previous 2 xpu inductor UT failures has been fixed. But there is only one doctests failure in the latest CI tests, it's very strange and may need more time to root cause it. I have created a issue #141705 to track it, and will try to fix it ASAP. Can we land this PR firstly?

@chuanqi129 chuanqi129 added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 27, 2024
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thank for taking time resolving and looking into these issues

@chuanqi129
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Dec 3, 2024
# Motivation
Fix this UT failure introduced by #140865. The unrelated failure suppressed this UT failure.
It goes to happen since #141546 is landed.

Pull Request resolved: #141800
Approved by: https://github.com/EikanWang
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
)

We found that there are duplicated build environments in XPU linux ci test, it led to test jobs may download wrong pytorch build artifact file. Refer https://github.com/pytorch/pytorch/actions/runs/12023238798/job/33518351906#step:14:633

Works for pytorch#139722 and pytorch#114850
Pull Request resolved: pytorch#141546
Approved by: https://github.com/EikanWang, https://github.com/atalman
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
# Motivation
Fix this UT failure introduced by pytorch#140865. The unrelated failure suppressed this UT failure.
It goes to happen since pytorch#141546 is landed.

Pull Request resolved: pytorch#141800
Approved by: https://github.com/EikanWang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants