-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Refactor device index bound check for xpu code #120768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120768
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 83266ff with merge base 965555d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
b8286b7 to
f2467e5
Compare
|
@guangyey could you describe more in your pr? |
f2467e5 to
5d749a9
Compare
5d749a9 to
c082909
Compare
c10/xpu/XPUFunctions.cpp
Outdated
|
|
||
| void enumDevices(std::vector<std::unique_ptr<sycl::device>>& devices) { | ||
| // Ensures we only call enumDevices only once. | ||
| void enumDevices() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could maybe be moved into a lambda function guarded by call_once (see my other comment below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the parameter devices back to enumDevices to prevent unintentional corrupting of the device list, since the initGlobalDevicePoolState` only can be called once.
|
FYI: #119639 got reverted (again). |
|
Don't merge this PR. It depends on #119639. |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
9ff00da to
83266ff
Compare
|
Looks like #119639 is stale. I will rebase to the main branch and land it. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
# Movitation refer to [Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex.](pytorch#119639), we use `c10::Device::MAX_NUM_DEVICES` to make sure the number of XPU devices is valid in PyTorch. # Solution Use `TORCH_CHECK` to check if the number of XPU devices exceeds `c10::Device::MAX_NUM_DEVICES` when enum XPU devices. Pull Request resolved: pytorch#120768 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/tringwald
Movitation
refer to Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex., we use
c10::Device::MAX_NUM_DEVICESto make sure the number of XPU devices is valid in PyTorch.Solution
Use
TORCH_CHECKto check if the number of XPU devices exceedsc10::Device::MAX_NUM_DEVICESwhen enum XPU devices.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10