KEMBAR78
[cuda][cupy] Improve cupy device placement when device is provided with explicit index by youkaichao · Pull Request #158529 · pytorch/pytorch · GitHub
Skip to content

Conversation

@youkaichao
Copy link
Collaborator

@youkaichao youkaichao commented Jul 17, 2025

resubmit #158320 , fixing a potential bug when device index is not specified explicitly.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158529

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 792fcb5 with merge base 0e3e377 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jul 17, 2025
Comment on lines +496 to +502
if (device_opt.has_value() && device_opt->has_index()) {
// if device_opt is provided with explicit device index, use it
return device_opt;
} else {
// otherwise infer from cudaPointerGetAttributes later in from_blob
return std::nullopt;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wdvr I think this can solve the broken test in https://github.com/pytorch/pytorch/actions/runs/16317513144/job/46092420996 .

I tried that test locally, but it passes even before this fix. no ideas about what's going on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we trigger that test in the pr before merge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youkaichao I added ciflow/trunk and ci-no-td labels, that should run all tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then how can i trigger a new ci run?

@youkaichao youkaichao added the release notes: cuda release notes category label Jul 17, 2025
@youkaichao youkaichao changed the title [cuda][cupy] Improve cupy device placement when device is provided [cuda][cupy] Improve cupy device placement when device is provided with explicit index Jul 17, 2025
@youkaichao
Copy link
Collaborator Author

also cc @ezyang

@janeyx99 janeyx99 requested a review from ezyang July 17, 2025 19:28
@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 17, 2025
@wdvr wdvr added ciflow/trunk Trigger trunk jobs on your pull request ci-no-td Do not run TD on this PR labels Jul 18, 2025
)


os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this, it will interfere with running this with other test files in a single process.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the env var change to _init_device, then it should not interfere with other tests. hope it works?

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functional change is fine but we shouldn't be twiddling envvars in the test file

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is still done improperly, because the environ is sticky. If there is no way to do this I recommend just running the test in a subprocess, check out test_cublas_config_nondeterministic_alert for an example

@youkaichao
Copy link
Collaborator Author

The test is still done improperly, because the environ is sticky.

I thought the test were run in subprocess?

@youkaichao
Copy link
Collaborator Author

@ezyang how about i use

torch.cuda.memory._set_allocator_settings('expandable_segments:True')
torch.cuda.memory._set_allocator_settings('expandable_segments:False')

before and after the test?

@ezyang
Copy link
Contributor

ezyang commented Aug 13, 2025

This appears to match current usage in test/test_cuda.py, so sure, if you try...finally it I will accept

@youkaichao
Copy link
Collaborator Author

@ezyang added in 361edbb. I think the tearDownClass method should be better than try-finally, in terms of indentation level and code format.

@youkaichao
Copy link
Collaborator Author

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

can-gaa-hou pushed a commit to can-gaa-hou/pytorch that referenced this pull request Aug 22, 2025
…th explicit index (pytorch#158529)

resubmit pytorch#158320 , fixing a potential bug when device index is not specified explicitly.

Pull Request resolved: pytorch#158529
Approved by: https://github.com/ezyang
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…th explicit index (pytorch#158529)

resubmit pytorch#158320 , fixing a potential bug when device index is not specified explicitly.

Pull Request resolved: pytorch#158529
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: cuda release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants