[cuda][cupy] Improve cupy device placement when device is provided with explicit index #158529

youkaichao · 2025-07-17T02:23:47Z

resubmit #158320 , fixing a potential bug when device index is not specified explicitly.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

pytorch-bot · 2025-07-17T02:23:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158529

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 792fcb5 with merge base 0e3e377 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / win-vs2022-cpu-py3 / test (default, 1, 4, lf.windows.4xlarge.nonephemeral) (gh) (similar failure)
dynamo\test_pgo.py::PgoTest::test_different_file_paths_local_pgo

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / win-vs2022-cpu-py3 / test (default, 2, 4, lf.windows.4xlarge.nonephemeral) (gh) (trunk failure)
inductor\test_torchbind.py::TestTorchbind::test_torchbind_aot_compile

This comment was automatically generated by Dr. CI and updates every 15 minutes.

youkaichao · 2025-07-17T02:26:07Z

torch/csrc/utils/tensor_numpy.cpp

+      if (device_opt.has_value() && device_opt->has_index()) {
+        // if device_opt is provided with explicit device index, use it
+        return device_opt;
+      } else {
+        // otherwise infer from cudaPointerGetAttributes later in from_blob
+        return std::nullopt;
+      }


@wdvr I think this can solve the broken test in https://github.com/pytorch/pytorch/actions/runs/16317513144/job/46092420996 .

I tried that test locally, but it passes even before this fix. no ideas about what's going on.

how can we trigger that test in the pr before merge?

@youkaichao I added ciflow/trunk and ci-no-td labels, that should run all tests

then how can i trigger a new ci run?

youkaichao · 2025-07-17T08:49:49Z

also cc @ezyang

ezyang · 2025-07-21T17:57:43Z

test/distributed/test_cupy_as_tensor.py

+)
+
+
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"


I don't like this, it will interfere with running this with other test files in a single process.

I moved the env var change to _init_device, then it should not interfere with other tests. hope it works?

ezyang

The functional change is fine but we shouldn't be twiddling envvars in the test file

ezyang

The test is still done improperly, because the environ is sticky. If there is no way to do this I recommend just running the test in a subprocess, check out test_cublas_config_nondeterministic_alert for an example

youkaichao · 2025-08-13T02:58:06Z

The test is still done improperly, because the environ is sticky.

I thought the test were run in subprocess?

youkaichao · 2025-08-13T03:00:14Z

@ezyang how about i use

torch.cuda.memory._set_allocator_settings('expandable_segments:True')
torch.cuda.memory._set_allocator_settings('expandable_segments:False')

before and after the test?

ezyang · 2025-08-13T03:21:55Z

This appears to match current usage in test/test_cuda.py, so sure, if you try...finally it I will accept

youkaichao · 2025-08-13T03:42:33Z

@ezyang added in 361edbb. I think the tearDownClass method should be better than try-finally, in terms of indentation level and code format.

youkaichao · 2025-08-15T00:19:55Z

@pytorchmergebot merge

pytorchmergebot · 2025-08-15T00:21:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…th explicit index (pytorch#158529) resubmit pytorch#158320 , fixing a potential bug when device index is not specified explicitly. Pull Request resolved: pytorch#158529 Approved by: https://github.com/ezyang

youkaichao added 8 commits July 14, 2025 20:11

add test

bdf3b18

improve

223331b

linter

627db44

improve doc

c58446b

fix lint

182c919

fix compilation error

9952cea

fix cpp build

9d85d0b

fix implicit device index

13b20d5

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jul 17, 2025

update doc

af44d3d

youkaichao commented Jul 17, 2025

View reviewed changes

youkaichao added the release notes: cuda release notes category label Jul 17, 2025

youkaichao changed the title ~~[cuda][cupy] Improve cupy device placement when device is provided~~ [cuda][cupy] Improve cupy device placement when device is provided with explicit index Jul 17, 2025

pytorchbot added the open source label Jul 17, 2025

janeyx99 requested a review from ezyang July 17, 2025 19:28

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 17, 2025

wdvr added ciflow/trunk Trigger trunk jobs on your pull request ci-no-td Do not run TD on this PR labels Jul 18, 2025

ezyang reviewed Jul 21, 2025

View reviewed changes

ezyang requested changes Jul 21, 2025

View reviewed changes

youkaichao added 3 commits August 10, 2025 08:56

Merge branch 'main' into improve_cupy_device

57276ca

move env var to test init

1d81d40

fix lint

3904506

ezyang requested changes Aug 13, 2025

View reviewed changes

cancel the side-effect after test

361edbb

youkaichao added 2 commits August 12, 2025 20:53

fix lint

1926843

fix lint

792fcb5

ezyang approved these changes Aug 15, 2025

View reviewed changes

pytorchmergebot added the merging label Aug 15, 2025

pytorchmergebot closed this in dae7710 Aug 15, 2025

pytorchmergebot added Merged and removed merging labels Aug 15, 2025

		)


		os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

[cuda][cupy] Improve cupy device placement when device is provided with explicit index #158529

[cuda][cupy] Improve cupy device placement when device is provided with explicit index #158529

Uh oh!

Conversation

youkaichao commented Jul 17, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158529

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

youkaichao Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

wdvr Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Jul 19, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Jul 17, 2025

Uh oh!

ezyang Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Aug 13, 2025

Uh oh!

youkaichao commented Aug 13, 2025

Uh oh!

ezyang commented Aug 13, 2025

Uh oh!

youkaichao commented Aug 13, 2025

Uh oh!

youkaichao commented Aug 15, 2025

Uh oh!

pytorchmergebot commented Aug 15, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

youkaichao commented Jul 17, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 17, 2025 •

edited

Loading