`with torch.cuda.device(gpu)` creates context on the wrong GPU

- OS: Ubuntu 16.04
- PyTorch version: e58a53a
- How you installed PyTorch (conda, pip, source): source
- Python version:  3.6.4
- CUDA/cuDNN version: 9.0 / 7.0.3
- GPU models and configuration: V100
- GCC version (if compiling from source): 5.4

There seems to have been a regression from PyTorch 0.30 in master. You would need multi GPU machine to show this problem. Please have two windows open

Window 1
Run `nvidia-smi -l 5`

Window 2
Run the following python script by stepping through `pdb`

```
import torch
import pdb

gpu = 1
pdb.set_trace()

with torch.cuda.device(gpu):
    pdb.set_trace()
```
Here is what happens
1. Hit the first `set_trace` - no context created on any GPUs
2. Hit the second `set_trace` - context is created on GPU 0 (instead of GPU1)

This is due to how initialization happens inside `torch/cuda/__init__.py`

1. First hit is `__enter__` for `class device` which calls `_lazy_init()`
2. Since this is the first call `_lazy_init()` will actually do something. Problem starts at `torch._C._cuda_init()`
3. This calls goes to `THCPModule_initExtension` in `torch/csrc/cuda/Module.cpp`
4. This will call `THCPModule_initCuda` which in turn calls `state = at::globalContext().lazyInitCUDA();`. 
5. Inside ATen contexts will eventually get created on GPU 0 since it assumes that `cudaSetDevice` has already been called

In fact any function that calls `_lazy_init()` in `torch.cuda` will create a context on GPU 0 the first time it is called because of this, irrespective what the user asked for. This is not harmful in a single GPU setting, but on a multi GPU setting as is the case with our cluster, different users get different GPUs, and everybody will have these contexts on GPU 0, no matter what. And contexts take up a fair bit of memory.

I think the right thing to do is to have a `_lazy_init(gpuId)` which in turns sends down into `THCPModule_initExtension` to do the right thing.

Quite interestingly `set_device()` does not call `_lazy_init()` which also looks like a bug to me since `_lazy_init()` seems to do a lot of initialization.

This works just fine in PyTorch 0.30 btw where `with torch.cuda.device(gpu)` will only create contexts on the GPU you ask for.

I am happy to try to fix this but I wanted to get a broader context on why things are this way.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`with torch.cuda.device(gpu)` creates context on the wrong GPU #4903

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

with torch.cuda.device(gpu) creates context on the wrong GPU #4903

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`with torch.cuda.device(gpu)` creates context on the wrong GPU #4903