KEMBAR78
`with torch.cuda.device(gpu)` creates context on the wrong GPU · Issue #4903 · pytorch/pytorch · GitHub
Skip to content

with torch.cuda.device(gpu) creates context on the wrong GPU #4903

@shubho

Description

@shubho
  • OS: Ubuntu 16.04
  • PyTorch version: e58a53a
  • How you installed PyTorch (conda, pip, source): source
  • Python version: 3.6.4
  • CUDA/cuDNN version: 9.0 / 7.0.3
  • GPU models and configuration: V100
  • GCC version (if compiling from source): 5.4

There seems to have been a regression from PyTorch 0.30 in master. You would need multi GPU machine to show this problem. Please have two windows open

Window 1
Run nvidia-smi -l 5

Window 2
Run the following python script by stepping through pdb

import torch
import pdb

gpu = 1
pdb.set_trace()

with torch.cuda.device(gpu):
    pdb.set_trace()

Here is what happens

  1. Hit the first set_trace - no context created on any GPUs
  2. Hit the second set_trace - context is created on GPU 0 (instead of GPU1)

This is due to how initialization happens inside torch/cuda/__init__.py

  1. First hit is __enter__ for class device which calls _lazy_init()
  2. Since this is the first call _lazy_init() will actually do something. Problem starts at torch._C._cuda_init()
  3. This calls goes to THCPModule_initExtension in torch/csrc/cuda/Module.cpp
  4. This will call THCPModule_initCuda which in turn calls state = at::globalContext().lazyInitCUDA();.
  5. Inside ATen contexts will eventually get created on GPU 0 since it assumes that cudaSetDevice has already been called

In fact any function that calls _lazy_init() in torch.cuda will create a context on GPU 0 the first time it is called because of this, irrespective what the user asked for. This is not harmful in a single GPU setting, but on a multi GPU setting as is the case with our cluster, different users get different GPUs, and everybody will have these contexts on GPU 0, no matter what. And contexts take up a fair bit of memory.

I think the right thing to do is to have a _lazy_init(gpuId) which in turns sends down into THCPModule_initExtension to do the right thing.

Quite interestingly set_device() does not call _lazy_init() which also looks like a bug to me since _lazy_init() seems to do a lot of initialization.

This works just fine in PyTorch 0.30 btw where with torch.cuda.device(gpu) will only create contexts on the GPU you ask for.

I am happy to try to fix this but I wanted to get a broader context on why things are this way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions