KEMBAR78
test_dataloader.py fails to pass test with error: Can't get attribute 'RandomDataset'... on MacOS · Issue #60319 · pytorch/pytorch · GitHub
Skip to content

test_dataloader.py fails to pass test with error: Can't get attribute 'RandomDataset'... on MacOS #60319

@DamonDeng

Description

@DamonDeng

🐛 Bug

The unit test "./test/test_dataloader.py" fails to pass the test on MacOS, get the following error message:
AttributeError: Can't get attribute 'RandomDataset' on <module 'main' (built-in)>

To Reproduce

Steps to reproduce the behavior:

On MacOS

  1. Setup Conda environment with Python 3.8
  2. Clone the pytorch repository and follow the "build from source" instructions to build pytorch
  3. Run the ./test/test_dataloader.py unit test with: python ./test/test_dataloader.py

Then got the following error message:

AttributeError: Can't get attribute 'RandomDataset' on <module 'main' (built-in)>
Traceback (most recent call last):
File "/Users/duser/Desktop/workspace/pytorch/pytorch/torch/utils/data/dataloader.py", line 990, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/Users/duser/Desktop/dinstalled/anaconda/anaconda3/envs/pytorch_dev/lib/python3.8/multiprocessing/queues.py", line 107, in get
if not self._poll(timeout):
File "/Users/duser/Desktop/dinstalled/anaconda/anaconda3/envs/pytorch_dev/lib/python3.8/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/Users/duser/Desktop/dinstalled/anaconda/anaconda3/envs/pytorch_dev/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
r = wait([self], timeout)
File "/Users/duser/Desktop/dinstalled/anaconda/anaconda3/envs/pytorch_dev/lib/python3.8/multiprocessing/connection.py", line 931, in wait
ready = selector.select(timeout)
File "/Users/duser/Desktop/dinstalled/anaconda/anaconda3/envs/pytorch_dev/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
File "/Users/duser/Desktop/workspace/pytorch/pytorch/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 6407) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

Expected behavior

Expect to pass the test without error message.

The same test was performed on Linux (Ubuntu 20), the output of this unit test is "OK"

Environment

PyTorch version: 1.10.0a0+git469f0e4
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 11.3.1 (x86_64)
GCC version: Could not collect
Clang version: 12.0.5 (clang-1205.0.22.9)
CMake version: version 3.19.6
Libc version: N/A

Python version: 3.8 (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.10.0a0+git469f0e4
[conda] blas 1.0 mkl
[conda] mkl 2021.2.0 hecd8cb5_269
[conda] mkl-include 2021.2.0 hecd8cb5_269
[conda] mkl-service 2.3.0 py38h9ed2024_1
[conda] mkl_fft 1.3.0 py38h4a7008c_2
[conda] mkl_random 1.2.1 py38hb2f4e1b_2
[conda] numpy 1.20.2 py38h4b4dc7a_0
[conda] numpy-base 1.20.2 py38he0bd621_0
[conda] torch 1.10.0a0+git469f0e4 dev_0

Additional context

I dived into the testing code, found that it was caused by the multiprocess handling issue of Python (https://bugs.python.org/issue25053). For some reason, Pool and Process in multiprocessing do not work with objects defined inline, the following code from stackoverflow generates the same kind of error:

from multiprocessing import Pool
def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

In the line 878 of ./test/test_dataloader.py, the unit test call "subprocess.check_output()" to run a python code segment to trigger issue 11201: "Too many open files error #11201", and then catch the exception, use assert code to check whether it is handled by DataLoader.

While, on Mac, it trigger the Python issue 25053 at first, before triggering the Pytorch issue 11201. Because the unit test code segment defines a class (RandomDataset) inline and then call the multiprocessing function to generate data in parallel.

The following is part of the code:

subprocess.check_output([sys.executable, '-c', """\
import torch
...
class RandomDataset(IterableDataset):
    ...
... """])

So, it is an issue caused by unit test code itself.

Suggestion of solution:

As the python issue 11201 doesn't happen on MacOS, one simple solution for this issue is skipping the 11201 test on MacOS.

We can define an 'IS_MACOS' variable and then use @unittest.skipIf() to skip the "test_fd_limit_exceeded" test in ./test/test_dataloader.py

cc @ssnl @VitalyFedyunin @ejguan

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: dataloaderRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions