Replies: 3 comments 9 replies
-
@mloubout The issue may not be threading, but the backend MPI implementation. You did not mention what backend MPI implementation and version are you using. Is it Open MPI? Have you tried with a different MPI implementation? Does it fail if you run your code above with Would you mind doing a quick experiment? |
Beta Was this translation helpful? Give feedback.
-
I have tried two: Openmpi and intelMPI and they both hang.
The
Yes. It doesn't change the hang with openMPI or intel MPI For reference, this is the error trace when I exit it (ctrl+c) ^C^CTraceback (most recent call last):
File "/home/mloubout/DevitoCodes/devitopro/mfes/decoupler.py", line 13, in main
print(list(results))
^^^^^^^^^^^^^
File "/home/mloubout/.local/lib/python3.11/site-packages/mpi4py/futures/pool.py", line 234, in result_iterator
yield futures.pop().result()
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 451, in result
self._condition.wait(timeout)
File "/usr/lib/python3.11/threading.py", line 327, in wait
waiter.acquire()
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mloubout/DevitoCodes/devitopro/mfes/decoupler.py", line 16, in <module>
main()
File "/home/mloubout/DevitoCodes/devitopro/mfes/decoupler.py", line 11, in main
with MPIPoolExecutor(max_workers=4) as executor:
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 647, in __exit__
self.shutdown(wait=True)
File "/home/mloubout/.local/lib/python3.11/site-packages/mpi4py/futures/pool.py", line 207, in shutdown
pool.join()
File "/home/mloubout/.local/lib/python3.11/site-packages/mpi4py/futures/_lib.py", line 160, in join
self.thread.join()
File "/usr/lib/python3.11/threading.py", line 1119, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.11/threading.py", line 1139, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt |
Beta Was this translation helpful? Give feedback.
-
I have also encountered this problem. I built a Docker image that runs successfully on an A800 GPU machine, but when I run the same image on an H20 GPU machine, the issue mentioned above occurs. So I believe this is related to the host machine rather than the image environment. I have no idea to solve it.... |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a follow-up/continuation of the work discussed in #476
ISSUE
We (I mainly) are running into an issue on some system that the executor hangs at either result fetching
or at pool shutdown around here
Reproducer
The script hanging is:
This is where it gets a bit tricky as it doesn't seem to always be the case. On my ubuntu desktop:
The question
Are you aware of some interaction between mpi4py futures and threading that might lead to some deadlock?
Thanks again for all the support and discussions
Beta Was this translation helpful? Give feedback.
All reactions