cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework.
With cuPyNumeric you can write code productively in Python, using the familiar NumPy API, and have your program scale with no code changes from single-CPU computers to multi-node-multi-GPU clusters.
For example, you can run the final example of the Python CFD course completely unmodified on 2048 A100 GPUs in a DGX SuperPOD and achieve good weak scaling.
cuPyNumeric works best for programs that have very large arrays of data that cannot fit in the memory of a single GPU or a single node and need to span multiple nodes and GPUs. While our implementation of the current NumPy API is still incomplete, programs that use unimplemented features will still work (assuming enough memory) by falling back to the canonical NumPy implementation.
Pre-built cuPyNumeric packages are available from conda on the legate channel and from PyPI. See https://docs.nvidia.com/cupynumeric/latest/installation.html for details about different install configurations, or building cuPyNumeric from source.
📌 Note
Packages are offered for Linux (x86_64 and aarch64) and macOS (aarch64, pip wheels only), supporting Python versions 3.11 to 3.13. Windows is only supported through WSL.
The cuPyNumeric documentation can be found here.
See the discussion on contributing in CONTRIBUTING.md.
For technical questions about cuPyNumeric and Legate-based tools, please visit the community discussion forum.
If you have other questions, please contact us at legate(at)nvidia.com.
The cuPyNumeric project is independent of the CuPy project. CuPy is a trademark of Preferred Networks, Inc, and the name 'cuPyNumeric' is used with their permission.