-
Notifications
You must be signed in to change notification settings - Fork 670
Use PyTorch's p2p access enable function #1991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
…ytorch#1991) Summary: Reland the diff after fixing the issues with some initialization issues. cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Differential Revision: D48939723
de33565 to
5a35c11
Compare
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
…ytorch#1991) Summary: Reland the diff after fixing the issues with some initialization issues. cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Differential Revision: D48939723
5a35c11 to
d57362e
Compare
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
…ytorch#1991) Summary: Reland the diff after fixing the issues with some initialization issues. cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Differential Revision: D48939723
d57362e to
077967a
Compare
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
…ytorch#1991) Summary: Reland the diff after fixing the issues with some initialization issues. cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Differential Revision: D48939723
077967a to
643088e
Compare
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
…ytorch#1991) Summary: Reland the diff after fixing the issues with some initialization issues. cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Differential Revision: D48939723
643088e to
52b7ea7
Compare
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
…ytorch#1991) Summary: Reland the diff after fixing the issues with some initialization issues. cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Reviewed By: zdevito Differential Revision: D48939723
52b7ea7 to
ac93de4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
…ytorch#1991) Summary: Reland the diff after fixing the issues with some initialization issues. cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Reviewed By: zdevito Differential Revision: D48939723
ac93de4 to
9081083
Compare
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
…ytorch#1991) Summary: Reland the diff after fixing the issues with some initialization issues. cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Reviewed By: zdevito Differential Revision: D48939723
9081083 to
719a963
Compare
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
2 similar comments
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
719a963 to
9ed959f
Compare
|
This pull request was exported from Phabricator. Differential Revision: D48939723 |
Summary: Pull Request resolved: pytorch/FBGEMM#1991 Test Plan: sandcastle Reviewed By: zdevito Differential Revision: D48939723
…8589) Summary: Pull Request resolved: pytorch/FBGEMM#1991 Test Plan: sandcastle Reviewed By: zdevito Differential Revision: D48939723 Pull Request resolved: #108589 Approved by: https://github.com/zdevito
Summary:
Reland the diff after fixing the issues with some initialization issues.
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs
get_p2p_accesswhich lets its allocator figure out how to correctly enable p2p access for that memory.In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.
Differential Revision: D48939723