KEMBAR78
Use PyTorch's p2p access enable function by banitag1 · Pull Request #1991 · pytorch/FBGEMM · GitHub
Skip to content

Conversation

@banitag1
Copy link
Contributor

@banitag1 banitag1 commented Sep 3, 2023

Summary:
Reland the diff after fixing the issues with some initialization issues.

cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs get_p2p_access which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Differential Revision: D48939723

@netlify
Copy link

netlify bot commented Sep 3, 2023

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 9ed959f
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/64f77c9eebb8990008be1198
😎 Deploy Preview https://deploy-preview-1991--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request Sep 4, 2023
…ytorch#1991)

Summary:

Reland the diff after fixing the issues with some initialization issues.

cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Differential Revision: D48939723
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request Sep 4, 2023
…ytorch#1991)

Summary:

Reland the diff after fixing the issues with some initialization issues.

cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Differential Revision: D48939723
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request Sep 4, 2023
…ytorch#1991)

Summary:

Reland the diff after fixing the issues with some initialization issues.

cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Differential Revision: D48939723
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request Sep 4, 2023
…ytorch#1991)

Summary:

Reland the diff after fixing the issues with some initialization issues.

cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Differential Revision: D48939723
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request Sep 4, 2023
…ytorch#1991)

Summary:

Reland the diff after fixing the issues with some initialization issues.

cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Differential Revision: D48939723
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request Sep 5, 2023
…ytorch#1991)

Summary:

Reland the diff after fixing the issues with some initialization issues.

cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Reviewed By: zdevito

Differential Revision: D48939723
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request Sep 5, 2023
…ytorch#1991)

Summary:

Reland the diff after fixing the issues with some initialization issues.

cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Reviewed By: zdevito

Differential Revision: D48939723
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request Sep 5, 2023
…ytorch#1991)

Summary:

Reland the diff after fixing the issues with some initialization issues.

cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Reviewed By: zdevito

Differential Revision: D48939723
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

2 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48939723

banitag1 pushed a commit to banitag1/pytorch that referenced this pull request Sep 5, 2023
Summary: Pull Request resolved: pytorch/FBGEMM#1991

Test Plan: sandcastle

Reviewed By: zdevito

Differential Revision: D48939723
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Sep 6, 2023
…8589)

Summary: Pull Request resolved: pytorch/FBGEMM#1991

Test Plan: sandcastle

Reviewed By: zdevito

Differential Revision: D48939723

Pull Request resolved: #108589
Approved by: https://github.com/zdevito
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants