- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.7k
Implement reference counting for shared IPC CUDA tensors #16854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement reference counting for shared IPC CUDA tensors #16854
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| What was the original performance problem, in the end? | 
| Test failures are real:  | 
| I still don't see a Note explaining the general strategy in the PR ;) For example, the most critical information to add is under what circumstances  EDIT: Sorry, I didn't see your note about documentation being in progress :) | 
        
          
                torch/csrc/CudaIPCTypes.h
              
                Outdated
          
        
      | CudaIPCSentData(std::string handle, int64_t offset, int64_t* counter_ptr) | ||
| : handle(handle), offset(offset), counter_ptr(counter_ptr){}; | ||
| ~CudaIPCSentData(); | ||
| int64_t get(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely appreciate a doc here
| Needs tests | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
3ba803c    to
    e146141      
    Compare
  
            
          
                torch/csrc/CudaIPCTypes.cpp
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this shouldn't go in the torch/csrc/cuda folder. I'm not too familiar with how the build works here, but it seems worth looking into, or maybe asking @zdevito about
        
          
                torch/csrc/CudaIPCTypes.cpp
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message can be even better if it offers some information about what this means, and advice about how to remediate the situation. A link to more detailed docs is often good enough.
|  | ||
| with leak_checker(self) as lc: | ||
| for _ in range(repeat): | ||
| do_test() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost assuredly failing lint
| OK, finished reviewing the new stuff. Note that you want to make the dev docs discoverable. The best way to do it is to cite them from the relevant code, so that when people are reading the code they know where to go to get the info. WE use the convention  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| @pytorchbot retest this please | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| @pytorchbot retest this please | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| @VitalyFedyunin merged this pull request in 5653a91. | 
This is to fix #16141 and similar issues.
The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage.
@ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress.