Implement reference counting for shared IPC CUDA tensors#16854
Implement reference counting for shared IPC CUDA tensors#16854VitalyFedyunin wants to merge 50 commits into
Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
What was the original performance problem, in the end? |
|
Test failures are real: |
|
I still don't see a Note explaining the general strategy in the PR ;) For example, the most critical information to add is under what circumstances EDIT: Sorry, I didn't see your note about documentation being in progress :) |
| CudaIPCSentData(std::string handle, int64_t offset, int64_t* counter_ptr) | ||
| : handle(handle), offset(offset), counter_ptr(counter_ptr){}; | ||
| ~CudaIPCSentData(); | ||
| int64_t get(); |
There was a problem hiding this comment.
I'd definitely appreciate a doc here
|
Needs tests |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
3ba803c to
e146141
Compare
There was a problem hiding this comment.
Wondering if this shouldn't go in the torch/csrc/cuda folder. I'm not too familiar with how the build works here, but it seems worth looking into, or maybe asking @zdevito about
There was a problem hiding this comment.
This message can be even better if it offers some information about what this means, and advice about how to remediate the situation. A link to more detailed docs is often good enough.
There was a problem hiding this comment.
This is almost assuredly failing lint
|
OK, finished reviewing the new stuff. Note that you want to make the dev docs discoverable. The best way to do it is to cite them from the relevant code, so that when people are reading the code they know where to go to get the info. WE use the convention |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@pytorchbot retest this please |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@pytorchbot retest this please |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@VitalyFedyunin merged this pull request in 5653a91. |
Summary: This is to fix pytorch#16141 and similar issues. The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage. ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress. Pull Request resolved: pytorch#16854 Differential Revision: D13994490 Pulled By: VitalyFedyunin fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1
…ed (pytorch#19904) Summary: The mp notes are not updated after pytorch#16854. (The torch.multiprocessing page is.) Pull Request resolved: pytorch#19904 Differential Revision: D15509661 Pulled By: soumith fbshipit-source-id: 7c11e14a6c804498dda3adbf19710e63e6a564a0
This is to fix #16141 and similar issues.
The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage.
@ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress.