-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Improve the Sparse matrix multiplication computational speed #16187 #16905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…#16905) Summary: Instead of converting coo to csr format of the sparse matrix in the original implementation, in my revision I directly use coo format for sparse dense matrix mutliplication. On my linux machine it is 5 times faster than the original code: ``` (original code) SIZE: 15000 DENSITY: 0.01 DEVICE: cpu torch: 0.39403 seconds np: 0.00496674 seconds torch/np: 79.3338 ---------------------------------------- (my update) SIZE: 15000 DENSITY: 0.01 DEVICE: cpu torch: 0.0812583 seconds np: 0.00501871 seconds torch/np: 16.1911 ``` Further code feedback and running time tests are highly welcomed. I will keep revise my code if needed. Pull Request resolved: pytorch/pytorch#16905 Differential Revision: D14020095 Pulled By: ezyang fbshipit-source-id: 4ab94075344a55b375f22421e97a690e682baed5
|
Hi @ezyang, @yf225 @fmassa, my PR is automatically closed. |
|
yes the current commit has landed to master. for more work / changes, open a new PR. thanks! |
Hi @soumith, thank you for your reply. Is it possible that I can have more feedback on my code? both of the error reports of internal tests involve cuda, but I only change the cpu part implementation. I didn't get very useful feedback from the internal tests. |
|
Hi @musikisomorphie; our CI is a bit flaky, so one of the things I do before merging is check the failure and see if they are valid or not. In your case, the CUDA errors were flaky and not related to your PR, so I merged. |
Thanks @ezyang, I am glad my code is merged. I misunderstood what @soumith said, I will keep contributing to pytorch. it is quite fun. |
Instead of converting coo to csr format of the sparse matrix in the original implementation, in my revision I directly use coo format for sparse dense matrix mutliplication.
On my linux machine it is 5 times faster than the original code:
Further code feedback and running time tests are highly welcomed. I will keep revise my code if needed.