-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Migrate crossKernel from THC to ATen (CUDA) #60039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 697c739 (more details on the Dr. CI page and at hud.pytorch.org/pr/60039):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
I measure a performance improvement here:
|
| int64_t x1stride, int64_t x2stride) { | ||
| const auto N = iter.numel(); | ||
| auto offset_calculator = make_element_offset_calculator<3>(iter); | ||
| TORCH_INTERNAL_ASSERT(N > 0 && N <= std::numeric_limits<int32_t>::max()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check enough? e.g. if you have a tensors with <INT_MAX element, but it's not memory dense, so maximum offset is >INT_MAX (or UINT_MAX)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could assert iter.can_use_32bit_indexing() but that's a more costly check and is guarunteed by cross_impl using with_32bit_indexing anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then with_32bit_indexing should guarantee that N is less than INT_MAX? Or it's here to protect people directly calling launch_cross_kernel, not going through cross_impl? In this case probably should be TORCH_INTERNAL_ASSERT_DEBUG_ONLY (unlikely someone will call launch_cross_kernel directly).
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Ref #24507 (There doesn't seem to be an actual issue for cross)
This also moves the remaining operator functors in
THCTensorMathPointwise.cuhtoSparseCUDATensorMath.cuwhich is the only file using them.