-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Improve error message for torch.binomial enforcing float inputs #157658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error message for torch.binomial enforcing float inputs #157658
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157658
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6d48465 with merge base 524e827 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: cpp" |
|
@pytorchbot label "module: error checking" |
| TORCH_CHECK( | ||
| count.scalar_type() == prob.scalar_type(), | ||
| "Binomial function arguments count and prob must have same datatype of type Float, got: count = ", | ||
| count.scalar_type(),", prob = ", prob.scalar_type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about something like to be consistent with multinomial's error checking
TORCH_CHECK(
at::isFloatingType(count.scalar_type()),
"binomial only supports floating-point dtypes for count, got: ",
self.scalar_type());
TORCH_CHECK(
at::isFloatingType(prob.scalar_type()),
"binomial only supports floating-point dtypes for prob, got: ",
self.scalar_type());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! Option 2 sounds good, added a small nit
|
Could we also add a test? |
|
Hi @soulitzer , sure no problem. I'll also work on making the error checking more consistent with the multinomial's error checking and make another commit. |
|
Hi @soulitzer , I have added a test and made the error more consistent with multimodal's error checking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably fine, but would be nice to add the same error handling for CUDA as well (not sure if test_distributions support dtypes)
| at::isFloatingType(count.scalar_type()), | ||
| "binomial only supports floating-point dtypes for count, got: ", | ||
| count.scalar_type()); | ||
| TORCH_CHECK( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If some values are unsupported, than ValueError is more appropriate than RuntimeError, don't you think?
| TORCH_CHECK( | |
| TORCH_CHECK_VALUE( |
Hmm, I hope CI will fail on this new check, because integer count seems to be fine
|
Hi @malfet, I realized I used the wrong Binomial function in my previous test. I have fixed this and created a new test. I wasn't sure where to put the test for torch.binomial as I didn't find other tests that tested this function so I put it in the test directory. |
|
@michellemadubuike Maybe test/distributions/test_distributions.py would be a good place for this test |
|
@soulitzer Thanks, I moved the test to test/distribution/test_distributions.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@soulitzer, I realized torch.binomial is not supported for CUDA tensors. I add a test skip if CUDA is used in my last commit. This should hopefully resolve the cuda fail in the CI builds |
| assert (vals == 0.0).sum() > 4000 | ||
| assert (vals == 1.0).sum() > 4000 | ||
|
|
||
| @unittest.skipIf(TEST_CUDA, "CUDA found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine without this skip since none of the tensors in the test are created on cuda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this change @michellemadubuike
The failing test should be unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soulitzer Sure, no problem
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes #157195
Summary:
Fixed Issue 157195 by adding a new error message for torch.binomial in aten/src/ATen/native/Distributions.cpp
Explanation
According to the issue,
RuntimeError: Found dtype Float but expected LongIt looks like we are getting a Tensor error rather than a binomial function error. Since the error is coming from pytorch/aten/src/ATen/TensorIterator.cpp, it seems like it is trying to align the tensor data to the same datatype for smooth tensor computations instead of giving a binomial function error.
I tried using both arguments as longs and both as ints and got the right binomial function error
But when I have both as different datatypes, the TensorIterator.cpp error comes back trying to align the datatypes.
RuntimeError: Found dtype Float but expected LongI then tried finding where the NotImplementation Error was documented and found it in pytorch/aten/src/ATen/Dispatch.h in lines 193 - 211
In the AT_DISPATCH_SWITCH function, it picks a tensor and its datatype and checks if the Tensor datatype matches the supported datatypes. If not we get the Not Implemented error. Unfortunately, I think the AT_DISPATCH_SWITCH function, uses the
common_dtypefrom TensorIterator in order to run. So TensorIterator.cpp needs to happen before the AT_DISPATCH_SWITCH function.Summary: We are getting the wrong error message because TensorIterator.cpp gets called and errors out due to Tensor datatype mismatch before we can get the right error message in Dispatch.h for torch.binomial not supporting that datatype.
Options for the Fix
Option 1: Make the error message in TensorIterator.cpp more general so it applies to torch.binomial. An error message along the lines
RunTime Error : "Tensor Datatypes", op.target_dtype," and ", common_dtype_, "are different "Option 2: Add an error message for the binomial function datatype mismatch before the the TensorIterator.cpp error message gets called.
Although Option 1 seemed easier I think Option 2 might be better as it is more specific to the binomial function while Option1 would affect all Tensors with datatype mismatch.
This PR applies the fix for Option 2
After Fix :
@malfet
cc @malfet