-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Bool tensor. Part 0: Boolean storage implementation #16810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Question: do we want to consider the bool tensor as holding 1bit data? E.g., by having |
|
From a quick look, what is the reason why |
This was done to minimize the amount of needed changes. If THCGenerateBoolType.h/THGenerateBoolType.h were included in THCGenerateAllTypes.h/THGenerateAllType.h, this change would me significantly bigger and harder to review. |
@fmassa , it's 1 byte, which is the same as numpy. |
|
I understand that it might be simpler to use |
|
@apaszke but that would mean that we could not do |
|
A couple more thoughts:
But I do agree that supporting 1bit (or 2bit? 4 bit?, 3 bit?) elements would be a much bigger endeavor. In particular, it would also mean changing all the support we currently have in TensorIterator and alike, which would not be easy. |
|
I buy @albanD's argument; there are use cases for 1 byte bools (even if we could magically also have 1-bit bools), and given our numpy affinity, it seems like the right choice to call 1-byte bools There's nothing precluding us from coming in later and supporting 1-bit bools via a packed representation or something (e.g. 1-bit quantization is a thing) that wouldn't break backwards compatibility. |
|
@albanD yeah in that case |
|
There is an advantage though @apaszke: we can have clear semantics on indexing with bool tensor. Now indexing with a byte tensor containing |
|
If the goal is just to be compatible with numpy bool and allow for better advanced indexing, do actually need to generate code a completely new scalar type (which looks like this PR is doing)? Can it be a small wrapper around ByteTensor with custom functions for numpy interface and so that it can be used by indexing? |
|
@albanD I don't really see how you'd make that work nicely at the C++ level. |
|
I'm not familiar enough with the way aten works so that was just a suggestion in case it was possible. |
|
@albanD well C++ is a little more complicated :). i.e. you'd have to make sure everything you need is virtual, which would slow down the main line path. It seems very likely to be a leaky abstraction. |
|
Ok, good to know ! |
|
@albanD I think the right thing to do is to only define minimal mathematical operators so the code size difference shouldn't be too big; people can |
|
looks like you may have a clang_tidy issue. |
aten/src/THC/THCTensorCopy.cu
Outdated
| }; | ||
|
|
||
| template <> | ||
| struct CopyOp <bool, bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't fully specialize this unless you rewrite the above to only have one template parameter. You can just partially specialize this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason being the above won't actually work if someone were to instantiate <non_bool, bool>.
torch/csrc/utils.h
Outdated
| PyInt_Check(object) ? PyInt_AsLong(object) : \ | ||
| (throw std::runtime_error("Could not parse real"), 0)) | ||
|
|
||
| #define THPUtils_unpackReal_BOOL(object) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to define a version for py2 vs py3, bools are the same across versions. Just put this below, outside of the if/endif.
| #if defined(TH_REAL_IS_BYTE) || defined(TH_REAL_IS_CHAR) | ||
| memcpy(THWStorage_(data)(storage), src + offset, count); | ||
| #elif defined(TH_REAL_IS_BOOL) | ||
| // Because of ASAN checks, we have to manually move bytes instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you aren't moving bytes though, you are changing the the values. How about include the ASAN check that tripped this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good, nice work!
I think the test failures will go away if you merge in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@izdeby has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@izdeby has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@izdeby is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: pytorch#4764 pytorch#4219 pytorch#4288 **Change**: Added boolean type to the Storage class for CPU and CUDA backends. **Tested via**: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: pytorch#16810 Differential Revision: D14087246 fbshipit-source-id: 12f22b897c33defddad1c967e4cf5ae764b85d13
Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: pytorch/pytorch#4764 pytorch/pytorch#4219 pytorch/pytorch#4288 **Change**: Added boolean type to the Storage class for CPU and CUDA backends. **Tested via**: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: pytorch/pytorch#16810 Reviewed By: gchanan Differential Revision: D14087246 Pulled By: izdeby fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e
This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this:
This feature was requested by the community:
#4764
#4219
#4288
Change:
Added boolean type to the Storage class for CPU and CUDA backends.
Tested via:
-> import torch
-> torch.BoolStorage
<class 'torch.BoolStorage'>
-> torch.cuda.BoolStorage
<class 'torch.cuda.BoolStorage'>