Bool tensor. Part 0: Boolean storage implementation #16810

izdeby · 2019-02-06T18:44:08Z

This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this:

Storage Implementation (this change)
Tensor Creation.
Tensor Conversions.
Tensor Indexing.
Tensor Operations.
Back compatibility related changes.

This feature was requested by the community:
#4764
#4219
#4288

Change:
Added boolean type to the Storage class for CPU and CUDA backends.

Tested via:

unit tests
running this:
-> import torch
-> torch.BoolStorage
<class 'torch.BoolStorage'>
-> torch.cuda.BoolStorage
<class 'torch.cuda.BoolStorage'>

fmassa · 2019-02-06T18:52:28Z

Question: do we want to consider the bool tensor as holding 1bit data? E.g., by having uint8_t and packing it as 8 element tensor?

albanD · 2019-02-06T18:53:52Z

From a quick look, what is the reason why THCGenerateBoolType.h did not make it into THGenerateAllTypes.h ? Are there cases where you want to generate all other types but not bool?

izdeby · 2019-02-06T19:24:40Z

From a quick look, what is the reason why THCGenerateBoolType.h did not make it into THGenerateAllTypes.h ? Are there cases where you want to generate all other types but not bool?

This was done to minimize the amount of needed changes. If THCGenerateBoolType.h/THGenerateBoolType.h were included in THCGenerateAllTypes.h/THGenerateAllType.h, this change would me significantly bigger and harder to review.
As an example, take a look at aten/src/TH/THTensor.h. If i had included bool type into *GenerateAllType, i would have to implement all Math, Random and Convolution functionality. This will be done, but later.

torch/csrc/byte_order.cpp

izdeby · 2019-02-07T00:05:30Z

Question: do we want to consider the bool tensor as holding 1bit data? E.g., by having uint8_t and packing it as 8 element tensor?

@fmassa , it's 1 byte, which is the same as numpy.

apaszke · 2019-02-07T20:33:27Z

I understand that it might be simpler to use uint8_t to represent a single element for now, but if we wanted to change the memory representation of those tensors in the future it would be a breaking change (people might have kernels), so it would be good to get it right sooner than later.

albanD · 2019-02-07T21:00:06Z

@apaszke but that would mean that we could not do from_numpy on numpy.bool into this type without copy? Which is what at least one of the issue is about no?

fmassa · 2019-02-07T21:35:37Z

A couple more thoughts:

if we actually want to have binary masks to be part of autograd (e.g., in order to solve problems like Log and indexing do not commute correctly with respect to gradient #12986) or for the mask in batched tensors, it might be useful considering a 1bit representation for memory savings.
having some kind of support for numeric representations with < 1 byte could be interesting, as it could potentially be a way of implementing low-precision numbers by packing 2/4/8 elements into a single byte

But I do agree that supporting 1bit (or 2bit? 4 bit?, 3 bit?) elements would be a much bigger endeavor. In particular, it would also mean changing all the support we currently have in TensorIterator and alike, which would not be easy.

gchanan · 2019-02-07T21:44:35Z

I buy @albanD's argument; there are use cases for 1 byte bools (even if we could magically also have 1-bit bools), and given our numpy affinity, it seems like the right choice to call 1-byte bools torch.bool.

There's nothing precluding us from coming in later and supporting 1-bit bools via a packed representation or something (e.g. 1-bit quantization is a thing) that wouldn't break backwards compatibility.

apaszke · 2019-02-07T21:45:29Z

@albanD yeah in that case from_numpy won't work on those tensors (it doesn't work today, and I don't think it's a terrible loss). I really think that the memory savings are quite nice, and otherwise we're just duplicating torch.ByteTensor with no clear distinction

fmassa · 2019-02-07T22:10:05Z

There is an advantage though @apaszke: we can have clear semantics on indexing with bool tensor. Now indexing with a byte tensor containing [0, 1, 2] works the same as if it was [0, 1, 1], which can be confusing

aten/src/TH/THGenerateBoolType.h

aten/src/THC/THCGenerateBoolType.h

aten/src/TH/THTensor.cpp

aten/src/TH/THTensor.h

aten/src/TH/THTensor.hpp

aten/src/THC/generic/THCTensor.h

c10/core/ScalarType.h

test/test_torch.py

torch/csrc/generic/StorageMethods.cpp

albanD · 2019-02-08T10:44:12Z

If the goal is just to be compatible with numpy bool and allow for better advanced indexing, do actually need to generate code a completely new scalar type (which looks like this PR is doing)? Can it be a small wrapper around ByteTensor with custom functions for numpy interface and so that it can be used by indexing?

gchanan · 2019-02-08T14:17:28Z

@albanD I don't really see how you'd make that work nicely at the C++ level.

albanD · 2019-02-08T15:49:48Z

I'm not familiar enough with the way aten works so that was just a suggestion in case it was possible.
Couldn't an implementation inherit from another one?

gchanan · 2019-02-08T16:09:26Z

@albanD well C++ is a little more complicated :). i.e. you'd have to make sure everything you need is virtual, which would slow down the main line path. It seems very likely to be a leaky abstraction.

albanD · 2019-02-08T16:26:30Z

Ok, good to know !
Larger binaries it is then !

gchanan · 2019-02-08T20:57:19Z

@albanD I think the right thing to do is to only define minimal mathematical operators so the code size difference shouldn't be too big; people can .to(torch.uint8) if they really want to do math.

gchanan · 2019-02-08T22:16:30Z

looks like you may have a clang_tidy issue.

aten/src/ATen/gen.py

aten/src/TH/THGenerateBoolType.h

aten/src/THC/THCGenerateBoolType.h

c10/core/ScalarType.h

test/test_torch.py

torch/csrc/byte_order.cpp

torch/csrc/utils.h

aten/src/THC/THCTensorCopy.cu

torch/csrc/generic/StorageMethods.cpp

torch/storage.py

aten/src/TH/THGenerateBoolType.h

aten/src/THC/THCGenerateBoolType.h

test/test_torch.py

gchanan · 2019-02-12T22:34:16Z

aten/src/THC/THCTensorCopy.cu

+};
+
+template <>
+struct CopyOp <bool, bool> {


I wouldn't fully specialize this unless you rewrite the above to only have one template parameter. You can just partially specialize this.

The reason being the above won't actually work if someone were to instantiate <non_bool, bool>.

test/test_torch.py

gchanan · 2019-02-12T22:57:50Z

torch/csrc/utils.h

    PyInt_Check(object) ? PyInt_AsLong(object) :                               \
    (throw std::runtime_error("Could not parse real"), 0))
+
+#define THPUtils_unpackReal_BOOL(object)                                       \


you don't need to define a version for py2 vs py3, bools are the same across versions. Just put this below, outside of the if/endif.

gchanan · 2019-02-13T16:55:31Z

torch/csrc/generic/StorageMethods.cpp

 #if defined(TH_REAL_IS_BYTE) || defined(TH_REAL_IS_CHAR)
  memcpy(THWStorage_(data)(storage), src + offset, count);
 #elif defined(TH_REAL_IS_BOOL)
+  // Because of ASAN checks, we have to manually move bytes instead of


you aren't moving bytes though, you are changing the the values. How about include the ASAN check that tripped this?

gchanan

changes look good, nice work!

I think the test failures will go away if you merge in master.

facebook-github-bot

@izdeby has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@izdeby has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@izdeby is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: pytorch#4764 pytorch#4219 pytorch#4288 **Change**: Added boolean type to the Storage class for CPU and CUDA backends. **Tested via**: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: pytorch#16810 Differential Revision: D14087246 fbshipit-source-id: 12f22b897c33defddad1c967e4cf5ae764b85d13

Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: pytorch/pytorch#4764 pytorch/pytorch#4219 pytorch/pytorch#4288 **Change**: Added boolean type to the Storage class for CPU and CUDA backends. **Tested via**: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: pytorch/pytorch#16810 Reviewed By: gchanan Differential Revision: D14087246 Pulled By: izdeby fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e

ssnl reviewed Feb 6, 2019

View reviewed changes

torch/csrc/byte_order.cpp Outdated Show resolved Hide resolved

izdeby changed the title ~~Bool tensor~~ [WIP] Bool tensor Feb 6, 2019

izdeby changed the title ~~[WIP] Bool tensor~~ Bool tensor. Part 0: Boolean storage implementation Feb 7, 2019

izdeby changed the title ~~Bool tensor. Part 0: Boolean storage implementation~~ [Not ready]Bool tensor. Part 0: Boolean storage implementation Feb 7, 2019

gchanan reviewed Feb 7, 2019

View reviewed changes

aten/src/TH/THGenerateBoolType.h Outdated Show resolved Hide resolved

aten/src/TH/THGenerateBoolType.h Outdated Show resolved Hide resolved

gchanan reviewed Feb 7, 2019

View reviewed changes

izdeby changed the title ~~[Not ready]Bool tensor. Part 0: Boolean storage implementation~~ Bool tensor. Part 0: Boolean storage implementation Feb 8, 2019

gchanan reviewed Feb 8, 2019

View reviewed changes

torch/csrc/byte_order.cpp Outdated Show resolved Hide resolved

torch/csrc/utils.h Outdated Show resolved Hide resolved

torch/csrc/utils.h Outdated Show resolved Hide resolved

izdeby changed the title ~~Bool tensor. Part 0: Boolean storage implementation~~ [Not ready] Bool tensor. Part 0: Boolean storage implementation Feb 9, 2019

izdeby changed the title ~~[Not ready] Bool tensor. Part 0: Boolean storage implementation~~ Bool tensor. Part 0: Boolean storage implementation Feb 11, 2019

gchanan reviewed Feb 11, 2019

View reviewed changes

test/test_torch.py Show resolved Hide resolved

test/test_torch.py Show resolved Hide resolved

gchanan reviewed Feb 12, 2019

View reviewed changes

gchanan reviewed Feb 13, 2019

View reviewed changes

gchanan approved these changes Feb 13, 2019

View reviewed changes

gchanan approved these changes Feb 14, 2019

View reviewed changes

facebook-github-bot reviewed Feb 14, 2019

View reviewed changes

facebook-github-bot reviewed Feb 15, 2019

View reviewed changes

izdeby force-pushed the BoolTensor branch from 2f6105d to ace40a8 Compare February 16, 2019 19:16

izdeby force-pushed the BoolTensor branch from ace40a8 to 1a5fccb Compare February 18, 2019 00:28

facebook-github-bot reviewed Feb 18, 2019

View reviewed changes

izdeby force-pushed the BoolTensor branch from 1a5fccb to d1128bf Compare February 18, 2019 23:24

izdeby force-pushed the BoolTensor branch from d1128bf to 6318e21 Compare February 18, 2019 23:27

izdeby force-pushed the BoolTensor branch from 6318e21 to db45f53 Compare February 19, 2019 01:11

facebook-github-bot closed this in 444039c Feb 19, 2019

ezyang added the merged label Jun 25, 2019

myedibleenso mentioned this pull request Apr 20, 2022

Cast mask tensor to bool in torch.where kmkurn/pytorch-crf#95

Closed

Bool tensor. Part 0: Boolean storage implementation #16810

Bool tensor. Part 0: Boolean storage implementation #16810

Uh oh!

Conversation

izdeby commented Feb 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmassa commented Feb 6, 2019

Uh oh!

albanD commented Feb 6, 2019

Uh oh!

izdeby commented Feb 6, 2019

Uh oh!

Uh oh!

izdeby commented Feb 7, 2019

Uh oh!

apaszke commented Feb 7, 2019

Uh oh!

albanD commented Feb 7, 2019

Uh oh!

fmassa commented Feb 7, 2019

Uh oh!

gchanan commented Feb 7, 2019

Uh oh!

apaszke commented Feb 7, 2019

Uh oh!

fmassa commented Feb 7, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albanD commented Feb 8, 2019

Uh oh!

gchanan commented Feb 8, 2019

Uh oh!

albanD commented Feb 8, 2019

Uh oh!

gchanan commented Feb 8, 2019

Uh oh!

albanD commented Feb 8, 2019

Uh oh!

gchanan commented Feb 8, 2019

Uh oh!

gchanan commented Feb 8, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gchanan Feb 12, 2019

Choose a reason for hiding this comment

Uh oh!

gchanan Feb 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gchanan Feb 12, 2019

izdeby commented Feb 6, 2019 •

edited

Loading

gchanan Feb 12, 2019 •

edited

Loading