Bool tensor. Part 0: Boolean storage implementation#16810
Conversation
|
Question: do we want to consider the bool tensor as holding 1bit data? E.g., by having |
|
From a quick look, what is the reason why |
This was done to minimize the amount of needed changes. If THCGenerateBoolType.h/THGenerateBoolType.h were included in THCGenerateAllTypes.h/THGenerateAllType.h, this change would me significantly bigger and harder to review. |
@fmassa , it's 1 byte, which is the same as numpy. |
|
I understand that it might be simpler to use |
|
@apaszke but that would mean that we could not do |
|
A couple more thoughts:
But I do agree that supporting 1bit (or 2bit? 4 bit?, 3 bit?) elements would be a much bigger endeavor. In particular, it would also mean changing all the support we currently have in TensorIterator and alike, which would not be easy. |
|
I buy @albanD's argument; there are use cases for 1 byte bools (even if we could magically also have 1-bit bools), and given our numpy affinity, it seems like the right choice to call 1-byte bools There's nothing precluding us from coming in later and supporting 1-bit bools via a packed representation or something (e.g. 1-bit quantization is a thing) that wouldn't break backwards compatibility. |
|
@albanD yeah in that case |
|
There is an advantage though @apaszke: we can have clear semantics on indexing with bool tensor. Now indexing with a byte tensor containing |
|
If the goal is just to be compatible with numpy bool and allow for better advanced indexing, do actually need to generate code a completely new scalar type (which looks like this PR is doing)? Can it be a small wrapper around ByteTensor with custom functions for numpy interface and so that it can be used by indexing? |
|
@albanD I don't really see how you'd make that work nicely at the C++ level. |
|
I'm not familiar enough with the way aten works so that was just a suggestion in case it was possible. |
|
@albanD well C++ is a little more complicated :). i.e. you'd have to make sure everything you need is virtual, which would slow down the main line path. It seems very likely to be a leaky abstraction. |
|
Ok, good to know ! |
|
@albanD I think the right thing to do is to only define minimal mathematical operators so the code size difference shouldn't be too big; people can |
|
looks like you may have a clang_tidy issue. |
| }; | ||
|
|
||
| template <> | ||
| struct CopyOp <bool, bool> { |
There was a problem hiding this comment.
I wouldn't fully specialize this unless you rewrite the above to only have one template parameter. You can just partially specialize this.
There was a problem hiding this comment.
The reason being the above won't actually work if someone were to instantiate <non_bool, bool>.
| PyInt_Check(object) ? PyInt_AsLong(object) : \ | ||
| (throw std::runtime_error("Could not parse real"), 0)) | ||
|
|
||
| #define THPUtils_unpackReal_BOOL(object) \ |
There was a problem hiding this comment.
you don't need to define a version for py2 vs py3, bools are the same across versions. Just put this below, outside of the if/endif.
| #if defined(TH_REAL_IS_BYTE) || defined(TH_REAL_IS_CHAR) | ||
| memcpy(THWStorage_(data)(storage), src + offset, count); | ||
| #elif defined(TH_REAL_IS_BOOL) | ||
| // Because of ASAN checks, we have to manually move bytes instead of |
There was a problem hiding this comment.
you aren't moving bytes though, you are changing the the values. How about include the ASAN check that tripped this?
gchanan
left a comment
There was a problem hiding this comment.
changes look good, nice work!
I think the test failures will go away if you merge in master.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@izdeby has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@izdeby has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@izdeby is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: pytorch#4764 pytorch#4219 pytorch#4288 **Change**: Added boolean type to the Storage class for CPU and CUDA backends. **Tested via**: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: pytorch#16810 Differential Revision: D14087246 fbshipit-source-id: 12f22b897c33defddad1c967e4cf5ae764b85d13
Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: pytorch/pytorch#4764 pytorch/pytorch#4219 pytorch/pytorch#4288 **Change**: Added boolean type to the Storage class for CPU and CUDA backends. **Tested via**: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: pytorch/pytorch#16810 Reviewed By: gchanan Differential Revision: D14087246 Pulled By: izdeby fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e
Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: pytorch#4764 pytorch#4219 pytorch#4288 **Change**: Added boolean type to the Storage class for CPU and CUDA backends. **Tested via**: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: pytorch#16810 Reviewed By: gchanan Differential Revision: D14087246 Pulled By: izdeby fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e
This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this:
This feature was requested by the community:
#4764
#4219
#4288
Change:
Added boolean type to the Storage class for CPU and CUDA backends.
Tested via:
-> import torch
-> torch.BoolStorage
<class 'torch.BoolStorage'>
-> torch.cuda.BoolStorage
<class 'torch.cuda.BoolStorage'>