Preemptively test for out-of-order length.#13933
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
"It would be good to understand where the segfault comes" I'm not an expert in C++ memory, but it seems to be an issue in how the stack gets unwound when the exception is thrown. The error message suggests a double-free. Stepping through in the debugger I see that it reaches the expected AT_CHECK, satisfies the condition, and throws the error in the original code as expected, but then segfaults when returning. I think failing faster is safer because there's nothing yet allocated that would need to be freed when the exception is thrown. |
Summary: torch.nn.utils.rnn.pack_padded_sequence segment fault if not in decreasing order pytorch#13324 We were seeing this segfault on throw, pre-emptively checking avoids this: *** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 *** Test Plan: Added unit test based on example provided in issue. Reviewers: Subscribers: Tasks: Tags:
4effde1 to
a14623f
Compare
|
Did ASAN report anything on this test? It's possible it is a stack unwinding problem but I don't see anything on the stack in this function that should cause the problem. ASAN might be able to say something better. |
|
Here's the full output running with ASAN, I haven't read through it in detail yet: In [6]: pack_padded_sequence(b_a, [22, 25])==213533==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60f000027c90 at pc 0x7f510d37d464 bp 0x7fff530ffab0 sp 0x7fff530ffaa8 0x60f000027c90 is located 0 bytes to the right of 176-byte region [0x60f000027be0,0x60f000027c90) SUMMARY: AddressSanitizer: heap-buffer-overflow caffe2/aten/src/ATen/native/PackedSequence.cpp:67 in at::native::_pack_padded_sequence(at::Tensor const&, at::Tensor const&, bool) |
|
I'm a bit confused that the original error was "double free or corruption," and when I stepped through the code it got to line 71 to throw before segfaulting ( |
|
Hi @nairbv and @apaszke, I think that I know what might be causing the exception. In Line 33, we have: at::Tensor batch_sizes_t = at::empty(lengths[0], _lengths.options());If the parameter So this code should be something like (I have no skill using the ATEN library :( ) at::Tensor batch_sizes_t = at::empty(at::max(lengths), _lengths.options());(I do no if am I right? Does it make sense for you guys? I did not test anything, just pass my eyes through the code, so the probability of my analysis is wrong is really high. Hope I could help. |
|
@igormq ah, yes, that does make sense, and much better explains why this PR avoided the segfault. Thanks |
There was a problem hiding this comment.
The issue, as @igormq discovered and @nairbv explained to me offline was that we were performing the check for sortedness too late and indexing pass the end of the batch_sizes_t tensor as a result, causing a buffer overflow.
this looks good to me, I had two minor comments in the code (please read them!). I think it would be nice in the future to remove the sortedness requirement but that needs discussion.
| (*batch_sizes++) = current_batch_size; | ||
| } | ||
| prev_l = l; | ||
| } else if (prev_l > l) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| AT_CHECK(lengths[batch_size - 1] > 0, | ||
| "Length of all samples has to be greater than 0, but found an element " | ||
| "in 'lengths' that is <= 0"); | ||
| for(auto i = 0; i < batch_size - 1; i++ ) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
found reason for segfault (see other comments)
facebook-github-bot
left a comment
There was a problem hiding this comment.
@nairbv is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: torch.nn.utils.rnn.pack_padded_sequence segment fault if not in decreasing order #13324 We were seeing this segfault on throw, pre-emptively checking avoids this: *** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 *** Pull Request resolved: pytorch/pytorch#13933 Differential Revision: D13090389 Pulled By: nairbv fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8
Summary: torch.nn.utils.rnn.pack_padded_sequence segment fault if not in decreasing order pytorch#13324 We were seeing this segfault on throw, pre-emptively checking avoids this: *** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 *** Pull Request resolved: pytorch#13933 Differential Revision: D13090389 Pulled By: nairbv fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8
Summary:
torch.nn.utils.rnn.pack_padded_sequence segment fault if not in
decreasing order #13324
We were seeing this segfault on throw, pre-emptively checking avoids
this:
*** Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 ***
Test Plan:
Added unit test based on example provided in issue.
Reviewers:
Subscribers:
Tasks:
Tags: