Fix output_nr not incremented correctly#4812
Conversation
|
Sorry about miscommunicating; can we get a hard coded test for this particular regression? (@apaszke, you think this would be good to have, right?) |
|
Ouch. It would be good, but testing all configurations can make the tests a lot slower (2^{number of inputs} is quite a lot) 😕 |
|
While updating an existing test to cover this case, I found another bug: This is caused by this check here only covering I attempted to fix by including all params in the check, but |
|
Ick. THNN bindings: the gift that keeps giving X( |
…meters if any param (not just weight) requires grad in parse_nn.py
ee63332 to
5e81cdd
Compare
…arameters with only bias requiring grad
4adf61b to
6f30587
Compare
|
This is ready for review now! There are changes to all THNN functions that update more than 1 parameters in
|
| "bias tensor has to be contiguous"); | ||
| input = THCTensor_(newContiguous)(state, input); | ||
| weight = THCTensor_(newContiguous)(state, weight); | ||
| bias = bias ? THCTensor_(newContiguous)(state, bias) : bias; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| } else if (gradBias != NULL) { | ||
| nOutputPlane = THCTensor_(size)(state, gradBias, 0); | ||
| } else { | ||
| return; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| "expected for weight, but got: %s"); | ||
| THArgCheck(THCTensor_(isContiguous)(state, weight), 4, | ||
| "weight tensor has to be contiguous"); | ||
| THArgCheck(!bias || THCTensor_(isContiguous)(state, bias), 5, |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
apaszke
left a comment
There was a problem hiding this comment.
Not a full review. Just noticed a few things
| } else if (dim == 5) { | ||
| return at::thnn_conv_transpose3d( | ||
| input, weight, bias, | ||
| input, weight, kernel_size, bias, |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| "2D or 4D weight tensor expected, but got: %s"); | ||
| if (bias != NULL) { | ||
| THCUNN_check_dim_size(state, bias, 1, 0, weight->size[0]); | ||
| } |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| THCUNN_assertSameGPU(state, 5, input, gradOutput, gradWeight, gradBias, columns, ones); | ||
| if (gradWeight) { | ||
| THArgCheck(THCTensor_(isContiguous)(state, gradWeight), 4, "gradWeight needs to be contiguous"); | ||
| } |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| int freeWeight = 0; | ||
| if (weight->nDimension == 4) { | ||
| if (weight && weight->nDimension == 4) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@pytorchbot retest this please. |
|
@onnxbot add to whitelist |
|
@pytorchbot retest this please |
2 similar comments
|
@pytorchbot retest this please |
|
@pytorchbot retest this please |
|
Would really appreciate if someone can review this. There are a lot lines changed, but many of them are just similar changes on different THNN files. I also split into multiple commits to make it clearer. If needed, let me know how I can make this easier to review. :) |
|
looking |
Summary: Pull Request resolved: pytorch/glow#4812 if no compilation options are passed, default to c-step fixed the FC and batchmatmul implementations to match C-step fixed the fakelowp map calling to make sure we use the fp32 substitution of operators updated the accumulator test to make it pass with fp32 Test Plan: fakelowp tests glow/test/numerics net_runner Reviewed By: jfix71 Differential Revision: D23086534 fbshipit-source-id: 3fbb8c4055bb190becb39ce8cdff6671f8558734
* fix output_nr not incremented correctly * update test_conv_double_backward to cover this case; call accGradParameters if any param (not just weight) requires grad in parse_nn.py * update Spatial/VolumetricFull(Dilated)Convolution to support accGradParameters with only bias requiring grad * Spatial/VolumetricConvolutionMM * Spatial/VolumetricDilatedConvolution * address @fmassa 's comments
Summary: Pull Request resolved: pytorch/glow#4812 if no compilation options are passed, default to c-step fixed the FC and batchmatmul implementations to match C-step fixed the fakelowp map calling to make sure we use the fp32 substitution of operators updated the accumulator test to make it pass with fp32 Test Plan: fakelowp tests glow/test/numerics net_runner Reviewed By: jfix71 Differential Revision: D23086534 fbshipit-source-id: 3fbb8c4055bb190becb39ce8cdff6671f8558734
output_nris not incremented properly inrebase_historyandset_historywhen some tensors are undefined. This causes the autograd engine incorrectly putting input tensors at wrong indices inInputBuffer. See the following extremely simple double backward example that reproduces the bug:Because
inputdoesn't require gradient, theggWandggbterms are set to incorrect indices, and it throws this weird error message:Why is this not detected in our tests: our double backward tests usually sets all parameters to
requires_grad=True. Therefore the case where a backward function call returns an undefined tensor is not tested.An issue has been submitted on testing with more diverse configurations: #4813.
Thanks @ezyang for helping me finding the cause.
Relevant forum post: https://discuss.pytorch.org/t/autograd-grad-dimension-error/12083/8