Adapt TransposedConvolution Layer#3967
Conversation
|
Hi, I’m wrapping up the last pieces of this PR for the transposed convolution layer and had a few open questions where I could use your input:
I initially planned to revisit these over the weekend, but they’re starting to feel out of scope—especially since the old code didn’t really address them either. Would love your thoughts on the best way to move forward! Thanks in advance! |
|
Sorry for the slow response @ranjodhsingh1729. I kickstarted the rest of the CI and restarted the cross-compilation job (hopefully that will succeed, if not, it's probably not your fault. I've been fighting one of those systems for quite a while now).
Which is the test that seems to expect output padding? As far as I can tell, outside of the tests, output padding is not required anywhere (I am looking in the models and examples repository). So my inclination would be to leave it out and match the padding strategy and parameter names of the regular convolution layer (input only).
I have not looked at the implementation yet, but I think that we should handle that case, yes. Perhaps there is more trickiness than I am thinking at this moment, but at a first glance it seems like this should not be so hard to handle.
Yes, the user should not need to specify the output sizes---that should now be computed based on the input size and the kernel size (and the padding size). So if the result is that I can't promise a fast review here (there is just too much else going on) but I will come back to this when I'm able to. I definitely am interested in an updated |
I am currently getting memory related errors while trying to use this in an autoencoder (BTW good to know there is one i can adapt from examples repository). I'll definitely try to handle this once everything else is working smoothly.
No problem. |
| else | ||
| { | ||
| MakeAlias(weights, weightsIn, weight.n_elem, 1); | ||
| } |
There was a problem hiding this comment.
If you're getting memory-related errors, here is the first place I would check (and also anywhere else you are making aliases). Does WeightSize() return the exact same number of elements that your aliases use? And are the other aliases the exact same size as the matrices they are aliases of? Just a guess to hopefully point you in a useful direction.
I really like valgrind --track-origins=yes for figuring out precisely where things are going wrong. There's still some thinking to be done about what has actually happened to cause the memory issue, but it at least gives you a place to dig in to.
There was a problem hiding this comment.
Thanks. Hopefully i will be able to get the example to work and wrap this up next month.
There was a problem hiding this comment.
Just a quick update — I’ve found the issue! I’m still seeing just noise with the autoencoder test, so it’s not quite ready yet. I’ll keep working on it and remove the draft label once everything’s solid. Totally understand if you're swamped — I can look into other issues while this is in progress.
There was a problem hiding this comment.
Quick Update — The TConv layer wasn’t the issue; ReLU was. Switching to LeakyReLU or Sigmoid fixed it. With ReLU, all values going into TConv were zero — even with He/Xavier init. The same setup works fine in PyTorch, so it might be the random init not being random. Tried different seeds — no luck.
I’ll try and figure this out and test with a more standard setup sometime around next week.
/**
* Model:
*
* Encoder:
* 28x28x1 --[Conv2x2, s=2]--> 14x14x16 --[Sigmoid]-->
* 14x14x16 --[Conv2x2, s=2]--> 7x7x32 --[Sigmoid]-->
*
* Decoder:
* 7x7x32 --[TConv2x2, s=2]--> 14x14x16 --[Sigmoid]-->
* 14x14x16 --[TConv2x2, s=2]--> 28x28x1 --[Sigmoid]-->
*
* Training:
* - Loss: MSE
* - Optimizer: Adam (step size: 0.1, batch: 128)
* - Epochs: 10
* - Init: Xavier
*/
db0faab to
ac121af
Compare
While Edit:- If we really need to crop the output in this case. I'll try to figure that out but i don't see a good approach yet. Any suggestions? |
ac121af to
4c3ef69
Compare
For now, the Transposed Convolution layer throws an error in this case until we decide on a proper solution. |
nzachary
left a comment
There was a problem hiding this comment.
I meant to do this after #3988 was merged but I was busy with midterms and forgot about it.
#3988 changed a bunch of things with the convolution rules so I added some suggestions to fix things here. I haven't actually tested this so you will probably need to change it.
As for the padding issue, I think Tensorflow's approach is correct. The regular convolution's size is outSize = floor(inSize + pSideOne + pSideTwo - filterSize) / stride + 1. Since the transposed convolution is the opposite of the regular convolution, I think the size should be the inverse of the regular convolution's size. If I'm doing the math correctly, the transposed convolution should have outSize = stride * (inSize - 1) - pSideOne - pSideTwo + filterSize.
|
Hi @nzachary! Sorry for the late response and thanks for the suggestions. I've made the necessary changes (corresponding changes in #3988 in convolution_impl.hpp made it really straightforward). Also, #3988 was cool work I saw the difference in speed between Naive and im2col. There are still some things pending (add few more tests, fix that padding issue, ...). I'll try get them done just after the exams. Thanks again! |
|
Hi can this be reopened if the changes look appropriate? |
|
The autoencoder test is just something i threw together. It is the one failing by a margin in the windows build. I think it best just to remove it. I think the test i've added in |
|
Thanks! Thanks a lot for taking the time to review this! |
nzachary
left a comment
There was a problem hiding this comment.
Looks good to me, just have a couple 1 character changes.
Do you want to add something to the HISTORY.md?
rcurtin
left a comment
There was a problem hiding this comment.
Hey there, I haven't had a chance to look at this and I think @nzachary has given the implementation a good review. I took a look through the tests and have a few comments---to me it seems like everything works, but for the sake of debugging in the medium to far future, if something breaks, I think it would be great to add some details about where the test values come from.
There was a problem hiding this comment.
No need for an extra line here.
There was a problem hiding this comment.
This shouldn't be needed---it should already be included by core.hpp.
There was a problem hiding this comment.
It would really be easier to use arma::approx_equal() here. But, I'm not sure how good of a test it is: you just set all the weights to zero and then check that they are still zero. (Without checking the size, too.) If the code has a bug where the input weights are ignored, it's likely that they will be set to zero anyway, and so this test will pass when it shouldn't.
Also, if you use approx_equal(), that does a size check too, so you can clean a bunch of things up.
There was a problem hiding this comment.
| const auto& c = configs[i]; | |
| const Config& c = configs[i]; |
This is pedantic, but auto makes code harder to read because it's not explicitly clear what the type is.
There was a problem hiding this comment.
Can you provide a little insight on specifically why these values are chosen and what they are testing?
There was a problem hiding this comment.
I tried to use odd/even, equal/unequal values for width and height of kernel, stride and padding sometime with width > height and other times height > width. After the initial rewrite these helped identify some silly bugs. So i kept them as is.
Do you think we need to cover more cases here?
There was a problem hiding this comment.
No, I think it's just fine, but I think it would be a good idea to add a comment that gives an idea of where the values came from and what edge cases you were trying to hit. Even what you just wrote above adapted into comment form would be fine. Basically I envision someone coming back to this test a couple years from now while implementing new support or refactoring something and finding that they've broken this test... but memory of this discussion here is long forgotten, so the first thought is "what are we even trying to test here?" which is not trivial to figure out just by looking at the code. :)
There was a problem hiding this comment.
| #include <mlpack/core/data/text_options.hpp> |
No need to include this.
There was a problem hiding this comment.
How long does this test take to run? If it is a long test we should add the [long] tag so that it doesn't hammer the low-resource devices in CI.
There was a problem hiding this comment.
2s isn't too bad, but it would probably be way more on some of the really low-resource systems, so I think adding [long] is a good idea. Thanks!
There was a problem hiding this comment.
Honestly I would suggest just putting this into transposed_convolution.cpp. Each new file we add (with the exception of the layer tests because of how they are included) adds non-trivial compilation overhead---it's actually faster to put more tests into one file.
There was a problem hiding this comment.
Okay will do
There was a problem hiding this comment.
| const auto& c = configs[i]; | |
| const Config& c = configs[i]; |
| { | ||
| 1, 3, 3, 1, 1, 0, 0, 0, 0, 4, 4, | ||
| {{0, 1.0}, {8, 2.0}}, | ||
| 360.0, 720.0, 15915.0, 10 |
There was a problem hiding this comment.
Can you provide some insight on how you came to these values?
There was a problem hiding this comment.
yes, i calculated these using torch.
I also checked the values for some cases too (as opposed to just the sum) which revealed #4050. I thought about matching something other than sum as a lot of bugs could escape that but ultimately decided against it (do you think its worth doing?)
I shared a gist above which contains the script i used to verify these numbers. I'll leave a comment in the test saying that these numbers were verified using pytorch.
Here is the script:- https://gist.github.com/ranjodhsingh1729/48f28648187fd4eed7d30c95069808f7
There was a problem hiding this comment.
Perfect, thanks---adding a note to the code will be helpful for figuring out what is going on later. You can even link to the gist if you want, either way is fine.
|
@nzachary up to you when you want to merge this one since you led the review; if everything looks good from your end and you're confident in your review feel free to merge away. My minor comments have all been addressed (thanks @ranjodhsingh1729!) 😄 |
|
Thanks @ranjodhsingh1729 ! |
Resolves: #3959
Description:
This PR updates the
TransposedConvolutionlayer to mlpack’s new layer interface.Changes
As suggested by @rcurtin, I followed a mostly straightforward translation to the new API.
I’ve added some tests — some ported from the legacy
ann_layer_test.cpp, others adapted from convolution layer tests. All tests are currently passing, except ones which depend on the remaining work.Remaining Work
Add support foroutputPaddingEDIT:- Added after realizing it would be necessary for some architectures.
Implement alignment logic (aWandaH)EDIT:- (they were there for output padding i think)
Handle output cropping when padding becomes negative (e.g., in'same'padding cases)Edit:-
When paddingType == "same" and stride > 1, the computed padding can go negative, meaning the output sometimes needs to be cropped to match the input. Tensorflow's implementations of same padding make sure output = stride * input rather than output = input (size). This doesn't create the negative padding problem. We can do the same as tf or We can just through error on encountering this case, forcing users to think about what they want i.e. calculate manually and use paddingType "none".
Feedback on the current direction would be much appreciated.