Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Adapt TransposedConvolution Layer#3967

Merged
nzachary merged 31 commits intomlpack:masterfrom
ranjodhsingh1729:transposed-convolution
Feb 16, 2026
Merged

Adapt TransposedConvolution Layer#3967
nzachary merged 31 commits intomlpack:masterfrom
ranjodhsingh1729:transposed-convolution

Conversation

@ranjodhsingh1729
Copy link
Contributor

@ranjodhsingh1729 ranjodhsingh1729 commented Jul 12, 2025

Resolves: #3959

Description:

This PR updates the TransposedConvolution layer to mlpack’s new layer interface.

Changes

As suggested by @rcurtin, I followed a mostly straightforward translation to the new API.

I’ve added some tests — some ported from the legacy ann_layer_test.cpp, others adapted from convolution layer tests. All tests are currently passing, except ones which depend on the remaining work.

Remaining Work

  • Add support for outputPadding
    EDIT:- Added after realizing it would be necessary for some architectures.
  • Implement alignment logic (aW and aH)
    EDIT:- (they were there for output padding i think)
  • Handle output cropping when padding becomes negative (e.g., in 'same' padding cases)
    Edit:-
    When paddingType == "same" and stride > 1, the computed padding can go negative, meaning the output sometimes needs to be cropped to match the input. Tensorflow's implementations of same padding make sure output = stride * input rather than output = input (size). This doesn't create the negative padding problem. We can do the same as tf or We can just through error on encountering this case, forcing users to think about what they want i.e. calculate manually and use paddingType "none".

Feedback on the current direction would be much appreciated.

@ranjodhsingh1729 ranjodhsingh1729 marked this pull request as draft July 12, 2025 17:21
@ranjodhsingh1729
Copy link
Contributor Author

Hi,

I’m wrapping up the last pieces of this PR for the transposed convolution layer and had a few open questions where I could use your input:

  1. Output Padding
    Older versions didn’t support output padding, but one of the existing tests seems to expect it.
    Do we want to support this officially now? And if so, should that be part of this PR?

  2. "Same" Padding with Stride > 1
    When paddingType == "same" and stride > 1, the computed padding can go negative, meaning the output sometimes needs to be cropped to match the input.
    Should we handle that explicitly?

  3. Alignment Parameters (aW, aH)
    Earlier implementations used aW and aH, based on user-specified outputSize. Since we now compute the output size internally, these values always end up as zero.
    Is it safe to remove them?

I initially planned to revisit these over the weekend, but they’re starting to feel out of scope—especially since the old code didn’t really address them either. Would love your thoughts on the best way to move forward!

Thanks in advance!

@rcurtin
Copy link
Member

rcurtin commented Jul 30, 2025

Sorry for the slow response @ranjodhsingh1729. I kickstarted the rest of the CI and restarted the cross-compilation job (hopefully that will succeed, if not, it's probably not your fault. I've been fighting one of those systems for quite a while now).

Output Padding
Older versions didn’t support output padding, but one of the existing tests seems to expect it.
Do we want to support this officially now? And if so, should that be part of this PR?

Which is the test that seems to expect output padding? As far as I can tell, outside of the tests, output padding is not required anywhere (I am looking in the models and examples repository). So my inclination would be to leave it out and match the padding strategy and parameter names of the regular convolution layer (input only).

"Same" Padding with Stride > 1
When paddingType == "same" and stride > 1, the computed padding can go negative, meaning the output sometimes needs to be cropped to match the input.
Should we handle that explicitly?

I have not looked at the implementation yet, but I think that we should handle that case, yes. Perhaps there is more trickiness than I am thinking at this moment, but at a first glance it seems like this should not be so hard to handle.

Alignment Parameters (aW, aH)
Earlier implementations used aW and aH, based on user-specified outputSize. Since we now compute the output size internally, these values always end up as zero.
Is it safe to remove them?

Yes, the user should not need to specify the output sizes---that should now be computed based on the input size and the kernel size (and the padding size). So if the result is that aW and aH are always zero, then I don't see any need to keep them.

I can't promise a fast review here (there is just too much else going on) but I will come back to this when I'm able to. I definitely am interested in an updated TransposedConvolution layer since that will re-enable the mnist_cnn_vae example in the examples repository (https://github.com/mlpack/examples).

@ranjodhsingh1729
Copy link
Contributor Author

I have not looked at the implementation yet, but I think that we should handle that case, yes. Perhaps there is more trickiness than I am thinking at this moment, but at a first glance it seems like this should not be so hard to handle.

I am currently getting memory related errors while trying to use this in an autoencoder (BTW good to know there is one i can adapt from examples repository). I'll definitely try to handle this once everything else is working smoothly.

I can't promise a fast review here (there is just too much else going on) but I will come back to this when I'm able to. I definitely am interested in an updated TransposedConvolution layer since that will re-enable the mnist_cnn_vae example in the examples repository (https://github.com/mlpack/examples).

No problem.

else
{
MakeAlias(weights, weightsIn, weight.n_elem, 1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're getting memory-related errors, here is the first place I would check (and also anywhere else you are making aliases). Does WeightSize() return the exact same number of elements that your aliases use? And are the other aliases the exact same size as the matrices they are aliases of? Just a guess to hopefully point you in a useful direction.

I really like valgrind --track-origins=yes for figuring out precisely where things are going wrong. There's still some thinking to be done about what has actually happened to cause the memory issue, but it at least gives you a place to dig in to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Hopefully i will be able to get the example to work and wrap this up next month.

Copy link
Contributor Author

@ranjodhsingh1729 ranjodhsingh1729 Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick update — I’ve found the issue! I’m still seeing just noise with the autoencoder test, so it’s not quite ready yet. I’ll keep working on it and remove the draft label once everything’s solid. Totally understand if you're swamped — I can look into other issues while this is in progress.

Copy link
Contributor Author

@ranjodhsingh1729 ranjodhsingh1729 Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick Update — The TConv layer wasn’t the issue; ReLU was. Switching to LeakyReLU or Sigmoid fixed it. With ReLU, all values going into TConv were zero — even with He/Xavier init. The same setup works fine in PyTorch, so it might be the random init not being random. Tried different seeds — no luck.

I’ll try and figure this out and test with a more standard setup sometime around next week.

ReLU: image
LeakyReLU/Sigmoid: image image image

/**
 * Model:
 *
 * Encoder:
 * 28x28x1 --[Conv2x2, s=2]--> 14x14x16 --[Sigmoid]-->
 * 14x14x16 --[Conv2x2, s=2]--> 7x7x32 --[Sigmoid]-->
 *
 * Decoder:
 * 7x7x32 --[TConv2x2, s=2]--> 14x14x16 --[Sigmoid]-->
 * 14x14x16 --[TConv2x2, s=2]--> 28x28x1 --[Sigmoid]-->
 *
 * Training:
 * - Loss: MSE
 * - Optimizer: Adam (step size: 0.1, batch: 128)
 * - Epochs: 10
 * - Init: Xavier
 */

@ranjodhsingh1729 ranjodhsingh1729 force-pushed the transposed-convolution branch 2 times, most recently from db0faab to ac121af Compare August 2, 2025 10:42
@ranjodhsingh1729
Copy link
Contributor Author

ranjodhsingh1729 commented Aug 2, 2025

When paddingType == "same" and stride > 1, the computed padding can go negative, meaning the output sometimes needs to be cropped to match the input.
Should we handle that explicitly?

I have not looked at the implementation yet, but I think that we should handle that case, yes. Perhaps there is more trickiness than I am thinking at this moment, but at a first glance it seems like this should not be so hard to handle.

While it may not be as tricky :), it may be simpler handling this in padding layer. I personally am not sure if this "should be" handled in padding layer. What do you think?

Edit:-
Initially i thought one could just crop the output to match the input. Now when revisiting this work i realized that if i crop the output of the forward and the backward gets the cropped output it would affect its dimensions and then the gradient will be affected and it will be a mess. So, i looked for a solution as to how other libraries handle this. I found that Tensorflow's implementations of same padding make sure output = stride * input rather than output = input (size). I put this in the equations and it all works out with no possibility of negative padding ( unless filter size is greater than the input ). Now I can only see only two reasonable solutions: don't support same padding for stride > 1 OR do what tensorflow does.

If we really need to crop the output in this case. I'll try to figure that out but i don't see a good approach yet. Any suggestions?

@ranjodhsingh1729
Copy link
Contributor Author

When paddingType == "same" and stride > 1, the computed padding can go negative.

For now, the Transposed Convolution layer throws an error in this case until we decide on a proper solution.

@ranjodhsingh1729 ranjodhsingh1729 marked this pull request as ready for review September 20, 2025 17:11
@github-actions github-actions bot removed the s: stale label Sep 21, 2025
@conradsnicta conradsnicta marked this pull request as draft October 23, 2025 03:25
Copy link
Member

@nzachary nzachary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to do this after #3988 was merged but I was busy with midterms and forgot about it.

#3988 changed a bunch of things with the convolution rules so I added some suggestions to fix things here. I haven't actually tested this so you will probably need to change it.

As for the padding issue, I think Tensorflow's approach is correct. The regular convolution's size is outSize = floor(inSize + pSideOne + pSideTwo - filterSize) / stride + 1. Since the transposed convolution is the opposite of the regular convolution, I think the size should be the inverse of the regular convolution's size. If I'm doing the math correctly, the transposed convolution should have outSize = stride * (inSize - 1) - pSideOne - pSideTwo + filterSize.

@ranjodhsingh1729
Copy link
Contributor Author

ranjodhsingh1729 commented Dec 8, 2025

Hi @nzachary! Sorry for the late response and thanks for the suggestions. I've made the necessary changes (corresponding changes in #3988 in convolution_impl.hpp made it really straightforward). Also, #3988 was cool work I saw the difference in speed between Naive and im2col.

There are still some things pending (add few more tests, fix that padding issue, ...). I'll try get them done just after the exams.

Thanks again!

@github-actions github-actions bot closed this Jan 27, 2026
@ranjodhsingh1729
Copy link
Contributor Author

Hi can this be reopened if the changes look appropriate?

@nzachary nzachary reopened this Feb 3, 2026
@github-actions github-actions bot removed the s: stale label Feb 4, 2026
@ranjodhsingh1729 ranjodhsingh1729 marked this pull request as ready for review February 4, 2026 14:15
@ranjodhsingh1729
Copy link
Contributor Author

The autoencoder test is just something i threw together. It is the one failing by a margin in the windows build. I think it best just to remove it. I think the test i've added in transposed_convolution.hpp should be enough.

@ranjodhsingh1729
Copy link
Contributor Author

@nzachary

Thanks! Thanks a lot for taking the time to review this!
I've handled the comments. If i missed anything or somethings a little off please let me know.

Copy link
Member

@nzachary nzachary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just have a couple 1 character changes.

Do you want to add something to the HISTORY.md?

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there, I haven't had a chance to look at this and I think @nzachary has given the implementation a good review. I took a look through the tests and have a few comments---to me it seems like everything works, but for the sake of debugging in the medium to far future, if something breaks, I think it would be great to add some details about where the test values come from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

No need for an extra line here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed---it should already be included by core.hpp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would really be easier to use arma::approx_equal() here. But, I'm not sure how good of a test it is: you just set all the weights to zero and then check that they are still zero. (Without checking the size, too.) If the code has a bug where the input weights are ignored, it's likely that they will be set to zero anyway, and so this test will pass when it shouldn't.

Also, if you use approx_equal(), that does a size check too, so you can clean a bunch of things up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto& c = configs[i];
const Config& c = configs[i];

This is pedantic, but auto makes code harder to read because it's not explicitly clear what the type is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a little insight on specifically why these values are chosen and what they are testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use odd/even, equal/unequal values for width and height of kernel, stride and padding sometime with width > height and other times height > width. After the initial rewrite these helped identify some silly bugs. So i kept them as is.

Do you think we need to cover more cases here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's just fine, but I think it would be a good idea to add a comment that gives an idea of where the values came from and what edge cases you were trying to hit. Even what you just wrote above adapted into comment form would be fine. Basically I envision someone coming back to this test a couple years from now while implementing new support or refactoring something and finding that they've broken this test... but memory of this discussion here is long forgotten, so the first thought is "what are we even trying to test here?" which is not trivial to figure out just by looking at the code. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <mlpack/core/data/text_options.hpp>

No need to include this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this test take to run? If it is a long test we should add the [long] tag so that it doesn't hammer the low-resource devices in CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I was aware of this but i was unaware of the [long] tag. I'll add it after moving it to the other file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2s isn't too bad, but it would probably be way more on some of the really low-resource systems, so I think adding [long] is a good idea. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I would suggest just putting this into transposed_convolution.cpp. Each new file we add (with the exception of the layer tests because of how they are included) adds non-trivial compilation overhead---it's actually faster to put more tests into one file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto& c = configs[i];
const Config& c = configs[i];

{
1, 3, 3, 1, 1, 0, 0, 0, 0, 4, 4,
{{0, 1.0}, {8, 2.0}},
360.0, 720.0, 15915.0, 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some insight on how you came to these values?

Copy link
Contributor Author

@ranjodhsingh1729 ranjodhsingh1729 Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i calculated these using torch.
I also checked the values for some cases too (as opposed to just the sum) which revealed #4050. I thought about matching something other than sum as a lot of bugs could escape that but ultimately decided against it (do you think its worth doing?)

I shared a gist above which contains the script i used to verify these numbers. I'll leave a comment in the test saying that these numbers were verified using pytorch.

Here is the script:- https://gist.github.com/ranjodhsingh1729/48f28648187fd4eed7d30c95069808f7

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks---adding a note to the code will be helpful for figuring out what is going on later. You can even link to the gist if you want, either way is fine.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@rcurtin
Copy link
Member

rcurtin commented Feb 13, 2026

@nzachary up to you when you want to merge this one since you led the review; if everything looks good from your end and you're confident in your review feel free to merge away. My minor comments have all been addressed (thanks @ranjodhsingh1729!) 😄

@nzachary nzachary merged commit b19881c into mlpack:master Feb 16, 2026
15 checks passed
@nzachary
Copy link
Member

Thanks @ranjodhsingh1729 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments