-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi,
First of all, thank you for your excellent and pioneering work on this project and the accompanying paper!
I have been trying to reproduce your results and have found your codebase very helpful.
However, when running the MADE training command as follows:
python training.py -save True -n_type 'made' -c_type 'sur' -n 13 -d 3 -k 1 -seed 0 -er 0.189 -device 'cuda:0' -batch 10000 -epoch 50000 -depth 3 -width 20
I noticed that the depth and width hyperparameters have a significant impact on the final network performance.
In your paper, the results for code distances from 3 to 13 all surpass the benchmark impressively. But in my experiments, starting from d=7, the default -depth 3 -width 20 settings seem less effective. For d=9 and above, even after extensive parameter search, it's quite difficult to find suitable hyperparameters, and the required GPU resources and training time are extremely high.
So my questions are:
-
Do you have any recommended hyperparameter settings (especially for depth and width) for larger code distances such as d=7, 9, 11, 13?
-
Are there any tricks or best practices to make the training more efficient at these larger distances?
Thank you so much for your help!
您好,非常感谢您开源的代码和前沿的研究工作!我在复现您的论文和代码时,受益良多。
在运行如下MADE训练指令时:
python training.py -save True -n_type 'made' -c_type 'sur' -n 13 -d 3 -k 1 -seed 0 -er 0.189 -device 'cuda:0' -batch 10000 -epoch 50000 -depth 3 -width 20
我发现depth和width这两个超参数对最终网络性能影响很大。
您的论文中,distance从3到13的结果都优于benchmark,非常令人印象深刻。但我在复现时,d=7开始,默认的-depth 3 -width 20效果就不是很好了。d=9及以上,哪怕花了很多时间搜索参数,也很难找到合适的设置,而且GPU资源和训练时间消耗非常大。
所以想请教:
-
对于d=7、9、11、13这些较大distance,您是否有更推荐的depth和width等超参数设置?
-
是否有提升大distance训练效率的技巧或经验?
非常感谢!