Personal learning codes for Deep Learning https://www.oreilly.co.jp/books/9784873117584/
Nice reads:
Hacker's guide to Neural Networks https://karpathy.github.io/neuralnets/
Understanding the backward pass through Batch Normalization Layer https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html
ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION (2015) https://arxiv.org/pdf/1412.6980
Attention Is All You Need (2017) https://arxiv.org/abs/1706.03762
An overview of gradient descent optimization algorithms (2015) https://arxiv.org/pdf/1609.04747
On the importance of initialization and momentum in deep learning (2016) https://www.cs.toronto.edu/~fritz/absps/momentum.pdf
A Comparison of Optimization Algorithms for Deep Learning (2020) https://arxiv.org/abs/2007.14166
Learning to Optimize: A Primer and A Benchmark (2022) https://www.jmlr.org/papers/volume23/21-030 8/21-0308.pdf
Xavier: Understanding the difficulty of training deep feedforward neural networks (2010) https://proceedings.mlr.press/v9/glorot10a.html
He: Deep Residual Learning for Image Recognition (2015) https://arxiv.org/abs/1512.03385
Gradient-based learning applied to document recognition (1998) https://ieeexplore.ieee.org/document/726791