Ever wish you had an inefficient but somewhat legible collection of machine learning algorithms implemented exclusively in numpy? No?
This repo includes code for the following models:
- 
Gaussian mixture model - EM training
 
- 
Hidden Markov model - Viterbi decoding
- Likelihood computation
- MLE parameter estimation via Baum-Welch/forward-backward algorithm
 
- 
Latent Dirichlet allocation (topic model) - Standard model with MLE parameter estimation via variational EM
- Smoothed model with MAP parameter estimation via MCMC
 
- 
Neural networks - Layers / Layer-wise ops
- Add
- Flatten
- Multiply
- Softmax
- Fully-connected/Dense
- Sparse evolutionary connections
- LSTM
- Elman-style RNN
- Max + average pooling
- Dot-product attention
- Restricted Boltzmann machine (w. CD-n training)
- 2D deconvolution (w. padding and stride)
- 2D convolution (w. padding, dilation, and stride)
- 1D convolution (w. padding, dilation, stride, and causality)
 
- Modules
- Bidirectional LSTM
- ResNet-style residual blocks (identity and convolution)
- WaveNet-style residual blocks with dilated causal convolutions
- Transformer-style multi-headed scaled dot product attention
 
- Regularizers
- Dropout
 
- Normalization
- Batch normalization (spatial and temporal)
- Layer normalization (spatial and temporal)
 
- Optimizers
- SGD w/ momentum
- AdaGrad
- RMSProp
- Adam
 
- Learning Rate Schedulers
- Constant
- Exponential
- Noam/Transformer
- Dlib scheduler
 
- Weight Initializers
- Glorot/Xavier uniform and normal
- He/Kaiming uniform and normal
- Standard and truncated normal
 
- Losses
- Cross entropy
- Squared error
- Bernoulli VAE loss
- Wasserstein loss with gradient penalty
 
- Activations
- ReLU
- Tanh
- Affine
- Sigmoid
- Leaky ReLU
 
- Models
- Bernoulli variational autoencoder
- Wasserstein GAN with gradient penalty
 
- Utilities
- col2im(MATLAB port)
- im2col(MATLAB port)
- conv1D
- conv2D
- deconv2D
- minibatch
 
 
- Layers / Layer-wise ops
- 
Tree-based models - Decision trees (CART)
- [Bagging] Random forests
- [Boosting] Gradient-boosted decision trees
 
- 
Linear models - Ridge regression
- Logistic regression
- Ordinary least squares
- Bayesian linear regression w/ conjugate priors
- Unknown mean, known variance (Gaussian prior)
- Unknown mean, unknown variance (Normal-Gamma / Normal-Inverse-Wishart prior)
 
 
- 
n-Gram sequence models - Maximum likelihood scores
- Additive/Lidstone smoothing
- Simple Good-Turing smoothing
 
- 
Reinforcement learning models - Cross-entropy method agent
- First visit on-policy Monte Carlo agent
- Weighted incremental importance sampling Monte Carlo agent
- Expected SARSA agent
- TD-0 Q-learning agent
- Dyna-Q / Dyna-Q+ with prioritized sweeping
 
- 
Nonparameteric models - Nadaraya-Watson kernel regression
- k-Nearest neighbors classification and regression
 
- 
Preprocessing - Discrete Fourier transform (1D signals)
- Bilinear interpolation (2D signals)
- Nearest neighbor interpolation (1D and 2D signals)
- Autocorrelation (1D signals)
- Signal windowing
- Text tokenization
- Feature hashing
- Feature standardization
- One-hot encoding / decoding
- Huffman coding / decoding
- Term frequency-inverse document frequency encoding
 
- 
Utilities - Similarity kernels
- Distance metrics
- Priority queues
- Ball tree data structure
 
Am I missing your favorite model? Is there something that could be cleaner / less confusing? Did I mess something up? Submit a PR! The only requirement is that your models are written with just the Python standard library and numpy. The SciPy library is also permitted under special circumstances ;)