A collection of Deep Learning models and experiments.
This repository contains:
- Custom implementation of different architectures.
- Training pipelines for different modalities.
- Docker scripts for reproducible experiments.
A custom normalization layer from "Transformers without Normalization" that leverages a parameterized tanh function. This can be used as an alternative to Normalization Layers.
A minimal implementation of the Vision Transformer architecture as described in the "An Image is Worth 16x16 Words" paper.
U-Net based model using 3D convolutions for regression on spatiotemporal data. Follows the implementation proposed in "3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation" and includes residual connection in the encoder and decoder blocks.
LayerNorm vs Dynamic Tanh (DyT) Normalization in Small Vision Transformers: Includes a study of the loss and accuracy dynamics when training small ViT models from scratch on tiny-imagenet-200 with different normalization approaches. It also includes a brief time analysis comparing the performance of RMSNorm, LayerNorm, and DyT.
zoo/
├── configs/ # Training configuration files
├── data_utils/ # Data loading and augmentation utilities
├── models/ # Model and Layers implementations
├── docker-build.sh # Script to build Docker container
├── docker-run.sh # Script to run Docker container
├── Dockerfile # Docker configuration
├── train_*.py # Training script(s)
└── utils.py # Utility functions
└── vis.py # Visualization functions
This project is licensed under the MIT License, see the LICENSE file for details.