This is my personal repo for working through Stanford's CS336. I watching the lectures and implementing all assignmenets on my own. I doing everything with just a 4060 (8GB VRAM), 16GB of RAM, and an AMD Ryzen 9 (16 Cores). The OS I am using is Arch.
Most assignments have a compute costly / compute minimal costs. Everything is implemented for both setups, however, I will only be training langauge models on small datasets.
I am skipping major experiments for now as it takes to long on a laptop. I.e. 1.7
- Assignment 1: Building a Transformer LM // Marking as done for now as experiments not really feasible
- Part 1: Traing BPE
- Part 2: Tokenizer
- Part 3: Language Model
- Part 4: Training Requirements
- Part 5: Training Loop
- Part 6: Generating Text
- Part 7: Experiments
- Assignment 2: Systems and Parallelism
- Part 1: Benchmarking Script
- Part 2: Flash Attention 2 Triton Implementation
- Part 3: Implement DDP
- Part 4: Optimizer Sharding