nanoGPT -> modern LLMs, gippity walks you through the little and big innovations in LLMs. Each branch has the implementation of a specific technique that made LLMs go more brr... and the subsequent branches build upon the previous ones (in the listed order).
- master - trains a vanilla GPT (truthful to nanoGPT, and therefore GPT-2 and 3) to establish a baseline.
- rope - adds RoPE and RMSNorm to achieve faster training iterations, lower memory footprint, and faster convergence.