Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
- 
            Updated
            Feb 24, 2025 
- Jupyter Notebook
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
使用Decoder-only的Transformer进行时序预测,包含SwiGLU和RoPE(Rotary Positional Embedding),Time series prediction using Decoder-only Transformer, Including SwiGLU and RoPE(Rotary Positional Embedding)
Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (RoPe), SwishGLU, RMSNorm, Mixture of Experts (MOE). Tested on Taylor Swift song lyrics dataset.
Transformer Models for Humorous Text Generation. Fine-tuned on Russian jokes dataset with ALiBi, RoPE, GQA, and SwiGLU.Plus a custom Byte-level BPE tokenizer.
my llama3 implementation
Add a description, image, and links to the swiglu topic page so that developers can more easily learn about it.
To associate your repository with the swiglu topic, visit your repo's landing page and select "manage topics."