Intro LLM v1
Intro LLM v1
Kun Yuan (袁 坤)
• Course plans
Note: The main contents of this lecture is summarized from two wonderful talks [1,2] by Andrej Karpathy
<2>
Teaching assistants
• Neural network parameters + the code to run them; that’s all you need
<5>
Large language model
<6>
What is the model parameter?
• LLM can be regarded as a magic function that maps the context to the next word
<latexit sha1_base64="ZrH5KG8RDPFMJZfO36vqFz9dXhc=">AAACG3icbVDLSgMxFM3Ud31VXboJFmndlJlSVBBBdONSwT6gLW0mzbShmWRI7ohl6H+48VfcuFDEleDCvzGtFbT1QOBwzrnk3uNHghtw3U8nNTe/sLi0vJJeXVvf2MxsbVeMijVlZaqE0jWfGCa4ZGXgIFgt0oyEvmBVv38x8qu3TBuu5A0MItYMSVfygFMCVmplikG+EfrqLmm3rYQNB6wkJrnc8KQBPQbkAJ/in0RIwBqtTNYtuGPgWeJNSBZNcNXKvDc6isYhk0AFMabuuRE0E6KBU8GG6UZsWERon3RZ3VJJQmaayfi2Id63SgcHStsnAY/V3xMJCY0ZhL5N2vV6Ztobif959RiC42bCZRQDk/T7oyAWGBQeFYU7XDMKYmAJoZrbXTHtEU0o2DrTtgRv+uRZUikWvMNC6bqUPTuf1LGMdtEeyiMPHaEzdImuUBlRdI8e0TN6cR6cJ+fVefuOppzJzA76A+fjC/adoCE=</latexit>
• Given the model parameter ✓ , LLM can predict the next word
<latexit sha1_base64="pYM132qgRhMHaU/a61ywLFbdrWg=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Ae0oWy2m3btZhN2J0IJ/Q9ePCji1f/jzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6wEnC/YgOlQgFo2ilVg9HHGm/XHGr7hxklXg5qUCORr/81RvELI24QiapMV3PTdDPqEbBJJ+WeqnhCWVjOuRdSxWNuPGz+bVTcmaVAQljbUshmau/JzIaGTOJAtsZURyZZW8m/ud1Uwyv/UyoJEWu2GJRmEqCMZm9TgZCc4ZyYgllWthbCRtRTRnagEo2BG/55VXSuqh6l9Xafa1Sv8njKMIJnMI5eHAFdbiDBjSBwSM8wyu8ObHz4rw7H4vWgpPPHMMfOJ8/puWPMQ==</latexit>
<7>
LLM can generate texts of various styles
<8>
How to get the weights? Training the deep neural network
• Use tremendous data and computing resources to get the valuable model parameters
• Very very expensive; update the model weights probably once a year or once a few years
<9>
How to make LLM as your personal copilot? PE and finetune
• But we should use LLM more frequently and smartly. It can be your personal copilot
• It is not easy to have your own LLM copilot. You need to know prompt engineering and finetune
< 10 >
PART 02
Data collection
Crawled data from websites; in both high quality and low quality
High-quality data
< 14 >
Pretraining
Tokenization (分词)
< 15 >
Pretraining
Token: ["The", "cat", "sat", "on", "the", "mat", ".", "The", "cat", "is", "orange", "."]
Vocabulary : {"The", "cat", "sat", "on", "the", "mat", ".", "is", "orange"}
< 16 >
Pretraining
< 17 >
Pretraining
While GPT-3 is larger, LLaMa utilizes more tokens. In practice, LLaMA significantly performs better.
We cannot judge the power of one LLM model only by its number of parameters; data also matters
It is still in debate that whether one should increase model size or data size given limited resource budget
< 18 >
Pretraining
< 19 >
Pretraining
< 20 >
Pretraining
• Parallelizable architecture
Transformer architecture
(will discuss it in later lectures)
< 21 >
Pretraining
< 22 >
Pretraining
< 23 >
Pretraining
< 24 >
Pretraining
§ Memory-efficient optimizers
§ Large-batch training
§ Mixed-precision training
< 25 >
Pretrained model provides strong transfer learning capabilities
< 26 >
Pretrained model provides strong transfer learning capabilities
• Pretrained base model only needs a small amount of data to be adapted to the down-stream applications.
§ Collect a small number of downstream data and use it to finetune the base model
< 27 >
Pretraining
< 28 >
Pretraining
• LLaMA https://github.com/facebookresearch/llama
• Bloom https://huggingface.co/bigscience
< 29 >
Supervised Finetuning
< 30 >
Supervised Finetuning
Base models cannot be deployed directly. It is still far away from being a smart assistant
< 31 >
Supervised Finetuning
< 32 >
Supervised Finetuning
< 33 >
Supervised Finetuning
• After supervised finetuning stage, base models can chat like humans
• 1-100 GPUs; days of training; but can still be very expensive due to human-generated data
• To save money, some (or most) models use ChatGPT-generated data to finetune
< 34 >
Reward modeling
< 35 >
Reward modeling
• SFT model performs like an “assistant”, but still not good enough.
• To further improve it, one can ask human contractors to generate more data; effective but expensive
• Another way is to let the model learn what response is good, and how to generate good response
• Reward model will enable GPT to judge whether a certain response is good or not
• Reward model will be used in the Reinforcement learning stage to reinforce good response
< 36 >
Reward modeling
Dataset
SFT model
generates different
responses to the
same prompt
< 37 >
Reward modeling
Dataset
SFT model
generates different
responses to the
same prompt
Ask contractors to
rank the responses;
much cheaper
< 38 >
Reward modeling
Dataset
SFT model
generates different
responses to the
same prompt
Ask contractors to
rank the responses;
much cheaper
Dataset:
{(prompt, response,
reward)}
< 39 >
Reward modeling
• Given a prompt, SFT model generates several responses, and then makes a reward prediction (green).
• After training, we achieve a RW model that can predict the reward after its generated response.
< 40 >
Reinforcement learning
< 41 >
Reinforcement learning
< 42 >
Reinforcement learning
< 43 >
Reinforcement learning
< 44 >
ChatGPT training pipeline
< 46 >
A short summary
• SFT, RM, and RL are also critical to transform GPT to your own personalized assistant
< 47 >
PART 03
< 49 >
Understand how human and LLM work differently
< 50 >
Use prompt to help LLM work like a human
< 51 >
Tree of thought
• How to find simple and effective prompts are still a hot research topic
< 52 >
Prompt ensemble
< 53 >
Ask for reflection
< 54 >
Automatic prompt engineering (APE)
< 55 >
RAG empowered LLM
< 56 >
RAG empowered LLM
ChatGPT 3.5
< 57 >
Tool use
< 58 >
Finetuning
< 59 >
LoRA: Low-rank adaptation
< 60 >
LoRA: Low-rank adaptation
< 61 >
LoRA: Low-rank adaptation
< 62 >
LoRA: Low-rank adaptation
Reference
Q. Zhang et. al., LoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning, https://arxiv.org/abs/2303.10512
< 63 >
How to use LLM effectively?
< 64 >
Use cases
< 65 >
Course plan
• 1. Preliminary
§ Attention; Transformer;
§ GPT
< 66 >
Course plan
• 2. LLM pretraining
§ SGD
< 67 >
Course plan
• 3. Finetuning
§ Supervised finetuning
§ RLHF
< 68 >
Course plan
• 4. Prompt engineering
< 69 >
Course plan
• 5. Applications
§ LLM agent
< 70 >
Grading policy
• Homework (~30%)
• Mid-term (~30%)
< 71 >
Thank you!