0% found this document useful (0 votes)

10 views72 pages

Intro LLM v1

Uploaded by

haryrise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views72 pages

Intro LLM v1

Uploaded by

haryrise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Introduction to Large Language Model

Kun Yuan (袁坤)

Feb 20, 2024

Contents

• Large language model (LLM)

• How to effectively train LLM

• How to effectively use LLM

• Course plans

Note: The main contents of this lecture is summarized from two wonderful talks [1,2] by Andrej Karpathy

[1] State of GPT

[2] The busy person’s intro to LLMs

<2>
Teaching assistants

白禹东耿云腾何雨桐李佩津刘梓豪

鲁可儿宋奕龙孙乾祐王宇驰

PART 01

Large language model (LLM)

Large language model

• Meta Llama 2 is probably the most powerful open-source LLM

• Weights, architectures, and the paper were all released by Meta

• Neural network parameters + the code to run them; that’s all you need

• No need to access your WIFI. Just one laptop

<5>
Large language model

<6>
What is the model parameter?

• LLM can be regarded as a magic function that maps the context to the next word

• Model parameter parameterize the magic function to a series of matrix-matrix(vector) products

<latexit sha1_base64="ZrH5KG8RDPFMJZfO36vqFz9dXhc=">AAACG3icbVDLSgMxFM3Ud31VXboJFmndlJlSVBBBdONSwT6gLW0mzbShmWRI7ohl6H+48VfcuFDEleDCvzGtFbT1QOBwzrnk3uNHghtw3U8nNTe/sLi0vJJeXVvf2MxsbVeMijVlZaqE0jWfGCa4ZGXgIFgt0oyEvmBVv38x8qu3TBuu5A0MItYMSVfygFMCVmplikG+EfrqLmm3rYQNB6wkJrnc8KQBPQbkAJ/in0RIwBqtTNYtuGPgWeJNSBZNcNXKvDc6isYhk0AFMabuuRE0E6KBU8GG6UZsWERon3RZ3VJJQmaayfi2Id63SgcHStsnAY/V3xMJCY0ZhL5N2vV6Ztobif959RiC42bCZRQDk/T7oyAWGBQeFYU7XDMKYmAJoZrbXTHtEU0o2DrTtgRv+uRZUikWvMNC6bqUPTuf1LGMdtEeyiMPHaEzdImuUBlRdI8e0TN6cR6cJ+fVefuOppzJzA76A+fjC/adoCE=</latexit>

f (“cat sit on a”; ✓) = “mat”

• Given the model parameter ✓ , LLM can predict the next word
<latexit sha1_base64="pYM132qgRhMHaU/a61ywLFbdrWg=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Ae0oWy2m3btZhN2J0IJ/Q9ePCji1f/jzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6wEnC/YgOlQgFo2ilVg9HHGm/XHGr7hxklXg5qUCORr/81RvELI24QiapMV3PTdDPqEbBJJ+WeqnhCWVjOuRdSxWNuPGz+bVTcmaVAQljbUshmau/JzIaGTOJAtsZURyZZW8m/ud1Uwyv/UyoJEWu2GJRmEqCMZm9TgZCc4ZyYgllWthbCRtRTRnagEo2BG/55VXSuqh6l9Xafa1Sv8njKMIJnMI5eHAFdbiDBjSBwSM8wyu8ObHz4rw7H4vWgpPPHMMfOJ8/puWPMQ==</latexit>

<7>
LLM can generate texts of various styles

code book information wikipedia

<8>
How to get the weights? Training the deep neural network

• Use tremendous data and computing resources to get the valuable model parameters

• Very very expensive; update the model weights probably once a year or once a few years
<9>
How to make LLM as your personal copilot? PE and finetune

• Over 90% of my interactions with ChatGPT are

• But we should use LLM more frequently and smartly. It can be your personal copilot

• It is not easy to have your own LLM copilot. You need to know prompt engineering and finetune

< 10 >
PART 02

ChatGPT Training Pipeline

ChatGPT training pipeline has 4 stages

Source: Andrej Karpathy, State of GPT < 12 >

Pretraining

99% training time

and resource

Source: Andrej Karpathy, State of GPT < 13 >

Pretraining

Data collection

Crawled data from websites; in both high quality and low quality

High-quality data

Training data mixture used in lLaMA model

< 14 >
Pretraining

Tokenization (分词)

Transform long texts to lists of integers

< 15 >
Pretraining

Token and vocabulary

Sentence: "The cat sat on the mat. The cat is orange."

Token: ["The", "cat", "sat", "on", "the", "mat", ".", "The", "cat", "is", "orange", "."]

Vocabulary : {"The", "cat", "sat", "on", "the", "mat", ".", "is", "orange"}

Vocabulary is a set with each element unique

< 16 >
Pretraining

< 17 >
Pretraining

While GPT-3 is larger, LLaMa utilizes more tokens. In practice, LLaMA significantly performs better.

We cannot judge the power of one LLM model only by its number of parameters; data also matters

It is still in debate that whether one should increase model size or data size given limited resource budget

< 18 >
Pretraining

< 19 >
Pretraining

< 20 >
Pretraining

• Effective representation learning

• Long-range dependency with attention

• Parallelizable architecture

• Flexibility and Adaptability

(In recent popular SORA, Diffusion +
transformer is used)

Transformer architecture
(will discuss it in later lectures)
< 21 >
Pretraining

[Training Compute-Optimal Large Language Models]

Larger dataset + bigger model + longer training

=
better prediction accuracy

A very straightforward way to achieve good LLM.

All you need is MONEY!

Amazing representation power

< 22 >
Pretraining

Larger dataset + bigger model + longer training = better prediction accuracy

< 23 >
Pretraining

< 24 >
Pretraining

• Pretraining a base model is extremely expensive

• Several effective pretraining techniques:

§ 3D parallelism: data/model/tensor parallelism

§ Memory-efficient optimizers

§ Large-batch training

§ Mixed-precision training

• Will discuss them later lectures

< 25 >
Pretrained model provides strong transfer learning capabilities

Pretrained base model performs well after finetuning

< 26 >
Pretrained model provides strong transfer learning capabilities

• Pretraining + finetuning/prompting reshapes the AI industry.

• Pretrained base model only needs a small amount of data to be adapted to the down-stream applications.

• The cost to deploy AI to down-stream applications decreases significantly

§ Achieve powerful base models from OpenAI/Google/Meta/GitHub

§ Collect a small number of downstream data and use it to finetune the base model

§ No need for expensive investment of money and talents

< 27 >
Pretraining

Base models in the wild

< 28 >
Pretraining

LLaMA and Bloom are popular open-source base models

• LLaMA https://github.com/facebookresearch/llama

• Bloom https://huggingface.co/bigscience

< 29 >
Supervised Finetuning

< 30 >
Supervised Finetuning

Base models cannot be deployed directly. It is still far away from being a smart assistant

< 31 >
Supervised Finetuning

Base models can be

tricked into being AI
assistants with prompting

We need to finetune the

base model to make it
chat like humans

< 32 >
Supervised Finetuning

Ask human contractors to

respond to prompts and
generate high-quality,
helpful, truthful, and
harmless responses

Collect 10,000+ high-

quality human-generated
responses

Finetune base models with

these high-quality data

< 33 >
Supervised Finetuning

• Dataset: 10~100K human-generated data pairs {(prompt, response)}

• Training: repeat what we did in the “Pretraining” stage

• After supervised finetuning stage, base models can chat like humans

• 1-100 GPUs; days of training; but can still be very expensive due to human-generated data

• To save money, some (or most) models use ChatGPT-generated data to finetune

< 34 >
Reward modeling

< 35 >
Reward modeling

• SFT model performs like an “assistant”, but still not good enough.

• To further improve it, one can ask human contractors to generate more data; effective but expensive

• Another way is to let the model learn what response is good, and how to generate good response

• Reward model will enable GPT to judge whether a certain response is good or not

• Reward model will be used in the Reinforcement learning stage to reinforce good response

< 36 >
Reward modeling
Dataset

SFT model
generates different
responses to the
same prompt

< 37 >
Reward modeling
Dataset

SFT model
generates different
responses to the
same prompt

Ask contractors to
rank the responses;
much cheaper

< 38 >
Reward modeling
Dataset

SFT model
generates different
responses to the
same prompt

Ask contractors to
rank the responses;
much cheaper

Dataset:
{(prompt, response,
reward)}

< 39 >
Reward modeling

• Given a prompt, SFT model generates several responses, and then makes a reward prediction (green).

• This reward will be supervised by ground-truth reward.

• After training, we achieve a RW model that can predict the reward after its generated response.

< 40 >
Reinforcement learning

< 41 >
Reinforcement learning

RL makes the model learn to generate responses with great scores

< 42 >
Reinforcement learning

< 43 >
Reinforcement learning

< 44 >
ChatGPT training pipeline

Source: Andrej Karpathy, State of GPT < 45 >

Assistant models in the wild

< 46 >
A short summary

• We discuss the pipeline to train ChatGPT

• SFT, RM, and RL are critical to transform GPT to ChatGPT

• SFT, RM, and RL are also critical to transform GPT to your own personalized assistant

< 47 >
PART 03

Use LLM Effectively As Your Personal Copilot

Understand how human and LLM work differently

• Human can plan and reflect

• Human can use tools

• Human typically thinks more

< 49 >
Understand how human and LLM work differently

• LLM strips away all human behavior

< 50 >
Use prompt to help LLM work like a human

• Chain of thoughts: break up tasks into multiple steps/stages

(will discuss it in later lectures)

< 51 >
Tree of thought

• Tree of thoughts: expand thoughts, evaluate them and then go deeper

(will discuss it in later lectures)

• How to find simple and effective prompts are still a hot research topic
< 52 >
Prompt ensemble

< 53 >
Ask for reflection

< 54 >
Automatic prompt engineering (APE)

• Learn a good prompt automatically

[Large language models are human-level prompt engineers, 2023]

< 55 >
RAG empowered LLM

Retrieval-augmented generation (RAG) helps LLM generate

more precise, up-to-date, and personalized contents.

< 56 >
RAG empowered LLM

RAG Bing Copilot

ChatGPT 3.5

< 57 >
Tool use

Offload tasks that LLM are not good at.

< 58 >
Finetuning

SFT and RLHF are all finetuning

the base pretrained model

< 59 >
LoRA: Low-rank adaptation

Finetune: Inject weights to base model

Fine-tuned weight Base model weight Additional weight

LoRA: low-rank adaptation

Fine-tuned weight Base model weight Low-rank weight

< 60 >
LoRA: Low-rank adaptation

< 61 >
LoRA: Low-rank adaptation

Light but powerful

< 62 >
LoRA: Low-rank adaptation

Reference

E. J. Hu et. al., LoRA: Low-Rank Adaptation of Large Language Models, https://arxiv.org/abs/2106.09685

Q. Zhang et. al., LoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning, https://arxiv.org/abs/2303.10512

< 63 >
How to use LLM effectively?

Recommendations from OpenAI

< 64 >
Use cases

< 65 >
Course plan

• 1. Preliminary

§ Linear algebra; optimization

§ Machine learning; deep neural network

§ Word embedding; recurrent neural network; Seq2Seq

§ Attention; Transformer;

§ GPT

< 66 >
Course plan

• 2. LLM pretraining

§ SGD

§ Momentum SGD; Adaptive SGD; Adam

§ Large-batch training; mixed-precision training

§ Data parallelism; model parallelism; tensor parallelism

< 67 >
Course plan

• 3. Finetuning

§ Supervised finetuning

§ RLHF

§ Parameter efficient finetuning (PEFT), e.g., LoRA

< 68 >
Course plan

• 4. Prompt engineering

§ Chain of thought; tree of thought

§ Principles to generate high quality prompt

§ Automatic prompt engineering

< 69 >
Course plan

• 5. Applications

§ LLM agent

§ LLM in decision intelligence

< 70 >
Grading policy

• Homework (~30%)

• Mid-term (~30%)

• Final project and presentation (~40%)

< 71 >
Thank you!

Kun Yuan homepage: https://kunyuan827.github.io/

IDiscover Leaflet
33% (3)
IDiscover Leaflet
24 pages
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
100% (3)
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
275 pages
Quick Start Guide to LLMs 2nd Ed
No ratings yet
Quick Start Guide to LLMs 2nd Ed
279 pages
LLMs Guide for Developers & Data Scientists
100% (14)
LLMs Guide for Developers & Data Scientists
132 pages
Building LLMs - Stanford
No ratings yet
Building LLMs - Stanford
78 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (6)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
Large Language Models (LLM)
100% (1)
Large Language Models (LLM)
139 pages
Star Gate SG1 Rule Book
100% (13)
Star Gate SG1 Rule Book
490 pages
How To Train Your Own LLM
No ratings yet
How To Train Your Own LLM
29 pages
Icaps LLM Tut Slides Posted
No ratings yet
Icaps LLM Tut Slides Posted
97 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
LLM Book
No ratings yet
LLM Book
275 pages
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
100% (1)
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
115 pages
Make Your LLM Core-1
No ratings yet
Make Your LLM Core-1
104 pages
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
No ratings yet
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
8 pages
Know Thy Frenemy
No ratings yet
Know Thy Frenemy
40 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
Toc 9780138199302
No ratings yet
Toc 9780138199302
8 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
Suggested Topics For Your LLM
No ratings yet
Suggested Topics For Your LLM
2 pages
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
No ratings yet
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
285 pages
Large Large Models
No ratings yet
Large Large Models
25 pages
AI Tools
No ratings yet
AI Tools
19 pages
Building Finetuning Aimodels
No ratings yet
Building Finetuning Aimodels
41 pages
NLP & LLM
No ratings yet
NLP & LLM
4 pages
Summary - Foundations On LLMs
No ratings yet
Summary - Foundations On LLMs
6 pages
Practical Guide To Generative AI HKIE Aiilog
No ratings yet
Practical Guide To Generative AI HKIE Aiilog
104 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
Day 5
No ratings yet
Day 5
48 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Handout - Open AI Journey and GPT Training
No ratings yet
Handout - Open AI Journey and GPT Training
3 pages
LLM - Introduction 2024
No ratings yet
LLM - Introduction 2024
77 pages
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
100% (1)
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
254 pages
LLM Presentation
No ratings yet
LLM Presentation
10 pages
All The Basics That You Need To Know About LLMs
No ratings yet
All The Basics That You Need To Know About LLMs
26 pages
AI and Prompt
No ratings yet
AI and Prompt
18 pages
LLM Model
No ratings yet
LLM Model
3 pages
SSRN 4504303
No ratings yet
SSRN 4504303
8 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
LLM 1 GPT
No ratings yet
LLM 1 GPT
12 pages
QWEN: Advanced AI Language Models
No ratings yet
QWEN: Advanced AI Language Models
59 pages
A E C P T L L M: A P ' G: N Mpirical Ategorization of Rompting Echniques FOR Arge Anguage Odels Ractitioner S Uide
No ratings yet
A E C P T L L M: A P ' G: N Mpirical Ategorization of Rompting Echniques FOR Arge Anguage Odels Ractitioner S Uide
16 pages
Deep Learning: Large Language Models
No ratings yet
Deep Learning: Large Language Models
58 pages
State of AI - by Eduardo Mace - ScalePV 2023
No ratings yet
State of AI - by Eduardo Mace - ScalePV 2023
36 pages
2-Weeks Gen AI & Prompt Training
No ratings yet
2-Weeks Gen AI & Prompt Training
5 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
AI Chatbots and LLMs - A Brief Technical Overview
No ratings yet
AI Chatbots and LLMs - A Brief Technical Overview
26 pages
Notes 4 Large Language Model
No ratings yet
Notes 4 Large Language Model
4 pages
Day 2 Module 2 - Understanding LLMs
No ratings yet
Day 2 Module 2 - Understanding LLMs
14 pages
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
No ratings yet
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
277 pages
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
No ratings yet
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
134 pages
Hands On Prompt Engineering Final 1750086965952
No ratings yet
Hands On Prompt Engineering Final 1750086965952
69 pages
LLMs: A Researcher's Guide
No ratings yet
LLMs: A Researcher's Guide
46 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
LLM Basics for Researchers
No ratings yet
LLM Basics for Researchers
54 pages
Local LLMs: Key Terms and Concepts
No ratings yet
Local LLMs: Key Terms and Concepts
13 pages
Tarun Red Hen Lab
No ratings yet
Tarun Red Hen Lab
6 pages
3 Nazneen Rajani Session I
No ratings yet
3 Nazneen Rajani Session I
33 pages
Jason Weston Reasoning Alignment Berkeley Talk
No ratings yet
Jason Weston Reasoning Alignment Berkeley Talk
106 pages
1 s2.0 S095219762400616X Main
No ratings yet
1 s2.0 S095219762400616X Main
19 pages
Module 2 Foundation Maven-V3
No ratings yet
Module 2 Foundation Maven-V3
60 pages
Lora
No ratings yet
Lora
31 pages
5 Winning Amazon KDD Cup 24
No ratings yet
5 Winning Amazon KDD Cup 24
7 pages
2h Thursday
No ratings yet
2h Thursday
2 pages
Linguistics Training for Language Teachers
No ratings yet
Linguistics Training for Language Teachers
6 pages
Chips 8
No ratings yet
Chips 8
1 page
Third Engineer Resume
No ratings yet
Third Engineer Resume
2 pages
Women of The Weeping River Reaction Paper
No ratings yet
Women of The Weeping River Reaction Paper
8 pages
Teen Readers' Book Quiz
No ratings yet
Teen Readers' Book Quiz
5 pages
Manuscript, Rare and Old Prints Collection - Narodna in Univerzitetna Knjižnica - Spletna Stran
No ratings yet
Manuscript, Rare and Old Prints Collection - Narodna in Univerzitetna Knjižnica - Spletna Stran
1 page
LOW VOWELS (Pronounce Group 6)
No ratings yet
LOW VOWELS (Pronounce Group 6)
7 pages
Go For Pythonistas
No ratings yet
Go For Pythonistas
33 pages
Juknis Porseni Pgri 2023
No ratings yet
Juknis Porseni Pgri 2023
21 pages
Review of A Prayer To Our Father
No ratings yet
Review of A Prayer To Our Father
7 pages
Pep MCQ of CH 2 - Writing and City Life
75% (4)
Pep MCQ of CH 2 - Writing and City Life
22 pages
Deng301 - 117181den53997 2
No ratings yet
Deng301 - 117181den53997 2
1 page
Lofi Tho Trao Phung - MQT Bien Phap Nghe Thuat Xay DVTVG Nen Tho Nom Dltofng Luat H 6 Xuan Hltong
No ratings yet
Lofi Tho Trao Phung - MQT Bien Phap Nghe Thuat Xay DVTVG Nen Tho Nom Dltofng Luat H 6 Xuan Hltong
9 pages
Chapter 5 Literature and Entertainment
No ratings yet
Chapter 5 Literature and Entertainment
3 pages
Objectives:: Visionpro - Visionpro Advanced - Section 1 Scripting Lab Approximate Duration: 45 Minutes
No ratings yet
Objectives:: Visionpro - Visionpro Advanced - Section 1 Scripting Lab Approximate Duration: 45 Minutes
7 pages
Micro Test Crucial Words Page 223 - 224
No ratings yet
Micro Test Crucial Words Page 223 - 224
2 pages
Various Interface Styles
No ratings yet
Various Interface Styles
45 pages
English Grammar Exercises
No ratings yet
English Grammar Exercises
3 pages
History of English II Task 1 With Answers
No ratings yet
History of English II Task 1 With Answers
2 pages
Word Formation Guide
No ratings yet
Word Formation Guide
3 pages
Tibetan Buddhist Mudras Guide
0% (1)
Tibetan Buddhist Mudras Guide
2 pages
Cahier 01
100% (1)
Cahier 01
28 pages
Lapulapu
No ratings yet
Lapulapu
8 pages
Business Communication
No ratings yet
Business Communication
6 pages
Unit 3 Worksheet 1 3.1-3.2
No ratings yet
Unit 3 Worksheet 1 3.1-3.2
2 pages
H200 H201 H202 MEng BEng Civil Engineering With Year in Industry
No ratings yet
H200 H201 H202 MEng BEng Civil Engineering With Year in Industry
23 pages
Soal UM English Modul 11
No ratings yet
Soal UM English Modul 11
2 pages

Intro LLM v1

Uploaded by

Intro LLM v1

Uploaded by

Introduction to Large Language Model

Feb 20, 2024

• Large language model (LLM)

• How to effectively train LLM

• How to effectively use LLM

[1] State of GPT

白禹东 耿云腾 何雨桐 李佩津 刘梓豪

鲁可儿 宋奕龙 孙乾祐 王宇驰

Large language model (LLM)

• Meta Llama 2 is probably the most powerful open-source LLM

• Weights, architectures, and the paper were all released by Meta

• No need to access your WIFI. Just one laptop

• Model parameter parameterize the magic function to a series of matrix-matrix(vector) products

f (“cat sit on a”; ✓) = “mat”

code book information wikipedia

• Over 90% of my interactions with ChatGPT are

ChatGPT Training Pipeline

Source: Andrej Karpathy, State of GPT < 12 >

99% training time

Source: Andrej Karpathy, State of GPT < 13 >

Training data mixture used in lLaMA model

Transform long texts to lists of integers

Token and vocabulary

Sentence: "The cat sat on the mat. The cat is orange."

Vocabulary is a set with each element unique

• Effective representation learning

• Long-range dependency with attention

• Flexibility and Adaptability

[Training Compute-Optimal Large Language Models]

Larger dataset + bigger model + longer training

A very straightforward way to achieve good LLM.

Amazing representation power

Larger dataset + bigger model + longer training = better prediction accuracy

• Pretraining a base model is extremely expensive

• Several effective pretraining techniques:

§ 3D parallelism: data/model/tensor parallelism

• Will discuss them later lectures

Pretrained base model performs well after finetuning

• Pretraining + finetuning/prompting reshapes the AI industry.

• The cost to deploy AI to down-stream applications decreases significantly

§ Achieve powerful base models from OpenAI/Google/Meta/GitHub

§ No need for expensive investment of money and talents

Base models in the wild

LLaMA and Bloom are popular open-source base models

Base models can be

We need to finetune the

Ask human contractors to

Collect 10,000+ high-

Finetune base models with

• Dataset: 10~100K human-generated data pairs {(prompt, response)}

• Training: repeat what we did in the “Pretraining” stage

• This reward will be supervised by ground-truth reward.

RL makes the model learn to generate responses with great scores

Source: Andrej Karpathy, State of GPT < 45 >

• We discuss the pipeline to train ChatGPT

• SFT, RM, and RL are critical to transform GPT to ChatGPT

Use LLM Effectively As Your Personal Copilot

• Human can plan and reflect

• Human can use tools

• Human typically thinks more

• LLM strips away all human behavior

• Chain of thoughts: break up tasks into multiple steps/stages

(will discuss it in later lectures)

• Tree of thoughts: expand thoughts, evaluate them and then go deeper

(will discuss it in later lectures)

• Learn a good prompt automatically

[Large language models are human-level prompt engineers, 2023]

Retrieval-augmented generation (RAG) helps LLM generate

RAG Bing Copilot

Offload tasks that LLM are not good at.

SFT and RLHF are all finetuning

Finetune: Inject weights to base model

Fine-tuned weight Base model weight Additional weight

LoRA: low-rank adaptation

Fine-tuned weight Base model weight Low-rank weight

Light but powerful

白禹东耿云腾何雨桐李佩津刘梓豪

鲁可儿宋奕龙孙乾祐王宇驰