0% found this document useful (0 votes)

21 views18 pages

LLM Prompting & In-Context Learning

Lecture 12 of COMP 3361 focuses on LLM prompting, in-context learning, scaling laws, and emergent capacities. It discusses the evolution of pretrained language models, the shift in learning paradigms since GPT-3, and the advantages of in-context learning. Key announcements include the final exam date and assignment deadlines.

Uploaded by

9gt5rqjjnq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views18 pages

LLM Prompting & In-Context Learning

Uploaded by

9gt5rqjjnq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

COMP 3361 Natural Language Processing

Lecture 12: LLM prompting, in-context learning,

scaling laws, emergent capacities

Spring 2024

Many materials from COS484@Princeton and CSE447@UW (Taylor Sorensen) with special thanks!
Announcements

• Final exam is scheduled at 9:30 - 11:30am on May 8, Wed @Rm 3

Library Ext.
• #assignment-2 due next week!
• Join #assignment-2 Slack channel for discussion

Lecture 3: Tokenization
Lecture plan

• LLM pretraining objectives: recap

• LLM prompting and in-context learning
• Scaling laws of LLMs
• Emergent capacities of LLMs

Lecture 3: Tokenization
Pretraining: training objectives?

• During pretraining, we have a large text corpus (no task labels)

• Key question: what labels or objectives used to train the vanilla
Transformers?

Training
labels/objectives?

Pretraining Transformers

Natural Language Processing - CSE 517 / CSE 447

Pretraining objectives

BERT (Encoder-only) T5 (Encoder-decoder) Decoder-only

Devlin et al., 2018 Raffel et al., 2019

Masked token prediction Denoising span-mask prediction Next token prediction

https://github.com/manueldeprada/Pretraining-T5-PyTorch-Lightning Lecture 3: Tokenization
Evolution tree of pretrained LMs
Open-sourced
Close-sourced
~200 billion

Model size
(# of parameters)
~1000 times larger

~300 million

https://github.com/Mooler0410/LLMsPracticalGuide
Natural Language Processing -
https://mistral.ai/news/mistral-large/ CSE 517 / CSE 447
From GPT1 to GPT-2 to GPT-3
• All decoder-only Transformer-based language models

• Model size ↑, training corpora ↑

Context size = 1024

GPT-2

.. trained on 40Gb of Internet text ..

(Radford et al., 2019): Language Models are Unsupervised Multitask Learners

GPT-3: language models are few-shot learners

• GPT-2 → GPT-3: 1.5B → 175B (# of parameters), ~14B → 300B (# of tokens)

Context size = 2048

Training computation is measured using
floating-point operations or “FLOP”.

One FLOP represents a single arithmetic

operation involving floating-point
numbers, such as addition, subtraction,
multiplication, or division.

(Brown et al., 2020): Language Models are Few-Shot Learners

Before GPT3: Modern learning paradigm

• Pre-training + supervised training/ ne-tuning

• First train Transformer using a lot of general text using unsupervised
learning. This is called pretraining.
• Then train the pretrained Transformer for a speci c task using supervised
learning. This is called netuning.

Natural Language Processing - CSE 517 / CSE 447

fi
fi
fi
Paradigm shift since GPT-3
• Before GPT-3, Pre-training + supervised training/ ne-
tuning is the default way of doing learning in models like
BERT/T5/GPT-2
• SST-2 has 67k examples, SQuAD has 88k (passage,
answer, question) triples

• Fine-tuning requires computing the gradient

and applying a parameter update on every
example (or every K examples in a mini-batch)

• However, this is very expensive for the

175B GPT-3 model
fi
Latest learning paradigm shift since GPT-3

• Pre-training + prompting/in-context learning (no training this

step)
• First train a large (>7~175B) Transformer using a lot of general text using
unsupervised learning. This is called large language model pretraining.
• Then directly use the pretrained large Transformer (no further netuning/
training) for any different task given only a natural language description of
the task or a few task (x, y) examples. This is called prompting/in-context
learning.

Natural Language Processing - CSE 517 / CSE 447

fi
GPT-3: few-shot in-context learning
• GPT-3 proposes an alternative: in-context learning

• This is just a forward pass,

no gradient update at all!

•You only need to feed a small

number of examples (e.g., 32)

(On the other hand, you can’t

feed many examples at once
too as it is bounded by
context size)
GPT-3: task speci cations

DROP
(a reading comprehension task)

Unscrambling words

Word in context (WiC)

fi
GPT-3’s in-context learning

http://ai.stanford.edu/blog/in-context-learning/

(Brown et al., 2020): Language Models are Few-Shot Learners 14

GPT-3’s scaling laws in performance

(Brown et al., 2020): Language Models are Few-Shot Learners 15

Chain-of-thought (CoT) prompting

16
(Wei et al., 2022): Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Why in-context learning with LLMs?
•Amazing zero/few-shot performance
◦Save a lot of annotation! 🎉
•Easy to use without training
◦Just talk to them! 👍
•One model for many NLP applications 😄
◦No need to annotate and ne-tune for different tasks

But, again, they are sensitive to prompts! Need to design a good prompt or train a good
example retriever! 😂

Natural Language Processing - CSE 517 / CSE 447

fi
Okay, so bigger is better? Can you be more speci c?

18 In-Context Learning, Scaling Laws, Emergent Capabilities

Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
100% (1)
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
115 pages
Large Language Models (LLM)
100% (1)
Large Language Models (LLM)
139 pages
Methods and Approaches in Teaching Reading
No ratings yet
Methods and Approaches in Teaching Reading
80 pages
The Ugly Duckling: Grade 2 Home Language
100% (1)
The Ugly Duckling: Grade 2 Home Language
28 pages
Summary - Foundations On LLMs
No ratings yet
Summary - Foundations On LLMs
6 pages
Foundations of LLM
100% (1)
Foundations of LLM
231 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
Cambridge English
No ratings yet
Cambridge English
20 pages
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
No ratings yet
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
134 pages
LLM Tutorial for CSC413 Students
100% (1)
LLM Tutorial for CSC413 Students
40 pages
P.1 Reading Scheme Term 3 2022
No ratings yet
P.1 Reading Scheme Term 3 2022
5 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Building LLMs - Stanford
No ratings yet
Building LLMs - Stanford
78 pages
Quick Start Guide to LLMs 2nd Ed
No ratings yet
Quick Start Guide to LLMs 2nd Ed
279 pages
Oxford AQA Speaking Guide 2025
No ratings yet
Oxford AQA Speaking Guide 2025
3 pages
CL3410 - Language Models and Agents - Tokenization - Byte Latent Transformer
No ratings yet
CL3410 - Language Models and Agents - Tokenization - Byte Latent Transformer
57 pages
This 200-Page LLM Guide Will Save You Months - Here's The Gold in 5 Minutes
No ratings yet
This 200-Page LLM Guide Will Save You Months - Here's The Gold in 5 Minutes
22 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Slides
No ratings yet
Slides
137 pages
Advanced Prompt Engineering
No ratings yet
Advanced Prompt Engineering
27 pages
Foundations of Large Language Models 1738142777
No ratings yet
Foundations of Large Language Models 1738142777
101 pages
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
No ratings yet
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
277 pages
Day 5
No ratings yet
Day 5
48 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
Glossary of ELT Methodology
No ratings yet
Glossary of ELT Methodology
5 pages
Poem About Context Clues
100% (1)
Poem About Context Clues
1 page
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
Pre-Training & LLM 2
No ratings yet
Pre-Training & LLM 2
46 pages
Perspectives in Business Ethics
No ratings yet
Perspectives in Business Ethics
113 pages
Intro To LLMs
No ratings yet
Intro To LLMs
32 pages
Cs224n 2025 Lecture11 Adapatation
No ratings yet
Cs224n 2025 Lecture11 Adapatation
60 pages
Make Your LLM Core-1
No ratings yet
Make Your LLM Core-1
104 pages
Deep Learning: Large Language Models
No ratings yet
Deep Learning: Large Language Models
58 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
Transformer Basics
No ratings yet
Transformer Basics
17 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
The Domains of Literacy in The k12 Language Curriculum
100% (2)
The Domains of Literacy in The k12 Language Curriculum
2 pages
cl13 gpt-2
No ratings yet
cl13 gpt-2
26 pages
cl13 GPT
No ratings yet
cl13 GPT
26 pages
2 Generative Models
No ratings yet
2 Generative Models
60 pages
LLM Scaling Laws & Emergent Capacities
No ratings yet
LLM Scaling Laws & Emergent Capacities
23 pages
AI Tools
No ratings yet
AI Tools
19 pages
19 20-gpt-3 Prompts
No ratings yet
19 20-gpt-3 Prompts
68 pages
2023 07 28 Evolution of Language Models
No ratings yet
2023 07 28 Evolution of Language Models
73 pages
Know Thy Frenemy
No ratings yet
Know Thy Frenemy
40 pages
Large Large Models
No ratings yet
Large Large Models
25 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
Introduction To Large Language Models
No ratings yet
Introduction To Large Language Models
3 pages
LLM Basics for Researchers
No ratings yet
LLM Basics for Researchers
54 pages
Prompting in Large Language Models
No ratings yet
Prompting in Large Language Models
66 pages
2023 LLMBC Whats Next
No ratings yet
2023 LLMBC Whats Next
95 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
Basics of NLP
No ratings yet
Basics of NLP
9 pages
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
No ratings yet
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
12 pages
LLM Learning
No ratings yet
LLM Learning
56 pages
Interview
No ratings yet
Interview
10 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Session 15-2 Future NLP & Deep Learning
No ratings yet
Session 15-2 Future NLP & Deep Learning
81 pages
LLMs
No ratings yet
LLMs
40 pages
(Shared) - GPT
No ratings yet
(Shared) - GPT
35 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
Perspective Large Languagemodels in Applied Mechanics
No ratings yet
Perspective Large Languagemodels in Applied Mechanics
7 pages
Teaching Poetry Writing in The Communicative Efl Classroom
No ratings yet
Teaching Poetry Writing in The Communicative Efl Classroom
63 pages
10.48550 Arxiv.2204.02311
No ratings yet
10.48550 Arxiv.2204.02311
87 pages
Introduction to Language Models
No ratings yet
Introduction to Language Models
43 pages
1 Solution
No ratings yet
1 Solution
3 pages
Reading Skill Development Guide
No ratings yet
Reading Skill Development Guide
2 pages
1621-Article Text-6910-4-10-20220118
No ratings yet
1621-Article Text-6910-4-10-20220118
9 pages
G1-Q3-Le-Week 3-R&L
No ratings yet
G1-Q3-Le-Week 3-R&L
15 pages
Dissertation Challenges in Bilingual Ed
100% (2)
Dissertation Challenges in Bilingual Ed
8 pages
Learning Map in English 10 2nd Quarter 2021
No ratings yet
Learning Map in English 10 2nd Quarter 2021
2 pages
English Oral Presentation 7th Grade 2022
No ratings yet
English Oral Presentation 7th Grade 2022
2 pages
E3. AI Agents
No ratings yet
E3. AI Agents
49 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
Assessment Report
No ratings yet
Assessment Report
4 pages
Teaching With Video
No ratings yet
Teaching With Video
4 pages
Matrices and Linear Transformations
No ratings yet
Matrices and Linear Transformations
74 pages
Neural Language Models & Tokenization
No ratings yet
Neural Language Models & Tokenization
70 pages
Language Transfer and Markedness
No ratings yet
Language Transfer and Markedness
13 pages
Orthogonality
No ratings yet
Orthogonality
61 pages
Subspace and Basis
No ratings yet
Subspace and Basis
60 pages
Multi-Class Classification
No ratings yet
Multi-Class Classification
52 pages
Kindergarten Teachers' Perception On The Extent of The Usage of Teaching Listening Strategies in Relation To Classroom Literacy Environment
No ratings yet
Kindergarten Teachers' Perception On The Extent of The Usage of Teaching Listening Strategies in Relation To Classroom Literacy Environment
7 pages
E5. Efficient LM Methods
No ratings yet
E5. Efficient LM Methods
41 pages
FUCK OFF - English Meaning - Cambridge Dictionary
No ratings yet
FUCK OFF - English Meaning - Cambridge Dictionary
1 page
Deep Learning Recap
No ratings yet
Deep Learning Recap
13 pages
Spaced Repetition for Adults
No ratings yet
Spaced Repetition for Adults
5 pages
Introduction
No ratings yet
Introduction
6 pages
N3 N5 N4 N2: JLPT Preparation Model Study Plan
No ratings yet
N3 N5 N4 N2: JLPT Preparation Model Study Plan
1 page
Language Acquisition Insights
No ratings yet
Language Acquisition Insights
6 pages
"People With High IQ's Are Good Language Learners
No ratings yet
"People With High IQ's Are Good Language Learners
2 pages
Karan Karan
No ratings yet
Karan Karan
2 pages
Translation Anis
No ratings yet
Translation Anis
2 pages
Specific Language Impairment (SLI) (The Term Developmental Language Disorder Is Preferred by
No ratings yet
Specific Language Impairment (SLI) (The Term Developmental Language Disorder Is Preferred by
2 pages
Introduction To NLP: What Is Natural Language Processing?
No ratings yet
Introduction To NLP: What Is Natural Language Processing?
14 pages

LLM Prompting & In-Context Learning

Uploaded by

LLM Prompting & In-Context Learning

Uploaded by

COMP 3361 Natural Language Processing

Lecture 12: LLM prompting, in-context learning,

• Final exam is scheduled at 9:30 - 11:30am on May 8, Wed @Rm 3

• LLM pretraining objectives: recap

• During pretraining, we have a large text corpus (no task labels)

Natural Language Processing - CSE 517 / CSE 447

BERT (Encoder-only) T5 (Encoder-decoder) Decoder-only

Masked token prediction Denoising span-mask prediction Next token prediction

• Model size ↑, training corpora ↑

Context size = 1024

.. trained on 40Gb of Internet text ..

(Radford et al., 2019): Language Models are Unsupervised Multitask Learners

• GPT-2 → GPT-3: 1.5B → 175B (# of parameters), ~14B → 300B (# of tokens)

Context size = 2048

One FLOP represents a single arithmetic

(Brown et al., 2020): Language Models are Few-Shot Learners

• Pre-training + supervised training/ ne-tuning

Natural Language Processing - CSE 517 / CSE 447

• Fine-tuning requires computing the gradient

• However, this is very expensive for the

• Pre-training + prompting/in-context learning (no training this

Natural Language Processing - CSE 517 / CSE 447

• This is just a forward pass,

•You only need to feed a small

(On the other hand, you can’t

Word in context (WiC)

(Brown et al., 2020): Language Models are Few-Shot Learners 14

(Brown et al., 2020): Language Models are Few-Shot Learners 15

Natural Language Processing - CSE 517 / CSE 447

18 In-Context Learning, Scaling Laws, Emergent Capabilities

You might also like