Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views18 pages

LLM Prompting & In-Context Learning

Lecture 12 of COMP 3361 focuses on LLM prompting, in-context learning, scaling laws, and emergent capacities. It discusses the evolution of pretrained language models, the shift in learning paradigms since GPT-3, and the advantages of in-context learning. Key announcements include the final exam date and assignment deadlines.

Uploaded by

9gt5rqjjnq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views18 pages

LLM Prompting & In-Context Learning

Lecture 12 of COMP 3361 focuses on LLM prompting, in-context learning, scaling laws, and emergent capacities. It discusses the evolution of pretrained language models, the shift in learning paradigms since GPT-3, and the advantages of in-context learning. Key announcements include the final exam date and assignment deadlines.

Uploaded by

9gt5rqjjnq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

COMP 3361 Natural Language Processing

Lecture 12: LLM prompting, in-context learning,


scaling laws, emergent capacities

Spring 2024

Many materials from COS484@Princeton and CSE447@UW (Taylor Sorensen) with special thanks!
Announcements

• Final exam is scheduled at 9:30 - 11:30am on May 8, Wed @Rm 3


Library Ext.
• #assignment-2 due next week!
• Join #assignment-2 Slack channel for discussion

Lecture 3: Tokenization
Lecture plan

• LLM pretraining objectives: recap


• LLM prompting and in-context learning
• Scaling laws of LLMs
• Emergent capacities of LLMs

Lecture 3: Tokenization
Pretraining: training objectives?

• During pretraining, we have a large text corpus (no task labels)


• Key question: what labels or objectives used to train the vanilla
Transformers?

Training
labels/objectives?

Pretraining Transformers

Natural Language Processing - CSE 517 / CSE 447


Pretraining objectives

BERT (Encoder-only) T5 (Encoder-decoder) Decoder-only


Devlin et al., 2018 Raffel et al., 2019

Masked token prediction Denoising span-mask prediction Next token prediction


https://github.com/manueldeprada/Pretraining-T5-PyTorch-Lightning Lecture 3: Tokenization
Evolution tree of pretrained LMs
Open-sourced
Close-sourced
~200 billion

Model size
(# of parameters)
~1000 times larger

~300 million

https://github.com/Mooler0410/LLMsPracticalGuide
Natural Language Processing -
https://mistral.ai/news/mistral-large/ CSE 517 / CSE 447
From GPT1 to GPT-2 to GPT-3
• All decoder-only Transformer-based language models

• Model size ↑, training corpora ↑

Context size = 1024


GPT-2

.. trained on 40Gb of Internet text ..

(Radford et al., 2019): Language Models are Unsupervised Multitask Learners


GPT-3: language models are few-shot learners

• GPT-2 → GPT-3: 1.5B → 175B (# of parameters), ~14B → 300B (# of tokens)

Context size = 2048


Training computation is measured using
floating-point operations or “FLOP”.

One FLOP represents a single arithmetic


operation involving floating-point
numbers, such as addition, subtraction,
multiplication, or division.

(Brown et al., 2020): Language Models are Few-Shot Learners


Before GPT3: Modern learning paradigm

• Pre-training + supervised training/ ne-tuning


• First train Transformer using a lot of general text using unsupervised
learning. This is called pretraining.
• Then train the pretrained Transformer for a speci c task using supervised
learning. This is called netuning.

Natural Language Processing - CSE 517 / CSE 447


fi
fi
fi
Paradigm shift since GPT-3
• Before GPT-3, Pre-training + supervised training/ ne-
tuning is the default way of doing learning in models like
BERT/T5/GPT-2
• SST-2 has 67k examples, SQuAD has 88k (passage,
answer, question) triples

• Fine-tuning requires computing the gradient


and applying a parameter update on every
example (or every K examples in a mini-batch)

• However, this is very expensive for the


175B GPT-3 model
fi
Latest learning paradigm shift since GPT-3

• Pre-training + prompting/in-context learning (no training this


step)
• First train a large (>7~175B) Transformer using a lot of general text using
unsupervised learning. This is called large language model pretraining.
• Then directly use the pretrained large Transformer (no further netuning/
training) for any different task given only a natural language description of
the task or a few task (x, y) examples. This is called prompting/in-context
learning.

Natural Language Processing - CSE 517 / CSE 447

fi
GPT-3: few-shot in-context learning
• GPT-3 proposes an alternative: in-context learning

• This is just a forward pass,


no gradient update at all!

•You only need to feed a small


number of examples (e.g., 32)

(On the other hand, you can’t


feed many examples at once
too as it is bounded by
context size)
GPT-3: task speci cations

DROP
(a reading comprehension task)

Unscrambling words

Word in context (WiC)


fi
GPT-3’s in-context learning

http://ai.stanford.edu/blog/in-context-learning/

(Brown et al., 2020): Language Models are Few-Shot Learners 14


GPT-3’s scaling laws in performance

(Brown et al., 2020): Language Models are Few-Shot Learners 15


Chain-of-thought (CoT) prompting

16
(Wei et al., 2022): Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Why in-context learning with LLMs?
•Amazing zero/few-shot performance
◦Save a lot of annotation! 🎉
•Easy to use without training
◦Just talk to them! 👍
•One model for many NLP applications 😄
◦No need to annotate and ne-tune for different tasks

But, again, they are sensitive to prompts! Need to design a good prompt or train a good
example retriever! 😂

Natural Language Processing - CSE 517 / CSE 447


fi
Okay, so bigger is better? Can you be more speci c?

18 In-Context Learning, Scaling Laws, Emergent Capabilities

fi

You might also like