0% found this document useful (0 votes)

9 views11 pages

Evaluating LLM Performance On Real Data Set

The document discusses the evaluation of Large Language Models (LLMs) and their reliance on tokenization, specifically using Byte Pair Encoding (BPE). BPE converts text into tokens by merging frequent character pairs, allowing the model to efficiently process both common and rare words. The text is ultimately represented as a list of integer token IDs for training purposes.

Uploaded by

aftabmsdev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views11 pages

Evaluating LLM Performance On Real Data Set

Uploaded by

aftabmsdev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Evaluating LLM performance

Dataset The verdict short story

characters
20,000
5000tokens
1) Convert text in to tokens: Byte pair encoding

Large Language Models (LLMs) don’t work directly with raw text instead, they process text as tokens. A
token is typically a word, part of a word, or even punctuation, depending on the tokenizer’s rules.

Byte Pair Encoding (BPE) is a common tokenization method where:

1. The text is first split into characters.
2. The most frequent pairs of characters are merged into bigger units.
3. This merging repeats until a fixed vocabulary size is reached.
This approach keeps common words as single tokens and breaks rare or unknown words into smaller
pieces, helping the model handle any text.

After tokenization, the text is represented as a list of integer token IDs.

For training, we usually split this list into chunks of fixed length, so the model can learn patterns in
manageable pieces.

Sliding Window Chunking is often used, where each chunk overlaps slightly with the previous one.
* max_length is the size of each chunk.
* stride controls how far the window moves each step.
This overlap ensures the model doesn’t lose context between chunks.

In the code below:

* tokenizer.encode() converts the text into token IDs (integers) while keeping special tokens like
<|endoftext|>.
* A sliding window loops through the token IDs, producing input sequences and their shifted target
sequences for next-token prediction.
2) Train/Validation Split

The dataset is split into 90% training and 10% validation using train_ratio. The training set is shuffled for
better learning, while the validation set remains in order for consistent evaluation. Separate data loaders
are created for each split with the same chunking parameters (max_length, stride) but different shuffle
and drop_last settings.

901 101

validation
Training
LLMs are
models which do not have
autoregressive
labels target
From the text we create the input target

Context size 4 Use a tokens at a time

Input Output pairs

tada1gwaysthoughach
it
ffam.cm

Is
Stride 4

Dataset

v
Validation
Training TrainData Validation
Loss
Loss Data

n I had always thought y

Had always thought Jack

Gisburn rather a Gisborn rather

Lack a cheap
Summery of Calculating Loss

GPTmodel

I had always thought

at
had Jack
Target always thought

Logits

I
had
always

thought
Likelyhood
www.ij
Negative Log

logpitlogp2t
Softman
logps logp
check forhightest
Probabilities
probability
AcrossEntropy
Get output prob
fortarget tokenIDs had
I
p it Icangetrandomdata nttaine

always

m.ttiI
thought

P want
if trainthese
proba 114 48200 Thing
token
IDs to
tothe proba correspond
token IDs getproba
Batch Processing

M 4
n I had always thought y
Had always thought Jack
Gisborn rather
siE2 Jack Gisburn rather a a cheap
patch
hr 2
yr
GPT Model
v
Logits

I I
had had
always always

thought

yf
man
µ
up
Gisborn
rather

8 50257
waitiIIII 2 9 50257

Softmax

v
Probabilities

I
Piz had

target token IDs

thought
Pia Jack

rather
ibm
Ppp

lrgiffjfjj.gg 8 50257

Cross Entropy

loss
im

log R logPat logR

logB logP.it log P

3) X, y split.

256

af
Yz j
9 training set batches 12 samples
256 tokens each

1 validation batch 12samples

256 token each
4) Pass through GPT-Model.

5) Calculate the Cross Entropy Loss.

Calculate loss forallthebatches

Mean loss for batch

LLM Cheat Sheetpdf
No ratings yet
LLM Cheat Sheetpdf
7 pages
Building LLMs - Stanford
No ratings yet
Building LLMs - Stanford
78 pages
Large Language Models (LLM)
100% (1)
Large Language Models (LLM)
139 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Lan - Guage Mo - Del Cheat Sheet
100% (2)
Lan - Guage Mo - Del Cheat Sheet
3 pages
Evaluating LLM Perfomance
No ratings yet
Evaluating LLM Perfomance
11 pages
Measuring The LLM Loss Function PDF
No ratings yet
Measuring The LLM Loss Function PDF
16 pages
Code Explanation
No ratings yet
Code Explanation
8 pages
Hands-On Large Language Models
No ratings yet
Hands-On Large Language Models
59 pages
Make Your LLM Core-1
No ratings yet
Make Your LLM Core-1
104 pages
Chapter 3
No ratings yet
Chapter 3
44 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
Neural Language Models & Classifiers Guide
No ratings yet
Neural Language Models & Classifiers Guide
7 pages
LLM Embeddings
No ratings yet
LLM Embeddings
11 pages
LLM4BeSciV2 2024 04 29T13 - 02 - 01.601Z
No ratings yet
LLM4BeSciV2 2024 04 29T13 - 02 - 01.601Z
25 pages
Harvard CS197 Lecture 4 Notes
No ratings yet
Harvard CS197 Lecture 4 Notes
15 pages
Differ - Blog-Heres How You Can Build and Train GPT-2 From Scratch Using PyTorch
No ratings yet
Differ - Blog-Heres How You Can Build and Train GPT-2 From Scratch Using PyTorch
13 pages
Day 5
No ratings yet
Day 5
48 pages
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
How Does A GPT Tool Process Inputs
No ratings yet
How Does A GPT Tool Process Inputs
19 pages
cl12 Huggingface
No ratings yet
cl12 Huggingface
34 pages
Slides
No ratings yet
Slides
137 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
Tokenization
No ratings yet
Tokenization
34 pages
تمثيل النص كموترات - تدريب - مايكروسوفت ليرن
No ratings yet
تمثيل النص كموترات - تدريب - مايكروسوفت ليرن
14 pages
AI Tools
No ratings yet
AI Tools
19 pages
NLP Transformer Class Notes
No ratings yet
NLP Transformer Class Notes
3 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
Week 1 Day 4
No ratings yet
Week 1 Day 4
21 pages
Lesson 1 Intro
No ratings yet
Lesson 1 Intro
51 pages
UNIT 5a
No ratings yet
UNIT 5a
48 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
Perspectives in Business Ethics
No ratings yet
Perspectives in Business Ethics
113 pages
10.48550 Arxiv.2204.02311
No ratings yet
10.48550 Arxiv.2204.02311
87 pages
CL3410 - Language Models and Agents - Tokenization - Byte Latent Transformer
No ratings yet
CL3410 - Language Models and Agents - Tokenization - Byte Latent Transformer
57 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Summer Course Material
No ratings yet
Summer Course Material
52 pages
LLM From Scratch
No ratings yet
LLM From Scratch
27 pages
GPT in 60 Lines of NumPy - Jay Mody
No ratings yet
GPT in 60 Lines of NumPy - Jay Mody
41 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
59 pages
GPT 2 - Learninhg 4
0% (2)
GPT 2 - Learninhg 4
2 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
58 pages
DL Practical 09text Pre Processing
No ratings yet
DL Practical 09text Pre Processing
6 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
Foundations of LLM
100% (1)
Foundations of LLM
231 pages
Foundations of Large Language Models 1738142777
No ratings yet
Foundations of Large Language Models 1738142777
101 pages
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
No ratings yet
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
277 pages
LLMs in Python Free Course by Inder P Singh
No ratings yet
LLMs in Python Free Course by Inder P Singh
28 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
No ratings yet
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
134 pages
NLP Short
No ratings yet
NLP Short
5 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
Medical Text Classifier GabrieldeOlaguibel
No ratings yet
Medical Text Classifier GabrieldeOlaguibel
12 pages
Path To The LLM & Generative AI
No ratings yet
Path To The LLM & Generative AI
12 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Compiler Design Basics
No ratings yet
Compiler Design Basics
13 pages
3.8 Reading and Writing Console
No ratings yet
3.8 Reading and Writing Console
9 pages
Computational Linguistics and Audio-Visual Readability: Analysing Linguistic Features of Intralingual-Subtitles Corpora
No ratings yet
Computational Linguistics and Audio-Visual Readability: Analysing Linguistic Features of Intralingual-Subtitles Corpora
14 pages
Daa 2a - B PRG
No ratings yet
Daa 2a - B PRG
8 pages
Heba Compiler Design Book - 2025
No ratings yet
Heba Compiler Design Book - 2025
133 pages
NoSQL Databases Explained
No ratings yet
NoSQL Databases Explained
13 pages
Compiler - Design - Module3
No ratings yet
Compiler - Design - Module3
19 pages
Lab Manual SPCC
No ratings yet
Lab Manual SPCC
62 pages
Compailer Design Assignment
No ratings yet
Compailer Design Assignment
14 pages
pxc3904245 (Marathi)
No ratings yet
pxc3904245 (Marathi)
4 pages
PPL Question Bank
No ratings yet
PPL Question Bank
5 pages
Compiler Tools & Lex/YACC Guide
No ratings yet
Compiler Tools & Lex/YACC Guide
27 pages
Compiler Construction: Nguyen Thi Thu Huong Department of Computer Science-HUST Email: Cell Phone 0903253796
No ratings yet
Compiler Construction: Nguyen Thi Thu Huong Department of Computer Science-HUST Email: Cell Phone 0903253796
35 pages
Sqlparse
No ratings yet
Sqlparse
31 pages
Compiler Design Lab Report
No ratings yet
Compiler Design Lab Report
15 pages
61a Final Study Guide
No ratings yet
61a Final Study Guide
2 pages
CD Greeshma
No ratings yet
CD Greeshma
15 pages
Morphosyntactic Analysis of Georgian
No ratings yet
Morphosyntactic Analysis of Georgian
21 pages
SAS Macro Tutorial Imp
No ratings yet
SAS Macro Tutorial Imp
27 pages
Java Lab Exercises: Week-by-Week Guide
No ratings yet
Java Lab Exercises: Week-by-Week Guide
27 pages
Lab Manual - NLP
No ratings yet
Lab Manual - NLP
60 pages
Lecture 2 Tokenization
No ratings yet
Lecture 2 Tokenization
16 pages
Compiler Design
No ratings yet
Compiler Design
94 pages
(2011) Longitudinal Detection of Dementia Through Lexical and Syntactic Changes in Writing
No ratings yet
(2011) Longitudinal Detection of Dementia Through Lexical and Syntactic Changes in Writing
27 pages
Compiler Construction CHAPTER 3
No ratings yet
Compiler Construction CHAPTER 3
15 pages
Symbol Table Run Time Environment
No ratings yet
Symbol Table Run Time Environment
39 pages
AT&CD Important Questions Bank
No ratings yet
AT&CD Important Questions Bank
7 pages
Unit 1 Lexical Analyzer
No ratings yet
Unit 1 Lexical Analyzer
103 pages
CC Assignment
No ratings yet
CC Assignment
11 pages
Effectual Contract Management and Analysis With AI-Powered Technology Reducing Errors and S
No ratings yet
Effectual Contract Management and Analysis With AI-Powered Technology Reducing Errors and S
6 pages

Evaluating LLM Performance On Real Data Set

Uploaded by

Evaluating LLM Performance On Real Data Set

Uploaded by

Evaluating LLM performance

Dataset The verdict short story

Byte Pair Encoding (BPE) is a common tokenization method where:

After tokenization, the text is represented as a list of integer token IDs.

In the code below:

Context size 4 Use a tokens at a time

Input Output pairs

n I had always thought y

Gisburn rather a Gisborn rather

I had always thought

target token IDs

log R logPat logR

logB logP.it log P

1 validation batch 12samples

5) Calculate the Cross Entropy Loss.

Calculate loss forallthebatches

Mean loss for batch

You might also like