Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
27 views9 pages

How LLMs Are Trained - A Simple Guide

The document discusses how LLMs are trained in 3 steps: pre-training on a large dataset to predict next words, supervised fine-tuning to understand instructions, and reinforcement learning from human feedback to focus on being helpful, honest and harmless. It also mentions new alignment methods like DPO will be covered later.

Uploaded by

Harsh Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views9 pages

How LLMs Are Trained - A Simple Guide

The document discusses how LLMs are trained in 3 steps: pre-training on a large dataset to predict next words, supervised fine-tuning to understand instructions, and reinforcement learning from human feedback to focus on being helpful, honest and harmless. It also mentions new alignment methods like DPO will be covered later.

Uploaded by

Harsh Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

MASTERING LLM

PRESENTS:
COFFEE BREAK
CONCEPTS

How LLMs are


trained? A simple
guide to
understand LLM
Training

@MASTERING-LLM-
LARGE-LANGUAGE-
MODEL
Step 1 : Pre-training
Step 1 is to train a model on a massive dataset
from the internet to predict the next word -
This is usually called as Language Model

01
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Cool so i can use this model?
Not Yet
In step 1, the model understands how to
predict next word but doesn't understand
any instructions

Model just completes next words


02
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Step 2 : Supervised fine-tuning
(SFT) or instruction tuning
We need to teach the model now to
understand specific instructions, step 2
helps model learn instructions.

03
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
I got a model now? Wait not
yet. Lets look into below
senarios
The Instruction models (SFT) are not helpful, honest
and harmless (HHH), we need to teach them this so
that they learn to respond with HHH

SOURCE

04
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Step 3 : RLHF
We need to teach the model the human
preferences and focus on being helpful,
honest and harmless (HHH)
In this step, model is asked to generate multiple outputs
and humans will rank this output from best to worst.

The simple goal of RLHF is to replace


human feedback with a model which
understands human preferences.

05
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Final Model
In final step:
The instruction model is used to
generate an answer
Once the answer is generated, reward
model (Replacement of humans) will
generate a score.
This score is used to improve the output
until desired accuracy or number of
iteration is reached.

06
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
Summery
Language model just understands how
to predict next words.

SFT or instruction tuning teaches model


on how to follow the instructions on
multiple different tasks.

RLHF helps more improve answers on


human preferences like helpful, honest
and harmless (HHH)
Check this paper to learn more about
LLM alignments

New alignment methods include


methods like DPO which we will cover
soon.

Comment below on which topic you


want to understand next in this "Coffee
Break Concepts" series and we will
include those topics in the upcoming
weeks
07
@MASTERING-LLM-LARGE-LANGUAGE-MODEL
www.masteringllm.com

LLM Interview
Course
Want to Prepare yourself for an
LLM Interview?
100+ Questions spanning 14 categories

Curated 100+ assessments for each


category

Well-researched real-world interview


questions based on FAANG & Fortune
500 companies
Focus on Visual learning
Real Case Studies & Certification

Coupon Code - LLM50


Coupon is valid till 30th May 2024

You might also like