0% found this document useful (0 votes)

7 views17 pages

Transformer Basics

Uploaded by

bhramreshwarjhacse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views17 pages

Transformer Basics

Uploaded by

bhramreshwarjhacse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Transformers & LLM Basics

LARGE LANGUAGE MODELS(LLMs)

▪ A large language model is a type of machine learning model that is trained on a large corpus of text
data to generate outputs for various natural language processing (NLP) tasks, such as text generation,
question answering, and machine translation.

▪ Large language models are typically based on deep learning neural networks such as the Transformer
architecture and are trained on massive amounts of text data, often involving billions of words. Larger
models, such as Google’s BERT model, are trained with a large dataset from various data sources which
allows them to generate output for many tasks.

Text Output

Language
Text Input
Model

Numeric Representation of
text useful for other systems
USE CASES OF LLMs
Large language models can be applied to a variety of use cases and industries,
including healthcare, retail, tech, and more. The following are use cases that exist in
all industries:

Text generation

Sentiment analysis

Chatbots

Textual Entailment Recognition

Question Answering

Code generation
TTransformer Componets
TRANSFORMER MODELS
Largely replaced RNN models with the publication of Attention is All You Need by Google in 2017
CATEGORIES OF TRANSFORMER MODELS

Encoders Decoders Encoders-Decoders

For understanding Language For Generative Models Sequence to Sequence
Suited for task requiring an understanding Suited for tasks involving Suited for tasks around generating new sentences
of the full sentence, such as sentence Text Generation depending on a given input, such as
summarization, translation, or generative question
classification, named entity recognition, and
answering.
extractive question answering.

Output probabilities

Models: Models: Models:

▪ BERT ▪ GPT-3 ▪ T5
Encoder Decoder
▪ ALBERT ▪ GPT-2 ▪ Multilingual –mT5
▪ DistilBERT

Outputs
Inputs
(shifted right)
BERT

▪ BERT: Pre-training of Deep Bidirectional Transformers

for Language Understanding (from Google in 2018)

▪ Encoder-only architecture that performs two main tasks

▪ Predicts several blanks in input given entire
context around the blank

▪ When given sentences A and B, it determines if

B actually follows A

▪ Used for question answering, classification etc.

▪ Takes a long time to train since each iteration only gets
signal from a handful of tokens in each sequence
GPT
Generative Pre-Training

▪ Originally published by OpenAI in 2018, followed by GPT-2 in 2019, and GPT-3 in 2020.
▪ Architecture is also a single stack like BERT, but is a traditional left-to-right language model

▪ Can be used for generating larger blocks of text (e.g. chat bots), but can also be used for question answering
▪ Has been the model that we have focused the most on with Megatron
▪ Faster to train than BERT since each iteration gets signal from every token in the sequence
WHEN LARGE LANGUAGE MODELS MAKE SENSE?

Traditional
Large Language Models ▪ Zero-Shot (or Few Shot Learning)
NLP
Approach ▪ Painful & Impractical to get a large corpus of
labelled data
Requires
labelled data Yes No
▪ Models can learn new tasks
▪ If you want models with “common sense”
Parameters 100s of millions Billions to trillions and can generalize well to new tasks

Desired Specific (one model General (model can do ▪ A single model can serve all use-cases
model per task) many tasks) ▪ At-scale you avoid costs and complexity of
capability many models, saving cost in data curation,
training, and managing deployment
Training Retrain frequently with Never retrain, or
Frequenc task-specific training retrain minimally
y data
DISTRIBUTED TRAINING
Data, Pipeline and Tensor Parallelism
CHALLENGES

Compute-, cost-, and Significant capital investment and large-scale compute infrastructure are
time- intensive necessary to maintain and develop LLMs.
workload:

As mentioned, training a large model requires a significant amount of

Scale of data required: data. Many companies struggle to get access to large enough data.

Due to their scale, training and deploying large language models are very
Technical expertise:
difficult.
THANK YOU!

LLM Book
No ratings yet
LLM Book
275 pages
LLMs Guide for Developers & Data Scientists
100% (14)
LLMs Guide for Developers & Data Scientists
132 pages
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
100% (1)
Quick Start Guide To Large Language Models Second Edition Sinan Ozdemir Online PDF
115 pages
Foundational LLMs & Text Generation
100% (2)
Foundational LLMs & Text Generation
75 pages
Quick Start Guide to LLMs 2nd Ed
No ratings yet
Quick Start Guide to LLMs 2nd Ed
279 pages
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
No ratings yet
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
134 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
Hands-On Large Language Models
No ratings yet
Hands-On Large Language Models
59 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
No ratings yet
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
277 pages
Slides
No ratings yet
Slides
137 pages
OceanofPDF - Com Large Language Models Concepts - John AtkinsonAbutridy
No ratings yet
OceanofPDF - Com Large Language Models Concepts - John AtkinsonAbutridy
185 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
This 200-Page LLM Guide Will Save You Months - Here's The Gold in 5 Minutes
No ratings yet
This 200-Page LLM Guide Will Save You Months - Here's The Gold in 5 Minutes
22 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
LLMs
No ratings yet
LLMs
18 pages
11-Transformer LLMs Updated
No ratings yet
11-Transformer LLMs Updated
96 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
No ratings yet
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
71 pages
How Different Large Language Models Shape Your Data Observability Strategy 1709132287
No ratings yet
How Different Large Language Models Shape Your Data Observability Strategy 1709132287
23 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
Approaches and Methods in Computational Linguistics
No ratings yet
Approaches and Methods in Computational Linguistics
18 pages
Deep Learning: Large Language Models
No ratings yet
Deep Learning: Large Language Models
58 pages
Mod 4
No ratings yet
Mod 4
69 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
Transformer Models - BERT, GPT, and Beyond
No ratings yet
Transformer Models - BERT, GPT, and Beyond
10 pages
To Create A LLM
No ratings yet
To Create A LLM
53 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Lecture # 13-3 BERT
No ratings yet
Lecture # 13-3 BERT
63 pages
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
LLM - Introduction 2024
No ratings yet
LLM - Introduction 2024
77 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
LLM Learning
No ratings yet
LLM Learning
56 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
AI Tools
No ratings yet
AI Tools
19 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
LLM Ntoes
No ratings yet
LLM Ntoes
1,139 pages
Introduction To Large Language Models
No ratings yet
Introduction To Large Language Models
3 pages
Understanding GPT for Tech Enthusiasts
No ratings yet
Understanding GPT for Tech Enthusiasts
30 pages
Transformers for AI Enthusiasts
No ratings yet
Transformers for AI Enthusiasts
11 pages
Generative AI and LLMS
No ratings yet
Generative AI and LLMS
34 pages
Language Models Presentation
No ratings yet
Language Models Presentation
11 pages
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
No ratings yet
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
12 pages
(English) Introduction To Large Language Models (DownSub - Com)
No ratings yet
(English) Introduction To Large Language Models (DownSub - Com)
9 pages
LLMs and Future Directions in AI
No ratings yet
LLMs and Future Directions in AI
8 pages
Paper Review
No ratings yet
Paper Review
6 pages
BERT for NLP Experts
No ratings yet
BERT for NLP Experts
17 pages
A Perfect Guide To GPT
No ratings yet
A Perfect Guide To GPT
9 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
Pranay Report
No ratings yet
Pranay Report
26 pages
Transformer Models Overview for NLP
No ratings yet
Transformer Models Overview for NLP
5 pages
RM Assignment 4
No ratings yet
RM Assignment 4
5 pages
LLM 1
No ratings yet
LLM 1
6 pages
LLM Review
No ratings yet
LLM Review
16 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
6th International Conference On NLP Trends & Technologies (NLPTT 2025)
No ratings yet
6th International Conference On NLP Trends & Technologies (NLPTT 2025)
2 pages
Don't Do RAG: When Cache-Augmented Generation Is All You Need For Knowledge Tasks
No ratings yet
Don't Do RAG: When Cache-Augmented Generation Is All You Need For Knowledge Tasks
5 pages
Agent Instructs Large Language Models To Be General Zero-Shot Reasoners
No ratings yet
Agent Instructs Large Language Models To Be General Zero-Shot Reasoners
90 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
NLP Week9 Fine Tuning - and - IR
No ratings yet
NLP Week9 Fine Tuning - and - IR
64 pages
Wijayanti 2021
No ratings yet
Wijayanti 2021
6 pages
Murenei - Natural Language Processing With Python and NLTK
No ratings yet
Murenei - Natural Language Processing With Python and NLTK
2 pages
NLP Course for CS & Linguistics Students
No ratings yet
NLP Course for CS & Linguistics Students
6 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
Ai TXT Unit2
No ratings yet
Ai TXT Unit2
14 pages
Soundarya 256 NLP Practs
No ratings yet
Soundarya 256 NLP Practs
14 pages
Can Large Language Models Be An Alternative To Human Evaluation
No ratings yet
Can Large Language Models Be An Alternative To Human Evaluation
25 pages
The Significance of LLM Tokenization
No ratings yet
The Significance of LLM Tokenization
6 pages
100x GenAI Indepth Curriculum
No ratings yet
100x GenAI Indepth Curriculum
6 pages
BLIVA - A Simple Multimodal LLM For Better Handling of Text-Rich Visual Questions
No ratings yet
BLIVA - A Simple Multimodal LLM For Better Handling of Text-Rich Visual Questions
12 pages
TOC Assignment 1 2025-26
No ratings yet
TOC Assignment 1 2025-26
14 pages
BioM-Transformers: Building Large Biomedical Language Models With
No ratings yet
BioM-Transformers: Building Large Biomedical Language Models With
7 pages
Multilingual Machine Translation With Large Language Models
No ratings yet
Multilingual Machine Translation With Large Language Models
16 pages
Intro To NLP
No ratings yet
Intro To NLP
93 pages
Practical Introduction. Ottawa: University of Ottawa Press
No ratings yet
Practical Introduction. Ottawa: University of Ottawa Press
4 pages
GPT-NER - Named Entity Recognition Via Large Language Models
No ratings yet
GPT-NER - Named Entity Recognition Via Large Language Models
21 pages
LLM and Generative AI Report - SDAIA
No ratings yet
LLM and Generative AI Report - SDAIA
23 pages
Analisis Sentimen Twitter Terhadap Capres 2024 Dengan Pendekatan Text Mining Dan Algoritma Naïve Bayes Classifier
No ratings yet
Analisis Sentimen Twitter Terhadap Capres 2024 Dengan Pendekatan Text Mining Dan Algoritma Naïve Bayes Classifier
8 pages
BLEU - Wikipedia
No ratings yet
BLEU - Wikipedia
8 pages
The Statistical Machine Translation
No ratings yet
The Statistical Machine Translation
9 pages
Automata Theory and Logic: Regular Language & Regular Expression
No ratings yet
Automata Theory and Logic: Regular Language & Regular Expression
41 pages

Transformer Basics

Uploaded by

Transformer Basics

Uploaded by

Transformers & LLM Basics

LARGE LANGUAGE MODELS(LLMs)

Textual Entailment Recognition

Encoders Decoders Encoders-Decoders

Models: Models: Models:

▪ BERT: Pre-training of Deep Bidirectional Transformers

▪ Encoder-only architecture that performs two main tasks

▪ When given sentences A and B, it determines if

▪ Used for question answering, classification etc.

As mentioned, training a large model requires a significant amount of

You might also like