Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
96 views49 pages

Understanding Language Models & Transformers

Uploaded by

RAUSHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views49 pages

Understanding Language Models & Transformers

Uploaded by

RAUSHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Language Models

&
The transformer
NLP

NLU NLG
NLP

NLU NLG
understanding generation
NLP

NLU NLG
understanding generation

Syntax semantics
NLP

NLU NLG
understanding generation

Syntax semantics Statistica


l
What exactly is a
language model??
LANGUAGE
representation
LANGUAGE
representation

Feature embedding
extractio Deep L,s LLMs
Classical
n ML
WORD EMBEDDINGS

Representing words by a
vector of numbers

How long should


these vectors be?
How do we know that the vector representations are correct?
Pre-trained embeddings available for download

Every word has a fixed embedding independent of the


context in which it occurs in a sentence.
THE TRANSFORMER MODEL
What’s the difference between these two devices
in terms of how they treat the incoming information and data?
Why is the one on the left considered to be intelligent,
and the one on the right considered to be dumb?
Intelligence is about being able to figure out the essence of a topic,
and not just memorizing facts.
AUTO-ENCODERS
[lossy compression]
Decoder
Encoder
Non-contextual
Token Embeddings
Non-contextual
Token Embeddings
The Attention
Mechanism

Taking care of the


word sequence is
important, but there
are also long range
dependencies
between words.
Important for tasks
like translation
What exactly is a
language model?
How are transformer embeddings
different from word2vec?
How are transformer embeddings
different from ELMO embeddings?
What exactly is an
auto-encoder?
In the transformer model,
what does an encoder do?
In the transformer model,
what does a decoder do?
What does the
encoder block contain?
What does the
decoder block contain?
How is the Self-attention of
decoder block different from
that of encoder block?
Why does not the GPT model
have any encoder block?
Why does not the BERT model
have any decoder block?
How does
tokenization work?
SUB-WORD TOKENIZERS for TRANSFORMERS

Tokenizer By Used In Merge Criteria Advantage


WordPiece Google BERT Normalized Score More context
Byte Pair
Philip Gage GPT Sub-word frequency Faster training
Encoding (BPE)
Llama,
Language
SentencePiece Google XLNet, Same as BPE
independent
T5, PaLM

You might also like