Language Models
&
The transformer
NLP
NLU NLG
NLP
NLU NLG
understanding generation
NLP
NLU NLG
understanding generation
Syntax semantics
NLP
NLU NLG
understanding generation
Syntax semantics Statistica
l
What exactly is a
language model??
LANGUAGE
representation
LANGUAGE
representation
Feature embedding
extractio Deep L,s LLMs
Classical
n ML
WORD EMBEDDINGS
Representing words by a
vector of numbers
How long should
these vectors be?
How do we know that the vector representations are correct?
Pre-trained embeddings available for download
Every word has a fixed embedding independent of the
context in which it occurs in a sentence.
THE TRANSFORMER MODEL
What’s the difference between these two devices
in terms of how they treat the incoming information and data?
Why is the one on the left considered to be intelligent,
and the one on the right considered to be dumb?
Intelligence is about being able to figure out the essence of a topic,
and not just memorizing facts.
AUTO-ENCODERS
[lossy compression]
Decoder
Encoder
Non-contextual
Token Embeddings
Non-contextual
Token Embeddings
The Attention
Mechanism
Taking care of the
word sequence is
important, but there
are also long range
dependencies
between words.
Important for tasks
like translation
What exactly is a
language model?
How are transformer embeddings
different from word2vec?
How are transformer embeddings
different from ELMO embeddings?
What exactly is an
auto-encoder?
In the transformer model,
what does an encoder do?
In the transformer model,
what does a decoder do?
What does the
encoder block contain?
What does the
decoder block contain?
How is the Self-attention of
decoder block different from
that of encoder block?
Why does not the GPT model
have any encoder block?
Why does not the BERT model
have any decoder block?
How does
tokenization work?
SUB-WORD TOKENIZERS for TRANSFORMERS
Tokenizer By Used In Merge Criteria Advantage
WordPiece Google BERT Normalized Score More context
Byte Pair
Philip Gage GPT Sub-word frequency Faster training
Encoding (BPE)
Llama,
Language
SentencePiece Google XLNet, Same as BPE
independent
T5, PaLM