0% found this document useful (0 votes)

23 views30 pages

Tut4 - WordEmb NLP

The document provides a tutorial on word embedding techniques, including TF-IDF, Word2Vec, and BERT. It explains how TF-IDF helps in understanding content relevance, while Word2Vec generates dense vector representations of words based on context. BERT, a more advanced method, allows for contextually dependent embeddings, enhancing model performance in various NLP tasks.

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views30 pages

Tut4 - WordEmb NLP

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Tutorial 4

Word Embedding

Xi Chen
E-mail: [email protected]
Outline
• Recap
• TF-IDF Example
• Word2vec Example
• BERT Example
Recap
vector semantics, which instantiates this linguistic vector
semantics embeddings hypothesis by learning representations of the
meaning of words, called embeddings, directly from their distributions
in texts.
• static embeddings
• Word2Vec, Glove, etc.
• contextualized embeddings
• BERT, etc.
TF-IDF Example
TF-IDF is used by search engines to better understand the content that
is undervalued. For example, when you search for “Coke” on Google,

Google may use TF-IDF to figure out if a page titled “COKE” is about:

a) Coca-Cola.
b) Cocaine.
c) A solid, carbon-rich residue derived from the distillation of crude oil.
d) A county in Texas.
TF-IDF Example
For a term t in document d, the weight Wt,d of term t in document d is
given by:

• Wt,d = TFt,d* log (N/DFt)

Where:

• TFt,d is the number of occurrences of t in document d.

• DFt is the number of documents containing the term t.
• N is the total number of documents in the corpus.
TF-IDF Example
How to compute TF-IDF?
Suppose we are looking for documents using the query Q and our
database is composed of the documents D1, D2, and D3.

• Q: The cat.
• D1: The cat is on the mat.
• D2: My dog and cat are the best.
• D3: The locals are playing.
TF-IDF Example
• Let’s compute the TF scores of the words “the” and
“cat” (i.e. the query words) with respect to the
documents D1, D2, and D3.

TF(“the”, D1) = 2
TF(“the”, D2) = 1
TF(“the”, D3) = 1
TF(“cat”, D1) = 1
TF(“cat”, D2) = 1
TF(“cat”, D3) = 0
TF-IDF Example
• Let’s compute the IDF scores of the words “the” and
“cat”
IDF(“the”) = log(3/3) = log(1) = 0
IDF(“cat”) = log(3/2) = 0.18

• Multiplying TF and IDF gives the TF-IDF score of a word in a document.

The higher the score, the more relevant that word is in that particular
document.
TF-IDF(“the”, D1) = 2 * 0 = 0
TF-IDF(“the, D2) = 1 * 0 = 0
TF-IDF(“the”, D3) = 1 * 0 = 0
TF-IDF(“cat”, D1) = 1 * 0.18= 0.18
TF-IDF(“cat, D2) = 1 * 0.18= 0.18
TF-IDF(“cat”, D3) = 0 * 0 = 0
TF-IDF Example
• Order the documents according to the TF-IDF scores
of their words.

Average TF-IDF of D1 = (0 + 0.18) / 2 = 0.09

Average TF-IDF of D2 = (0 + 0.18) / 2 = 0.09
Average TF-IDF of D3 = (0 + 0) / 2 = 0

As a conclusion, when performing the query “The cat” over the

collection of documents D1, D2, and D3, the ranked results would be:

D1=D2>D3
TF-IDF Example
• Implement
Word2vec Example
The first really influential dense word embeddings

• Main idea:

Use a classifier to predict which words appear in the context of (i.e. near) a
target word (or vice versa) This classifier induces a dense vector representation
of words (embedding)

Words that appear in similar contexts (that have high distributional similarity) will
have very similar vector representations.
Word2vec Example
• Two ways to think about Word2Vec:

• a simplification of neural language models

• a binary logistic regression classifier

• Variants of Word2Vec

• CBOW(Continuous Bag of Words)

• Skip-Gram
Word2vec Example
• CBOW/Skip-Gram Architectures
Word2vec Example
• CBOW: predict target from context

Training sentence:

Given the surrounding context words (tablespoon, of,

jam, a), predict the target word (apricot).

Input: each context word is a one-hot vector

Projection layer: map each one-hot vector down to a dense
D-dimensional vector
Output: predict the target word with softmax
Word2vec Example
• Skipgram: predict context from target

Training sentence:

Given the target word (apricot), predict the

surrounding context words (tablespoon, of, jam, a),

Input: each target word is a one-hot vector

Projection layer: map each one-hot vector down to a dense
D-dimensional vector
Output: predict the context word with softmax
Word2vec Example
• Visualize the word2vec
BERT Example
What is BERT?
BERT (Bidirectional Encoder Representations from Transformers), released in late 2018.
BERT is a method of pretraining language representations that was used to create models
that NLP practicioners can then download and use for free. You can either use these
models to extract high quality language features from your text data, or you can fine-tune
these models on a specific task (classification, entity recognition, question answering, etc.)
with your own data to produce state of the art predictions.

Why BERT embeddings?

• useful for keyword/search expansion, semantic search and information retrieval
• these vectors are used as high-quality feature inputs to downstream models.

https://www.youtube.com/watch?v=xI0HHN5XKDo&ab_channel=CodeEmporium
BERT Example
For example, given two sentences:

"The man was accused of robbing a bank."

"The man went fishing by the bank of the river."

Word2Vec would produce the same word embedding for the word "bank" in both
sentences, while under BERT the word embedding for "bank" would be different for each
sentence.

Aside from capturing obvious differences like polysemy, the context-informed word
embeddings capture other forms of information that result in more accurate feature
representations, which in turn results in better model performance.
BERT Example
Contextually dependent vectors:
"After stealing money from the bank vault, the bank robber was seen fishing on the
Mississippi river bank."

Bert embeddings:
BERT Example
Let's calculate the cosine similarity between the vectors to make a more precise
comparison.

Implement:

https://colab.research.google.com/drive/1TRQyU5MtWn9DTuFCnKjbI1cjxh2mh_KW?usp=sharing
BERT Example
What’s the best contextualized embedding for “help” in that context?

The BERT authors tested word-embedding strategies by feeding different vector

combinations as input features to a BiLSTM used on a named entity recognition task
and observing the resulting F1 scores.
BERT Example:Visualize the Bert Embedding
vocabulary embeddings
BERT Example:Visualize the Bert Embedding
vocabulary embeddings
BERT Example:Visualize the Bert Embedding
vocabulary embeddings
BERT Example:Visualize the Bert Embedding
context dependent embeddings: values
BERT Example:Visualize the Bert Embedding
context dependent embeddings: values
Now, we use BERT to embed 15,000 instances of values in sentences drawn from
Wikipedia and Project Gutenberg, then run t-SNE on the embeddings taken from the
final layer.
BERT Example:Visualize the Bert Embedding
context dependent embeddings: values
Zooming in, we find different senses of the word in different areas of the visualization.
The cluster in the lower left corresponds to verbal uses:
BERT Example:Visualize the Bert Embedding
Context dependent embeddings: values
The remaining are mostly nominal uses. On the left are uses of the sense related to
principles or standards:
BERT Example:Visualize the Bert Embedding
Context dependent embeddings: values
To the right we find scientific and mathematical uses; the following shows the lower
right corner:
Thanks

Word2Vec for NLP Enthusiasts
No ratings yet
Word2Vec for NLP Enthusiasts
13 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Word Embeddings in NLP and IR
No ratings yet
Word Embeddings in NLP and IR
31 pages
Natural Language Processing: Lecture # 7
No ratings yet
Natural Language Processing: Lecture # 7
36 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Vector Semantics and Embedding (Part 2)
No ratings yet
Vector Semantics and Embedding (Part 2)
47 pages
Experimenting Word Embedding On The Penn Tree Bank Dataset
No ratings yet
Experimenting Word Embedding On The Penn Tree Bank Dataset
4 pages
Ch4 Word Embeddings
No ratings yet
Ch4 Word Embeddings
21 pages
Wordembed
No ratings yet
Wordembed
31 pages
DLNLP CH-3 N
No ratings yet
DLNLP CH-3 N
11 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
NLP Week9 Fine Tuning - and - IR
No ratings yet
NLP Week9 Fine Tuning - and - IR
64 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Ba LLMS W2 S2 2024 2025
No ratings yet
Ba LLMS W2 S2 2024 2025
47 pages
NLP Session 2
No ratings yet
NLP Session 2
9 pages
Computational Intelligence Endsem
No ratings yet
Computational Intelligence Endsem
8 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
Data Mining Report
No ratings yet
Data Mining Report
17 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
ProgramsGenAI BAIL657C
No ratings yet
ProgramsGenAI BAIL657C
18 pages
Understanding Transformers in AI
No ratings yet
Understanding Transformers in AI
8 pages
Lesson 2 Feature Engineering On Text Data
No ratings yet
Lesson 2 Feature Engineering On Text Data
89 pages
Module 3 - NLP
No ratings yet
Module 3 - NLP
34 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
Foundations of Text Representation, LLMs and Transformers
No ratings yet
Foundations of Text Representation, LLMs and Transformers
87 pages
Word Embeddings 1
No ratings yet
Word Embeddings 1
42 pages
Module03 Embeddings
No ratings yet
Module03 Embeddings
102 pages
NLP - L9 Word Embedding
No ratings yet
NLP - L9 Word Embedding
5 pages
Word Embeddings Notes Cleaned
No ratings yet
Word Embeddings Notes Cleaned
4 pages
Illustrated Word2vec Guide
100% (1)
Illustrated Word2vec Guide
24 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Gen AI 1
No ratings yet
Gen AI 1
4 pages
Word Embeddings & Word2Vec Guide
No ratings yet
Word Embeddings & Word2Vec Guide
9 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
Unit IV
No ratings yet
Unit IV
57 pages
Chapter II
No ratings yet
Chapter II
26 pages
NLP DL Lecture2
No ratings yet
NLP DL Lecture2
54 pages
Bag of Words and TF-IDF
No ratings yet
Bag of Words and TF-IDF
17 pages
Word Embedding 9 Mar 23 PDF
No ratings yet
Word Embedding 9 Mar 23 PDF
16 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
Text Vectorization
No ratings yet
Text Vectorization
18 pages
Lecture1 Word Embeddings
No ratings yet
Lecture1 Word Embeddings
99 pages
Lab 5
No ratings yet
Lab 5
27 pages
Dan Jurafsky and James Martin Speech and Language Processing
No ratings yet
Dan Jurafsky and James Martin Speech and Language Processing
46 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
NLP Asgn2
No ratings yet
NLP Asgn2
7 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
Word 2 Vec
No ratings yet
Word 2 Vec
33 pages
Lect33 Textcat
No ratings yet
Lect33 Textcat
70 pages
Syntactic and Dependency Parsing
No ratings yet
Syntactic and Dependency Parsing
159 pages
Bag - of - Words NLP
No ratings yet
Bag - of - Words NLP
23 pages
2DI90 chID190-CH5
No ratings yet
2DI90 chID190-CH5
62 pages
ch07 Consistency Replication
No ratings yet
ch07 Consistency Replication
30 pages
2DI90 ch9
No ratings yet
2DI90 ch9
83 pages
Primes
No ratings yet
Primes
39 pages
Slides08 LR Parsing
No ratings yet
Slides08 LR Parsing
25 pages
New Trends For Authentication
No ratings yet
New Trends For Authentication
5 pages
Reduction Proofs
No ratings yet
Reduction Proofs
9 pages
Imc Shift-Cipher
No ratings yet
Imc Shift-Cipher
17 pages
10 Estimators Pre Lecture
No ratings yet
10 Estimators Pre Lecture
109 pages
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 NLP
No ratings yet
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 NLP
126 pages
Jarrar LectureNotes Ch1 Introduction
No ratings yet
Jarrar LectureNotes Ch1 Introduction
18 pages
13-Neuralcrf Pos Tagging
No ratings yet
13-Neuralcrf Pos Tagging
40 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
2DI90 ch11
No ratings yet
2DI90 ch11
54 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
07 Covariance Answers Hidden Lecture
No ratings yet
07 Covariance Answers Hidden Lecture
62 pages
13-Oo-Opolymorphism PLC
No ratings yet
13-Oo-Opolymorphism PLC
15 pages
Ch. 1 Notes
No ratings yet
Ch. 1 Notes
11 pages
04-Textcat Text Class
No ratings yet
04-Textcat Text Class
77 pages
3 - Slides Corpus3
No ratings yet
3 - Slides Corpus3
88 pages
02 Random Vars All Handout
No ratings yet
02 Random Vars All Handout
23 pages
4 - Slides Regualer Expression
No ratings yet
4 - Slides Regualer Expression
75 pages
2.BasicTextProcessing NEW
No ratings yet
2.BasicTextProcessing NEW
39 pages
POS Tagging
No ratings yet
POS Tagging
63 pages
01-Bayes-All-Handout Prob
No ratings yet
01-Bayes-All-Handout Prob
28 pages
01-Introduction PLC
No ratings yet
01-Introduction PLC
53 pages
2 Corpora and Smoothing
No ratings yet
2 Corpora and Smoothing
85 pages
Top 41+ Natural Language Processing Class 10 MCQ - CBSE Skill Education
No ratings yet
Top 41+ Natural Language Processing Class 10 MCQ - CBSE Skill Education
15 pages
TF Idf
100% (3)
TF Idf
38 pages
NLP Essentials for AI Enthusiasts
No ratings yet
NLP Essentials for AI Enthusiasts
4 pages
Pazzani - Content-Based Recommender Systems
No ratings yet
Pazzani - Content-Based Recommender Systems
17 pages
Hindi Speech Analysis for Alzheimer's Detection
No ratings yet
Hindi Speech Analysis for Alzheimer's Detection
7 pages
Sentiment Analysis On Manipuri Language
No ratings yet
Sentiment Analysis On Manipuri Language
46 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
15 pages
AI Driven Intelligent System For Personalized Job Recommendations Using Real Time Skill Identification and Industry Trend Analysis 1
No ratings yet
AI Driven Intelligent System For Personalized Job Recommendations Using Real Time Skill Identification and Industry Trend Analysis 1
7 pages
ADBMS Lab Manual New
No ratings yet
ADBMS Lab Manual New
24 pages
Enhancing Fake News Detection by Multi-Feature Classification
No ratings yet
Enhancing Fake News Detection by Multi-Feature Classification
13 pages
2564-Article Text-7906-1-10-20230930
No ratings yet
2564-Article Text-7906-1-10-20230930
14 pages
Bangla Text Summarization Using Natural Language Processing
No ratings yet
Bangla Text Summarization Using Natural Language Processing
6 pages
Topic Detection and Extraction in Chat
No ratings yet
Topic Detection and Extraction in Chat
8 pages
A Deep-Learned Embedding Technique For Categorical Features Encoding
No ratings yet
A Deep-Learned Embedding Technique For Categorical Features Encoding
11 pages
Deng Et Al. - 2019 - Feature Selection For Text Classification A Review
No ratings yet
Deng Et Al. - 2019 - Feature Selection For Text Classification A Review
20 pages
Bankira Et Al. - 2023 - Automatic Extractive Text Summarization For Ho Language
No ratings yet
Bankira Et Al. - 2023 - Automatic Extractive Text Summarization For Ho Language
6 pages
Q - ClassX - AI - NATURAL LANGUAGE PROCESSING
No ratings yet
Q - ClassX - AI - NATURAL LANGUAGE PROCESSING
10 pages
1 RV
No ratings yet
1 RV
13 pages
Azure AI Search for Organizations
No ratings yet
Azure AI Search for Organizations
34 pages
Unit 3
No ratings yet
Unit 3
70 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
TF-IDF: Feature Extraction Guide
No ratings yet
TF-IDF: Feature Extraction Guide
18 pages
Unit Iii
No ratings yet
Unit Iii
100 pages
SmartBerg BERT
No ratings yet
SmartBerg BERT
14 pages
Convert Text Documents To A TF
No ratings yet
Convert Text Documents To A TF
7 pages
Automated Document Classification in Construction
No ratings yet
Automated Document Classification in Construction
12 pages
Assignment No - 7
No ratings yet
Assignment No - 7
4 pages
E-commerce Fraud Detection with NLP
No ratings yet
E-commerce Fraud Detection with NLP
43 pages
Class 10 AI Exam Answer Key
No ratings yet
Class 10 AI Exam Answer Key
5 pages
NLP 101 - Machine Learning Seminar 2017
100% (1)
NLP 101 - Machine Learning Seminar 2017
30 pages

Tut4 - WordEmb NLP

Uploaded by

Tut4 - WordEmb NLP

Uploaded by

Tutorial 4

• Wt,d = TFt,d* log (N/DFt)

• TFt,d is the number of occurrences of t in document d.

• Multiplying TF and IDF gives the TF-IDF score of a word in a document.

Average TF-IDF of D1 = (0 + 0.18) / 2 = 0.09

As a conclusion, when performing the query “The cat” over the

• a simplification of neural language models

• CBOW(Continuous Bag of Words)

Given the surrounding context words (tablespoon, of,

Input: each context word is a one-hot vector

Given the target word (apricot), predict the

Input: each target word is a one-hot vector

Why BERT embeddings?

"The man was accused of robbing a bank."

"The man went fishing by the bank of the river."

The BERT authors tested word-embedding strategies by feeding different vector

You might also like