Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views9 pages

NLP Session 2

Uploaded by

2023ac05191
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

NLP Session 2

Uploaded by

2023ac05191
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Word Embeddings Topics

NLP Session 2-Vector semantics and Embedding by Dr. Chetana Gavankar [Autosaved].pdf
PDF

List all topics covered in this pdf

The topics covered in the document are as follows:

1. Vector Semantics and Word Embedding:


Lexical semantics
Vector semantics
Word vectors
TF-IDF
Word2Vec (Skip-gram, CBOW)
GloVe
Visualizing embeddings
2. Lexical Semantics:
Morphology
Language modeling
Distributional hypothesis
Synonymy, similarity, and relatedness
Antonymy and connotation
3. Word Similarity and Relatedness:
Semantic fields
Applications in question answering, plagiarism detection, etc.
4. Computational Models of Word Meaning:
Defining words by usage
Word embeddings as vectors
Importance of context and distribution
5. Types of Word Embeddings:
Frequency-based (Count vectors, TF-IDF, Co-occurrence)
Prediction-based (Word2Vec, GloVe)
6. Techniques for Embedding:
Word2Vec (Skip-gram, CBOW)
Training embeddings (positive and negative examples)
Stochastic Gradient Descent (SGD)
7. Embedding Properties:
Sparse vs. dense vectors
Contextual embeddings (e.g., ELMo, BERT)
Historical and cultural bias in embeddings

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/9
8. Applications of Word Embeddings:
Information retrieval
Analogical reasoning
Sentiment analysis
9. Advanced Topics:
Diachronic word embeddings
Bias quantification in embeddings
Optimization of computational efficiency in embeddings
10. Comparative Methods:
Skip-gram vs. CBOW
Use cases and advantages of each method
11. Visualization and Interpretation:
Visualizing word embeddings
Analogical relations using embeddings
12. GloVe Specifics:
Word-word co-occurrence matrix
Ratios of conditional probabilities
Intuition and advantages of GloVe over Word2Vec
13. References and Resources:
Links to related literature, tutorials, and videos for further understanding.

If you need detailed insights or explanations for any specific topic, feel free to ask!

Explain each topic in detail with a numerical example where ever is required. Also show flow
diagram or chart where ever necessary

This is a comprehensive task, and I will break down each topic from the document in detail, with
numerical examples, flow diagrams, and charts where necessary. Here's the plan:

1. Topic Explanation: Each topic will be explained conceptually.


2. Numerical Examples: Relevant topics will include numerical examples for better understanding.
3. Flow Diagrams or Charts: Topics involving processes or workflows will be complemented with
flow diagrams or charts.

Let’s start with Vector Semantics and Word Embedding:

1. Vector Semantics and Word Embedding

Concept:

Vector Semantics: Words are represented as points in a multi-dimensional space, where the
distances between points indicate semantic similarity. For example, words like "king" and "queen"
are closer in this space than "king" and "cat."

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/9
Word Embedding: A technique to map words into dense vectors of real numbers which capture
semantic meanings. Word embeddings are used in NLP to analyze and predict text-based tasks.

Numerical Example:

Suppose we have the following sentences:

1. "I love apples."


2. "Apples are delicious."

The co-occurrence matrix for the words could look like this:

Word I love apples are delicious


I 0 1 1 0 0

love 1 0 1 0 0
apples 1 1 0 1 1
are 0 0 1 0 1

delicious 0 0 1 1 0

Each row or column represents a vector for that word. For example:

Vector for "apples": [1, 1, 0, 1, 1]

2. Lexical Semantics

Concept:

The study of word meanings and their relationships, such as synonymy (similar meanings),
antonymy (opposite meanings), and polysemy (multiple meanings).
Distributional Hypothesis: Words that appear in similar contexts have similar meanings.

Numerical Example:

Consider the sentences:

1. "The cat sat on the mat."


2. "The dog lay on the carpet."

The co-occurrence matrix for "cat" and "dog" might show similar contexts like "sat" and "lay":

Word cat dog sat lay mat carpet


cat 0 0 1 0 1 0
dog 0 0 0 1 0 1

Here, "cat" and "dog" are related because they co-occur with similar words ("mat" and "carpet").

3. TF-IDF

Concept:

Term Frequency (TF): Frequency of a word in a document.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/9
Inverse Document Frequency (IDF): A measure of how unique a word is across all documents.

Formula:

TF-IDF(t, d) = TF(t, d) × IDF(t)

Where:

count(t, d) N
TF(t, d) = , IDF(t) = log
total words in d DF(t)
​ ​

N : Total number of documents


DF (t): Number of documents containing t

Numerical Example:

Suppose we have two documents:

1. Doc1: "Apples are sweet."


2. Doc2: "Apples are tasty."

TF of "apples" in Doc1: TF= 13 = 0.33 ​

IDF of "apples": IDF = log 22 = 0 (since it appears in both documents)


TF-IDF of "apples": TF − IDF = 0.33 × 0 = 0

4. Word2Vec

Concept:

Two key architectures:


Skip-gram: Predict surrounding words (context) given a target word.
CBOW: Predict the target word given surrounding words.

Flow Diagram:

mathematica

Input: One-hot encoding of "apple"



Hidden Layer: Word embedding vector

Output Layer: Softmax probabilities for context words

Numerical Example:

For the sentence: "I love apples," the Skip-gram model might create training pairs:

("I", "love"), ("love", "apples")

The model learns embeddings to maximize the similarity between pairs.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/9
5. GloVe

Concept:

Combines global co-occurrence statistics and local context for word embedding.

Example:

Using the co-occurrence matrix:

Word apple orange fruit

apple 10 3 20
orange 3 8 15
fruit 20 15 50

The ratio of co-occurrence probabilities is used to compute embeddings:

P (i∣j)
Ratio =
P (k∣j)

6. Word Similarity

Concept:

Dot Product: Measures similarity between word vectors.


Cosine Similarity: Normalized dot product to account for vector length.

Formula:

A⋅B
Cosine Similarity = ​

∣∣A∣∣ ⋅ ∣∣B ∣∣

Numerical Example:

Given two vectors:

A = [1, 2, 3]
B = [4, 5, 6]

Dot Product = 1 ⋅ 4 + 2 ⋅ 5 + 3 ⋅ 6 = 32
32
Cosine Similarity = = 0.9746
12 + 22 + 32 ⋅ 42 + 52 + 62

​ ​

7. Applications

Question Answering: Using embeddings to retrieve relevant answers.


Plagiarism Detection: Semantic similarity measures to compare texts.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/9
This detailed explanation has been started for the first few topics. I will continue to elaborate on
remaining topics in subsequent parts. Let me know if you'd like me to proceed with the next topics.

please proceed

Continuing with the explanation of topics:

8. Skip-gram and CBOW in Word2Vec

Concept:

Skip-gram: Predicts context words for a given center word.


CBOW (Continuous Bag of Words): Predicts the center word using surrounding context words.

Differences:

Feature Skip-gram CBOW


Prediction Target Context words Center word

Training Data Works well with small data Requires large data

Focus Rare words Frequent words


Training Speed Slower Faster

Numerical Example:

Sentence: "I enjoy learning NLP."

Skip-gram:
Context window = 2
Training pairs: ("I", "enjoy"), ("I", "learning"), ("enjoy", "I"), ("enjoy", "learning"), etc.
CBOW:
Input: ["I", "learning"]
Output: "enjoy"

Flow Diagram:

Skip-gram:

vbnet

Input: "NLP"

Hidden Layer: Embedding vector for "NLP"

Output: Predicts words like "learning" and "enjoy"

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/9
CBOW:

vbnet

Input: Words ["I", "learning"]



Hidden Layer: Aggregated embeddings

Output: Predicts "enjoy"

9. GloVe

Concept:

Focuses on the global word co-occurrence matrix and uses ratios of conditional probabilities for
embedding computation.

Numerical Example:

Suppose the co-occurrence matrix X is:

Word apple fruit sweet


apple 50 30 20

fruit 30 40 10

sweet 20 10 25

Compute P (word∣context):

30 20
P (apple∣fruit) = = 0.75, P (apple∣sweet) = = 0.8
40 25
​ ​

Ratio:

P (apple∣fruit) 0.75
Ratio = = = 0.9375
P (apple∣sweet) 0.8
​ ​

This ratio is used to learn embeddings.

10. Word Embedding Visualization

Concept:

Techniques like t-SNE or PCA reduce high-dimensional embeddings to 2D or 3D for visualization.

Example:

Words such as "king," "queen," "man," "woman" may cluster based on gender and royalty.

Visualization Chart:

Using t-SNE, a 2D plot might show:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/9
css

woman--------queen
| |
| |
man---------king

11. Semantic Relations in Word Embeddings

Concept:

Analogies: king − man + woman = queen


Relations like synonyms, antonyms, or analogical reasoning.

Numerical Example:

Given vectors:

king = [0.8, 0.6], man = [0.7, 0.4], woman = [0.6, 0.8]

Compute:

queen = king − man + woman

queen = [0.8, 0.6] − [0.7, 0.4] + [0.6, 0.8] = [0.7, 1.0]

12. Bias in Word Embeddings

Concept:

Word embeddings can reflect societal biases present in training data.


Example: "man is to computer programmer as woman is to homemaker."

Visualization:

Biased embeddings:

css

man---programmer
|
|
woman---homemaker

13. TF-IDF vs Word2Vec

Aspect TF-IDF Word2Vec


Vector Type Sparse Dense

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/9
Aspect TF-IDF Word2Vec

Basis Term frequency Learned embeddings


Context Awareness No Yes

Use Case Document retrieval Semantic understanding

14. Applications of Word Embeddings

1. Question Answering:
Use embeddings to retrieve the most relevant answers.
2. Plagiarism Detection:
Compare text similarity using cosine similarity.
3. Sentiment Analysis:
Use embeddings to identify sentiments in a document.

15. Skip-gram Training Workflow

Flow Diagram:

1. Input Sentence: "I like apples."


2. Training Window: Context size = 2
3. Training Pairs:
Positive: ("I", "like"), ("like", "I"), etc.
Negative: ("I", "bananas"), etc.

Steps:

Compute dot product for each word pair.


Apply sigmoid to get probabilities.
Adjust embeddings via SGD.

16. Final Summary

Key Concepts:

1. Vector Semantics: Words are vectors in space.


2. Word Embeddings: Represent words in dense vectors.
3. Word2Vec and GloVe: Different techniques for embedding learning.
4. Applications: From question answering to bias analysis.

If you need further expansion on specific sections or additional diagrams, let me know!
Is this conversation helpful so far?

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/9

You might also like