0% found this document useful (0 votes)

7 views8 pages

Natural Language Processing - Personal Notes

Uploaded by

xedac78301

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views8 pages

Natural Language Processing - Personal Notes

Uploaded by

xedac78301

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

NLP Study Session - January 2025

Natural Language Processing (NLP)

Teaching Machines to Understand Human Language

Core Challenge: Human language is ambiguous, contextual, and constantly evolving.

NLP bridges the gap between human communication and computer understanding.

1. What is NLP?

Natural Language Processing sits at the intersection of linguistics, computer science, and AI.
It's about making computers understand, interpret, and generate human language in a valuable
way.

Remember: Language is not just words - it's context, culture, emotion, and
meaning!

NLP VS COMPUTATIONAL LINGUISTICS:

NLP: Engineering-focused, build systems that work

Comp Linguistics: Science-focused, understand language computationally

2. The NLP Pipeline

Raw Text → Tokenization → Normalization → POS Tagging → Named

Entity Recognition → Dependency Parsing → Feature Extraction →
Model Application → Output

STEP-BY-STEP BREAKDOWN:

1. Text Preprocessing

Tokenization: Split text into words/subwords/characters

Lowercasing: "Hello" → "hello"

Removing punctuation: "Hello!" → "Hello"

Removing stop words: "the", "is", "at", etc.

Stemming: "running" → "run"

Lemmatization: "better" → "good" (context-aware)

Example Pipeline:
Original: "The cats are running quickly!"
Tokenized: ["The", "cats", "are", "running", "quickly", "!"]
Lowercased: ["the", "cats", "are", "running", "quickly", "!"]
Stop words removed: ["cats", "running", "quickly"]
Lemmatized: ["cat", "run", "quickly"]

3. Traditional NLP Techniques

A. BAG OF WORDS (BOW)

Document = {word1: count1, word2: count2, ...} "I love NLP"

→ {I: 1, love: 1, NLP: 1}

Simple but loses word order and context!

B. TF-IDF (TERM FREQUENCY - INVERSE DOCUMENT

FREQUENCY)

TF-IDF(t,d) = TF(t,d) × IDF(t) TF(t,d) = (# of times term t

appears in doc d) / (total # of terms in d) IDF(t) = log(N
/ df(t)) where N = total documents, df(t) = documents
containing term t

Highlights important words while downweighting common ones!

C. N-GRAMS

Unigram: ["I", "love", "NLP"]

Bigram: ["I love", "love NLP"]

Trigram: ["I love NLP"]

Captures local context but suffers from sparsity!

4. Part-of-Speech (POS) Tagging

Word POS Tag Description

The DT Determiner

cat NN Noun

sits VBZ Verb (3rd person singular)

quietly RB Adverb

5. Named Entity Recognition (NER)

Input: "Apple Inc. was founded by Steve Jobs in Cupertino."

Output:
- Apple Inc. → ORGANIZATION
- Steve Jobs → PERSON
- Cupertino → LOCATION

6. Word Embeddings - The Game Changer

Key Insight: Represent words as dense vectors in high-dimensional space where similar
words are close together!

WORD2VEC

CBOW (Continuous Bag of Words): Predict word from context

Skip-gram: Predict context from word

king - man + woman ≈ queen (Word arithmetic works in

embedding space!)

GLOVE (GLOBAL VECTORS)

Combines global matrix factorization with local context windows

FASTTEXT

Extends Word2Vec by considering character n-grams - handles OOV words!

7. Modern NLP - The Transformer Era

2017: "Attention Is All You Need" paper changed everything!

THE ATTENTION MECHANISM

Attention(Q, K, V) = softmax(QK^T / √d_k) × V Q = Queries K

= Keys V = Values d_k = dimension of keys

BERT (BIDIRECTIONAL ENCODER REPRESENTATIONS FROM

TRANSFORMERS)

Pre-trained on massive text corpus

Bidirectional context understanding

Fine-tune for specific tasks

Masked Language Modeling (MLM) objective

GPT SERIES (GENERATIVE PRE-TRAINED TRANSFORMER)

Autoregressive language modeling

Unidirectional (left-to-right)

Excellent for generation tasks

GPT-3: 175B parameters!

8. Common NLP Tasks

Classification Tasks:

Sentiment Analysis - Positive/Negative/Neutral

Spam Detection - Spam/Not Spam

Topic Classification - News categories

Intent Detection - User's intention

Sequence Labeling Tasks:

NER - Entity recognition

POS Tagging - Grammatical roles

Chunking - Phrase identification

Generation Tasks:

Machine Translation - Language A → Language B

Text Summarization - Long → Short

Question Answering - Context + Question → Answer

Dialogue Systems - Chatbots

9. Evaluation Metrics

FOR CLASSIFICATION:

Accuracy: Overall correctness

Precision/Recall/F1: For imbalanced datasets

ROC-AUC: For binary classification

FOR GENERATION:

BLEU: Machine translation quality

ROUGE: Summarization quality

Perplexity: Language model quality

Human Evaluation: Still the gold standard!

10. Python Libraries & Tools

Essential NLP Toolkit:

NLTK - Classic, educational, lots of resources

spaCy - Fast, production-ready, great for NER/POS

Gensim - Topic modeling, word embeddings

Transformers (HuggingFace) - State-of-the-art models

TextBlob - Simple API for common tasks

Stanford CoreNLP - Java-based, very comprehensive

11. Challenges in NLP

The Hard Problems:

Ambiguity: "I saw her duck" (verb or noun?)

Sarcasm/Irony: "Great, another meeting!" (not actually great)

Context: "Bank" (financial or river?)

Multilingual: Different languages, different structures

Domain Adaptation: Medical vs Legal vs Casual text

Bias: Models learn societal biases from data

12. Advanced Topics

ZERO-SHOT & FEW-SHOT LEARNING

Perform tasks without task-specific training data!

CROSS-LINGUAL MODELS

Models that work across multiple languages (mBERT, XLM-R)

PROMPT ENGINEERING

The art of crafting inputs to get desired outputs from LLMs

RETRIEVAL-AUGMENTED GENERATION (RAG)

Combine retrieval with generation for factual, up-to-date responses

13. Real-World Applications

Industry Applications

Healthcare Clinical notes analysis, drug discovery

Finance Sentiment analysis for trading, document processing

Legal Contract analysis, legal research

Customer Service Chatbots, ticket routing

Education Automated grading, personalized tutoring

14. Code Example - Sentiment Analysis

from transformers import pipeline

# Load pre-trained model

classifier = pipeline("sentiment-analysis")

# Analyze sentiment
texts = [
"I love this product!",
"This is terrible.",
"It's okay, nothing special."
]

results = classifier(texts)
for text, result in zip(texts, results):
print(f"{text} → {result['label']}: {result['score']:.3f}")

15. Future Directions

What's Next in NLP?

Multimodal models (text + image + audio)

More efficient models (smaller, faster)

Better reasoning capabilities

Improved factuality and reduced hallucinations

Personal AI assistants

Real-time translation breaking language barriers

16. Study Tips & Resources

My Learning Path:

1. Master regex and basic text processing

2. Understand traditional methods (BoW, TF-IDF)

3. Learn word embeddings thoroughly

4. Dive into transformers and attention

5. Practice with real datasets (Kaggle, papers)

6. Build end-to-end projects

"Attention Is All You Need" (2017)

"BERT: Pre-training of Deep Bidirectional Transformers" (2018)

"Language Models are Few-Shot Learners" (GPT-3, 2020)

"Language is the foundation of human intelligence - teaching machines to understand it is teaching them to
think"

ETH Zurich Talk - April 14, 2025
No ratings yet
ETH Zurich Talk - April 14, 2025
84 pages
NLP (Natural Language Processing) Student Book
No ratings yet
NLP (Natural Language Processing) Student Book
16 pages
NLP Notes Unit 1
No ratings yet
NLP Notes Unit 1
179 pages
100 ANN MCQs Complete
No ratings yet
100 ANN MCQs Complete
26 pages
N LP Notes Detailed
No ratings yet
N LP Notes Detailed
12 pages
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
No ratings yet
Seminar Title: Natural Language Processing: Understanding and Generating Human Language
20 pages
NLP Short Notes
No ratings yet
NLP Short Notes
21 pages
NLPX
No ratings yet
NLPX
3 pages
NLP Presentation
No ratings yet
NLP Presentation
15 pages
LLaMA-Adapter: Efficient LLM Tuning
No ratings yet
LLaMA-Adapter: Efficient LLM Tuning
30 pages
EE331 2024F Introduction To Machine Learning Syllabus
No ratings yet
EE331 2024F Introduction To Machine Learning Syllabus
3 pages
Introduction To NLP - First - Week - Lecture - 1st
No ratings yet
Introduction To NLP - First - Week - Lecture - 1st
6 pages
NLP Week 1 20
No ratings yet
NLP Week 1 20
20 pages
Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
61 pages
Natural Language Processing (NLP) : Key Terms in NLP
No ratings yet
Natural Language Processing (NLP) : Key Terms in NLP
3 pages
NLP Unit1 Presentation
No ratings yet
NLP Unit1 Presentation
65 pages
Chap 1
No ratings yet
Chap 1
54 pages
Capstone1problemstatement
No ratings yet
Capstone1problemstatement
16 pages
SNLP - 1
No ratings yet
SNLP - 1
11 pages
TOPIC 4 Natural Language Processing
No ratings yet
TOPIC 4 Natural Language Processing
26 pages
Bca 6 Sem Machine Learning 91697 Jan 2023
No ratings yet
Bca 6 Sem Machine Learning 91697 Jan 2023
2 pages
NLP Record300
No ratings yet
NLP Record300
24 pages
Lecture 3.1.1
No ratings yet
Lecture 3.1.1
19 pages
Machine Learning and Deep Learning Based Intrusion Detection in Cloud Environment A Review
No ratings yet
Machine Learning and Deep Learning Based Intrusion Detection in Cloud Environment A Review
9 pages
NLP AI Professional Presentation 2
No ratings yet
NLP AI Professional Presentation 2
18 pages
Wisdom Natural Language Processing
No ratings yet
Wisdom Natural Language Processing
4 pages
01 Introduction To Natural Language Processing
No ratings yet
01 Introduction To Natural Language Processing
42 pages
A Memristor-Based Spiking Neural Network With High Scalability and Learning Efficiency
No ratings yet
A Memristor-Based Spiking Neural Network With High Scalability and Learning Efficiency
5 pages
Unit 4
No ratings yet
Unit 4
39 pages
NLP Chapter 1
No ratings yet
NLP Chapter 1
1 page
NLP AI Detailed Presentation
No ratings yet
NLP AI Detailed Presentation
18 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Facial Expression-Based Emotion Detection For Adaptive Teaching in Educational Environments
No ratings yet
Facial Expression-Based Emotion Detection For Adaptive Teaching in Educational Environments
7 pages
CNN: Innovations and Future Prospects
No ratings yet
CNN: Innovations and Future Prospects
21 pages
NLP Sheets
No ratings yet
NLP Sheets
23 pages
NLP Intermediate Presentation
No ratings yet
NLP Intermediate Presentation
7 pages
Roadmap For Mastering Natural Language Processing
No ratings yet
Roadmap For Mastering Natural Language Processing
3 pages
Introduction To NLP - First - Week - Lecture - 2st
No ratings yet
Introduction To NLP - First - Week - Lecture - 2st
4 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
NLP CH 1
No ratings yet
NLP CH 1
8 pages
Ai Applications Unit-1
No ratings yet
Ai Applications Unit-1
11 pages
Reading4 NLP
No ratings yet
Reading4 NLP
64 pages
BE02000041 Funda of AI Unit 2 NLP
No ratings yet
BE02000041 Funda of AI Unit 2 NLP
16 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
Project PPT - 1
No ratings yet
Project PPT - 1
24 pages
Natural Language Processing
No ratings yet
Natural Language Processing
87 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
88 pages
Deep Learning Schizophrenic
No ratings yet
Deep Learning Schizophrenic
16 pages
Gradient Descent Explained Simply
No ratings yet
Gradient Descent Explained Simply
16 pages
KNN ALGORITHM - Ipynb - Colab
No ratings yet
KNN ALGORITHM - Ipynb - Colab
4 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
SignExplainer An Explainable AI-Enabled Framework For Sign Language Recognition With Ensemble Learning
No ratings yet
SignExplainer An Explainable AI-Enabled Framework For Sign Language Recognition With Ensemble Learning
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
1 NLP
No ratings yet
1 NLP
26 pages
Practice Final
No ratings yet
Practice Final
16 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
(Semester Scheme - NEP Syllabus - 2022-23) : Answer Any Two Full Questions, Choosing One Full Question From Each Module
No ratings yet
(Semester Scheme - NEP Syllabus - 2022-23) : Answer Any Two Full Questions, Choosing One Full Question From Each Module
2 pages
Eco 36
No ratings yet
Eco 36
6 pages
Postgraduate PG - Master Computer Applications Mca - Semester 3 - 2024 - May - Knowledge Representation Artificial Intelligence 2020 Pattern
No ratings yet
Postgraduate PG - Master Computer Applications Mca - Semester 3 - 2024 - May - Knowledge Representation Artificial Intelligence 2020 Pattern
4 pages
Natural Language Processing - Bridging The Gap Between Humans and Machines
No ratings yet
Natural Language Processing - Bridging The Gap Between Humans and Machines
6 pages
NLP Module 1
No ratings yet
NLP Module 1
31 pages
Unit 4
No ratings yet
Unit 4
27 pages
NLP Intro Logistics MIHE
No ratings yet
NLP Intro Logistics MIHE
21 pages
Ai 2
No ratings yet
Ai 2
7 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
Single-Image Crowd Counting Via Multi-Column Convolutional Neural Network 2016
No ratings yet
Single-Image Crowd Counting Via Multi-Column Convolutional Neural Network 2016
9 pages
SCO409 Lecture Notes
No ratings yet
SCO409 Lecture Notes
64 pages
Module-1 Introduction To NLP
No ratings yet
Module-1 Introduction To NLP
28 pages
What Is Chat GPT?
No ratings yet
What Is Chat GPT?
7 pages
Kannada Imagenet: A Dataset For Image Classification in Kannada
No ratings yet
Kannada Imagenet: A Dataset For Image Classification in Kannada
4 pages
NLP Coding Guide for Beginners
No ratings yet
NLP Coding Guide for Beginners
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
3 pages
Seminar Outline NLP
No ratings yet
Seminar Outline NLP
5 pages
Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville
No ratings yet
Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville
38 pages
Module-I NLP
No ratings yet
Module-I NLP
35 pages
Ieee Conference Paper Template
No ratings yet
Ieee Conference Paper Template
5 pages
NLP
No ratings yet
NLP
3 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
Introduction To Data Science - Week 7 - LAQ's
No ratings yet
Introduction To Data Science - Week 7 - LAQ's
4 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
21 pages
NLP Guide: Theory & Practice
No ratings yet
NLP Guide: Theory & Practice
26 pages
Mask R-CNN for Object Detection
No ratings yet
Mask R-CNN for Object Detection
5 pages
202104 - 공공분야 인공지능 도입 실무 안내서 PDF
No ratings yet
202104 - 공공분야 인공지능 도입 실무 안내서 PDF
74 pages
Credit Card Fraud Deteciton Using SVM
No ratings yet
Credit Card Fraud Deteciton Using SVM
19 pages
2007 02 01b Janecek Perceptron
No ratings yet
2007 02 01b Janecek Perceptron
37 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Google NLP: NLP (Natural Language Processing)
No ratings yet
Google NLP: NLP (Natural Language Processing)
8 pages