Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views85 pages

Lesson 6 NLP With Machine Learning and Deep Learning

The document covers various aspects of Natural Language Processing (NLP) with a focus on machine learning and deep learning techniques. It explains concepts such as neural networks, recurrent neural networks (RNN), attention mechanisms, and machine translation, along with applications like text classification, summarization, and question answering engines. Additionally, it discusses specific models like LSTM and GRU, and provides practical examples for spam-ham classification and news summarization.

Uploaded by

pradeep191988
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views85 pages

Lesson 6 NLP With Machine Learning and Deep Learning

The document covers various aspects of Natural Language Processing (NLP) with a focus on machine learning and deep learning techniques. It explains concepts such as neural networks, recurrent neural networks (RNN), attention mechanisms, and machine translation, along with applications like text classification, summarization, and question answering engines. Additionally, it discusses specific models like LSTM and GRU, and provides practical examples for spam-ham classification and news summarization.

Uploaded by

pradeep191988
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Natural Language Processing

NLP with Machine Learning and Deep Learning


Learning Objectives

By the end of this lesson, you will be able to:

Explain Neural Network and Recurrent Neural Networks

Explain Neural Machine Translation

Define Text Classification and Text Summarization

Explain document clustering, attention mechanism, and question


answering engine

Demonstrate spam-ham classification using ML

Create a script summarizing a news by finding only important


information
Neural Networks and Recurrent Neural Networks (RNN)
Neural Network
Neural Networks used in Deep Learning, consists of different layers connected to each other and work on the
structure and functions of a human brain. It learns from huge volumes of data and uses complex algorithms to
train a neural net.

neuron

Output
(prediction)

Input
(features)
Hidden Layers
Feed Forward Neural Network
It is a type of neural network where every unit in a layer is connected with all the units in the previous layer

neuron

Output
(prediction)

Input
(features)
Hidden Layers
Disadvantages of Feed Forward Neural Network

Can not handel sequential data

Considers only the current input

Can not memorize previous inputs

Note: To over come these, RNN came in to the picture.


Recurrent Neural Networks (RNN)

Can handel sequential data

Considers current input and also previously recieved input

Can memorize previous inputs due to internal memory


Recurrent Neural Networks (RNN)

Recurrent Neural Network(RNN) are a type of Neural Network where the output from previous step are
fed as input to the current step.
The RNN Model

The RNN remembers the analysis done upto a given point by maintaining a state.

Data in Data out


RNN

Recurs new state


to itself

Note: You can think of the state as the memory of RNN which recurs into the net with
each new input.
RNN: Working

The first data point flows into the network as input data, denoted as x.

RNN
x Y

Weights are multiplied


by the previously Recurs new state
hidden state to itself

Previous state received by


the hidden units

Weight matrix between input


and hidden units
Introduction to Attention Mechanism
What Is Attention Mechanism?

Attention mechanism is used in machine translation.


1

It has bidirectional RNN layer which takes word input from any language (Ex. French) and
2 generate feature output for each word.

Another RNN is used after bidirectional RNN to generate the translations. This takes all features
3 from previous step bidirectional RNN and predicts word to target language.

It uses matrix where columns are input sentences and rows are the translated sentences.
4

Word prediction is done using features extracted from bidirectional RNN and based on relation
5 to previous word in translation RNN.
What Is Attention Mechanism?

Encoder Decoder

q Context
Annotations h1 … hTx Vector S1 … STy
ci

X1 … XTx Y1 … yTx

How are you


Attention Mechanism: Example

Encoder input

le chat est noir EOS

Encoder output
the

cat
Attended encoder
output is Decoder output

black

EOS
Long Short-Term Memory (LSTM)
The Problem of Vanishing Gradient with RNNs

The problem arises while updating weights in RNN. These weights connect the hidden layers to
themselves in the unrolled temporal loop.

t-3 t-2 t T+1

Wout Wout Wout Wout Wout

Wrec Wrec Wrec Wrec

Win Win Win Win Win

Note: When any figure is multiplied by a small number, its value decreases very
quickly.
LSTM Architecture

Decides what
to forget Decides what to
Bits of insert
memory
Gated Recurrent Unit (GRU)
GRU Architecture

Performs label predictions against random data.

Update
Reset Gate
Gate
LSTM vs. GRU

LSTM

▪ Tracks long-term dependencies while mitigating the vanishing


or exploding gradient problems. It does so via input, forget,
and output gates

▪ Controls the exposure of memory content

GRU

▪ Tracks long-term dependencies using a reset gate and an


update gate

▪ Exposes the entire cell state to other units in the network


Introduction to Machine Translation
What Is Machine Translation?

English
Machine translation is a process to translate the text or speech
from one language to other language.
1

It is a subfield of computational linguistics. Translation Algorithm


2

Translation process preserves the meaning of the input text.


3
Hindi
Types of Machine Translation

Rule-Based Machine Statistical Machine


Translation Translation
It uses a combination of It uses statistical translation
language and grammar models based on bilingual
rules with common words. text corpora.
Types of Machine Translation

Interlingua Direct MT
Source text converted into Word by word translation
abstract language-independent performed
representation and then
generated as the target language

Rule-Based
Machine
Translation

Transfer-Based MT
3 step process: Analysis,
Transfer, and Generation
Types of Machine Translation

• Uses statistical approach where data is the main part used for
translation

• Requires huge amount of data to train

• Neural machine translation approach is used in machine


translation that uses a large artificial neural network

Statistical
Machine
Translation Training Testing Model Building
Introduction to Neural Machine Translation
Neural Machine Translation

Encoder Decoder
Neural Machine Translation

NMT uses the Encoder Decoder structure for seq2seq model.

Encoder converts the source text into intermediate state, and this is
converted into target text by the decoder.

# Dgsh dsghsd
Very Sunny Encoder Decoder
Day # ghddsg

#
Neural Machine Translation

Encoder

Encoder embeds in English and passes the embedded text through multiple LSTM,
encodes it, and gives it to the decoder LSTM layer.
Neural Machine Translation

Decoder

• The Decoder takes text from the language which is to be translated into, embeds, and
matches the text with the encoded English text in LSTM layer.

• The softmax function in the last layer in decoder decides the output of the decoder in
translation.
Neural Machine Translation

Encoder and Decoder together as a sequence model:

Er liebte ze essen

Softmax

Encoder S Decoder

embed NULL Er liebte zu essen

He Loved to eat
seq2seq example

Music Generator

Random seed = 5 Sequence Model

Text Captioning
Neuron is a cell that can
Random seed =10 Sequence Model
Transmit electrical signals.
They can also refer to units
In ANN model…..

Image Generator

Random seed =15 Sequence Model


Components of Encoder Decoder Architecture
Components of Encoder-Decoder Architecture

Following are the components of Encoder-Decoder architecture:

RNN GRU

LSTM
Components of Encoder-Decoder Architecture

• Output from previous step are fed as input to the current


step
• Hidden state of RNN remembers some information about a
sequence
• Weights and losses are calculated in each layer
Recurrent
Neural
Network
(RNN)
Components of Encoder-Decoder Architecture

Here one hidden layer shares information with the next hidden layer to understand the relation
and information to produce the output.

Calculating current state: Calculating output:

ht = f(ht-1, Xt) yt = Whyht

ht is current state yt is output


ht-1 is previous state
Xt is input state Why is weight at output layer
Components of Encoder-Decoder Architecture

Type of RNN
1

Solves the problem of long-term dependencies of RNN


2

Cannot predict the word stored in the long-term memory


3

Long
Short-Term Has the ability to give more accurate predictions from the recent
4 information
Memory
(LSTM)
Components of Encoder-Decoder Architecture

• LSTM contains four neural networks which are arranged in a chain structure and have
different memory blocks called cells.

• Information is retained by the cells and the memory manipulations are done by the
gates.
Components of Encoder-Decoder Architecture

The following are the applications of LSTM:

Language Modeling

Question Answering Chatbots

Machine Translation

Handwriting Generation

Image Capturing
Components of Encoder-Decoder Architecture

Forget Gate:
Removes information
that is no longer useful
in the cell state
Components of Encoder-Decoder Architecture

Input Gate:
Adds some new
information in the cell
state
Components of Encoder-Decoder Architecture

Output Gate:
Selects useful
information from the
current cell state and
shows it as an output
Components of Encoder-Decoder Architecture

• GRU can be considered as a variation of the LSTM


because it is designed similarly

• It solves the vanishing gradient problem of a standard


RNN

• It uses two gates


Gated
Recurrent
Unit (GRU)
Components of Encoder-Decoder Architecture

Reset Gate:
Used by the model to
determine how much of
the past information to
forget
Components of Encoder-Decoder Architecture

Updated Gate:
Helps the model to
determine how much of the
past information needs to
be passed along to the
future
Text Classification
Text Classification: Introduction

Process of classifying a text into one or more defined category

Also known as text categorization or text tagging

Can be done two ways; manual, and automatic


Text Classification: Approach

Rule-Based System Machine Learning Based


System

Hybrid System
Text Classification: Approaches

Rule-Based System

• Classifies texts into groups by using predefined


handcrafted rules

• Can be improved over time, but it is time consuming

• Requires domain knowledge to build the rules

• Maintainability is tough
Text Classification: Approaches

• Classifies the text based on past observations.

• Text classification algorithms:


• Naive Bayes
• Deep Learning
• Support Vector Machine
Machine Learning Based
System

Feature Model
Training Tag
Extraction
Data
Text Classification: Approaches

Machine Learning-
Rule-Based System
Based System

Hybrid System
Text Classification: Example

Spam-Ham Detection INOX


SPAM

1. Email is the input data.

2. Once user receives the email, it is CLASSIFIER

classified as spam or ham. SPAM FOLDER

3. Spam is moved into junk folder.


Ham sits in the inbox. SPAM
Text Classification: Example

Language Detector
Example: Google Language Detector
Text Classification: Example

I want to purchase Laptop. Electronics


Query Classification for Information
Retrieval

Track my parcel. Logistics


Text Summarization
Text Summarization: Introduction

Text summarization is a technique to make the long piece of information, small and
concise. The subset of data identified represents the crux of all contents.

Following are the two approaches for automatic text summarization:

Extraction Abstraction
Text Summarization: Introduction

Extraction Abstraction

It gives you the semantic


It selects some portion from the representation of content and uses
original text like words and phrase. Natural Language Generation
technique to produce the final text.
Text Summarization: Working

Text summarization is considered as a supervised machine learning problem.

Key Phrase Extraction or


Doc Semantic Meaning
Extraction

Final Text Summarized


Formation Logic Text
Text Summarization: Example

Media Monitoring Social Media Marketing

Automated Content Question Answering


Creation and Chatbots
Document Clustering
Document Clustering: Introduction

Document clustering is the grouping of same kind of documents in one


or more cluster
1

The number of clusters can be defined, or it is automatically picked


2

It is an unsupervised process used to find structure in collection of


unlabeled data
3
Document Clustering: Working

The following are the steps for clustering:

1 Remove the stop words, perform stemming, and tokenization.

Apply TF-IDF on text preprocessed in first step. TF-IDF will give importance to
2
words occurring frequently in document but not that frequently in corpus.

Use clustering algorithm (k-means) on output of TF-IDF. Algorithm will have


3
centroids in each cluster, which are far apart from each other.

4 Assign centroid as a topic for the documents.


Document Clustering: Example

Documents are clustered into two and here it is seen as document from each cluster having
important keywords.

Source
Question Answering Engine
Question Answering Engine: Introduction

QnA engine is the system used to cater the responses for user’s query.

It was initially used for writing some rules but, when there is huge
data, writing rule is not beneficial.

The attention mechanism is helpful in QnA engine.

The Question Answering Model will use attention mechanism to generate


specific answer to question.

It is neural network-based system with a focus on required information.


Question Answering Engine: Layers

Encoder Attention Attention


RNN Scores Distribution

Embedding Layer RNN Layer

QnA Engine Layers

Attention
Output
Output Layer Attention Layer
Question Answering Engine: Layers

Embedding Layer Attention Layer

Context and corresponding We have a hidden vector each


questions lie within the training for question and context. We
dataset for the model that can need to look at them together
be broken into individual in order to figure out the
words. Then, these words can answer. This is where attention
be converted into word comes in. It decides, given the
embeddings using the question, which words in the
pretrained vectors like GloVe. context should it “attend” to.
Question Answering Engine: Layers

RNN Layer Output Layer

Softmax output layer is the


final layer of the model that
A bidirectional GRU or LSTM helps to choose the start and
helps to be aware of words end index for the answer span.
before and after it. It combines the context hidden
states and the attention vector
from the previous layer to
create a blended response.
Question Answering Engine: Example

HELLO ! I’M CHATBOT.


CAN I HELP YOU?

HELLO CHATBOT!
YES PLEASE :)

Source
Spam-Ham Classification using Machine Learning

Problem Statement: Nowadays, digital contents have increased and people are moving toward
internet. They communicate and share the information using emails. Companies target the audience to
perform campaign and sell their products. Some people use this medium to commit fraud by sending
false information through email. The task is to create a ML-based model to classify email text into spam
and ham.

Access: Click on the Practice Labs tab on the left side panel of the LMS. Copy or note the
username and password that is generated. Click on the Launch Lab button. On the page that
appears, enter the username and password in the respective fields, and click Login.
Summarization of News

Problem Statement: Sometime during news reading, we must face very long news but due to lack of
time, a person only focuses on certain important points. Create a script to summarize a news by finding
only important information.

Access: Click on the Practice Labs tab on the left side panel of the LMS. Copy or note the
username and password that is generated. Click on the Launch Lab button. On the page that
appears, enter the username and password in the respective fields, and click Login.
Document Clustering for BBC News

Objective: To perform document clustering in order to assign similar


articles under a single cluster label.
Problem Scenario: You are given a zip file which contains summaries of
news from BBC. The data is taken from Kaggle.
(https://www.kaggle.com/pariza/bbc-news-summary).
The zip file contains a folder: BBC News Articles. This folder contains 5
sub-folders, named:
• Business
• Entertainment
• Politics
• Sports
• Tech
Each of these sub-folders contains text files which have summaries of
different news articles.
Key Takeaways

You are now able to:

Explain Neural Machine Translation

Define Text Classification and Text Summarization

Explain document clustering, attention mechanism, and question


answering engine

Demonstrate spam-ham classification using ML

Create a script to summarize a news y finding only important


information
Knowledge Check
Knowledge
Check Which activation function is suitable for multiclass text classification in output layer?

a. Sigmoid

b. Relu

c. Softmax

d. Linear
Knowledge
Check Which activation function is suitable for multiclass text classification in output layer?

a. Sigmoid

b. Relu

c. Softmax

d. Linear

The correct answer is c.


Softmax is suitable for multiclass text classification in output layer.
Knowledge
Check What are the applications of sequence-to-sequence generator?

a. Machine Translation

b. Question Answering

c. Automatic Text Generation

d. All of the above


Knowledge
Check What are the applications of sequence-to-sequence generator?

a. Machine Translation

b. Question Answering

c. Automatic Text Generation

d. All of the above

The correct answer is d.


Machine Translation, Question Answering, and Automatic Text Generation are the applications of
sequence-to-sequence generator.
Knowledge
Check Information to forget is decided by ___________ in GRU.

a. Forget Gate

b. Update Gate

c. Reset Gate

d. Output Gate
Knowledge
Check Information to forget is decided by ___________ in GRU.

a. Forget Gate

b. Update Gate

c. Reset Gate

d. Output Gate

The correct answer is c.


In GRU, information to forget is decided by reset gate.
Knowledge
Check Vanishing Gradient is problematic in ____________.

a. RNN

b. LSTM

c. GRU

d. None of the above


Knowledge
Check Vanishing Gradient is problematic in ____________.

a. RNN

b. LSTM

c. GRU

d. None of the above

The correct answer is a.


Vanishing gradient is problematic in RNN.
Knowledge
Check Which layer understands the answer for questions asked in Question and Answering?

a. RNN Layer

b. Embedding Layer

c. Attention Layer

d. None of the above


Knowledge
Check Which layer understands the answer for questions asked in Question and Answering?

a. RNN Layer

b. Embedding Layer

c. Attention Layer

d. None of the above

The correct answer is c.


Attention layer understands the answer for questions asked in Question and Answering.

You might also like