General
Complete
Natural
Language
Processing
NLP
Masterclass
Pipeline
This section would allow you to
conceptualise the overall process from
using raw text to the point where it can
be fed to a machine learning model.
For instance; take Amazon Text
Reviews to a point where we can
create a machine learning model can
tell us if it is a positive or negative
review.
Course Flow OVERVIEW //NIDIA
Text Word
Text
Wrangling & Embedding
Normalization
Preprocessing
Train Initialise Build
Model Model Model
Test Evaluate
Model Model
Text Complete
Natural
Language
Pre-
Processing
Masterclass
processing
Most of the time data in the real world,
not limited to NLP but for data science
in general - data can be messy. In our
case, text data can be highly
unstructured. While learning NLP, most
times the datasets require
preprocessing - but not to the extent to
if you were actually creating your own
dataset from scratch. Nevertheless, I
will ensure that you get good practice
with cleaning some very dirty data.
Clean the mess REALITY //NIDIA
Pre- Complete
Natural
Language
processing
Processing
Masterclass
Tweets
Extract hashtags
Clean URLS
Mentions
Emojis
Smileys
Remove digits
Punctuations
Stop words - dont add much
meaning - a, an, in, this, it, at, the
#Coachella TWEETS //NIDIA
Text Complete
Natural
Language
Normalization Processing
Masterclass
in computer science, canonical means
a standard state or behaviour of an
attribute - we are putting the text into a
structure that conforms to well-
established patterns.
Tokenization
Stemming
Lemmatization
Sentence Segmentation
Spell correction
standard state CONVERTING TEXT INTO SINGLE //NIDIA
CANONICAL FORM
Word Complete
Natural
Language
Embeddings Processing
Masterclass
The aim is to encode (normalized)
words into a vector that exists at some
position in “word space”.
Essentialy, we are representing a
vocabulary in a vector space.
mathematics principles Cosine and
Euclidean Distance.
This is where the magic starts. The
similarities - how do we know that king
is associated with prince, but not as
much with jellyfish?
Embedding layers allow the algorithm
to figure out:
man is to king
EXAMPLE: WORD2VEC, GLOVE //NIDIA
woman is to ________
Space - dimensions, positions.
Closeness, distance.
Build Model Complete
Natural
Language
Processing
Machine learning involves models; Masterclass
having its own ML pipeline - which
operate with a series of mathematical
executions, to learn from data, in order
to estimate the output on unseen data.
Example: the dataset can have 1000
reviews labelled positive or negative.
The model can predict if an unseen
review is negative or positive.
Depending on the type of data, the
accuracy will vary depending on the
type of model.
Deep learning - Recurrent Neural
Networks, LSTMs.
ML pipeline EXAMPLE: LSTM //NIDIA
Transfer Complete
Natural
Language
Learning Processing
Masterclass
If my dataset consists of Netflix movie
titles. The movie titles make up my
vocabulary. But what if this is not
enough to train my model on - Transfer
learning allows the model to use
already learned knowledge from a
gigantic dataset - this knowledge is
transferred from a related task to this
new dataset.
Initialising Model PRE-TRAINED - USING PREVIOUS //NIDIA
KNOWLEDGE
Train Model Complete
Natural
Language
Processing
Masterclass
Think of this as - using the dataset, with
the new cleaned data; that was
converted into a machine readable
integer, structured form --> take this
data, think of it as knowledge; we are
taking this knowledge to iterate, the
way we would study words repeatedly
in order to learn it --> training the model
is simply teaching the
machine/teaching the computer.
teaching the computer CHOOSE MODEL //NIDIA
Test & Complete
Natural
Language
Evaluate Processing
Masterclass
Predict the output, then compare it to
the dataset with the labelled data.
Then evaluate it on unseen data.
Amazon review: "I love this book, it's
better than I expected & the shipping
was really fast!"
Performance PRE-TRAINED - USING PREVIOUS //NIDIA
KNOWLEDGE