Natural language processing (NLP)
Natural language processing (NLP) refers to the branch of computer science, and more
specifically, the branch of artificial intelligence (AI), that is concerned with giving computers
the ability to understand text and spoken words in much the same way human beings can.
NLP is used in a wide variety of everyday products and services. Some of the most common
ways NLP is used are through voice-activated digital assistants on smartphones, email-
scanning programs used to identify spam, and translation apps that decipher foreign languages.
NLP is used in a wide range of applications, including machine translation, sentiment analysis,
speech recognition, online chatbots, and text classification. Some common techniques used in
NLP include:
1. Tokenization: the process of breaking text into individual words or phrases.
2. Part-of-speech tagging: the process of labelling each word in a sentence with its
grammatical part of speech.
3. Named entity recognition: the process of identifying and categorizing named entities,
such as people, places, and organizations, in text.
4. Sentiment analysis: the process of determining the sentiment of a piece of text, such as
whether it is positive, negative, or neutral.
5. Machine translation: the process of automatically translating text from one language to
another.
6. Text classification: the process of categorizing text into predefined categories or topics.
Recent advances in deep learning, particularly in the area of neural networks, have led to
significant improvements in the performance of NLP systems. Deep learning techniques such
as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been
applied to tasks such as sentiment analysis and machine translation, achieving state-of-the-art
results.
Common Natural Language Processing (NLP) Task:
• Text and speech processing: This includes Speech recognition, text-&-speech
processing, encoding (i.e., converting speech or text to machine-readable language) etc.
• Text classification: This includes Sentiment Analysis in which the machine can
analyze the qualities, emotions, and sarcasm from text and also classify it accordingly.
• Language generation: This includes tasks such as machine translation, summary
writing, essay writing, etc. which aim to produce coherent and fluent text.
• Language interaction: This includes tasks such as dialogue systems, voice assistants,
and chatbots, which aim to enable natural communication between humans and
computers.
NLP techniques are widely used in a variety of applications such as search engines, machine
translation, sentiment analysis, text summarization, question answering, and many more. NLP
research is an active field and recent advancements in deep learning have led to significant
improvements in NLP performance. However, NLP is still a challenging field as it requires an
understanding of both computational and linguistic principles.
Working of Natural Language Processing (NLP)
Working in natural language processing (NLP) typically involves using computational
techniques to analyze and understand human language. This can include tasks such as language
understanding, language generation, and language interaction.
The field is divided into three different parts:
1. Speech Recognition: The translation of spoken language into text.
2. Natural Language Understanding (NLU): The computer’s ability to understand what
we speak.
3. Natural Language Generation (NLG): The generation of natural language by a
computer.
NLU and NLG are the key aspects depicting the working of NLP devices. These 2 aspects are
very different from each other and are achieved using different methods.
Individuals working in NLP may have a background in computer science, linguistics, or a
related field. They may also have experience with programming languages such as Python, and
C++ and be familiar with various NLP libraries and frameworks such as NLTK, spaCy, and
OpenNLP.
Speech Recognition:
• First, the computer must take natural language and convert it into machine-readable
language. This is what speech recognition or speech-to-text does. This is the first step
of NLU.
• Hidden Markov Models (HMMs) are used in the majority of voice recognition systems
nowadays. These are statistical models that use mathematical calculations to determine
what you said in order to convert your speech to text.
• HMMs do this by listening to you talk, breaking it down into small units (typically 10-
20 milliseconds), and comparing it to pre-recorded speech to figure out which phoneme
you uttered in each unit (a phoneme is the smallest unit of speech). The program then
examines the sequence of phonemes and uses statistical analysis to determine the most
likely words and sentences you were speaking.
Natural Language Understanding (NLU):
The next and hardest step of NLP is the understanding part.
• First, the computer must comprehend the meaning of each word. It tries to figure out
whether the word is a noun or a verb, whether it’s in the past or present tense, and so
on. This is called Part-of-Speech tagging (POS).
• A lexicon (a vocabulary) and a set of grammatical rules are also built into NLP systems.
The most difficult part of NLP is understanding.
• The machine should be able to grasp what you said by the conclusion of the process.
There are several challenges in accomplishing this when considering problems such as
words having several meanings (polysemy) or different words having similar meanings
(synonymy), but developers encode rules into their NLU systems and train them to learn
to apply the rules correctly.
Natural Language Generation (NLG):
NLG is much simpler to accomplish. NLG converts a computer’s machine-readable language
into text and can also convert that text into audible speech using text-to-speech technology.
• First, the NLP system identifies what data should be converted to text. If you asked the
computer a question about the weather, it most likely did an online search to find your
answer, and from there it decides that the temperature, wind, and humidity are the
factors that should be read aloud to you.
• Then, it organizes the structure of how it’s going to say it. This is similar to NLU except
backward. NLG system can construct full sentences using a lexicon and a set of
grammar rules.
• Finally, text-to-speech takes over. The text-to-speech engine uses a prosody model to
evaluate the text and identify breaks, duration, and pitch. The engine then combines all
the recorded phonemes into one cohesive string of speech using a speech database.
Some common roles in Natural Language Processing (NLP) include:
• NLP engineer: designing and implementing NLP systems and models.
• NLP researcher: conducting research on NLP techniques and algorithms.
• ML engineer: Designing and deployment of various machine learning models
including NLP.
• NLP data scientist: analyzing and interpreting NLP data
• NLP consultant: providing expertise in NLP to organizations and businesses.
Working in NLP can be both challenging and rewarding as it requires a good understanding
of both computational and linguistic principles. NLP is a fast-paced and rapidly changing
field, so it is important for individuals working in NLP to stay up to date with the latest
developments and advancements.
Technologies related to Natural Language Processing
There are a variety of technologies related to natural language processing (NLP) that are used
to analyze and understand human language. Some of the most common include:
1. Machine learning: NLP relies heavily on machine learning techniques such as
supervised and unsupervised learning, deep learning, and reinforcement learning to
train models to understand and generate human language.
2. Natural Language Toolkits (NLTK) and other libraries: NLTK is a popular open-
source library in Python that provides tools for NLP tasks such as tokenization,
stemming, and part-of-speech tagging. Other popular libraries include spaCy,
OpenNLP, and CoreNLP.
3. Parsers: Parsers are used to analyze the syntactic structure of sentences, such as
dependency parsing and constituency parsing.
4. Text-to-Speech (TTS) and Speech-to-Text (STT) systems: TTS systems convert
written text into spoken words, while STT systems convert spoken words into written
text.
5. Named Entity Recognition (NER) systems: NER systems identify and extract named
entities such as people, places, and organizations from the text.
6. Sentiment Analysis: A technique to understand the emotions or opinions expressed in
a piece of text, by using various techniques like Lexicon-Based, Machine Learning-
Based, and Deep Learning-based methods
7. Machine Translation: NLP is used for language translation from one language to
another through a computer.
8. Chatbots: NLP is used for chatbots that communicate with other chatbots or humans
through auditory or textual methods.
9. AI Software: NLP is used in question-answering software for knowledge
representation, analytical reasoning as well as information retrieval.
Applications of Natural Language Processing (NLP):
• Spam Filters: One of the most irritating things about email is spam. Gmail uses natural
language processing (NLP) to discern which emails are legitimate and which are spam.
These spam filters look at the text in all the emails you receive and try to figure out
what it means to see if it’s spam or not.
• Algorithmic Trading: Algorithmic trading is used for predicting stock market
conditions. Using NLP, this technology examines news headlines about companies and
stocks and attempts to comprehend their meaning in order to determine if you should
buy, sell, or hold certain stocks.
• Questions Answering: NLP can be seen in action by using Google Search or Siri
Services. A major use of NLP is to make search engines understand the meaning of
what we are asking and generate natural language in return to give us the answers.
• Summarizing Information: On the internet, there is a lot of information, and a lot of
it comes in the form of long documents or articles. NLP is used to decipher the meaning
of the data and then provides shorter summaries of the data so that humans can
comprehend it more quickly.
Future Scope:
• Bots: Chatbots assist clients to get to the point quickly by answering inquiries and
referring them to relevant resources and products at any time of day or night. To be
effective, chatbots must be fast, smart, and easy to use, To accomplish this, chatbots
employ NLP to understand language, usually over text or voice-recognition interactions
• Supporting Invisible UI: Almost every connection we have with machines involves
human communication, both spoken and written. Amazon’s Echo is only one
illustration of the trend toward putting humans in closer contact with technology in the
future. The concept of an invisible or zero user interface will rely on direct
communication between the user and the machine, whether by voice, text, or a
combination of the two. NLP helps to make this concept a real-world thing.
• Smarter Search: NLP’s future also includes improved search, something we’ve been
discussing at Expert System for a long time. Smarter search allows a chatbot to
understand a customer’s request can enable “search like you talk” functionality (much
like you could query Siri) rather than focusing on keywords or topics. Google recently
announced that NLP capabilities have been added to Google Drive, allowing users to
search for documents and content using natural language.
Components of NLP
There are two components of NLP as given −
Natural Language Understanding (NLU)
Understanding involves the following tasks −
• Mapping the given input in natural language into useful representations.
• Analyzing different aspects of the language.
Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the form of natural
language from some internal representation.
It involves −
• Text planning − It includes retrieving the relevant content from knowledge base.
• Sentence planning − It includes choosing required words, forming meaningful
phrases, setting tone of the sentence.
• Text Realization − It is mapping sentence plan into sentence structure.
The NLU is harder than NLG.
Difficulties in NLU
NL has an extremely rich form and structure.
It is very ambiguous. There can be different levels of ambiguity −
• Lexical ambiguity − It is at very primitive level such as word-level.
• For example, treating the word “board” as noun or verb?
• Syntax Level ambiguity − A sentence can be parsed in different ways.
• For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle or
he lifted a beetle that had red cap?
• Referential ambiguity − Referring to something using pronouns. For example, Rima
went to Gauri. She said, “I am tired.” − Exactly who is tired?
• One input can mean different meanings.
• Many inputs can mean the same thing.
NLP Terminology
• Phonology − It is study of organizing sound systematically.
• Morphology − It is a study of construction of words from primitive meaningful units.
• Morpheme − It is primitive unit of meaning in a language.
• Syntax − It refers to arranging words to make a sentence. It also involves determining
the structural role of words in the sentence and in phrases.
• Semantics − It is concerned with the meaning of words and how to combine words
into meaningful phrases and sentences.
• Pragmatics − It deals with using and understanding sentences in different situations
and how the interpretation of the sentence is affected.
• Discourse − It deals with how the immediately preceding sentence can affect the
interpretation of the next sentence.
• World Knowledge − It includes the general knowledge about the world.
Steps in NLP
There is general five steps −
• Lexical Analysis − It involves identifying and analyzing the structure of words.
Lexicon of a language means the collection of words and phrases in a language.
Lexical analysis is dividing the whole chunk of txt into paragraphs, sentences, and
words.
• Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for
grammar and arranging words in a manner that shows the relationship among the
words. The sentence such as “The school goes to boy” is rejected by English syntactic
analyzer.
• Semantic Analysis − It draws the exact meaning or the dictionary meaning from the
text. The text is checked for meaningfulness. It is done by mapping syntactic
structures and objects in the task domain. The semantic analyzer disregards sentence
such as “hot ice-cream”.
• Discourse Integration − The meaning of any sentence depends upon the meaning of
the sentence just before it. In addition, it also brings about the meaning of
immediately succeeding sentence.
• Pragmatic Analysis − During this, what was said is re-interpreted on what it actually
meant. It involves deriving those aspects of language which require real world
knowledge.
Implementation Aspects of Syntactic Analysis
There are a number of algorithms researchers have developed for syntactic analysis, but we
consider only the following simple methods −
• Context-Free Grammar
• Top-Down Parser
Let us see them in detail −
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite
rules. Let us create grammar to parse a sentence −
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily
understand and process it. In order for the parsing algorithm to construct this parse tree, a set
of rewrite rules, which describe what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other
symbols. According to first order logic rule, if there are two strings Noun Phrase (NP) and
Verb Phrase (VP), then the string combined by NP followed by VP is a sentence. The rewrite
rules for the sentence are as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown −
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks",
sentences such as "The bird peck the grains" can be wrongly permitted. i. e. the subject-verb
agreement error is approved as correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
• They are not highly precise. For example, “The grains peck the bird”, is a
syntactically correct according to parser, but even if it makes no sense, parser takes it
as a correct sentence.
• To bring out high precision, multiple sets of grammar need to be prepared. It may
require a completely different sets of rules for parsing singular and plural variations,
passive sentences, etc., which can lead to creation of huge set of rules that are
unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal
symbols that matches the classes of the words in the input sentence until it consists entirely of
terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is
started over again with a different set of rules. This is repeated until a specific rule is found
which describes the structure of the sentence.
Merit − It is simple to implement.
Demerits −
• It is inefficient, as the search process must be repeated if an error occurs.
• Slow speed of working.