Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views34 pages

1.1chap NLP - Introduction

Uploaded by

vishalmishra0427
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views34 pages

1.1chap NLP - Introduction

Uploaded by

vishalmishra0427
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Natural Language Processing

• Natural Language Processing (NLP) refers to AI method of

communicating with an intelligent systems using a natural language

such as English.

• Processing of Natural Language is required when you want an

intelligent system or dialogue based clinical expert system, etc.

• The field of NLP involves making computers to perform useful tasks

with the natural languages humans use. The input and output of an

NLP system can be −

• 1. Speech 2. Written Text


Forms of Natural Language
• The input/output of a NLP system can be:
– written text

– speech

• We will mostly concerned with written text (not speech).


• To process written text, we need:
– lexical, syntactic, semantic knowledge about the language

– discourse information, real world knowledge

• To process spoken language, we need everything required to process written


text, plus the challenges of speech recognition and speech synthesis.

2
NLP - an inter-disciplinary Field
• NLP borrows techniques and insights from several disciplines.
• Linguistics: How do words form phrases and sentences? What constraints the
possible meaning for a sentence?
• Computational Linguistics: How is the structure of sentences are identified?
How can knowledge and reasoning be modeled?
• Computer Science: Algorithms for automatons, parsers.
• Engineering: Stochastic techniques for ambiguity resolution.
• Psychology: What linguistic constructions are easy or difficult for people to
learn to use?
• Philosophy: What is the meaning, and how do words and sentences acquire it?
3
Why NL Understanding is hard?
• Natural language is extremely rich in form and structure, and very
ambiguous.
– How to represent meaning,
– Which structures map to which meaning structures.

• One input can mean many different things. Ambiguity can be at different
levels.
– Lexical (word level) ambiguity -- different meanings of words
– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the meaning of that sentence.

• Many input can mean the same thing.


• Interaction among components of the input is not clear. 4
Knowledge of Language
• Phonology – concerns how words are related to the sounds that realize

them.

• Morphology – concerns how words are constructed from more basic

meaning units called morphemes. A morpheme is the primitive unit of

meaning in a language.

• Syntax – concerns how can be put together to form correct sentences and

determines what structural role each word plays in the sentence and what

phrases are subparts of other phrases.

• Semantics – concerns what words mean and how these meaning combine in

sentences to form sentence meaning. The study of context-independent


6
Knowledge of Language (cont.)
• Pragmatics – concerns how sentences are used in different
situations and how use affects the interpretation of the sentence.
• Discourse – concerns how the immediately preceding sentences
affect the interpretation of the next sentence. For example,
interpreting pronouns and interpreting the temporal aspects of the
information.
• World Knowledge – includes general knowledge about the world.
What each language user must know about the other’s beliefs and
goals.

7
Components of NLP

• Natural Language Understanding (NLU)


Understanding involves the following tasks
– Mapping the given input in natural language into useful
representations.
– Analyzing different aspects of the language.
Natural Language Generation (NLG)
• It is the process of producing meaningful phrases and sentences in the

form of natural language from some internal representation.

It involves −

• Text planning − It includes retrieving the relevant content from

knowledge base.

• Sentence planning − It includes choosing required words, forming

meaningful phrases, setting tone of the sentence.

• Text Realization − It is mapping sentence plan into sentence structure.

• The NLU is harder than NLG.


• Natural Language Understanding
– Mapping the given input in the natural language into a useful
representation.
– Different level of analysis required:

morphological analysis,
syntactic analysis,
semantic analysis,
discourse analysis, …
• Natural Language Generation
– Producing output in the natural language from some internal
representation.
– Different level of synthesis required:

deep planning , syntactic generation 10


Natural language understanding
Raw speech signal
• Speech recognition

Sequence of words spoken


• Syntactic analysis using knowledge of the grammar

Structure of the sentence


• Semantic analysis using info. about meaning of words

Partial representation of meaning of sentence


• Pragmatic analysis using info. about context

Final representation of meaning of sentence


Natural Language Understanding
• Input/Output data Processing stage Other data used

Frequency spectrogram freq. of diff.


speech recognition sounds
Word sequence grammar of
“He loves Mary” syntactic analysis language

Sentence structure meanings of


semantic analysis words
He loves Mary
Partial Meaning context of
Ξx loves(x,mary) pragmatics utterance

Sentence meaning
loves(john,mary)
Difficulties in NLU
• NL has an extremely rich form and structure. It is very ambiguous. There

can be different levels of ambiguity −

• Lexical ambiguity − It is at very primitive level such as word-level.

• For example, treating the word “board” as noun or verb?

• Syntax Level ambiguity − A sentence can be parsed in different ways.

• For example, “He lifted the beetle with red cap.” − Did he use cap to lift the

beetle or he lifted a beetle that had red cap?

• Referential ambiguity − Referring to something using pronouns. For

example, Rima went to Gauri. She said, “I am tired.” − Exactly who is tired?

• One input can mean different meanings.

• Many inputs can mean the same thing.


Ambiguity is pervasive
Find at least 5 meanings of this sentence:
I made her duck
– I cooked duck for her
– I cooked duck belonging to her
– I created the (artificial) duck, she owns
– I caused her to quickly lower her head or body
– I waved my magic wand and turned her into a duck

Duck’ can be a noun or verb


‘her’ can be a possessive (‘of her’) or dative (‘for
her’) pronoun
Steps in NLP
• There are general five steps −
• Lexical Analysis − Lexical analysis is a vocabulary that includes its words
and expressions. It depicts analyzing, identifying and description of the
structure of words. It includes dividing a text into paragraphs, words and
the sentences. Individual words are analyzed into their components, and
non-word tokens such as punctuations are separated from the words.
• Syntactic Analysis (Parsing) − The syntax refers to the principles and
rules that govern the sentence structure of any individual languages. It
involves analysis of words in the sentence for grammar and arranging
words in a manner that shows the relationship among the words. The
sentence such as “The school goes to boy” is rejected by English syntactic
analyzer.
• Semantic Analysis Semantic Analysis is a structure created by the

syntactic analyzer which assigns meanings. This component transfers

linear sequences of words into structures. It shows how the words are

associated with each other. Semantics focuses only on the literal

meaning of words, phrases, and sentences. This only abstracts the

dictionary meaning or the real meaning from the given context. The

structures assigned by the syntactic analyzer always have assigned

meaning

• E.g.. "colorless green idea." This would be rejected by the Symantec

analysis as colorless Here; green doesn't make any sense. The

semantic analyzer disregards sentence such as “hot ice-cream”.


• Discourse Integration − The meaning of any sentence

depends upon the meaning of the sentence just before it. In

addition, it also brings about the meaning of immediately

succeeding sentence.

• Pragmatic Analysis − During this process, what was said

is re-interpreted on what it actually meant. It involves

deriving those aspects of language which require real

world knowledge.
Natural Language vs. Computer Language

Parameter Natural Language Computer Language

They are ambiguous in They are designed to


Ambiguous nature. unambiguous.

Natural languages employ Formal languages


Redundancy lots of redundancy. are less redundant.

Natural languages are Formal languages


Literalness made of idiom & mean exactly what
metaphor they want to say
Advantages of NLP
• Users can ask questions about any subject and get a direct response within

seconds.

• NLP system provides answers to the questions in natural language

• NLP system offers exact answers to the questions, no unnecessary or

unwanted information

• The accuracy of the answers increases with the amount of relevant information

provided in the question.

• NLP process helps computers communicate with humans in their language and

scales other language-related tasks

• Structuring a highly unstructured data source


Disadvantages of NLP
• Complex Query Language- the system may not be able to
provide the correct answer it the question that is poorly
worded or ambiguous.
• The system is built for a single and specific task only; it is
unable to adapt to new domains and problems because of
limited functions.
• NLP system doesn't have a user interface which lacks
features that allow users to further interact with the
system
NLP Applications
Two main areas:
1. Massive management of textual information
sources:
For human use
For automatic collection of linguistic resources
2 Person/Machine interaction
NLP Applications
Massive management of textual information
sources
– Machine Translation (MT)
– Information Retrieval (IR)
– Question Answering (Q&A)
– Information Extraction (IE)
– Summarization
Machine Translation
Process of translating a text from a source
language to a target language preserving some
properties
– The main property to preserve (but not the only
one) is the meaning
– MT textual vs oral
– Different degrees of human intervention
Machine Translation
Information Retrieval
Input A collection of documents
– The Web
– A corporate document collection
...
A user need represented as a query

Output
– The documents of the collection that satisfy the
user needs.
Information Retrieval
Question Answering

• Natural extension of IR
• A QA system receives a query expressed in NL and
tries to provide not a document containing the
answer but the proper answer (usually a fact).
• QA systems need to use NLP techniques for both
processing the question and looking for the answer.
Question Answering
Automatic Summarization
• A summary is a reductive transformation of a source text into a

summary text by extraction or generation

• Look for the relevant parts of a document and produce a summary of

them

Summarization vs Information Extraction

- Information Extraction

What has to be extracted is defined a priori “I am interested on this,

look for it”

- Summarization

An a priori definition of what is relevant is not always defined


Automatic Summarization
Information Extraction
• Extracting useful information from free text
• Named Entity Recognition (NER)
• Named Entity Classification (NEC)
• Both tasks together (NERC)
• Slot Filling
• Relation Extraction
Natural Language Processing Challenges

• Contextual words and phrases and homonyms


• Synonyms
• Irony and sarcasm
• Ambiguity
• Errors in text or speech
• Colloquialisms and slang
• Domain-specific language
• Low-resource languages
• Lack of research and development
Sentiment Analysis
• sentiment analysis is used to identify the sentiments among several

posts.

• Companies are using sentiment analysis, to identify the opinion and

sentiment of their customers online

• It will help companies to understand what their customers think

about the products and services.

• beyond determining simple polarity, sentiment analysis understands

sentiments in context to better understand what is behind the

expressed opinion.

You might also like