The Dream
It’d be great if machines could
Natural Language Processing Process our email (usefully)
Translate languages accurately
Introduction Help us manage, summarize,
and aggregate information
Use speech as a UI (when
needed)
Talk to us / listen to us
But they can’t:
Language is complex,
Dr.Pooja Jain ambiguous, flexible, and subtle
Good solutions need linguistics
and machine learning
IIIT Nagpur knowledge
What is Natural Language
Natural Language Processing
Processing
• Branch of computer science focused on • Natural Language Processing (NLP) is the study
developing systems that allow computers to of the computational treatment of natural
communicate with people using everyday (human) language.
language. • Teaching computers how to understand (and
• Also called Computational Linguistics generate) human language.
– Also concerns how computational methods can • Decipher meaning out of text available on the
aid the understanding of human language web.
3 4
1
What is NLP? What is Natural Language Processing?
• The study of human languages and how they
can be represented computationally and
analyzed and generated algorithmically
– The cat is on the mat. --> on (mat, cat)
– on (mat, cat) --> The cat is on the mat.
Fundamental goal: deep understanding of broad language
Not just string processing or keyword matching! • Studying NLP involves studying natural
End systems that we want to build: language, formal representations, and
Simple: spelling correction, text categorization…
Complex: speech recognition, machine translation, information algorithms for their manipulation
extraction, dialog interfaces, question answering…
Unknown: human-level comprehension
Modern Applications Perpectivising NLP: Areas of AI and
their inter-dependencies
• Search engines (Google, Yahoo!, Bing, Baidu)
Knowledge
• Question answering (IBM’s Watson) Search Logic Representation
• Natural language assistants (Apple’s Siri)
• Translation systems (Google Translate) Machine
• News digest (Yahoo!) Planning
Learning
• Automatic earthquake reports (LA Times)
Expert
NLP Vision Robotics Systems
7
AI is the forcing function for Computer Science
2
What is NLP Engineering Perspective
Use NLP as part of a larger application:
• Branch of AI – Spoken dialogue systems for telephone based information
systems
• 2 Goals
– Components of web search engines or document retrieval
– Science Goal: Understand the way language services
operates • Machine translation
• Question/answering systems
– Engineering Goal: Build systems that analyse and • Text Summarization
generate language; reduce the man machine gap – Interface for intelligent tutoring/training systems
Emphasis on
– Robustness (doesn’t collapse on unexpected input)
– Coverage (does something useful with most inputs)
– Efficiency (speech; large document collections)
Cognitive Science Perspective Language as Goal-Oriented Behaviour
• We speak for a reason, e.g.,
Goal: gain an understanding of how people – get hearer to believe something
comprehend and produce language. – get hearer to perform some action
– impress hearer
Goal: a model that explains actual human • Language generators must determine how to use
linguistic strategies to achieve desired effects
behaviour
• Language understanders must use linguistic
Solution must: knowledge to recognise speaker’s underlying
explain psycholinguistic data be verified by
purpose
experimentation
11
3
Examples Input and Output in NLP
(1) It’s hot in here, isn’t it? • The field of NLP involves making computers to
perform useful tasks with the natural
(2) Can you book me a flight to Delhi tomorrow languages humans use. The input and output
morning? of an NLP system can be −
• Speech
(3) P: What time does the train for Nagpur leave?
• Written Text
C: 6:00 from platform number 2
14
Components of NLP
NLU
• Natural Language Understanding (NLU) • Mapping the given input in natural language
• Natural Language Generation (NLG) into useful representations.
• Analyzing different aspects of the language.
15 16
4
Natural Language Generation (NLG)
Text book
• It is the process of producing meaningful
phrases and sentences in the form of natural • Speech and Language Processing ,An
language from some internal representation. Introduction to Natural Language Processing,
• It involves − Computational Linguistics, and Speech
– Text planning − It includes retrieving the relevant
Recognition by Daniel Jurafsky and James H.
content from knowledge base. Martin
– Sentence planning − It includes choosing required
words, forming meaningful phrases, setting tone
of the sentence.
– Text Realization − It is mapping sentence plan into
sentence structure.
17 18