Natural Language
Processing (CSAI 8003)
Solomon Teferra Abate (PhD) and
Martha Yifiru Tachbelie (PhD)
For
PhD in Artificial Intelligence
@
ASTU and EAII
Description
This course will cover
1) fundamentals of NLP
2) statistical methods used for NLP tasks
3) fundamentals of neural networks and deep learning
4) application of deep learning techniques to NLP tasks
Introduction
Let us analyze the conversation between Dave
and HAL the actors of A Space Odyssey
Dave Bowman: Open the pod bay doors, HAL.
HAL: I’m sorry Dave, I’m afraid I can’t do that.
HAL is capable of interacting with human via
natural language
Scope
Multimedia and Multimodal
Technology
Speech Text
Technology Technology
Language Technology
Knowledge Technology
4
Levels of Language Processing
Phonetics and Phonology — The study of
linguistic sounds
Morphology —The study of the meaningful
components of words
Syntax —The study of the structural relationships
between words
Semantics — The study of meaning
Pragmatics — The study of how language is used
to accomplish goals
Discourse—The study of linguistic units larger
than a single utterance
Major Tasks in NSLP
• Speech recognition • Information retrieval (IR)
• Text-to-speech • Query expansion
• Speech segmentation • Natural language search
• Optical character recognition • Automatic summarization
• Truecasing • Natural language generation
• Morphological segmentation • Text simplification
• Stemming • Text-proofing
• Part-of-speech tagging • Topic segmentation and recognition
• Word segmentation • Coreference resolution
• Sentence breaking • Relationship extraction
• Parsing • Sentiment analysis
• Language Modeling • Automated essay scoring
• Word sense disambiguation • Natural language understanding
• Named entity recognition • Discourse analysis
• Question answering • Machine translation
• Information extraction (IE)
NLP from the different perspectives
Engineering:
How to build a system?
How to select a suitable approach/tool/data source?
How to combine different approaches/tools/data
source?
How to optimize the performance with respect to
quality and resource requirements?
Science:
Why an approach/tool/data source works/fails?
Why an approach/tool/data source A works better
than B?
Approaches to NLP
Rule Based (Hand Crafted Rules)
Develop the rules to process different types of natural language data
based on known facts, rules and exceptions cases.
Machine Learning
Capture patterns from examples (corpus which is annotated or
otherwise) and apply on new instances
Supervised: learn by comparing with expected output
Unsupervised: blind learning. Create knowledge by association
rather than predefined output
Semi-Supervised: Start with seed of labeled data and iteratively
learn using both supervised and unsupervised learning
Deap Generative Learning: Advanced unsupervised learning
using self generated data that is of higher similarity with the
original data (uses variational autoencoder and/or GAN).
Reinforcement: is a machine learning training method based on
rewarding desired behaviors and punishing undesired ones
Assignments
Fundamentals of NLP
• Prepare a presentation on the following
topics:
– Introduction (what is NLP?, Approaches to NLP, Tasks
of NLP, Foundations of NLP, History of NLP, NLP related
technologies and disciplines, etc)
– Linguistic related issues in NLP
– Classical/Statistical Machine Learning (What it is,
Supervised, semi-supervised, unsupervised,
reinforcement, etc) as applied to NLP
– Deep Learning/Generative Learning as applied to NLP
(What are neural network and DL? Types/Architectures
of DL, activation functions, normalization, optimization,
hyper-parameters, etc)
Sate of the art research on one of
the NLP topics (see slide 6)
• Presentation and Report
• Can be on your potential research topic of
your project and one of the human
languages (for an Ethiopian language)
• You Need to review:
– The current research issues
– State of the art approaches, methods,
techniques, results and research gaps.
– Related works
Experiment on the development of an
NLP system for a local language
• You need to do the following:
– Formulate your research problem
– Prepare/acquire data,
– Select learning algorithms/techniques (preferably
DNN/GL), methods, approaches,
– Follow the NLP development steps
– Use NLP development ecosystems,
– Analyze experimental results
– Presentation of what you have done
– Prepare a report or a publishable manuscript