NLP for Information Retrieval

The document provides an overview of the different phases of natural language processing (NLP) including morphological, lexical, syntactic, semantic, discourse processing, and pragmatic processing. It describes what each phase involves, such as morphological analysis dealing with the structure of words, lexical analysis identifying word meanings, syntactic analysis assessing grammar rules, and semantic analysis determining the overall meaning. The document also gives examples of techniques used in each phase and how understanding language at each level can improve applications like information retrieval.

Uploaded by

Sunil Nagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views8 pages

NLP for Information Retrieval

Uploaded by

Sunil Nagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Phases of NLP:

Natural Language Processing works on multiple levels and most often, these
different areas synergize well with each other. This article will offer a brief
overview of each and provide some examples of how they are used in information
retrieval.
Morphological

The morphological level of linguistic processing deals with the study of word
structures and word formation, focusing on the analysis of the individual
components of words. The most important unit of morphology, defined as having
the “minimal unit of meaning”, is referred to as the morpheme.

Taking, for example, the word: “unhappiness”. It can be broken down into three
morphemes (prefix, stem, and suffix), with each conveying some form of meaning:
the prefix un- refers to “not being”, while the suffix -ness refers to “a state of
being”.

The stem happy is considered as a free morpheme since it is a “word” in its own

right.

Bound morphemes (prefixes and suffixes) require a free morpheme to which it can

be attached to, and can therefore not appear as a “word” on their own.

In Information Retrieval, document and query terms can be stemmed to match the
morphological variants of terms between the documents and query; such that the
singular form of a noun in a query will match even with its plural form in the
document, and vice versa, thereby increasing recall.

Lexical
Lexical analysis is the process of trying to understand what words mean, intuit
their context, and note the relationship of one word to others. It is often the entry
point to many NLP data pipelines.
Lexical analysis can come in many forms and varieties. It is used as the first
step of a compiler, for example, and takes a source code file and breaks down
the lines of code to a series of "tokens", removing any whitespace or
comments.
In other types of analysis, lexical analysis might preserve multiple words
together as an "n-gram" (or a sequence of items).
After tokenization, the computer will proceed to look up words in a dictionary
and attempt to extract their meanings.
For a compiler, this would involve finding keywords and associating operations
or variables with the tokens.
In other contexts, such as a chat bot, the lookup may involve using a database to
match intent. As noted above, there are often multiple meanings for a specific
word, which means that the computer has to decide what meaning the word has in
relation to the sentence in which it is used.

This second task if often accomplished by associating each word in the dictionary
with the context of the target word. For example, the word "baseball field" may be
tagged in the machine as LOCATION for syntactic analysis

Syntactic
The syntax of the input string refers to the arrangement of words in a sentence
so they grammatically make sense. NLP uses syntactic analysis to assess whether
or not the natural language aligns with grammatical or other logical rules.
To apply these grammar rules, a collection of algorithms is utilized to describe
words and derive meaning from them. Syntax techniques that are frequently used
in NLP include the following:
 Lemmatization / Stemming - reduces word complexity to simpler forms
that have less variation. Lemmatization uses a dictionary to reduce the
natural language to its root words. Stemming uses simple matching patterns
to strip away suffixes such as 's' and 'ing'.
 Parsing - This is the process of undergoing grammatical analysis of a given
sentence. A common method is called Dependency Parsing, which assesses
the relationships between words in a sentence.
Nevertheless, syntax can still be ambiguous at times as in the case of the news
headline:
grammar to parse a sentence −
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer
can easily understand and process it. In order for the parsing algorithm to construct this
parse tree, a set of rewrite rules, which describe what tree structures are legal, need to
be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of
other symbols. According to first order logic rule, if there are two strings Noun Phrase
(NP) and Verb Phrase (VP), then the string combined by NP followed by VP is a
sentence. The rewrite rules for the sentence are as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown −
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or
"pecks", sentences such as "The bird peck the grains" can be wrongly permitted. i. e.
the subject-verb agreement error is approved as correct
Semantic Analysis:
Semantics refers to the meaning that is conveyed by the input text. This analysis is
one of the difficult tasks involved in NLP, as it requires algorithms to understand
the meaning and interpretation of words in addition to the overall structure of a
sentence. Semantic analysis techniques include:
 Entity Extraction - This means identifying and extracting categorical
entities such as people, places, companies, or things. It is essential to
simplifying the contextual analysis of natural language.
 Machine Translation - This is used to automatically translate text from one
human language to another.
 Natural Language Generation - This is the process of converting
information of the computer semantic intention into readable human
language. This is utilized by chatbots to effectively and realistically respond
to users.
 Natural Language Understanding - This involves converting pieces of text
into representations that are structured logically for the computer programs
to easily manipulate.
Discourse Processing

The discourse level of linguistic processing deals with the analysis of structure
and meaning of text beyond a single sentence, making connections between
words and sentences. At this level, Anaphora Resolution is also achieved by
identifying the entity referenced by an anaphor (most commonly in the form of, but
not limited to, a pronoun). An example is shown below.

Fig: Anaphora Resolution Illustration

With the capability to recognize and resolve anaphora relationships, document and
query representations are improved, since, at the lexical level, the implicit presence
of concepts is accounted for throughout the document as well as in the query, while
at the semantic and discourse levels, an integrated content representation of the
documents and queries are generated.

Structured documents also benefit from the analysis at the discourse level since
sections can be broken down into (1) title, (2) abstract, (3) introduction, (4) body,
(5) results, (6) analysis, (7) conclusion, and (8) references. Information Retrieval
systems are significantly improved, as the specific roles of pieces of information are
determined as for whether it is a conclusion, an opinion, a prediction, or a fact.

Pragmatic Processing:

The pragmatic level of linguistic processing deals with the use of real-world
knowledge and understanding of how this impacts the meaning of what is being
communicated. By analyzing the contextual dimension of the documents and
queries, a more detailed representation is derived.

In Information Retrieval, this level of Natural Language Processing primarily

engages query processing and understanding by integrating the user’s history and
goals as well as the context upon which the query is being made. Contexts may
include time and location.

This level of analysis enables major breakthroughs in Information Retrieval as it

facilitates the conversation between the IR system and the users, allowing the
elicitation of the purpose upon which the information being sought is planned to be
used, thereby ensuring that the information retrieval system is fit for purpose.

FG100 Tech Manual v2
80% (10)
FG100 Tech Manual v2
94 pages
(Ebook) Speech and Language Processing: An Introduction To Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin Download
100% (1)
(Ebook) Speech and Language Processing: An Introduction To Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin Download
80 pages
Mathematics For Electrical Science and Physical Science, M-1, S2
No ratings yet
Mathematics For Electrical Science and Physical Science, M-1, S2
4 pages
(A) What Is Traditional Model of NLP?: Unit - 1
No ratings yet
(A) What Is Traditional Model of NLP?: Unit - 1
18 pages
NLP Components and Techniques Guide
No ratings yet
NLP Components and Techniques Guide
26 pages
Ai Lab
No ratings yet
Ai Lab
33 pages
Ch11 3 Tries
No ratings yet
Ch11 3 Tries
11 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
ML Lab Syllabus for Students
No ratings yet
ML Lab Syllabus for Students
90 pages
CSCI 4152/6509 Natural Language Processing: Lab 1: FCS Computing Environment, SVN Tutorial
No ratings yet
CSCI 4152/6509 Natural Language Processing: Lab 1: FCS Computing Environment, SVN Tutorial
38 pages
Introduction to NLP Concepts
No ratings yet
Introduction to NLP Concepts
21 pages
SE Unit 1 V 2
No ratings yet
SE Unit 1 V 2
12 pages
Table of Content
No ratings yet
Table of Content
13 pages
Semantics, Pragmatics, and Logic
No ratings yet
Semantics, Pragmatics, and Logic
105 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
SQL Report Writing Basics
No ratings yet
SQL Report Writing Basics
2 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Question Bank
No ratings yet
Question Bank
13 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
IS 7118 Unit1 Introduction
No ratings yet
IS 7118 Unit1 Introduction
58 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
A Beginner's Introduction To Natural Language Processing (NLP)
100% (1)
A Beginner's Introduction To Natural Language Processing (NLP)
15 pages
IS 7118 Unit-5 POS Tagging
No ratings yet
IS 7118 Unit-5 POS Tagging
89 pages
Funcational Requirement Document
No ratings yet
Funcational Requirement Document
11 pages
NATURAL LANGUAGE PROCESSING IN Education
No ratings yet
NATURAL LANGUAGE PROCESSING IN Education
17 pages
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
No ratings yet
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
30 pages
NLP Notes
No ratings yet
NLP Notes
203 pages
Unit 1
No ratings yet
Unit 1
35 pages
New Data Science Specialization Brochure
No ratings yet
New Data Science Specialization Brochure
23 pages
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
No ratings yet
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
57 pages
B - N - M - Institute of Technology: Department of Computer Science & Engineering
No ratings yet
B - N - M - Institute of Technology: Department of Computer Science & Engineering
44 pages
Natural Language Processing
100% (1)
Natural Language Processing
3 pages
Unit 1
No ratings yet
Unit 1
99 pages
Bhawini NLP File
No ratings yet
Bhawini NLP File
100 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
Linguistics & NLP: Morphology Basics
No ratings yet
Linguistics & NLP: Morphology Basics
14 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
Auto Matic Receipt Creation Cong PDF
No ratings yet
Auto Matic Receipt Creation Cong PDF
17 pages
NLP
No ratings yet
NLP
2 pages
Oracle EBS Technical Training
No ratings yet
Oracle EBS Technical Training
7 pages
Unit3 - Morphology and Finite State Transducers
100% (1)
Unit3 - Morphology and Finite State Transducers
55 pages
Nlp-Unit-I Final
No ratings yet
Nlp-Unit-I Final
31 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
2 pages
NLP Course for Students
No ratings yet
NLP Course for Students
25 pages
Applications of AI
No ratings yet
Applications of AI
11 pages
Introduction To NLP: Natural Language Processing
No ratings yet
Introduction To NLP: Natural Language Processing
21 pages
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
NLP UNIT 5 Part B
100% (2)
NLP UNIT 5 Part B
31 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
Lecture-8. Only For This Batch
No ratings yet
Lecture-8. Only For This Batch
46 pages
Com713 Advanced Data Structures and Algorithms
No ratings yet
Com713 Advanced Data Structures and Algorithms
13 pages
NLP Unit-2 Notes
No ratings yet
NLP Unit-2 Notes
45 pages
Natural Language Processing
No ratings yet
Natural Language Processing
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
NLP Lab Guide for Students
No ratings yet
NLP Lab Guide for Students
103 pages
Unit 3
No ratings yet
Unit 3
14 pages
10 Natural Language Processing
No ratings yet
10 Natural Language Processing
27 pages
NLP Basics
No ratings yet
NLP Basics
7 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
AI Ch-12 Natural Language Processing
No ratings yet
AI Ch-12 Natural Language Processing
5 pages
AI UNIT 4 Lecture 2
No ratings yet
AI UNIT 4 Lecture 2
7 pages
Ai Unit 2
No ratings yet
Ai Unit 2
29 pages
AI UNIT 2 Lecture 3
No ratings yet
AI UNIT 2 Lecture 3
16 pages
Javascript Dom
No ratings yet
Javascript Dom
15 pages
AI UNIT 1 Lecture 5
No ratings yet
AI UNIT 1 Lecture 5
15 pages
AI Unit 3 Lecture 3
No ratings yet
AI Unit 3 Lecture 3
17 pages
Logic in Programming (PROLOG)
0% (1)
Logic in Programming (PROLOG)
12 pages
AI UNIT 4 Lecture 1
No ratings yet
AI UNIT 4 Lecture 1
10 pages
Layers of Netscape
100% (1)
Layers of Netscape
4 pages
Abhishek WD 1st Unit
No ratings yet
Abhishek WD 1st Unit
27 pages
Chapter 2
No ratings yet
Chapter 2
66 pages
Chapter 3
No ratings yet
Chapter 3
84 pages
Unit-4 Software Testing-BCA
No ratings yet
Unit-4 Software Testing-BCA
21 pages
Fresh Water Generator Guide
No ratings yet
Fresh Water Generator Guide
6 pages
Ultra Sensitive TSH Test Report
No ratings yet
Ultra Sensitive TSH Test Report
1 page
Activity 1.6
No ratings yet
Activity 1.6
3 pages
UNIT3 2marks
No ratings yet
UNIT3 2marks
7 pages
Wipro Technical Interview Questions
No ratings yet
Wipro Technical Interview Questions
3 pages
Stucor Ma3351 Er
No ratings yet
Stucor Ma3351 Er
149 pages
Comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2/comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2 PDF
No ratings yet
Comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2/comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2 PDF
13 pages
Mathematics SL Internal Assessment Does My Dog Walk More Than Me?
No ratings yet
Mathematics SL Internal Assessment Does My Dog Walk More Than Me?
15 pages
Pinto - pm2 - Session 4 - Shared Slides
No ratings yet
Pinto - pm2 - Session 4 - Shared Slides
78 pages
Control Structures in PLSQL
No ratings yet
Control Structures in PLSQL
8 pages
Non-Invasive Cylicon (Cylinder and Cone) Antenna For Blood Glucose Monitoring
No ratings yet
Non-Invasive Cylicon (Cylinder and Cone) Antenna For Blood Glucose Monitoring
5 pages
Missel Product List GB 2017 02 Fire Protection PDF
No ratings yet
Missel Product List GB 2017 02 Fire Protection PDF
36 pages
Namma Kalvi 12th Computer Applications Practical Manual em
No ratings yet
Namma Kalvi 12th Computer Applications Practical Manual em
33 pages
Oriental College of Technology: Ritika Makhija
No ratings yet
Oriental College of Technology: Ritika Makhija
23 pages
Fire Protection System
No ratings yet
Fire Protection System
60 pages
McNemara Test
No ratings yet
McNemara Test
11 pages
Hyd Cylinder Details Jyo Make
No ratings yet
Hyd Cylinder Details Jyo Make
4 pages
Autodesk Revit Architecture 2011 Brochure
No ratings yet
Autodesk Revit Architecture 2011 Brochure
6 pages
Maths
No ratings yet
Maths
6 pages
Preparation of Specimens FR Immunohistochemistry - PPT (2) - 1
No ratings yet
Preparation of Specimens FR Immunohistochemistry - PPT (2) - 1
33 pages
Procedure of Selant Application MC Teaching
No ratings yet
Procedure of Selant Application MC Teaching
2 pages
Quickassist Adapter 8950 Brief
No ratings yet
Quickassist Adapter 8950 Brief
3 pages
Secrets of Sight Reading Piano Music
100% (5)
Secrets of Sight Reading Piano Music
8 pages
Inverse of A Matrix
100% (1)
Inverse of A Matrix
71 pages
Parametric Design For Nepal
No ratings yet
Parametric Design For Nepal
19 pages
Air Dryer
100% (1)
Air Dryer
21 pages
Retaining Wall Drawing
No ratings yet
Retaining Wall Drawing
1 page
PLC Components & Functions Guide
No ratings yet
PLC Components & Functions Guide
2 pages

NLP for Information Retrieval

Uploaded by

NLP for Information Retrieval

Uploaded by

Phases of NLP:

The stem happy is considered as a free morpheme since it is a “word” in its own

Bound morphemes (prefixes and suffixes) require a free morpheme to which it can

Fig: Anaphora Resolution Illustration

In Information Retrieval, this level of Natural Language Processing primarily

This level of analysis enables major breakthroughs in Information Retrieval as it

You might also like