0% found this document useful (0 votes)

37 views40 pages

Lec1 Introduction

Uploaded by

Ishuraj chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views40 pages

Lec1 Introduction

Uploaded by

Ishuraj chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Information Retrieval (CSD510)

Introduction

Ayan Das
Instructor Contacts

Instructor: Ayan Das

Email id: [email protected]
TAs
Raj Kumar Saw
Neeraj Singh Dhurvey
Mode of contact
Email or post in Google Classroom.
Classroom interaction.
Phone calls or Whatsapp messages shall be ignored (Exception: Class
representatives).

Introduction (Ayan Das) Information Retrieval (CSD510) 2 / 38

Class timings

Tuesday: 11:00 AM - 11:50 AM

Thursday: 12:00 AM - 12:50 AM
Friday: 10:00 AM - 10:50 AM
Class rules:
Institute rule: 75% attendance is mandatory.
Class rule: Enter the class within 5 minutes of the commencement of
the class.
ATTENDANCE MEANS PHYSICAL PRESENCE IN THE CLASS!!

Introduction (Ayan Das) Information Retrieval (CSD510) 3 / 38

Lecture basics

Classes will involve both Slides + Board

For the latest/updated slides, download them before each use.
Use of laptops and smartphones is not allowed in the classroom.

Introduction (Ayan Das) Information Retrieval (CSD510) 4 / 38

Evaluation plan

Evaluation plan
Quiz 1: 10 marks
Mid semester: 32 marks
Quiz 2: 10 marks
End semester: 48 marks

NOTE
There is no provision for quiz retakes or compensatory vivas !!

Introduction (Ayan Das) Information Retrieval (CSD510) 5 / 38

Course Webpage

Google classroom link: Information Retrieval (CSD510)

https:
//classroom.google.com/c/NjUwNjMxODY4Nzg2?cjc=sgj7bhr
Why do I need to check the webpage?
Lecture Notes
Misc. static information about the course.
Announcements, Quiz schedules, and marks.

Introduction (Ayan Das) Information Retrieval (CSD510) 6 / 38

Life without search engines is difficult to imagine!

Introduction (Ayan Das) Information Retrieval (CSD510) 7 / 38

Search in Banking and Finance

Introduction (Ayan Das) Information Retrieval (CSD510) 8 / 38

Search in Sports, travel and entertainment

Introduction (Ayan Das) Information Retrieval (CSD510) 9 / 38

Education

Education, coding, and study

materials

Introduction (Ayan Das) Information Retrieval (CSD510) 10 / 38

Why care to learn IR and web search?

Introduction (Ayan Das) Information Retrieval (CSD510) 11 / 38

Why care to learn IR and web search?

About 80% of business is conducted on unstructured information.

About 85% of all data stored is held in an unstructured format.
On an average, roughly 7 million web pages are added everyday.
Unstructured data doubles roughly every three months.

Introduction (Ayan Das) Information Retrieval (CSD510) 12 / 38

IR as research discipline

ACM’s SIGIR
Special Interest Group on Information Retrieval.
Annual conferences, beginning in 1978.
Awards the Gerard Salton award.

TREC
Annual text retrieval conference, beginning in
1992.
Sponsored by the US National Institute of
Standards and Technology as well as US
Department of Defense.
Conducts different tracks, e.g. blogs, genomics,
spam
Provides data sets and test problems.
CLEF, NTCIR and FIRE are some other major IR conferences.
Introduction (Ayan Das) Information Retrieval (CSD510) 13 / 38
Information retrieval

IR is finding material (usually documents) of an

unstructured nature (usually text) that satisfies an
information need from within large collections (usually
stored on computers).
Introduction (Ayan Das) Information Retrieval (CSD510) 14 / 38
Core problems of IR

How to store and update large document collections?

Small !!
Scalable !
How to do efficient retrieval?
Speed !
How to do effective retrieval?
Ensure high result quality!

Introduction (Ayan Das) Information Retrieval (CSD510) 15 / 38

Document vs. Database Records

Document
A document is a collection of free text records written in some
natural language.
Web pages, emails, books, news stories, scholarly papers, text
messages, Powerpoint, PDF, forum postings, patents, tweets,
question-answer postings, blogs, etc.

Database records
Database records (or tuples in relational databases) are typically made
up of well-defined fields (or attributes),
e.g., bank records with account numbers, balances, names, addresses,
social security numbers, dates of birth, etc.
Easy to compare fields with well-defined semantics to queries in order
to find matches.

Introduction (Ayan Das) Information Retrieval (CSD510) 16 / 38

Document vs. Database Records

Example bank database query

Find records with balance > $50,000 in branches located in Amherst,
MA.
Matches easily found by comparison with field values of records

Introduction (Ayan Das) Information Retrieval (CSD510) 17 / 38

Document vs. Database Records

Example bank database query

Find records with balance > $50,000 in branches located in Amherst,
MA.
Matches easily found by comparison with field values of records

Example search engine query

financial scams since 2019 in India

Introduction (Ayan Das) Information Retrieval (CSD510) 17 / 38

Document vs. Database Records

Example bank database query

Find records with balance > $50,000 in branches located in Amherst,
MA.
Matches easily found by comparison with field values of records

Example search engine query

financial scams since 2019 in India
This text must be compared to the text of entire news stories

Introduction (Ayan Das) Information Retrieval (CSD510) 17 / 38

Typical IR tasks

Given
A corpus of textual natural-language documents.
A user query in the form of a textual string.

Find
A ranked set of documents that are relevant to the query.

Introduction (Ayan Das) Information Retrieval (CSD510) 18 / 38

So, what is relevance?

The relevant document contains the information that a person was

looking for when they submitted the query. This may include:
Being on the proper subject.
Being timely (recent information).
Being authoritative (from a trusted source).
Satisfying the goals of the user and his/her intended use of the
information (information need).

Introduction (Ayan Das) Information Retrieval (CSD510) 19 / 38

What do we do in IR??

Introduction (Ayan Das) Information Retrieval (CSD510) 20 / 38

Information need

An information need is the topic about which the user wants to

know more.
Refers to an individual, hidden cognitive state.
Depends on what the user knows and doesn’t know.
Ill-defined
What is the capital of USA?
Is it really true that addictive substances are mixed in soft drinks?
What is “cloud computing”?

Introduction (Ayan Das) Information Retrieval (CSD510) 21 / 38

Query
A query is what the user conveys to the IR system to communicate
the information need.
Stated using a
usually a list of search terms.
some formal query structure.

Introduction (Ayan Das) Information Retrieval (CSD510) 22 / 38

Logical view of document
Bag-of-words model: Document usually treated as a multi-set of
index terms or keywords derived from a predefined vocabulary.
Index term is a term that captures the essence of the topic of a
document.
Keywords extracted from a document.
Keywords are derived automatically or generated by a specialist.
Text operations: Operations involved in converting a document to a
bag of words.
reduces the complexity of the document representation.
allows moving the logical view from that of full text to that of a set of
index terms.

that's, one, small, step,

That's one small step for a for (2), a (2), man, giant,
man, a giant leap for mankind leap, for, mankind

Introduction (Ayan Das) Information Retrieval (CSD510) 23 / 38

Bag-of-words model

mankind

Kalam's
Abdul
that's

small

giant

India
step

man

leap
one

for

is
a
Vocabulary (Index terms)

That's one small step for a man, 1 1 1 1 2 2 1 1 1 1 0 0 0 0

a giant leap for mankind

Abdul Kalam's small step is a 0 0 1 1 0 1 0 1 1 0 1 1 1 1

giant leap for India

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14

D1 1 1 1 1 2 2 1 1 1 1 0 0 0 0
D2 0 0 1 1 0 1 0 1 1 0 1 1 1 1

Introduction (Ayan Das) Information Retrieval (CSD510) 24 / 38

The bag-of-words model

Pros
Simple set-theoretic representation of documents.
Efficient storage and retrieval of individual terms.
IR models using the bag-of-words representation have been found to
perform reasonably well.
Cons
Word order not maintained
Very different documents could have similar representation.
“advantages of C over Java”
AND
“advantages of Java over C”
Document structure information or metadata is ignored.

Introduction (Ayan Das) Information Retrieval (CSD510) 25 / 38

Logical view of a document

Figure: Logical view of the document: from full text to a set of index terms

Introduction (Ayan Das) Information Retrieval (CSD510) 26 / 38

Logical view of a document (contd.)

Stop-word removal
Word categories
Content words: Nouns, verbs, adjectives, adverbs.
Function words: Other parts-of-speeches.
Stop-words
Function words do not bear useful information for IR.
of, in, about, with, I, although
Reduce the set of representative keywords from large collection.
The removal of stop-words usually improves IR effectiveness.
Stop-lists
PoS tagging is usually not an integral component of an IR system.
Stop-lists: Lists of stop-words consisting of function words and very
frequent words not to be indexed.

Introduction (Ayan Das) Information Retrieval (CSD510) 27 / 38

Logical view of a document (contd.)
Noun groups
Word retention module.
Required when only NOUNs are needed by the retrieval system.
To identify the noun groups - gazetteer list, list of nouns updated
constantly.
Which eliminates the adjectives, adverbs and verbs.

Stemming

A root word may take different word computer

compute
forms based on their usage in a computes comput
context. computing
computed
Stemming used to normalize the computation
different word forms to a standard form.
Introduction (Ayan Das) Information Retrieval (CSD510) 28 / 38
Retrieval process

Introduction (Ayan Das) Information Retrieval (CSD510) 29 / 38

The retrieval process

The RP can be initiated, it is necessary to define the text DB.

This is done the DB manager, which specifies the following,
The documents to be used
The operations to be performed on the text
The text model, i.e. the text structure and what elements can be
retrieval.
Text operations transform the original documents and generate a
logical view of them.
The database manager builds an index of the text i.e. ”inverted file”,
Query operations used to generate actual “query” based on the used
needs To retrieve the relevant document for processing the query
The retrieved document ranked, before sent to the user

Introduction (Ayan Das) Information Retrieval (CSD510) 30 / 38

The retrieval process (contd.)

Text Operations forms index words (tokens).

Stop-word removal
Stemming
Indexing constructs an inverted index of word to document pointers.
Searching retrieves documents that contain a given query token from
the inverted index.
Ranking scores all retrieved documents according to a relevance
metric.
User Interface manages interaction with the user:
Query input and document output.
Relevance feedback.
Visualization of results.
Query Operations transform the query to improve retrieval:
Query expansion
Query transformation using relevance feedback.

Introduction (Ayan Das) Information Retrieval (CSD510) 31 / 38

Modelling

Index terms
Documents
(Bag of words
D1 D4
representation) Ranking
Text
D2 D5 processing
D4
D1
D3 D6
D5
D2

D6 D2
D3
Match D1
D6
D3
Text D4
User Query processing D5

Query

Actor

Introduction (Ayan Das) Information Retrieval (CSD510) 32 / 38

Modelling

IR systems usually adopt index terms to process queries.

Index term:
a keyword or group of selected words
any word (more general)
Stemming might be used
connect: connecting, connection, connections
An inverted file is built for the chosen index terms.
A ranking is an ordering of the documents retrieved to the user query.
A ranking is based on fundamental premises regarding the notion of
relevance, such as:
common sets of index terms
sharing of weighted terms
likelihood of relevance
Each set of premises leads to a distinct IR model.

Introduction (Ayan Das) Information Retrieval (CSD510) 33 / 38

Simplest notion of Relevance from Retrieval Models’
Perspective

Keyword Search
Simplest notion of relevance is that the query string appears verbatim
in the document.
Slightly less strict notion is that (most of) the words in the query
appear frequently in the document, in any order (bag of words).

Introduction (Ayan Das) Information Retrieval (CSD510) 34 / 38

Problems with Keywords Search

Term mismatch
May not retrieve relevant documents that include synonymous terms
PRC vs. China
car vs. automobile

Ambiguity
May retrieve irrelevant documents that include ambiguous terms (due to
polysemy)
‘Apple’ (company vs. fruit)
‘Java’ (programming language vs. Island)
‘Python’ (programming language vs. Snake)

Introduction (Ayan Das) Information Retrieval (CSD510) 35 / 38

Topics to be covered in the course
1 Boolean retrieval
2 The term vocabulary & postings lists
3 Dictionaries and tolerant retrieval
4 Index construction and compression
5 Scoring, term weighting & the vector space model
6 Computing scores in a complete search system
7 Evaluation in information retrieval.
8 Relevance feedback & query expansion
9 Probabilistic information retrieval
10 Language models for information retrieval
11 Text classification.
12 Link analysis – HITS, PageRank
13 Learning to Rank
14 Neural IR - Word embeddings, Semantic Matching - DSSM
Introduction (Ayan Das) Information Retrieval (CSD510) 36 / 38
Books

Textbooks
1 Introduction to Information Retrieval - Christopher D. Manning,
Prabhakar Raghavan and Hinrich Schütze: Cambridge University Press.
Reference books
1 Mining of Massive Datasets - Jure Leskovec, Anand Rajaraman, Jeff
Ullman: Cambridge University Press.
2 Mining the Web: Discovering Knowledge from Hypertext Data -
Soumen Chakrabarti: Morgan Kaufmann Series in Data Management
Systems
3 An Introduction to Neural Information Retrieval - Bhaskar Mitra,
Nick Craswell: NOW publishers
4 other materials (if required) shall be made available in the Google
classroom....

Introduction (Ayan Das) Information Retrieval (CSD510) 37 / 38

Technologies & Frameworks

Introduction (Ayan Das) Information Retrieval (CSD510) 38 / 38

Introduction
No ratings yet
Introduction
42 pages
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
No ratings yet
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
16 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
Chap 1
No ratings yet
Chap 1
23 pages
Unit - I - IR
No ratings yet
Unit - I - IR
39 pages
Unit 1: Introduction and Data Pre-Processing
No ratings yet
Unit 1: Introduction and Data Pre-Processing
71 pages
Intro to Info Retrieval Course
No ratings yet
Intro to Info Retrieval Course
31 pages
Lecture17 IR
No ratings yet
Lecture17 IR
28 pages
IR Chapter 1
No ratings yet
IR Chapter 1
32 pages
Materi Pertemuan Ke-1-Dno 2018-1
No ratings yet
Materi Pertemuan Ke-1-Dno 2018-1
42 pages
01 Introduction To ISR
No ratings yet
01 Introduction To ISR
34 pages
IR-Module 1 and 2
No ratings yet
IR-Module 1 and 2
48 pages
Introduction to Information Retrieval Course
No ratings yet
Introduction to Information Retrieval Course
39 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
No ratings yet
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
77 pages
Lecture1 Chap1
No ratings yet
Lecture1 Chap1
22 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
63 pages
Introduction To IIR
No ratings yet
Introduction To IIR
53 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
42 pages
1 IR Chapter-One
No ratings yet
1 IR Chapter-One
47 pages
CS583 Info Retrieval
No ratings yet
CS583 Info Retrieval
33 pages
Information Retrieval Detailed Lecture Nov 2023
No ratings yet
Information Retrieval Detailed Lecture Nov 2023
39 pages
Intro to Information Retrieval
No ratings yet
Intro to Information Retrieval
23 pages
Unit I
No ratings yet
Unit I
33 pages
IR Chapter 1
No ratings yet
IR Chapter 1
29 pages
Information Retrieval & MapReduce
No ratings yet
Information Retrieval & MapReduce
72 pages
Information Retrieval Systems
No ratings yet
Information Retrieval Systems
46 pages
22103071-Assignment - Ii
No ratings yet
22103071-Assignment - Ii
7 pages
Lect 1
No ratings yet
Lect 1
15 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
1 IR Introductionn
No ratings yet
1 IR Introductionn
30 pages
IR Lec1
No ratings yet
IR Lec1
26 pages
Chapter 1 Ir
No ratings yet
Chapter 1 Ir
37 pages
IR UNIT I - Notes
No ratings yet
IR UNIT I - Notes
23 pages
Information Retrieval Course Overview
100% (2)
Information Retrieval Course Overview
12 pages
CS & Engineering Lecture Notes
No ratings yet
CS & Engineering Lecture Notes
24 pages
Web Mining UNIT-II Chapter-01 - 02 - 03
No ratings yet
Web Mining UNIT-II Chapter-01 - 02 - 03
19 pages
1 IR Intro
No ratings yet
1 IR Intro
30 pages
Introduction To IR Chapter 01
No ratings yet
Introduction To IR Chapter 01
29 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
IRS B Tech CSE Part 1
No ratings yet
IRS B Tech CSE Part 1
161 pages
Introduction To Information Retrieval - by William Scott - Medium
No ratings yet
Introduction To Information Retrieval - by William Scott - Medium
4 pages
Information Storage and Retrieval: Chapter One - Introduction
No ratings yet
Information Storage and Retrieval: Chapter One - Introduction
50 pages
Minimize The Overhead of A User Locating Needed Information Precision and Recall
No ratings yet
Minimize The Overhead of A User Locating Needed Information Precision and Recall
14 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Advanced Database Tech: IR & Web Search
No ratings yet
Advanced Database Tech: IR & Web Search
21 pages
Web Information Retrieval
No ratings yet
Web Information Retrieval
10 pages
All Units Notes TYBSC-CS-Information-Retrieval
No ratings yet
All Units Notes TYBSC-CS-Information-Retrieval
89 pages
01 Introduction To ISR
No ratings yet
01 Introduction To ISR
48 pages
11 Multimedia Media IR
No ratings yet
11 Multimedia Media IR
19 pages
Chapter 1
No ratings yet
Chapter 1
69 pages
Lecture1 Intro Boolean
No ratings yet
Lecture1 Intro Boolean
42 pages
Unit-5 Adt
No ratings yet
Unit-5 Adt
11 pages
Classwork For Information Retrieval
No ratings yet
Classwork For Information Retrieval
118 pages
UNIT I IR Final
No ratings yet
UNIT I IR Final
26 pages
UNIT I - Introduction and Motivation
No ratings yet
UNIT I - Introduction and Motivation
57 pages
Week 2 - Information Retrieval Basics
No ratings yet
Week 2 - Information Retrieval Basics
74 pages
IR Introduction
100% (1)
IR Introduction
6 pages
Lec2 BooleanRetrieval 1
No ratings yet
Lec2 BooleanRetrieval 1
61 pages
Ramanuja-III
No ratings yet
Ramanuja-III
9 pages
Computational Geometry F
No ratings yet
Computational Geometry F
14 pages
Approximation Algorithms 8
No ratings yet
Approximation Algorithms 8
9 pages
Indian Knowledge System: Abstract
No ratings yet
Indian Knowledge System: Abstract
18 pages
AI As Awakened Intelligence Buddha Kurzw
No ratings yet
AI As Awakened Intelligence Buddha Kurzw
13 pages
Documentation Cloud Sem 7
No ratings yet
Documentation Cloud Sem 7
13 pages
MongoDB Shell Cheat Sheet
No ratings yet
MongoDB Shell Cheat Sheet
3 pages
Idms Faq
No ratings yet
Idms Faq
4 pages
Intergraph Smart 3D: (Includes Smartplant® 3D, Smartmarine® 3D, Smartplant® 3D Materials Handling Edition)
No ratings yet
Intergraph Smart 3D: (Includes Smartplant® 3D, Smartmarine® 3D, Smartplant® 3D Materials Handling Edition)
165 pages
Autonomous Navigation and Landing of Airliners Using Artificial Neural Networks and Learning by Imitation
No ratings yet
Autonomous Navigation and Landing of Airliners Using Artificial Neural Networks and Learning by Imitation
10 pages
LITESTAR 4D v. 4.00: User Manual Litecalc - Lighting Design Module
No ratings yet
LITESTAR 4D v. 4.00: User Manual Litecalc - Lighting Design Module
117 pages
Sixth Economic Census India 2013-14
No ratings yet
Sixth Economic Census India 2013-14
390 pages
NBU Walk Through Document
No ratings yet
NBU Walk Through Document
59 pages
Computer Science Textbook Solutions - 29
No ratings yet
Computer Science Textbook Solutions - 29
31 pages
Information Retrieval Systems Guide
100% (1)
Information Retrieval Systems Guide
32 pages
Basis Data
No ratings yet
Basis Data
2 pages
New - Export Amazon DynamoDB Table Data To Your Data Lake in Amazon S3, No Code Writing Required
No ratings yet
New - Export Amazon DynamoDB Table Data To Your Data Lake in Amazon S3, No Code Writing Required
6 pages
Deep Dive Into Oracle Identity Governance 12.2.1.4.0 Performance On Oracle Cloud Infrastructure Container Engine For Kubernetes
No ratings yet
Deep Dive Into Oracle Identity Governance 12.2.1.4.0 Performance On Oracle Cloud Infrastructure Container Engine For Kubernetes
11 pages
MongoDB API vs Drivers Guide
No ratings yet
MongoDB API vs Drivers Guide
15 pages
7th 8th Syll With Objectivesa21.11.09acm
No ratings yet
7th 8th Syll With Objectivesa21.11.09acm
36 pages
1.case Study CTTS - Milestone 01 Scope Definition
No ratings yet
1.case Study CTTS - Milestone 01 Scope Definition
6 pages
K
No ratings yet
K
12 pages
PowerPlay Performance Guide
No ratings yet
PowerPlay Performance Guide
84 pages
Experiment No 11 DBMS
No ratings yet
Experiment No 11 DBMS
5 pages
Postgre SQL
No ratings yet
Postgre SQL
43 pages
Spectrum Release Notes
No ratings yet
Spectrum Release Notes
11 pages
Database Security for DBAs
100% (1)
Database Security for DBAs
52 pages
附件1：南方科技大学研究生高水平国际会议名录
No ratings yet
附件1：南方科技大学研究生高水平国际会议名录
84 pages
100127068
No ratings yet
100127068
29 pages
7 - M.SC Cyber Security Syllabus
No ratings yet
7 - M.SC Cyber Security Syllabus
58 pages
Low Code Development for Pro Developers
No ratings yet
Low Code Development for Pro Developers
33 pages
PowerLink Advantage Installation Guide SWM0027 - V5.00 - R0
No ratings yet
PowerLink Advantage Installation Guide SWM0027 - V5.00 - R0
70 pages
Post Office Management System A Java Project
87% (15)
Post Office Management System A Java Project
41 pages
YCCC Website Redesign RFP 2020
No ratings yet
YCCC Website Redesign RFP 2020
8 pages
Beginner's Guide to Oracle Basics
100% (2)
Beginner's Guide to Oracle Basics
78 pages
Using MySQL Database With Visual Basic
No ratings yet
Using MySQL Database With Visual Basic
10 pages

Lec1 Introduction

Uploaded by

Lec1 Introduction

Uploaded by

Information Retrieval (CSD510)

Instructor: Ayan Das

Introduction (Ayan Das) Information Retrieval (CSD510) 2 / 38

Tuesday: 11:00 AM - 11:50 AM

Introduction (Ayan Das) Information Retrieval (CSD510) 3 / 38

Classes will involve both Slides + Board

Introduction (Ayan Das) Information Retrieval (CSD510) 4 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 5 / 38

Google classroom link: Information Retrieval (CSD510)

Introduction (Ayan Das) Information Retrieval (CSD510) 6 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 7 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 8 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 9 / 38

Education, coding, and study

Introduction (Ayan Das) Information Retrieval (CSD510) 10 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 11 / 38

About 80% of business is conducted on unstructured information.

Introduction (Ayan Das) Information Retrieval (CSD510) 12 / 38

IR is finding material (usually documents) of an

How to store and update large document collections?

Introduction (Ayan Das) Information Retrieval (CSD510) 15 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 16 / 38

Example bank database query

Introduction (Ayan Das) Information Retrieval (CSD510) 17 / 38

Example bank database query

Example search engine query

Introduction (Ayan Das) Information Retrieval (CSD510) 17 / 38

Example bank database query

Example search engine query

Introduction (Ayan Das) Information Retrieval (CSD510) 17 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 18 / 38

The relevant document contains the information that a person was

Introduction (Ayan Das) Information Retrieval (CSD510) 19 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 20 / 38

An information need is the topic about which the user wants to

Introduction (Ayan Das) Information Retrieval (CSD510) 21 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 22 / 38

that's, one, small, step,

Introduction (Ayan Das) Information Retrieval (CSD510) 23 / 38

That's one small step for a man, 1 1 1 1 2 2 1 1 1 1 0 0 0 0

Abdul Kalam's small step is a 0 0 1 1 0 1 0 1 1 0 1 1 1 1

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14

Introduction (Ayan Das) Information Retrieval (CSD510) 24 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 25 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 26 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 27 / 38

A root word may take different word computer

Introduction (Ayan Das) Information Retrieval (CSD510) 29 / 38

The RP can be initiated, it is necessary to define the text DB.

Introduction (Ayan Das) Information Retrieval (CSD510) 30 / 38

Text Operations forms index words (tokens).

Introduction (Ayan Das) Information Retrieval (CSD510) 31 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 32 / 38

IR systems usually adopt index terms to process queries.

Introduction (Ayan Das) Information Retrieval (CSD510) 33 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 34 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 35 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 37 / 38

Introduction (Ayan Das) Information Retrieval (CSD510) 38 / 38

You might also like