0% found this document useful (0 votes)

371 views6 pages

Lucene and Solr Search Engine Guide

Apache Lucene is a free and open-source information retrieval software library written in Java. It allows developers to add full-text search and indexing capabilities to applications. Solr is an open-source enterprise search platform built on Lucene that provides powerful indexing, searching, and retrieval capabilities across various repositories. It allows developers to easily develop search and analytics applications through REST-like APIs and a web interface for administration. Both Lucene and Solr use tokenization, filtering, and analysis to process content for indexing and searching.

Uploaded by

vikashvardhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

371 views6 pages

Lucene and Solr Search Engine Guide

Uploaded by

vikashvardhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 6

Search Engine Functionality for LLP

Apache Lucene Library and Solr Enterprise Search Server

Apache Lucene

• A high-performance, full-featured text search engine

library written entirely in Java.

• It is a technology suitable for nearly any application

that requires full-text search, especially cross-platform.

Features-Lucene is designed to make it easy to add indexing and

search capability to a broad range of applications, including:

• Searchable email: An email application could let users

search archived messages and add new messages to the
index as they arrive.

• Online documentation search: A documentation reader --

CD-based, Web-based, or embedded within the application --
could let users search online documentation or archived
publications.

• Searchable Webpages: A Web browser or proxy server

could build a personal search engine to index every
Webpage a user has visited, allowing users to easily revisit
pages.

• Website search: A CGI program could let users search your

Website.

• Content search: An application could let the user search

saved documents for specific content; this could be
integrated into the Open Document dialog.
• Version control and content management: A document
management system could index documents, or document
versions, so they can be easily retrieved.

• News and wire service feeds: A news server or relay

could index articles as they arrive.

Usage-Lucene can be used as follows:-

• Indexing Side: Write code to add Documents to the index.

• Search Side: Write code to transform user query into

Lucene Query instances.

• Submit Query to Lucene to Search.

• Display Results

-A Document is one or more Fields. A Field consists of a name,

content, and metadata on how to handle the content. Content is
made searchable by analyzing it. Analysis is completed by
chaining together a Tokenizer, which splits an input stream into
words (tokens) and zero or more TokenFilters, which can alter (for
example, stem) or remove the token.

Indexing- It is the process of preparing and adding text to

Lucene. Key Point is Lucene only indexes Strings, i.e.

• Lucene doesn’t care about XML, Word, PDF, etc.

• There are many good open source extractors available

• We need to convert whatever file format we have into

lucene format.

Solr
• Solr is an open source enterprise search server based on the
Lucene Java search library, with XML/HTTP and JSON APIs, hit
highlighting, faceted search, caching, replication, a web
administration interface and many more features. It runs in a
Java servlet container such as Tomcat.

Features: Its in the form of Java5 webapp (WAR) with web

services-like API. We put documents in it (called "indexing") via
XML over HTTP. And we query it via HTTP GET and receive XML
results.

• Advanced Full-Text Search Capabilities

• Optimized for High Volume Web Traffic

• Standards Based Open Interfaces - XML and HTTP

• Server statistics exposed over JMX for monitoring

• Scalability - Efficient Replication to other Solr Search Servers

• Flexible and Adaptable with XML configuration

• Extensible Plugin Architecture

The admin console :

Usage: Conceptually, Solr can be broken down into four main
areas:

• Schema (schema.xml) –describes the data

• Configuration (solrconfig.xml) - describes how people can
interact with the data
• Indexing
• Searching
As in case of Lucene, content is made searchable by analyzing it
by chaining together a Tokenizer. The Solr schema makes it easy
to configure this analysis process without code.

Configuration--The solrconfig.xml file specifies how Solr should

handle indexing, highlighting, faceting, search, and other
requests, as well as attributes specifying how caching should be
handled and how Lucene should manage the index.
Indexing and searching--Happens via HTTP requests sent to the
Solr server. Index is modified by POSTing XML Documents
containing instructions to add (or update) documents, delete
documents, commit pending adds and deletes.
• Loading data- Send XML add commands over HTTP. For example :

<field name="id">canes</field>

<field name="name">Carolina Hurricanes</field>

</doc></add>

• Querying data: HTTP GET or POST, where parameters specifying

query options:

o http://solr/select?q=electronics

o http://solr/select?q=electronics&sort=price+desc

• Canonical response format is XML

</lst>

<result name="response" numFound="14" start="0">

<doc>

<str>electronics</str>

<str>connector</str>

</arr>

<str>car power adapter, white</str>

</arr>

<str name="id">F8V7067APLKIT</str> ..…

Lucene v. Solr

Lucene Solr
Embedded/ lightweight Server-side

No Container HTTP as communication language

Provide low-level control over all Want ease of setup and

aspects of process configuration

Thick clients Can be used for Non-Java clients

Distributed Replication/Caching Out-of-the-Box

Need to use features not available JDK 1.5

in Solr

JDK 1.4

Links for installation and documentation:

Lucene:

http://lucene.apache.org/java/2_4_0/gettingstarted.html (official
website)

http://www.ibm.com/developerworks/web/library/wa-
lucene2/?S_TACT=105AGY82&S_CMP=GENSITE

Solr:

http://lucene.apache.org/solr/tutorial.html (official website)

http://www.ibm.com/developerworks/opensource/library/j-solr-
update/index.html?ca=drs-

SPPA-3000 Basic Manual
75% (16)
SPPA-3000 Basic Manual
407 pages
Apache Lucene 4: Search Library Insights
No ratings yet
Apache Lucene 4: Search Library Insights
8 pages
Lucene & Solr for Java Developers
No ratings yet
Lucene & Solr for Java Developers
35 pages
3CX Basic Exam Questions
100% (1)
3CX Basic Exam Questions
8 pages
Apache Solr Presentation
100% (1)
Apache Solr Presentation
37 pages
Apache Lucene
100% (1)
Apache Lucene
13 pages
Lucene and Solr
No ratings yet
Lucene and Solr
24 pages
Informatica Cloud (IICS) Architecture
No ratings yet
Informatica Cloud (IICS) Architecture
21 pages
Advanced Search With Lucene
No ratings yet
Advanced Search With Lucene
30 pages
Lucence / SOLR
No ratings yet
Lucence / SOLR
21 pages
Solr Architecture
No ratings yet
Solr Architecture
5 pages
Welcome To Lucene!
No ratings yet
Welcome To Lucene!
11 pages
Apache Lucene
No ratings yet
Apache Lucene
5 pages
Siemens Power System Simulation For Engineers® (Pss/E)
No ratings yet
Siemens Power System Simulation For Engineers® (Pss/E)
12 pages
Apache Solr for Developers
No ratings yet
Apache Solr for Developers
17 pages
Lucene Tutorial
100% (1)
Lucene Tutorial
189 pages
Luce Ne Bootcamp
No ratings yet
Luce Ne Bootcamp
83 pages
Solr vs Elasticsearch: Key Features
No ratings yet
Solr vs Elasticsearch: Key Features
10 pages
Plugins & Stats Screen: Page 144 of 1426 Apache Solr Reference Guide 7.7
No ratings yet
Plugins & Stats Screen: Page 144 of 1426 Apache Solr Reference Guide 7.7
10 pages
Lucene Software Architecture Lecture
No ratings yet
Lucene Software Architecture Lecture
11 pages
Build a Rich Snippets Search Engine
No ratings yet
Build a Rich Snippets Search Engine
37 pages
Software Engineering Thesis Help
100% (2)
Software Engineering Thesis Help
8 pages
Ansible
No ratings yet
Ansible
4 pages
Web Search Engines: Practice and Experience: Content Analysis Query Prcessing Search Log
No ratings yet
Web Search Engines: Practice and Experience: Content Analysis Query Prcessing Search Log
21 pages
Lucene Solr
No ratings yet
Lucene Solr
52 pages
Chapter - 6 - Searching and Indexing
No ratings yet
Chapter - 6 - Searching and Indexing
44 pages
HD Mod10 Solr
No ratings yet
HD Mod10 Solr
73 pages
Searching and Indexing
No ratings yet
Searching and Indexing
21 pages
Tutorial 3
No ratings yet
Tutorial 3
38 pages
Tutorial
No ratings yet
Tutorial
59 pages
Requirements: Sun IBM BEA Solr Release
No ratings yet
Requirements: Sun IBM BEA Solr Release
5 pages
Lucene Domain Index
No ratings yet
Lucene Domain Index
78 pages
Search Engine Architecture Guide
No ratings yet
Search Engine Architecture Guide
23 pages
5 Indexing and Searching Big Data
No ratings yet
5 Indexing and Searching Big Data
11 pages
NLP 05
No ratings yet
NLP 05
26 pages
Article About Elasticsearch
No ratings yet
Article About Elasticsearch
5 pages
4
No ratings yet
4
35 pages
Apache Solr: Overview & Benefits
No ratings yet
Apache Solr: Overview & Benefits
4 pages
IR Project Guide for CS Students
No ratings yet
IR Project Guide for CS Students
15 pages
Apache Solr Essentials - Sample Chapter
No ratings yet
Apache Solr Essentials - Sample Chapter
25 pages
Apache Lucene
No ratings yet
Apache Lucene
5 pages
Networking
No ratings yet
Networking
51 pages
Untitled Document
No ratings yet
Untitled Document
9 pages
Apache Solr For Indexing Data - Sample Chapter
No ratings yet
Apache Solr For Indexing Data - Sample Chapter
19 pages
Logo 345 1649916914 Elasticsearch-Introductions
No ratings yet
Logo 345 1649916914 Elasticsearch-Introductions
86 pages
Chapter 5 1712934164766
No ratings yet
Chapter 5 1712934164766
13 pages
Apache Solr Search Patterns - Sample Chapter
No ratings yet
Apache Solr Search Patterns - Sample Chapter
33 pages
Musa Talukdar: Software Engineer 28 June, 2012
No ratings yet
Musa Talukdar: Software Engineer 28 June, 2012
19 pages
Chapter 5 Searching and Indexing Big Data 250525 070825
No ratings yet
Chapter 5 Searching and Indexing Big Data 250525 070825
19 pages
Advanced Lucene: Grant Ingersoll Center For Natural Language Processing Apachecon 2005 December 12, 2005
0% (1)
Advanced Lucene: Grant Ingersoll Center For Natural Language Processing Apachecon 2005 December 12, 2005
37 pages
Elasticsearch
100% (2)
Elasticsearch
21 pages
Apache Lucene
No ratings yet
Apache Lucene
19 pages
Driving Simulator 2011 Guide
No ratings yet
Driving Simulator 2011 Guide
5 pages
Selenium Intrew Quest
No ratings yet
Selenium Intrew Quest
21 pages
Intro To IT Chapter 1
No ratings yet
Intro To IT Chapter 1
15 pages
System Requirement Specification For A Mobile Barter Shop: - Epic Fail
No ratings yet
System Requirement Specification For A Mobile Barter Shop: - Epic Fail
22 pages
Marc Krellenst's Session at Lucene Revolution 2011
No ratings yet
Marc Krellenst's Session at Lucene Revolution 2011
16 pages
Excel File Repair & Password Unlock Guide
No ratings yet
Excel File Repair & Password Unlock Guide
19 pages
Lucene 4 Guide for Developers
No ratings yet
Lucene 4 Guide for Developers
28 pages
Solr and Lucene Search Revolution
No ratings yet
Solr and Lucene Search Revolution
27 pages
How To Connect A Machine Via Heidenhain DNC Interface
No ratings yet
How To Connect A Machine Via Heidenhain DNC Interface
15 pages
L01
No ratings yet
L01
33 pages
45MG Upgrade Instructions Using Upgrade 2010
100% (1)
45MG Upgrade Instructions Using Upgrade 2010
3 pages
Mcascheme
No ratings yet
Mcascheme
9 pages
Glossary of Salesforce Terms 1692036627
No ratings yet
Glossary of Salesforce Terms 1692036627
8 pages
Microsoft 365 Training + Certification Guide
No ratings yet
Microsoft 365 Training + Certification Guide
35 pages
Shubhamresume
No ratings yet
Shubhamresume
5 pages
Tax Id Account Routine
No ratings yet
Tax Id Account Routine
9 pages
Solr Setup and Usage Guide
No ratings yet
Solr Setup and Usage Guide
20 pages
CICS JCL Spawning Guide
100% (2)
CICS JCL Spawning Guide
5 pages
Rouse Hill HS BYOD Guide 2024
No ratings yet
Rouse Hill HS BYOD Guide 2024
3 pages
How To Download Insta Pro
No ratings yet
How To Download Insta Pro
6 pages
Dynamic Memory Allocation in C
No ratings yet
Dynamic Memory Allocation in C
3 pages
ATI x1400 Full Resolution Guide
No ratings yet
ATI x1400 Full Resolution Guide
7 pages
Spring Boot Basics
No ratings yet
Spring Boot Basics
9 pages
Built On Solr Simplified, Accelerated Produc Vity Cost Effec Ve Architecture
No ratings yet
Built On Solr Simplified, Accelerated Produc Vity Cost Effec Ve Architecture
7 pages
JD Decoding - Activity Sheet
No ratings yet
JD Decoding - Activity Sheet
2 pages
Diptendu Tan: Highlights
No ratings yet
Diptendu Tan: Highlights
3 pages
Learn To Use OmegaT in 5 Minutes
No ratings yet
Learn To Use OmegaT in 5 Minutes
2 pages
Android Course Syllabus
No ratings yet
Android Course Syllabus
3 pages
Nortel2003 0368 Iss1 1
No ratings yet
Nortel2003 0368 Iss1 1
0 pages
Bluetooth Test Setup Guide
No ratings yet
Bluetooth Test Setup Guide
2 pages

Lucene and Solr Search Engine Guide

Uploaded by

Lucene and Solr Search Engine Guide

Uploaded by

Search Engine Functionality for LLP

Apache Lucene Library and Solr Enterprise Search Server

• A high-performance, full-featured text search engine

• It is a technology suitable for nearly any application

Features-Lucene is designed to make it easy to add indexing and

• Searchable email: An email application could let users

• Online documentation search: A documentation reader --

• Searchable Webpages: A Web browser or proxy server

• Website search: A CGI program could let users search your

• Content search: An application could let the user search

• News and wire service feeds: A news server or relay

Usage-Lucene can be used as follows:-

• Indexing Side: Write code to add Documents to the index.

• Search Side: Write code to transform user query into

• Submit Query to Lucene to Search.

-A Document is one or more Fields. A Field consists of a name,

Indexing- It is the process of preparing and adding text to

• Lucene doesn’t care about XML, Word, PDF, etc.

• There are many good open source extractors available

• We need to convert whatever file format we have into

Features: Its in the form of Java5 webapp (WAR) with web

• Advanced Full-Text Search Capabilities

• Optimized for High Volume Web Traffic

• Standards Based Open Interfaces - XML and HTTP

• Server statistics exposed over JMX for monitoring

• Scalability - Efficient Replication to other Solr Search Servers

• Flexible and Adaptable with XML configuration

• Extensible Plugin Architecture

The admin console :

• Schema (schema.xml) –describes the data

Configuration--The solrconfig.xml file specifies how Solr should

<field name="name">Carolina Hurricanes</field>

• Querying data: HTTP GET or POST, where parameters specifying

• Canonical response format is XML

<result name="response" numFound="14" start="0">

<str>car power adapter, white</str>

<str name="id">F8V7067APLKIT</str> ..…

No Container HTTP as communication language

Provide low-level control over all Want ease of setup and

Thick clients Can be used for Non-Java clients

Distributed Replication/Caching Out-of-the-Box

Need to use features not available JDK 1.5

Links for installation and documentation:

http://lucene.apache.org/solr/tutorial.html (official website)

You might also like