0% found this document useful (0 votes)

2K views81 pages

An Elasticsearch Crash Course Presentation PDF

This document provides an overview of Elasticsearch, including: - Elasticsearch is a distributed, open source search and analytics engine that allows storing and searching of documents. - It uses inverted indexes and can perform searches across structured and unstructured data very quickly. - Documents are stored in indexes, which live within clusters. Documents are made up of fields that can be queried. - The document discusses common use cases, basic concepts like documents, types, indexes, and how documents are stored and queried. It also covers analysis, scoring, and different query types available in Elasticsearch.

Uploaded by

kshitij229020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views81 pages

An Elasticsearch Crash Course Presentation PDF

Uploaded by

kshitij229020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

An Elasticsearch

Crash Course

Elasticsearch is Everywhere

Why?

Elasticsearch

Jared and Corin @ flickr http://bit.ly/1qHHMPu

Some Use Cases

Searching pieces of pure text (books, legal documents,

blog posts)

Searching text + structured data (products, user

profiles, application logs)

Pure aggregated data (statistics, metrics, etc.)

Geo Search

Distributed JSON Document DB (Anything)

At a High Level

Is a database, like any other!

Document Oriented!
Clusters!
Built on Lucene!
Built on an IR foundation!
Can perform fancy tricks with inverted indexes and
automata!

The Basics of the ES API

Getting Data Into ES

Storing a Document
Verb

Index

Type

DocID

curl -XPUT http://localhost:9200/literature/quote/one -d'

{
"person": "Jack Handy",
"said": "The face of a child can say it all, especially the
mouth part of the face"
}'

Document

Where does the

document go?

Indexes live in the cluster

Documents live in indexes

Cluster
Index
Doc

Doc

Index
Doc

Doc

Index
Doc

Doc

Key Nouns

Documents

A single Arbitrary JSON object

Stored as a text blob + indexes on fields

All fields get an inverted index(es)

{
"person": "Sam",
"foods": ["Green eggs", "ham"]
"likeswith": {
"place": "house",
"companion": mouse,
"age": 10
}
}

Types

Defines the schema for documents

Defines indexing rules as well

{
"human" : {
"properties" : {
"person" : {"type" : "string"},
"age" :
{"type" : "integer"}}}}

Indexes

Largest building block in ES

Container for documents / types

Composable

Document Storage
{
_id: 1,
person: Jack Handy,
said: The face of

Docs

_id: 3,
person: Ben Franklin,
said: Any fool can

_id: 2,
person: George Eliot,
said: Wear a
}

Routing

Consistent Hashing!

Index

!
!
!
!

SHARD 1

SHARD 2

SHARD 3

SHARD 4

Inside an Elasticsearch Index

Elasticsearch Index
Lucene
Indexes

Shard 1

Shard 2

Shard 3

Shard N

Primary

Replica 1

Replica N

Each primary or replica shard is a Lucene index

Querying

A Simple Query
Verb

Index

Type

Action

curl -XPOST http://localhost:9200/literature/quote/_search -d'

{
"query": {
"match": {
"person": "jack"}}}'

Search Body

The Search API in Action

Query

Response
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,

{"query": {
"match": {
"person": "jack"}}}

API

Any Node

Index

!
!
!

SHARD 1

SHARD 2

SHARD 3

SHARD 4

Natural Language Search

Everything should run in

sub linear time, usually
O(log n)
Martin Fisch @ flickr http://bit.ly/1l4sII3

Think of Your Indexes

as Trees

Martin Fisch @ flickr http://bit.ly/1l4sII3

Working with Data in SQL

phrases table
Index on phrase
The fat brown.

phrase

The quick brown

fox jumped over the
lazy dog

The fat brown dog

Raining cats and

dogs

The quick brown

Raining cats and

SQL Index as a B-Tree

The fat brown.

Raining cats and

The quick brown

Fast Prefix Search

SELECT * FROM
phrases WHERE
phrase LIKE The%

Standard BTree-based indexes

are fast at:

Exact matches
Prefix matches

How well does the

previous example work
given a search for
dog?

Slow Scan Search

SELECT * FROM
phrases WHERE
phrase LIKE %dog%

An Inverted Index
Terms

Document

brown

{
"_id": 1
"phrase": the
quick brown fox jumps
over the lazy dog

dog
fat
fox
jump
lazi
over
quick

{
"_id": 2
"phrase": "The fat
brown dog"
}

An Inverted Index as a Tree

Terms
jump
dog
brown

over
fox

fat

lazi

quick

Sequential Scan City

SELECT * FROM
phrases WHERE
phrase ILIKE dog

Uses an index!
SELECT * FROM
phrases WHERE
LOWER(phrase)
=LOWER(dog)

Making the index

CREATE INDEX
lcase_phrase_idx ON
phrases (LOWER(phrase));

Text In, Terms Out

Some kind of Text

ANALYZER

[text, of, kind, some]

Analysis
The quick brown fox jumps over the lazy dog

Snowball Analyzer

[quick2, brown3, fox4, jump5, over6, lazi7, dog8]

Stemming and Stopwords

I jump while she jumps and laughs

Snowball Analyzer

[i1 jump2, while3, she4, jump5, laugh7]

NGrams
news

NGram Analyzer

["n", "e", "w", "s", "ne", "ew", "ws"]

An NGram Search
Query
["n", "e", "w", "ne", "ew"]
Good Match
["n", "e", "w", "s", "ne", "ew", "ws"]
Poor Match
["s", "t", "e", "w", "s", "st", te, ew, ws]

Path Hierarchy
"/var/lib/racoons"

Path Hierarchy Analyzer

["/var", "/var/lib", "/var/lib/racoons"]

Inverted Index Highlights

M Terms map to N documents

Still uses trees, but by breaking up text,

performance is gained!

String broken up into linguistic terms (usually

words)

Postgres users can do this (in a simple form)

List of ES Analysis Tools

Analyzers!
Tokenizers!
standard analyzer!
standard tokenizer!
simple analyzer!
edge ngram tokenizer!
whitespace analyzer! keyword tokenizer!
stop analyzer!
letter tokenizer!
keyword analyzer!
lowercase tokenizer!
pattern analyzer!
ngram tokenizer!
language analyzers! whitespace tokenizer!
snowball analyzer!
pattern tokenizer!
custom analyzer
uax email url tokenizer!
path hierarchy tokenizer!
classic tokenizer!
thai tokenizer
+ Plugins!

Token Filters!

standard token filter!

ascii folding token filter!

length token filter!

lowercase token filter!

uppercase token filter!

ngram token filter!

edge ngram token filter!

porter stem token filter!

shingle token filter!

stop token filter!

word delimiter token filter!

stemmer token filter!

stemmer override token filter!

keyword marker token filter!

keyword repeat token filter!

kstem token filter!

snowball token filter!

phonetic token filter!

synonym token filter!

compound word token filter!

reverse token filter!

elision token filter!

truncate token filter!

unique token filter!

pattern capture token filter!

pattern replace token filter!

trim token filter!

limit token count token filter!

hunspell token filter!

common grams token filter!

normalization token filter!

cjk width token filter!

cjk bigram token filter!

delimited payload token filter!

keep words token filter!

classic token filter!

apostrophe token filter

Scoring
=
Relevance

Search Methodology

Find all the docs using a boolean query!

Score all the docs using a similarity algorithm (TF/IDF)

TF/IDF Boosts When

The matched term is rare in the corpus!

The term appears frequently in the document

Document Scoring

Results are ordered based on score

(relevance)
Score based on either TF/IDF or other
algorithm
Custom scoring functions can be sent with
query or registered on the server

Document Scoring

Results are ordered based on score

(relevance)
Score based on either TF/IDF or other
algorithm
Custom scoring functions can be sent with
query or registered on the server

Query Types

Phrase Queries

Geo Queries

Numeric Range Queries

More Like This Queries

Autocomplete Queries

Query Types
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.

match query
multi match query
bool query
boosting query
common terms query
custom filters score
query
custom score query
custom boost factor
query
constant score query
dis max query
field query
filtered query
fuzzy like this query
fuzzy like this field
query
function score query
fuzzy query

17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.

geoshape query
has child query
has parent query
ids query
indices query
match all query
more like this query
more like this field
query
nested query
prefix query
query string query
simple query string
query
range query
regexp query
span first query
span multi term query
span near query

34.
35.
36.
37.
38.
39.
40.
41.
42.

span not query

span or query
span term query
term query
terms query
top children query
wildcard query
text query
minimum should
match
43. multi term query
rewrite

Compose Queries with

Boolean / DisMax Queries

Efficient Aggregate Queries:

An RDBMS vs Elasticsearch

Elasticsearch is an Information
Retrieval (IR) System

An RDBMS is oriented around

organizing data
!

An IR system is oriented around

efficient searches

In an RDBMS you create data, then

index it
!

In an IR system you create indexes

linked to data

Inverted indexes are

fantastically efficient for
denormalization!

Inverted Indexes for HTTP Logs

Proto Terms
http

Document
{
"_id": 1,
"proto": "http",
"path": "/foo",

https
}

Path Terms

/foo

/foo/bar

"_id": 2,
"proto": "http",
"path": "/foo",

{
"_id": 3,
"proto": "https",
"path": /foo/bar",
}

Question:
How many reqs did we
get under for each
path?

How We Answer It
SQL

SELECT
stat,COUNT(*)
FROM logs
WHERE stat IN
(proto,path)
GROUP BY stat

ES
"aggs": {
"path": {
"terms": {
"field": "path"} },
"proto": {
"terms": {
"field": proto"}}}}

Question:
How many reqs did we
get under each different
path AND its parents?

Inverted Indexes for HTTP Logs

Proto Terms
http

Document
{
"_id": 1,
"proto": "http",
"path": "/foo",

https
}

Path Terms

/foo

/foo/bar

"_id": 2,
"proto": "http",
"path": "/foo",

{
"_id": 3,
"proto": "https",
"path": /foo/bar",
}

Inverted Indexes for HTTP Logs

Proto Terms
http

Document
{
"_id": 1,
"proto": "http",
"path": "/foo",

https
}

Path Terms

/foo

/foo/bar

"_id": 2,
"proto": "http",
"path": "/foo",

{
"_id": 3,
"proto": "https",
"path": /foo/bar",
}

How We Answer It
SQL

X
SELECT
stat,COUNT(*)
FROM logs
WHERE stat IN
(proto,path)
GROUP BY stat

ES
"facets": {
"path": {
"terms": {
"field": "path"} },
"proto": {
"terms": {
"field": proto"}}}}

Lets Save some Space

Space Now Saved!

Proto Terms
http

Document
{
"_id": 1,

https

Path Terms

{
"_id": 2,
}

/foo
/foo/bar

{
"_id": 3,
}

Reasons to Consider ES

1. Speed
Traditional databases!
often are slower for full text search

2. Relevance
Search is all about relevance. A huge!
array of tools are provided by ES/Lucene!
to ensure results are relevant.

3. Aggregate Statistics
Elasticsearch can be faster than
your RDBMS when it comes to
aggregate stats!

4. Search Goodies
Users nowadays expect features like ultrafast type-ahead search, Did you mean?,
and More Like this

Logstash, an ES
Success Story

Indexes

Multi Index Query

logs-2013-01

logs-2013-02

logs-2013-03

logs-2013-04

logs-2013-05

logs-2013-06

curl http://es.srv/logs-2013-05,logs-2013-06/
_search -d '
"query": ""
'

Kibana + Logstash

Generic Document Store

Document Store Properties

Distributed
Excellent read performance / scalability
Mediocre delete/update performance
Rich queries on top of document properties

Things ES is bad at

Extremely high write environments: Lucene is

not write optimized. You probably wont hit limits
here however!
Large amounts of document churn: Deleting
and remerging segments can get expensive
Transactional Operations: Lucene is no RDBMS.
It is meant for fast, denormalized operations.
Primary Store: Still too new

Thank You!
Check out our hosted ES solution @
http://found.no

Elastic Search Presentation
No ratings yet
Elastic Search Presentation
55 pages
OSCP - 2022 - Standalones - October - 19 Machines
No ratings yet
OSCP - 2022 - Standalones - October - 19 Machines
32 pages
Quiz Sistran 1-7
0% (1)
Quiz Sistran 1-7
56 pages
McAfee ePO Training: Key Features
100% (1)
McAfee ePO Training: Key Features
23 pages
Elasticsearch Guide for Developers
100% (2)
Elasticsearch Guide for Developers
25 pages
Anchor Bolt Design Example As Per Indian Code - Google Search
50% (2)
Anchor Bolt Design Example As Per Indian Code - Google Search
2 pages
II. Information Retrieval (Basics Cont.) : Web Search - Summer Term 2006
No ratings yet
II. Information Retrieval (Basics Cont.) : Web Search - Summer Term 2006
16 pages
Luce Ne Bootcamp
No ratings yet
Luce Ne Bootcamp
83 pages
Elasticsearch Developer Cheat Sheet
No ratings yet
Elasticsearch Developer Cheat Sheet
2 pages
Elasticsearch Overview & Features
No ratings yet
Elasticsearch Overview & Features
10 pages
Review of Related Literature
100% (1)
Review of Related Literature
9 pages
AI6122 Topic 3.1 - Index
No ratings yet
AI6122 Topic 3.1 - Index
40 pages
List All Indices: Shards & Replicas
No ratings yet
List All Indices: Shards & Replicas
5 pages
Assessment 2
No ratings yet
Assessment 2
3 pages
Lucene Solr
No ratings yet
Lucene Solr
52 pages
When SQL Is Not Enough - There Comes Elasticsearch
No ratings yet
When SQL Is Not Enough - There Comes Elasticsearch
28 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
ES Tutorial PDF
No ratings yet
ES Tutorial PDF
61 pages
IR Chap7
No ratings yet
IR Chap7
30 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
No ratings yet
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
22 pages
Elastic Search
No ratings yet
Elastic Search
19 pages
Elasticsearch Introduction
No ratings yet
Elasticsearch Introduction
60 pages
ESA 7-5 Daily Management Guide PDF
No ratings yet
ESA 7-5 Daily Management Guide PDF
456 pages
Elasticsearch
100% (2)
Elasticsearch
21 pages
Query DSL in Elasticsearch: Narayan Kumar Software Consultant Knoldus Software LLP
No ratings yet
Query DSL in Elasticsearch: Narayan Kumar Software Consultant Knoldus Software LLP
22 pages
Guidlines For Approved Supervisors
No ratings yet
Guidlines For Approved Supervisors
5 pages
Did It Make The News?
No ratings yet
Did It Make The News?
6 pages
Alfresco CMIS Webinar Final
No ratings yet
Alfresco CMIS Webinar Final
23 pages
CSE Networking Lab Report
No ratings yet
CSE Networking Lab Report
12 pages
Elasticsearch Basics for Beginners
No ratings yet
Elasticsearch Basics for Beginners
44 pages
Elastic Stack 7
No ratings yet
Elastic Stack 7
280 pages
FULLTEXT01
No ratings yet
FULLTEXT01
32 pages
2020 Course Guide Grade 8 Acacia PDF
No ratings yet
2020 Course Guide Grade 8 Acacia PDF
6 pages
Blue Prism 7.0 - Release Notes
No ratings yet
Blue Prism 7.0 - Release Notes
28 pages
What Is Elasticsearch
No ratings yet
What Is Elasticsearch
63 pages
Ir Chapter Three
No ratings yet
Ir Chapter Three
41 pages
Text
No ratings yet
Text
5 pages
20 ElasticSearch
No ratings yet
20 ElasticSearch
62 pages
Networking
No ratings yet
Networking
51 pages
Explore California Alpha Test Plan
No ratings yet
Explore California Alpha Test Plan
10 pages
1.elasticsearch Introduction Slides
No ratings yet
1.elasticsearch Introduction Slides
106 pages
ELK Stack Explanation & Configuration
No ratings yet
ELK Stack Explanation & Configuration
24 pages
How To Use PowerShell To Manage Windows Updates
No ratings yet
How To Use PowerShell To Manage Windows Updates
6 pages
Searching and Indexing
No ratings yet
Searching and Indexing
21 pages
IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
Meraki Go GR12 Installation Guide
No ratings yet
Meraki Go GR12 Installation Guide
16 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Darknet Traffic Analysis A Systematic Literature Review
No ratings yet
Darknet Traffic Analysis A Systematic Literature Review
30 pages
Elastic Search
No ratings yet
Elastic Search
12 pages
DM3000 User Manual
No ratings yet
DM3000 User Manual
103 pages
Digital Marketing Study Guide
No ratings yet
Digital Marketing Study Guide
9 pages
Elasticsearch Python Slides
No ratings yet
Elasticsearch Python Slides
173 pages
DHCP Server Lab
No ratings yet
DHCP Server Lab
6 pages
Networking Commands Guide
No ratings yet
Networking Commands Guide
5 pages
Chapter - 6 - Searching and Indexing
No ratings yet
Chapter - 6 - Searching and Indexing
44 pages
Resume Fahim
No ratings yet
Resume Fahim
1 page
Microsoft 365 E5
No ratings yet
Microsoft 365 E5
1 page
Digital Forensics Case Study
No ratings yet
Digital Forensics Case Study
56 pages
Free Multiplication Printable Books
No ratings yet
Free Multiplication Printable Books
13 pages
Query Languages
No ratings yet
Query Languages
54 pages
Microsemi PSX FW Rel Notes 309431
No ratings yet
Microsemi PSX FW Rel Notes 309431
19 pages
L05
No ratings yet
L05
33 pages
iR-ADV DX C5840i - Datasheet + MyHijau + Sirim
No ratings yet
iR-ADV DX C5840i - Datasheet + MyHijau + Sirim
9 pages
AXOS R24 2 0 - ReleaseNotes - E7 2 - E3 2
No ratings yet
AXOS R24 2 0 - ReleaseNotes - E7 2 - E3 2
27 pages
Couchbase Server 45 New WP
No ratings yet
Couchbase Server 45 New WP
7 pages
GC 2025 03 02
No ratings yet
GC 2025 03 02
40 pages
Design Elastic Search 2
No ratings yet
Design Elastic Search 2
11 pages
Apache Lucene
No ratings yet
Apache Lucene
19 pages
Logo 345 1649916914 Elasticsearch-Introductions
No ratings yet
Logo 345 1649916914 Elasticsearch-Introductions
86 pages
Logo 345 1649916949 Elasticsearch Introduction With Rubyonrails
No ratings yet
Logo 345 1649916949 Elasticsearch Introduction With Rubyonrails
60 pages
Ai Cyber Chain
No ratings yet
Ai Cyber Chain
19 pages
3.executing Search Requests Using Elasticsearch Query DSL Slides
No ratings yet
3.executing Search Requests Using Elasticsearch Query DSL Slides
71 pages
09 Indexes2
No ratings yet
09 Indexes2
5 pages
Lecture 03
No ratings yet
Lecture 03
53 pages
En 17412-1-2020 - Level of Information Need - Part 1 - Concepts and Principles
No ratings yet
En 17412-1-2020 - Level of Information Need - Part 1 - Concepts and Principles
27 pages
Red Team Recon
No ratings yet
Red Team Recon
6 pages