Elasticsearch, a quick
intro
Ahmed El Taweel
@iAhmedeltaweel
[email protected]
Problem, Search!!
Why search is hard?
● Volume
● Complexity
● Diversity
● Search queries made wrong :D
Database Search
● Full scan
○ Slow
○ Complex
○ Slow, Slow, slow
● Full Text search ????
○ Works, but!
■ Auto complete / correct
Inverted index
explained!
Theory ES uses on inverted index algorithm to do
lockups
● Term dictionary
● Postings list
● Term vector
Diagram reference: here
Tokenization 101
Text Analysis
Tokenization Normalization
breaking a text down into smaller chunks the quick brown fox jumps
mostly words. ● ‘Quick’ can be lowercase: ‘quick’.
“Hello world from Ahmed” => [hello, world, ● ‘foxes’ can be stemmed, or reduced
from, ahmed] to its root word: ‘fox’.
● ‘jump’ and ‘leap’ are synonyms and
can be indexed as a single word:
‘jump’.
Diagram reference: here
Elasticsearch,
Really!
What
● 13 Years old. Apache Lucene. Java based.
● It provides a distributed, multitenant-capable.
● HTTP web interface. JSON documents.
● Commonly used for:
○ log analytics.
○ Full-text search.
○ Operational intelligence use cases with Kibana.
Relational DB Elasticsearch
DB server ES node
Table Index
Table Schema Mapping
Row Document
Field Column
Diagram reference here
Take care
“There ain't no such thing as a free lunch”
● Complexity
● Resource-intensive
● Data loss risk
● Query optimization
● Security
● Version compatibility
Near real-time ~1sec
Document Journey
Indexing
Diagram reference: here and here
Searching
Diagram reference: here
API Convention
The Elasticsearch APIs uses JSON
over HTTP.
API Types
Document APIs Single & multi-document API
Search APIs Search across all indices in ES
Aggregation API Aggregation for searched data
Index APIs Operation at the index level.
Cluster APIs Operation at the cluster level.
API Convention
check the cluster health >>> GET -> /_cat/health?v
List all nodes in cluster >>> GET -> /_cat/nodes?v
List all indexes >>> GET -> /_cat/indices?v
Create Index >>> PUT -> /customer?pretty
Index a document with id >>> PUT -> /customer/1?pretty
{"name": "John Doe"}
Index document without id >>> POST -> /customer?pretty { ... }
Retrieve a document by id >>> GET -> /customer/1?pretty
Search documents >>> GET /my_index/_search { … }
Delete an index >>> DELETE -> /customer?pretty
Demo
Materials: https://github.com/ahmedeltaweel/elasticsearch-session
Testing
Testing
● Query
○ Accuracy
■ Edge cases
○ Performance
■ Metrics
● Data
○ Consistency
○ Mapping
Q&A