Elasticsearch, a quick
intro
Ahmed El Taweel
@iAhmedeltaweel
ahmed.m.eltaweel@gmail.com
Problem, Search!!
Why search is hard?
 ●   Volume
 ●   Complexity
 ●   Diversity
 ●   Search queries made wrong :D
Database Search
●   Full scan
     ○    Slow
     ○    Complex
     ○    Slow, Slow, slow
●   Full Text search ????
     ○    Works, but!
           ■    Auto complete / correct
Inverted index
explained!
                   Theory   ES uses on inverted index algorithm to do
                            lockups
 ●   Term dictionary
 ●   Postings list
 ●   Term vector
Diagram reference: here
Tokenization 101
               Text Analysis
                          Tokenization                                 Normalization
                          breaking a text down into smaller chunks     the quick brown fox jumps
                          mostly words.                                  ●    ‘Quick’ can be lowercase: ‘quick’.
                          “Hello world from Ahmed” => [hello, world,     ●    ‘foxes’ can be stemmed, or reduced
                          from, ahmed]                                        to its root word: ‘fox’.
                                                                         ●    ‘jump’ and ‘leap’ are synonyms and
                                                                              can be indexed as a single word:
                                                                              ‘jump’.
Diagram reference: here
Elasticsearch,
Really!
What
●   13 Years old. Apache Lucene. Java based.
●   It provides a distributed, multitenant-capable.
●    HTTP web interface. JSON documents.
●   Commonly used for:
      ○    log analytics.
      ○    Full-text search.
      ○ Operational intelligence use cases with Kibana.
 Relational DB   Elasticsearch
  DB server        ES node
    Table           Index
Table Schema       Mapping
     Row          Document
     Field         Column
Diagram reference here
 Take care
“There ain't no such thing as a free lunch”
 ●    Complexity
 ●    Resource-intensive
 ●    Data loss risk
 ●    Query optimization
 ●    Security
 ●    Version compatibility
Near real-time ~1sec
Document Journey
               Indexing
Diagram reference: here and here
               Searching
Diagram reference: here
API Convention
The Elasticsearch APIs uses JSON
over HTTP.
    API Types
Document APIs     Single & multi-document API
Search APIs       Search across all indices in ES
Aggregation API   Aggregation for searched data
Index APIs        Operation at the index level.
Cluster APIs      Operation at the cluster level.
API Convention
check the cluster health >>> GET -> /_cat/health?v
List all nodes in cluster >>> GET -> /_cat/nodes?v
List all indexes >>> GET -> /_cat/indices?v
Create Index >>> PUT -> /customer?pretty
Index a document with id >>> PUT -> /customer/1?pretty
                                {"name": "John Doe"}
Index document without id >>> POST -> /customer?pretty { ... }
Retrieve a document by id >>> GET -> /customer/1?pretty
Search documents >>> GET /my_index/_search { … }
Delete an index >>> DELETE -> /customer?pretty
                   Demo
Materials: https://github.com/ahmedeltaweel/elasticsearch-session
Testing
Testing
●   Query
     ○ Accuracy
          ■ Edge cases
     ○ Performance
          ■ Metrics
●   Data
     ○ Consistency
     ○ Mapping
Q&A