When SQL is not Enough
…it comes Elasticsearch
 About me
  Project Manager @
     13 years professional experience
     .NET Web Development MCPD
     SQL Server 2012 (MCSA)
  External Expert Horizon 2020
  Business Interests
     Web Development, SOA, Integration
     Security & Performance Optimization
  Contact
       ivelin.andreev@icb.bg
       www.linkedin.com/in/ivelin
       www.slideshare.net/ivoandreev
2 |
Agenda
   What
   Why
   Jump start
   Analysis in depth
   Side by side with SQL
   Demo
What is ES
 Powerful real-time search and analytics engine
“…It has a very advanced distributed model, speaks JSON
natively, and exposes many advanced search features,
all seamlessly expressed through JSON DSL…”
                  Shay Banon – Creator, Founder, CTO
 What else…
      Document-oriented
      Sophisticated RESTful API
      Entirely open source
      Based on Apache Lucene
      Requires JAVA
Popularity (All DB Engines)
                   All DB Engines Ranking
Popularity (Search Engines)
Who Uses ES
“You don’t learn walk by following
rules. You learn by doing”
                    (Richard Branson)
                           First Steps in Elasticsearch
Terms
   ElasticSearch          RDBMS
   Index                  Database
   Type                   Table
   Field                  Column
   Document               Row
 Scaling
   Cluster; Node; Shard (Primary/ Replica)
RESTful APIs
 Document APIs                                        POST /[index]/[type] {
                                                           “…”,”…” }
   Index, Get, Update, Delete                         GET /[index]/[type]/[ID] { }
   Bulk API available                                 PUT /[index]/[type]/[ID] {
                                                           “…”,”…” }
 Search APIs                                          DELETE /[index]/[type]/[ID]
   Send/Receive JSON
   Basic queries via query string
  http://localhost:9200/{indexName}/{type}/_search?q=searchstr&size=100
  http://localhost:9200/{index1,index2}/{type}/_search?q=createdby:ivo
  http://localhost:9200/_search?q=tag:spam
Query DSL
 Entire JSON object is the Query DSL
 Query
    Full text queries
    Results ordered by relevance
    Every field is searchable
 Filter
    Binary – either a field matches or it does not
 Filters and queries can be nested
    Nesting passes relevance to parents
Query - for full-text search or for any condition
  that should affect the relevance score
                           Filter – for everything else
How To (Filters)
 ES provides 27 filters (Sep 2015)
 Term/Terms filter
   { "term": { "date": "2015-10-10" }}
 Range filter
   {"range": {"age": {"gte":20, "lt":30}}}
 Exists/Missing filter
   {"exists": {"field": "title"}}
 Bool filter
   {"bool": {
       "must": { "term": { "folder": "inbox" }},
       "must_not": { "term": { "tag": "spam" }}
       "should": [{ "term": { "starred": true }}, { "term": { "unread": true }}]
   }}
How To (Queries)
 ES provides 38 queries (Sep 2015)
 match query
  { "match": { "tweet": "About Search" }
 multi_match query
  { "multi_match": {
       "query": "full text search",
       "fields": [ "title", "body" ] }}
 bool query
  { "bool": {
    "must":         { "match": { "title": "how to make millions" }},
    "must_not": { "match": { "tag": "spam" }},
    "should": [
           { "match": { "tag": "starred" }},
           { "range": { "date": { "gte": "2014-01-01" }}}
       ]}}
 fuzzy query
Any index search solution is way better than “LIKE”
How does SQL Full-text Index Work
 Column-level language
   Used by stemmers and tokenizers
   Different columns for different languages
   Language tags are respected (XML, binary)
 Stop words
  ALTER FULLTEXT STOPLIST ProductSL
  ADD ‘blah' LANGUAGE 1033;
 Thesaurus files
   (i.e. “song”->”tune”)
Inverted Index
ES Analysis Process
 Character filters
   Simplify data (“&” -> “and”, “ü” -> “u”)
 Tokenizers
   Split data into words (terms, tokens)
 Token filters
   Lowercase
   Remove words w/o relevance impact (“a”, “the”)
   Synonyms added
 Stemming
   Reduce to root form (“dogs” -> “dog”)
Analyzers
 FT fields are analyzed into terms to create inverted index
 Configured when index is created
"Set the shape to semi-transparent by calling set_trans(5)"
Analyzer Type      Example
    Whitespace Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
 Standard (Def.) set, the, shape, to, semi, transparent, by, calling, set_trans, 5
         Simple set, the, shape, to, semi, transparent, by, calling, set, trans
            Stop set, the, shape, to, semi, transparent, by, calling, set, trans
  Language (EN) set, shape, semi, transparent, calling, set_trans, 5
         Pattern “nonword”:{ “type”: “pattern”, “pattern”:”[^\\w]+” }
        Custom Allows combination of Tokenizer[1:1] and TokenFilters[0:N]
Security Remarks
 RAM is Important
   Data structures reside in-memory
   Performance and reliability depend on it
              • Be Aware
                 •   No authentication!
                 •   Protect private data alone
                 •   Prevent expensive requests (DoS)
                 •   Protect http://localhost:9200
Side by Side
                         ElasticSearch   SQL Full-text Search
        Performance       RAM mainly        Disk I/O mainly
           Licensing     Open Source         Commercial
            Platform      Any (Java)        Windows Only
           Wildcards         Yes                Partly
         FTS Syntax          Rich               Basic
        Extensibility       Plugins      CLR or custom code
           Scale Out         Yes                  No
  Relational Integrity        No                 Yes
             Security         No                 Yes
    FT Search Setup         Manual              Wizard
        Index Update        Manual               Auto
From SQL to Elasticsearch
 Rivers (deprecated)
 Logstash
   Open source log management tool
 Client libraries
   .NET
      Elasticsearch.Net
      Nest
   Also Java, JS, Perl, Python, Ruby, PHP
Summary
   Not a replacement of RDBMS
   Real-time search applications
   Built for scalability
   Easy to install
   RESTful API and JSON
Deployment (Windows)
 Install Java 
 Download ES zip
 Install
   [ESHome]/bin> service install
 Set ES service to start automatically
   [ESHome]/bin> service manager
 Open in browser http://localhost:9200/
 Plugin Install
   [ESHome]/bin> plugin -i elasticsearch/marvel/latest
   Restart ES
Takeaways
 Tools
     Kopf: https://github.com/lmenezes/elasticsearch-kopf
     Marvel: https://www.elastic.co/products/marvel
     Curl: http://curl.haxx.se/download.html
     JDBC Driver: http://www.java2s.com/Code/Jar/s/Downloadsqljdbc430jar.htm
 Community
   https://discuss.elastic.co
 Getting Started
   http://joelabrahamsson.com/elasticsearch-101/
Sponsors