Elastic Stack Overview
The world’s most popular enterprise open source products
for real-time search, logging, analytics, and more
Agenda
• Elastic Stack Overview
• Architecture
• Demos: Logging, Search
• Logstash & Beats
• Elasticsearch
✦ The Distributed Model
✦ Text Analysis
✦ Search
✦ Aggregation
• Kibana
Once upon a time …
• As any good story begins, “Once up on a time...”
✦ More precisely: in 1999, Doug Cutting created an
open-source project called Lucene
• Lucene is:
✦ a search engine library entirely written in Java
✦ a top-level Apache project, as of 2005
✦ great for full-text search
• But, Lucene is also:
✦ a library (you have to incorporate it into your
application)
✦ challenging to use
✦ not originally designed for scaling
The Birth of Elasticsearch
• In 2004, Shay Banon developed a product called Compass
✦ Built on top of Lucene, Shay’s goal was to have search
integrated into Java applications as simply as possible
• The need for scalability became a top priority
• In 2010, Shay completely rewrote Compass with two main
objectives:
1. distributed from the ground up in its design
2. easily used by any programming language
• He called it Elasticsearch ... and we all lived happily ever after!
• Today, Elasticsearch is the most popular enterprise search
engine
85,000+ 100M+ 3,000+
Community Product Subscription
Members Downloads Customers
Statistics since 2012, founding of Elastic
7
Who is using Elasticsearch?
Tech
Finance
Telco
Consumer
Enterprise Customers in Every Industry
9
“Improving patient “Combating our global “Mining 3-4 billion “Many use cases from
care with real-time human trafficking events per day to trade optimization to
clinical decision problem.” ensure security compliance to HR
making.” intelligence.” recruiting.”
Solving Problems Beyond ‘Search’
10
Security
Alerting
Monitoring
X-Pack Reporting
Single install
Extensions for the Elastic Stack Graph
Subscription pricing
Machine Learning
12
Elastic Cloud
Hosted Elasticsearch & Kibana
Includes X-Pack features
Available in AWS today
Available in Google Cloud Platform (Beta)
Available as a private cloud/on-premise solution
(Elastic Cloud Enterprise)
13
Enterprise Deployment Architecture
Beats Elasticsearch
Master Nodes (3) Custom UI
Log Files Metrics
Logstash
Ingest Nodes (X) Kibana
Wire Data your{beat}
Data Nodes – Hot (X)
Kafka
Instances (X)
Datastore Web APIs
Redis
Data Notes – Warm (X)
Messaging
Nodes (X)
Sensors
Queue
Social
X-Pack X-Pack
LDAP AD SSO
ES-Hadoop
Hadoop Ecosystem Authentication Notification
Elastic Stack X-Pack Elastic Cloud
Application Search Log Analytics Security Analytics
Metrics Analytics Business Analytics Many more …
Solving many diverse & complex use cases
Demo:
Apache Logging
Logstash
Data processing pipeline
Ingest data of all shapes, Parse and dynamically Transport data to any
sizes, and sources transform data output
Secure and encrypt data Build your own pipeline More than 200+ plugins
inputs
Parsing Logs Using Logstash
Logstash Configuration Example – Apache Access Logs
input {
file {
path => "/Users/aquan/Desktop/JUG/demo/access_log"
start_position => "beginning"
}
}
filter {
if [path] =~ "access" {
mutate { replace => { "type" => "apache_access" } }
grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }
geoip { source => "clientip" }
}
date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] }
}
output { elasticsearch { hosts => ["localhost:9200"] } }
Logstash Configuration Example - Spring Boot Logs
filter {
# If log line contains tab character followed by 'at' then we will tag that entry as stacktrace
if [message] =~ "\tat" { grok { match => ["message", "^(\tat)"] add_tag => ["stacktrace"] } }
# Grokking Spring Boot's default log format
grok {
match => [ "message",
"(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME})
%{LOGLEVEL:level} %{NUMBER:pid} --- \[(?<thread>[A-Za-z0-9-]+)\] [A-Za-z0-9.]*\.(?<class>[A-Za-z0-
9#_]+)\s*:\s+(?<logmessage>.*)"
]}
# Parsing out timestamps which are in timestamp field
date { match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ] } }
Beats
Lightweight data shippers
Ship data from the source Ship and centralize in Ship to Logstash for
Elasticsearch transformation and parsing
Ship to Elastic Cloud Libbeat: API framework to 30+ community Beats
build custom beats
FILEBEAT METRICBEAT PACKETBEAT WINGLOGBEAT
Log Files Metrics Network Data Window Events
More than 30 community Beats Apachebeat, dockbeat, httpbeat,
and growing … mysqlbeat, nginxbeat, redis beats,
twitterbeat, and more
Elasticsearch
Heart of the Elastic Stack
Distributed, Scalable High-availability Multi-tenancy
Developer Friendly Real-time, Full-text Search Aggregations
Clusters, Nodes and Indices
Cluster my_cluster
Server 1
Node A
d1
d3 d6
d2 d1
d4 d7 1
d9 d8
d5
d12 d3 d6
d10
d1
d2
Index twitter d4
d5
Index logs
Split Indices into Shards
Cluster my_cluster
Server 1
Node A
d1
d3 d6
d2 d1
d4 d7 1
d9 d8
d5
d12 d3 d6
d10
d1
d2
Index twitter d4
d5
Index logs
Distribute Shards over Multiple Nodes
Cluster my_cluster
Server 2 Server 1
twitter shard 1 Node B Node A
d1
d6 d3
d2 d1 twitter
twitter shard 4 d4 d7 1
shard 0
d9 d8
d5
d3 d6
d12
d10 twitter
d2
d1
shard 2 d4
twitter logs d5
shard 3 shard 0 logs shard 1
CRUD
Text Analysis
Inverted Index
Most think of search as…
SEARCH
Multilingual
Full Text Search
Stemming
Type ahead
Mobile
Time Range
Geo search
Influenced by Rating
Personalized Ranking
Search
Pagination
Time range Filter
Numeric Filter
Geo range Filter
Stemming /
Highlighting
Demo:
e-Commerce Search
Search – Finding the Needles in the Haystack
• Relevancy – scoring of a document basedon how closely it matches the query
✦ TF (term frequency): The more a term appears in a field, the more important it is
✦ IDF (inverse document frequency): The more documents that contain the term, the
less important the term is
✦ Field length: shorter fields are more likely to be relevant than longer fields
• Structured Search
• Full-Text Search
Structured Search
• Answer is always “Yes” or “No”
• Does not worry about document relevance or scoring
• Filters – very very fast, easily cached, no relevance, use as often as you can
✦ Term Filter, Terms Filter – numbers, Booleans, dates, and text
✦ Bool Filter (compound filter) – must, must_not, should
✦ Range Filter – number, date (date math), string
✦ Exists Filter
✦ Missing Filter
• Filter Order – Important for performance
✦ More specific filters should be placed before less-specific filters
Full-Text Search
• Relevance
✦ The match Query
✦ Multiword Queries – Precision control
๏ Operator: and, or
๏ minimum_should_match
✦ Bool Query - Combining Queries
✦ Boosting Query – boost parameter
• Multi-field Search
✦ The multi_match Query
✦ Types: Best, Most, Cross
✦ Boosting Individual Fields - ^
Proximity Matching – Phase Matching
• Search for “sue alligator”
✦ Sue ate the alligator
✦ The alligator ate Sue
✦ Sue never goes anywhere without her alligator-skin purse
• The match_phrase Query
✦ Find words that are near each other – “quick fox”
✦ Closer is better
✦ Flexibility - slop
Partial Matching
• The prefix Query
• Wildcard and regexp Queries
• Completion Suggester
✦ Query-Time Search-as-You-Type
๏ match_phrase_prefix – “johnnie walker bl”
- slop
- max_expensions
✦ Index Time Search-as-You-Type – edge n-grams
๏ “quick” à q, qu, qui, quic, quick
๏ Storage vs. perfromance
Dealing with Human Language
• Language Analyzers - Many
✦ Tokenize text into individual words – Think about Chinese, no space
✦ Lowercase tokens
✦ Remove stopwords – a, an, and, are, as, at, be, but, for, if, into …
✦ Stem tokens to their root form – foxes à fox
• Synonyms – jump, leap, and hop
• Dictionary
• Typos and misspellings – Fuzzy Query
Real-time Reporting & Analytics - Aggregation
• Aggregations are a way to perform analytics on your indexed data
✦ Combination of buckets and metrics
✦ Buckets – Collection of document that meet a criteria
✦ Metrics – Statistics calculated on the documents in the bucket
• Example: Average salary per <country, gender, age> combination, in one
request with one pass over the data!
✦ Partition documents by country (bucket)
✦ Partition each country by gender (bucket)
✦ Partition each gender bucket by age ranges (bucket)
✦ Calculate the average salary for each age range (metric)
Aggregations: Count by Country
GET /person/person/_search?search_type=count
{
"aggs": {
"by_country": {
"terms": {
"field": "address.country" { ..., "aggregations" : {
} "by_country" : {
} "buckets" : [ {
} "key" : "England",
} "doc_count" : 30051
England }, {
Germany "key" : "Germany",
France "doc_count" : 30004
17% Spain }, {
33% "key" : "France",
17% "doc_count" : 15034
}, {
"key" : "Spain",
33% "doc_count" : 14912
} ]}}}
A lot more …
Elasticsearch Clients
• Java API
• Java REST Client
• JavaScript API
• Groovy API
• .Net API
• PHP API
• Perl API
• Python API
• Ruby API
• Community Contributed Clients: B4J, Clojure, Erlang, Go, Groovy, Haskell, Java,
JavaScript, kotlin, Lua, .Net, Ocaml, Perl, PHP, Python, R, Ruby, Rust, Scala, Smalltalk,
Vert.x
Kibana
Window into the Elastic Stack
Visualize and analyze Geospatial Customize and Share
Reports
Graph Exploration UX to secure and manage Build Custom Apps
the Elastic Stack
47
Become an Elastic Pioneer
1 Download 6.0 preview release
2 Provide feedback via GitHub or Discuss forum
3 Get limited edition Pioneer swag
Elastic Pioneer Program
We want your feedback!
1 Download 6.0 preview release (alpha, beta, etc)
2 Provide feedback via GitHub or Discuss forum
3 Get limited edition Pioneer swag
THANK YOU
@elastic
www.elastic.co