KEMBAR78
Chapter 12 Elasticsearch - Distributed Search Engine | PDF | Apache Hadoop | Computer Cluster
0% found this document useful (0 votes)
16 views30 pages

Chapter 12 Elasticsearch - Distributed Search Engine

Uploaded by

mazlout hanadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views30 pages

Chapter 12 Elasticsearch - Distributed Search Engine

Uploaded by

mazlout hanadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Chapter 12 Elasticsearch - Distributed

Search Engine
Foreword

 When we search for a movie or a book we like, a commodity on the e-


commerce website, or a resume, or position on a recruitment website, we
will use a search engine. In real life, Elasticsearch is often firstly brought up
when talking about the search function during project development.
 In recent years, Elasticsearch has developed rapidly and surpassed its
original role as a search engine. It has added the features of data
aggregation analysis and visualization. If you need to locate desired
content using keywords in millions of documents, Elasticsearch is the best
choice.
1 Huawei Confidential
Objectives

 Upon completion of this course, you will be able to:


 Know basic functions and concepts of Elasticsearch.
 Master application scenarios of Elasticsearch.
 Understand the system architecture of Elasticsearch.
 Know the key features of Elasticsearch.

2 Huawei Confidential
Contents

1. Elasticsearch Overview

2. Elasticsearch System Architecture

3. Elasticsearch Key Features

3 Huawei Confidential
Elasticsearch Overview
 Elasticsearch is a high-performance Lucene-based full-text search service. It is a
distributed RESTful search and data analysis engine and can also be used as a
NoSQL database.
 Lucene extension
 Seamless switchover between the prototype environment and production
environment
 Horizontal scaling
 Support for structured and non-structured data

4 Huawei Confidential
Elasticsearch Features

High
Scalability Relevance Reliability
performance

 The search results can  Horizontal scaling is  Searches results are  Faults are
be obtained supported. sorted based on automatically detected,
immediately, and the Elasticsearch can run elements (from word hardware faults such
inverted index for full- on hundreds or frequency or proximate as and network
text search is thousands of servers. cause to popularity). segmentation, ensuring
implemented. The prototype the security and
environment and availability of your
production cluster (and data).
environment can be
seamlessly switched.

5 Huawei Confidential
Elasticsearch Application Scenarios
 Elasticsearch is used for log search and analysis, spatiotemporal search, time
sequence search, and intelligent search.
 Complex data types: Structured data, semi-structured data, and unstructured data
need to be queried. ElasticSearch can perform a series of operations such as
cleansing, word segmentation, and inverted index creation on the preceding data
types, and then provide the full-text search capability.
 Diversified search criteria: Full-text search criteria contain words or phrases.
 Write and read: The written data can be searched in real time.

6 Huawei Confidential
Elasticsearch Ecosystem

User access
layer
 ELK/ELKB provides a
complete set of solutions.

Plugin extension layer


They are open-source
Data persistence
and analysis layer software and work
together to meet diverse
requirements.

Data access
layer

7 Huawei Confidential
Contents

1. Introduction to Elasticsearch

2. Elasticsearch System Architecture

3. Elasticsearch Key Features

8 Huawei Confidential
Elasticsearch System Architecture
Obtain cluster
information.
ZooKeeper
Client cluster
Perform file indexing and
Update
search operations.
cluster
information.
Cluster
EsMaster EsNode1 ... EsNode9
Replica 0 Replica 0 Replicas Replica 1
Shard 1 Replica 1 Shards Shard 0

Read index files.

Disk Disk ... Disk

9 Huawei Confidential
Elasticsearch Internal Architecture
 Elasticsearch provides RESTful
APIs or APIs related to other Restful Style API Java
languages (such as Java).
Transport
JMX
JMX
 The cluster discovery Thrift HTTP

mechanism is used. Scripting


Discovery
Plugins
Plugins
Module js groovy python
 Script languages are
supported. Index Module Search Module Mapping River
 The underlying layer is based
Distribute Lucene Directory
on Lucene, ensuing absolute
independence of Lucene. Gateway

Local FileSystem Shared FileSystem Hadoop HDFS


 Indexes are stored in local
files, shared files, and HDFS.

10 Huawei Confidential
Basic Concepts of Elasticsearch (1)

Index A logical namespace in Elasticsearch

Used to store different types of documents.


Type It is deleted in Elasticsearch 7.

Document A basic unit that can be indexed

Mapping Used to restrict the field type

12 Huawei Confidential
Basic Concepts of Elasticsearch (2)
Cluster. Each cluster contains multiple nodes, one of which is the master node (the rest
Cluster are slave nodes). The master node can be elected.

EsNode Elasticsearch node. A node is an Elasticsearch instance.

Master node that temporarily manages cluster-level changes, such as creating or


deleting indexes, and adding or removing nodes. A master node does not involve in
EsMaster document-level change or search. When the traffic increases, the master node does not
affect the cluster performance.

Index shard. Elasticsearch splits a complete index into multiple shards and distributes
Shard them on different nodes.

Index replica. Elasticsearch allows you to set multiple replicas for an index. Replicas can
improve the fault tolerance of the system. When a shard on a node is damaged or lost,
Replica the data can be recovered from the replica. In addition, replicas can improve the search
efficiency of Elasticsearch by automatically balancing the load of search requests.

13 Huawei Confidential
Basic Concepts of Elasticsearch (3)

Data recovery or re-distribution. When a node is added to or deleted from the cluster,
Recovery Elasticsearch redistributes shards based on the load of the node. When a failed node is
restarted, data will be recovered.

Mode for storing Elasticsearch index snapshots. By default, Elasticsearch stores indexes in the
memory and only makes them persistent on the local hard disk when the memory is full. The
Gateway gateway stores index snapshots. When an Elasticsearch cluster is disabled and restarted, the
cluster reads the index backup data from the gateway. Elasticsearch supports multiple types
of gateways, including the default local file system, distributed file system, Hadoop HDFS,
and Amazon S3.

Interaction mode between an Elasticsearch internal node or cluster and the client. By default,
internal nodes use the TCP protocol for interaction. In addition, such transmission protocols
Transport (integrated using plugins) as the HTTP (JSON format), Thrift, Servlet, Memcached, and
ZeroMQ are also supported.

14 Huawei Confidential
Contents

1. Elasticsearch Overview

2. Elasticsearch System Architecture

3. Elasticsearch Key Features

15 Huawei Confidential
ElasticSearch Inverted Index
 Forward index: Values are
searched for based on keys. Jerry Doc1 ...

That is, specific information


that meets the search criteria is
Likes Doc1 Doc2 ...
located based on keys.
 Inverted index: It searches for
Watching Doc1 Doc2 ...
the key based on the value. In
the full-text search, a value is
the keyword to be searched for. Doc1
Frozen ...
Corresponding documents are
located based on the value.
... ... ... ...

17 Huawei Confidential
Elasticsearch Access APIs
 Elasticsearch can initiate RESTful requests to operate data. The request methods include GET, POST, PUT,
DELETE, and HEAD, which allow you to add, delete, modify, and query documents and indexes.

1. View the cluster health status. 3. Delete multiple indexes.


GET /_cat/health?v&pretty
2. Create a small index with only one primary shard and no DELETE /index_one,index_two
replicas. 4. Add documents by automatically generated IDs.
POST ip:9200/person/man
{
PUT /my_temp_index "name":"111",
{ "age":11
"settings": { }
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
 For example, access Elasticsearch using the cURL client.
Obtain the cluster health status.
curl -XGET "http://ip:httpport/_cluster/health?pretty"

18 Huawei Confidential
Elasticsearch Routing Algorithm
 Elasticsearch provides two routing algorithms:
 Default route: shard=hash (routing)%number_of_primary_shards. In this routing
policy, the number of received shards is limited. During capacity expansion, the
number of shards needs to be multiplied (Elasticsearch 6.x). In addition, when
creating an index, you need to specify the capacity to be expanded in the future.
Elasticsearch 5.x does not support capacity expansion. Elasticsearch 7.x supports
expansion freely.
 Custom route: In this routing mode, the routing can be specified to determine the
shard to which a document is written, or search for a specified shard.

19 Huawei Confidential
Elasticsearch Balancing Algorithm
 Elasticsearch provides the automatic balancing function.
 Application scenarios: capacity expansion, capacity reduction, and data import
 The algorithms are as follows:
 weight_index(node, index) = indexBalance * (node.numShards(index) -
avgShardsPerNode(index))
 Weight_node(node, index) = shardBalance * (node.numShards() -
avgShardsPerNode)
 weight(node, index) = weight_index(node, index) + weight_node(node, index)

20 Huawei Confidential
Elasticsearch Capacity Expansion
 Scenarios:
 High physical resource consumption such as high CPU and memory usage of
Elasticsearch service nodes, and insufficient disk space
 Excessive index data volume for one Elasticsearch instance, such as 1 billion data
records or 1 TB data
 Capacity expansion mode:
 Add EsNode instances.
 Add nodes with EsNode instances.
 After capacity expansion, use the automatic balancing policy.

21 Huawei Confidential
Elasticsearch Capacity Reduction
 Scenarios:
 OS reinstallation on nodes required
 Reduced amount of cluster data
 Out-of-service

 Capacity reduction mode:


 Delete an Elasticsearch instance on the Cloud Search Service (CSS) console.

 Precautions:
 Ensure that replicas in the shard of the instance to be deleted exist in another instance.
 Ensure that data in the shard of the instance to be deleted has been migrated to another
node.

22 Huawei Confidential
Elasticsearch Indexing HBase Data
 When Elasticsearch indexes the HBase data,
job
the HBase data is written to HDFS and HBase2ES YARN
summit
Elasticsearch creates the corresponding Node Resource
Client Manager Manager
HBase index data. The index ID is mapped to
1.read
the rowkey of the HBase data, which 2.scan
3.write
ensures the unique mapping between each
index data record and HBase data and HBase Elasticsearch
Region
implements full-text search of the HBase HMaster EsNode 1 EsNode N
Server
data.
 Batch indexing: For data that already exist in
HDFS
HBase, an MR task is submitted to read all
data in HBase, and then indexes are created NameNode DataNode

in Elasticsearch.
23 Huawei Confidential
Elasticsearch Multi-instance Deployment on a Node
 Multiple Elasticsearch instances can be deployed on one node, and differentiated from
each other based on the IP address and port number. This method increases the usage
of the single-node CPU, memory, and disk, and improves the indexing and search
capability of Elasticsearch.

Node1 Node2 NodeN

EsMaster EsMaster

EsNode1 EsNode1 EsNode1

... ... ...

EsNode5 EsNode5 EsNode5

24 Huawei Confidential
Elasticsearch Cross-node Replica Allocation Policy
 When multiple instances are deployed on a single node with multiple replicas, if replicas
can only be allocated across instances, a single-point failure may occur. To solve this
problem, configure parameter cluster.routing.allocation.same_shard.host to true.

Node1 Node1 Node1

EsNode1 EsNode1
EsNode1
coll_shard_replica1 coll_shard_replica1

EsNode2 EsNode2
EsNode2
coll_shard_replica2 coll_shard_replica2

EsNodeN EsNodeN EsNodeN

25 Huawei Confidential
New Features of Elasticsearch
 HBase full-text indexing
 After the HBase table and Elasticsearch indexes are mapped, indexes and raw data
can be stored in Elasticsearch and HBase, respectively. The HBase2ES tool is used for
offline indexing.
 Encryption and authentication
 Encryption and authentication are supported for a user to access Elasticsearch
through a security cluster.

26 Huawei Confidential
Quiz

1. What is the basic unit that can be indexed in Elasticsearch?( )

2. Which type of data can be indexed using Elasticsearch? ( )


A. Structured data

B. Unstructured data

C. Semi-structured data

D. All of the above

27 Huawei Confidential
Quiz

3. Which of the following open-source software is used to develop Elasticsearch? ( )


A. MySQL

B. MongoDB

C. Memcached

D. Lucence

28 Huawei Confidential
Summary

 This chapter describes the basic concepts, functions, application scenarios,


architecture, and key features of Elasticsearch. Understanding the key
concepts and features of ElasticSearch allows you to better develop and
use components.

29 Huawei Confidential
Recommendations

 Huawei Cloud Official Web Link:


 https://www.huaweicloud.com/intl/en-us/
 Huawei MRS Documentation:
 https://www.huaweicloud.com/intl/en-us/product/mrs.html
 Huawei TALENT ONLINE:
 https://e.huawei.com/en/talent/#/

30 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright© 2020 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.

You might also like