KEMBAR78
Exasol: Tuning-Free Database Guide | PDF | Computer Data Storage | Databases
0% found this document useful (0 votes)
93 views10 pages

Exasol: Tuning-Free Database Guide

Exasol is a tuning-free database that automatically manages resources and data distribution across nodes through intelligent algorithms. It uses a column-based storage model and massively parallel processing (MPP) architecture to optimize query performance. The query optimizer analyzes data and queries to determine optimal execution plans and transparently manages indexing without user intervention.

Uploaded by

Peter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views10 pages

Exasol: Tuning-Free Database Guide

Exasol is a tuning-free database that automatically manages resources and data distribution across nodes through intelligent algorithms. It uses a column-based storage model and massively parallel processing (MPP) architecture to optimize query performance. The query optimizer analyzes data and queries to determine optimal execution plans and transparently manages indexing without user intervention.

Uploaded by

Peter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

ACADEMY

Exasol - Overview and Concepts

1
Base idea: Creating a tuning free database

– Strong cost-based query optimizer


– Automatic and fast table analyzer
(e.g. column selectivity)
– Automatic & transparent index creation and maintenance

– Automatic resource management


– Optimized for mixed workload
– Throughput orientated
– Usage of priorities possible to influence the resource management

– Automatic data distribution, compression, …

2 Exasol – Overview and Concepts ACADEMY

Base idea: Creating a tuning free database

Due to intelligent algorithms, Exasol gets even faster with its usage and not slower, as
conventional DBMS do. At the same time, the amount of administrative work will be
strongly reduced, because the on-going tuning is performed by Exasol itself. Exasol's
performance is independent from chosen schema type. Data distribution is automatic on
the basis of the usage profile. The optimizer analyzes both data and queries and matches
them without the intervention of the DBA. The system automatically creates and
administers the indexes on the fly on the basis of the query analysis.
Strong cost-based query optimizer
From the beginning, the query optimizer was designed to meet the needs of massively
parallel data processing. Its goal is to ensure that every single node can process as much
data locally as possible. This significantly reduces the communication overhead and
contributes greatly to the excellent scalability of Exasol.
The optimizer figures out the optimal join order on the basis of table statistics and
therefore allows optimal processing of multiple table joins. Due to these sophisticated
mechanisms, Exasol achieves optimal performance with any database model.
Indexing
Indexes are also automatically generated, reutilized and discarded by the system as
necessary. The user can not directly influence this, the executed queries serve as
foundation for the choice of index.
Data distribution across nodes
The system distributes each table automatically according its distribution attributes across
all active nodes in the cluster (shared nothing architecture). Tables below a certain size
(typically < 100K rows) will be replicated. This ensures, that table joins over their
distribution attributes can mostly be processed locally.
The user can affect data distribution by setting distribution attributes.
Shared nothing architecture (MPP processing)
SELECT s.SALES_DATE, s.MARKET_ID, sp.ARTICLE_ID
FROM RETAIL.SALES s JOIN RETAIL.SALES_POSITIONS sp
ON s.SALES_ID = sp.SALES_ID WHERE s.MARKET_ID IN (661, 534, 678, 1990);

2014-09-17
2014-09-17 661
661 94346
94346 2014-08-02 534 96673 2014-08-09
2014-08-09 678
678 94447
94447 2014-11-01 1990 96378

2014-09-17 661 93086 2014-04-08 678 94826 2014-12-22 1990 93803 2014-11-01 1990 93447

2014-11-28 534 93000 … … … … … … 2014-06-21 661 94447

… … … … … …

id sales_date market id sales_date market


id sales_date market
7 2014-09-17 661 4 2014-08-02 534 id sales_date market
8 2014-11-01 1990
10 2014-11-28 534 1 2014-04-08 678 5 2014-08-09 678
2 2014-06-21 661
… … id article 1 2014-12-22 1990
id article
1 …
7 93086 id article id article
7 94346

4 96673 8 93803 8 96378
10 93000
11 93803 2 94447 1 94826 5 94447
… … … … … … … …

3 Exasol – Overview and Concepts ACADEMY

Massively Parallel Data Processing

Exasol was developed as a parallel system and is constructed according to the shared-
nothing principle. Data is distributed across all nodes in a cluster. When responding to
queries, all nodes co-operate and special parallel algorithms ensure that most data is
processed locally in each individual node's main memory.
When a query is sent to the system, it is first accepted by the node the client is connected
to. The query is then distributed to all nodes. Intelligent algorithms optimize the query,
determine the best plan of action and generate needed indexes on the fly. The system
then processes the partial results based the local datasets. This processing paradigm is also
known as SPMD (single program multiple data). All cluster nodes operate on an equal basis,
there is no Master Node. The global query result is delivered back to the user through the
original connection.
Column-based storage

– Table values are stored column wise

SALES
SALES_ID SALES_DATE PRICE MARKET_ID

Row 1 1 Becker
2014-04-08 Hans49.91 678
23000

3 2 Weber
2014-06-21 54.65
Peter 661
730000

4 4 Huber
2014-08-02 49.08
Klaus 534
39600

5 5 Schmidt
2014-08-09 80.01
Maria 678
124000

6 7 Schneider
2014-09-17 43.14
Thomas 661
93600

22 10 Fischer
2014-11-28 63.04
Stefan 534
368200

4 Exasol – Overview and Concepts ACADEMY

Column-based storage

Due to Exasol’s specialization on the data warehousing it benefits from the column-based
data storage, reducing the number of IO operations and the overall amount of processed
data. Typically, queries in a data warehouse access only few columns (eg. when joining
tables).
In order to optimize access to the hard disk, columns are partitioned into blocks. This
facilitates maximum throughput and prevents unnecessary data from being imported. Gaps
can occur in the blocks as a result of various operations (e.g. deletions or updates);
however, the system automatically conducts a defragmentation if certain limits are
exceeded.
Compression

– Compression
▪ Faster hard disc access
▪ Less RAM required
SALES
SALES_ID SALES_DATE PRICE MARKET_ID

1 1 …
2014-04-08 49.91
49.91 678678

2 2 …
2014-06-21 54.65
54.65 661661

4 4 …
2014-08-02 49.08
49.08 534534

5 5 …
2014-08-09 80.01
80.01 678678

7 7 …
2014-09-17 43.14
43.14 661661

10 10 …
2014-11-28 63.04
63.04 534534

5 Exasol – Overview and Concepts ACADEMY

Data compression
To optimize RAM utilization, table data is compressed element by element already in main
memory; on basis of the data types and content of the columns, the system automatically
selects a sufficiently effective compression algorithm. Compression is fully transparent to
the user.
Data blocks

– Several values of one column are collected within a block

SALES
SALES_ID SALES_DATE PRICE MARKET_ID

1 2014-04-08 49.91 678

2 2014-06-21 54.65 661


Block 1
4 2014-08-02 49.08 534
Block x
5 2014-08-09 80.01 678

7 2014-09-17 43.14 661


Block 2 2014-11-28 63.04 534
10

6 Exasol – Overview and Concepts ACADEMY

Data blocks

All columns are devided into data blocks to minimize the amount of data loaded or written
to disc. Data that is not needed for a query is not loaded into the RAM.
Blocks may include different numbers of elements, depending on the column type and the
compression algorithm. Due to delete operations holes may occur within blocks.
These holes are automatically refilled by the system.
Data block types

– Three different types of data blocks:


1. Persistent (Data for persistent tables)
2. Temporary (created during query execution)
3. Indexes

– All block types are transparently loaded into RAM on demand


– All block types are treated the same way

7 Exasol – Overview and Concepts ACADEMY

Data block types

There are three different types of data stored blocks:


Persistent
Temporary
Indexes

Persistent: Data for persistent tables


Temporary: Data created during the query execution (aggregates, sorting, …)
Indexes: Data for internal indexes

All these block types are transparently loaded into the main memory on demand.
All the block types are handled in the same way.
In-Memory processing

Query 1 Query 2 Query 3

DB RAM

1 Smith

Virtual Storage

8 Exasol – Overview and Concepts ACADEMY

In-Memory processing

Exasol achieves its high performance as a result of innovative main memory algorithms.
Unlike the hard-disk-based algorithms of traditional solutions, Exasol can specifically
access any single value within nanoseconds. The algorithms that process the queries take
advantage of these characteristics of the main memory and thus enable optimum
performance. Exasol further enhances performance by automatically adjusting the
contents of main memory according to the respective usage profile.
Upon completion of a write operation, data is commited to the local hard disks. Built-in
redundant data distribution also guarantees high database availability.
This method of data processing is fully transparent for the user.
Hardware

– Utilization of commodity hardware - Industry-Standard 19'' Server


– Clustering of a (large) number of low-cost
components
– Free vendor choice:
Dell, HP, IBM, FSC, Oracle (Sun) …

– 2 Hexa/Ten/Twelve Core CPUs


– 16 – 786 GB RAM
– 2 – 24 SAS/SATA HDD
– GBit Ethernet (1GiB, 10GiB)

9 Exasol – Overview and Concepts ACADEMY

Hardware

Exasol is implemented to work with low-cost commodity hardware. Exasol typically


operates on a cluster of powerful 19'' Intel servers.
Typically each server will be configured as follows:
•2 Intel Xeon CPUs each with 8 up to 12 cores,
•16 to 786 GB RAM and
•2 to 24 SATA or SAS hard disks.
Network connectivity is typically based on standard GBit Ethernet.
Such servers can be delivered by nearly every hardware vendor.
Exasol Logical Limits

– Maximum number of schema objects within a database (tables, views, functions, scripts):
– 250,000
– Maximum number of columns per table:
– 10,000
– Identifier length:
– 128 Characters
– Supported Character Sets:
– UTF8
– ASCII

10 Exasol – Overview and Concepts ACADEMY

Exasol Logical Limits

The listed limits may change in future versions of Exasol.

You might also like