ACADEMY
Exasol - Overview and Concepts
                                 1
Base idea: Creating a tuning free database
– Strong cost-based query optimizer
    – Automatic and fast table analyzer
      (e.g. column selectivity)
    – Automatic & transparent index creation and maintenance
– Automatic resource management
    – Optimized for mixed workload
    – Throughput orientated
    – Usage of priorities possible to influence the resource management
– Automatic data distribution, compression, …
2                                     Exasol – Overview and Concepts                                ACADEMY
         Base idea: Creating a tuning free database
         Due to intelligent algorithms, Exasol gets even faster with its usage and not slower, as
         conventional DBMS do. At the same time, the amount of administrative work will be
         strongly reduced, because the on-going tuning is performed by Exasol itself. Exasol's
         performance is independent from chosen schema type. Data distribution is automatic on
         the basis of the usage profile. The optimizer analyzes both data and queries and matches
         them without the intervention of the DBA. The system automatically creates and
         administers the indexes on the fly on the basis of the query analysis.
         Strong cost-based query optimizer
         From the beginning, the query optimizer was designed to meet the needs of massively
         parallel data processing. Its goal is to ensure that every single node can process as much
         data locally as possible. This significantly reduces the communication overhead and
         contributes greatly to the excellent scalability of Exasol.
         The optimizer figures out the optimal join order on the basis of table statistics and
         therefore allows optimal processing of multiple table joins. Due to these sophisticated
         mechanisms, Exasol achieves optimal performance with any database model.
         Indexing
         Indexes are also automatically generated, reutilized and discarded by the system as
         necessary. The user can not directly influence this, the executed queries serve as
         foundation for the choice of index.
         Data distribution across nodes
         The system distributes each table automatically according its distribution attributes across
         all active nodes in the cluster (shared nothing architecture). Tables below a certain size
         (typically < 100K rows) will be replicated. This ensures, that table joins over their
         distribution attributes can mostly be processed locally.
         The user can affect data distribution by setting distribution attributes.
Shared nothing architecture (MPP processing)
                  SELECT s.SALES_DATE, s.MARKET_ID, sp.ARTICLE_ID
                  FROM RETAIL.SALES s JOIN RETAIL.SALES_POSITIONS sp
                  ON s.SALES_ID = sp.SALES_ID WHERE s.MARKET_ID IN (661, 534, 678, 1990);
    2014-09-17
    2014-09-17     661
                    661          94346
                                 94346         2014-08-02         534        96673      2014-08-09
                                                                                        2014-08-09         678
                                                                                                            678         94447
                                                                                                                          94447    2014-11-01         1990        96378
    2014-09-17      661           93086        2014-04-08         678        94826       2014-12-22         1990         93803     2014-11-01         1990        93447
    2014-11-28      534           93000        …                  …          …           …                  …            …         2014-06-21         661         94447
    …               …             …                                                                                                …                  …           …
        id    sales_date          market           id       sales_date       market
                                                                                                                                       id       sales_date       market
         7   2014-09-17                  661       4    2014-08-02               534         id       sales_date        market
                                                                                                                                        8   2014-11-01            1990
        10   2014-11-28                  534       1    2014-04-08               678         5    2014-08-09                 678
                                                                                                                                        2   2014-06-21               661
             …                                          …         id      article            1    2014-12-22              1990
                  id           article
                                                                                             1                                              …
                                                                   7     93086                               id      article                          id     article
                   7       94346
                                                                                                  …
                                                                   4     96673                                  8   93803                             8      96378
                  10       93000
                  11       93803                                   2     94447                                  1   94826                             5      94447
                   …       …                                       …     …                                      …   …                                 …      …
3                                                           Exasol – Overview and Concepts                                                                 ACADEMY
             Massively Parallel Data Processing
             Exasol was developed as a parallel system and is constructed according to the shared-
             nothing principle. Data is distributed across all nodes in a cluster. When responding to
             queries, all nodes co-operate and special parallel algorithms ensure that most data is
             processed locally in each individual node's main memory.
             When a query is sent to the system, it is first accepted by the node the client is connected
             to. The query is then distributed to all nodes. Intelligent algorithms optimize the query,
             determine the best plan of action and generate needed indexes on the fly. The system
             then processes the partial results based the local datasets. This processing paradigm is also
             known as SPMD (single program multiple data). All cluster nodes operate on an equal basis,
             there is no Master Node. The global query result is delivered back to the user through the
             original connection.
Column-based storage
– Table values are stored column wise
                                                                         SALES
                                        SALES_ID          SALES_DATE        PRICE       MARKET_ID
                     Row                    1   1          Becker
                                                            2014-04-08      Hans49.91       678
                                                                                        23000
                                        3       2          Weber
                                                           2014-06-21          54.65
                                                                            Peter           661
                                                                                        730000
                                        4       4          Huber
                                                           2014-08-02          49.08
                                                                            Klaus           534
                                                                                        39600
                                            5   5          Schmidt
                                                            2014-08-09        80.01
                                                                            Maria           678
                                                                                        124000
                                        6       7          Schneider
                                                            2014-09-17         43.14
                                                                            Thomas         661
                                                                                        93600
                                        22      10         Fischer
                                                            2014-11-28         63.04
                                                                            Stefan          534
                                                                                        368200
4                                   Exasol – Overview and Concepts                                  ACADEMY
       Column-based storage
       Due to Exasol’s specialization on the data warehousing it benefits from the column-based
       data storage, reducing the number of IO operations and the overall amount of processed
       data. Typically, queries in a data warehouse access only few columns (eg. when joining
       tables).
       In order to optimize access to the hard disk, columns are partitioned into blocks. This
       facilitates maximum throughput and prevents unnecessary data from being imported. Gaps
       can occur in the blocks as a result of various operations (e.g. deletions or updates);
       however, the system automatically conducts a defragmentation if certain limits are
       exceeded.
Compression
– Compression
  ▪ Faster hard disc access
  ▪ Less RAM required
                                                                       SALES
                                    SALES_ID            SALES_DATE        PRICE       MARKET_ID
                                       1   1               …
                                                          2014-04-08      49.91
                                                                              49.91    678678
                                   2       2               …
                                                          2014-06-21      54.65
                                                                              54.65    661661
                                   4       4               …
                                                          2014-08-02      49.08
                                                                              49.08    534534
                                       5   5                …
                                                          2014-08-09      80.01
                                                                              80.01    678678
                                   7       7                 …
                                                          2014-09-17       43.14
                                                                             43.14     661661
                                    10 10                  …
                                                          2014-11-28       63.04
                                                                              63.04    534534
5                                 Exasol – Overview and Concepts                                  ACADEMY
       Data compression
       To optimize RAM utilization, table data is compressed element by element already in main
       memory; on basis of the data types and content of the columns, the system automatically
       selects a sufficiently effective compression algorithm. Compression is fully transparent to
       the user.
Data blocks
– Several values of one column are collected within a block
                                                                      SALES
                                    SALES_ID           SALES_DATE        PRICE        MARKET_ID
                                         1               2014-04-08           49.91      678
                                         2               2014-06-21           54.65      661
                      Block 1
                                         4               2014-08-02           49.08      534
                                                                                                  Block x
                                         5               2014-08-09           80.01      678
                                         7               2014-09-17           43.14      661
                      Block 2                            2014-11-28           63.04      534
                                        10
6                                    Exasol – Overview and Concepts                                ACADEMY
       Data blocks
       All columns are devided into data blocks to minimize the amount of data loaded or written
       to disc. Data that is not needed for a query is not loaded into the RAM.
       Blocks may include different numbers of elements, depending on the column type and the
       compression algorithm. Due to delete operations holes may occur within blocks.
       These holes are automatically refilled by the system.
Data block types
– Three different types of data blocks:
     1. Persistent (Data for persistent tables)
     2. Temporary (created during query execution)
     3. Indexes
– All block types are transparently loaded into RAM on demand
– All block types are treated the same way
7                                 Exasol – Overview and Concepts                        ACADEMY
       Data block types
       There are three different types of data stored blocks:
       Persistent
       Temporary
       Indexes
       Persistent: Data for persistent tables
       Temporary: Data created during the query execution (aggregates, sorting, …)
       Indexes: Data for internal indexes
       All these block types are transparently loaded into the main memory on demand.
       All the block types are handled in the same way.
In-Memory processing
                Query 1                    Query 2               Query 3
                                                                                          DB RAM
           1     Smith
                                         Virtual Storage
8                               Exasol – Overview and Concepts                                  ACADEMY
     In-Memory processing
     Exasol achieves its high performance as a result of innovative main memory algorithms.
     Unlike the hard-disk-based algorithms of traditional solutions, Exasol can specifically
     access any single value within nanoseconds. The algorithms that process the queries take
     advantage of these characteristics of the main memory and thus enable optimum
     performance. Exasol further enhances performance by automatically adjusting the
     contents of main memory according to the respective usage profile.
     Upon completion of a write operation, data is commited to the local hard disks. Built-in
     redundant data distribution also guarantees high database availability.
     This method of data processing is fully transparent for the user.
Hardware
– Utilization of commodity hardware                               - Industry-Standard 19'' Server
    – Clustering of a (large) number of low-cost
      components
    – Free vendor choice:
      Dell, HP, IBM, FSC, Oracle (Sun) …
                                                                     –   2 Hexa/Ten/Twelve Core CPUs
                                                                     –   16 – 786 GB RAM
                                                                     –   2 – 24 SAS/SATA HDD
                                                                     –   GBit Ethernet (1GiB, 10GiB)
9                                       Exasol – Overview and Concepts                                 ACADEMY
         Hardware
         Exasol is implemented to work with low-cost commodity hardware. Exasol typically
         operates on a cluster of powerful 19'' Intel servers.
         Typically each server will be configured as follows:
         •2 Intel Xeon CPUs each with 8 up to 12 cores,
         •16 to 786 GB RAM and
         •2 to 24 SATA or SAS hard disks.
         Network connectivity is typically based on standard GBit Ethernet.
         Such servers can be delivered by nearly every hardware vendor.
Exasol Logical Limits
– Maximum number of schema objects within a database (tables, views, functions, scripts):
  – 250,000
– Maximum number of columns per table:
  – 10,000
– Identifier length:
  – 128 Characters
– Supported Character Sets:
  – UTF8
  – ASCII
10                                   Exasol – Overview and Concepts                         ACADEMY
       Exasol Logical Limits
       The listed limits may change in future versions of Exasol.