KEMBAR78
ClickHouse table engine · Issue #91 · h2oai/db-benchmark · GitHub
Skip to content

ClickHouse table engine #91

@jangorecki

Description

@jangorecki

ClickHouse has been already added to benchmark script and report. There are some pending items to close #73 fully. New question appeared, precisely, what table engine should be used, there are generally two options to consider:

  • To achieve maximum performance one could expect to use memory table engine. In linked docs you can read

Normally, using this table engine is not justified. However, it can be used for tests, and for tasks where maximum speed is required on a relatively small number of rows (up to approximately 100,000,000).

  • On the other hand, there is a on-disk merge tree table engine. All other tools we are benchmarking, as of now, are using in-memory data, thus using on-disk storage for ClickHouse might be unfair.

Although it is happening that merge tree table engine yields faster query execution time than memory engine (see timings below). Moreover all 1e9 rows (50GB) datasets, as of current CH version, are failing when writing data to in-memory tables.

Code: 210. DB::NetException: Connection reset by peer, while writing to socket (
127.0.0.1:9000)

We could say "as documented" because 1e9 rows is 10 times more than docs suggests for in-memory tables.


As of now we do run benchmarks on both table engines.
For a memory table engine we use G1 data file prefix. For merge tree table engine we use G2 prefix (extra 1:N column has been added, as required by merge tree). Timings of both types of table engines lands in time.csv.
What lands on the benchplot on report page is the in-memory timing, unless there is no, then merge tree timing. Logic for that can be found in report.R#L141-L164.


The question is, should we switch to merge tree timings only?

cc @alexey-milovidov

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions