KEMBAR78
Apache HBase Improvements and Practices at Xiaomi | PDF
Some improvements and practices of
HBase at Xiaomi
Duo Zhang, Liangliang He
{zhangduo, heliangliang}@xiaomi.com
........ ..... ................. ................. ................. .... .... . .... ........ .
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
About Xiaomi
Xiaomi Inc. (literally ”millet technology”) is a privately owned Chinese
electronics company headquartered in Beijing.
▶ Sold 70m+ smart phones in 2015
▶ 100m+ DAU for MIUI
▶ Lots of other smart devices.(Mi Band, Air Purifier, etc.)
2 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Our HDFS/HBase Team
▶ 9 Developers
▶ Honghua Feng
▶ Jianwei Cui
▶ Liangliang He
▶ YingChao Zhou
▶ Guanghao Zhang
▶ Shaohui Liu
▶ Chen Zhang
▶ Zhe Yang
▶ Duo Zhang
3 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Agenda
1. Current Status
2. Problems and Solutions
3. HBase as a service
4 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Clusters and Scenarios
▶ Traditional IDC
20+ online clusters / 2 offline clusters, 3 data centers
▶ AWS
5 online clusters / 1 offline cluster, 3 AWS regions
▶ Online Service
MiCloud, MiPush, SDS, Metrics...
▶ Offline Processing
User Profile, Distributed Trace, Recommendation, ...
5 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Online Scenario: MiCloud
Personal cloud storage for smart phones
Numbers
▶ 100+ million users
▶ 1+ trillion rows
▶ 1600+ regions in the largest table
See: https://i.mi.com
6 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Offline Scenario: User Profile
▶ Input data replicated from online to offline cluster
▶ Output data is written to offline cluster and replicated to online cluster
Numbers
▶ 200+ million users
▶ Both batch and streaming processing
7 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Agenda
1. Current Status
2. Problems and Solutions
3. HBase as a service
8 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Per-CF Flush
HBase book, section 34, On the number of column families:
HBase currently does not do well with anything above two or
three column families ... if one column family is carrying the
bulk of the data bringing on flushes, the adjacent families will
also be flushed even though the amount of data they carry is
small ...
So let’s not flush the small families
9 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Per-CF Flush
HBase book, section 34, On the number of column families:
HBase currently does not do well with anything above two or
three column families ... if one column family is carrying the
bulk of the data bringing on flushes, the adjacent families will
also be flushed even though the amount of data they carry is
small ...
So let’s not flush the small families
9 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Per-CF Flush
▶ Why we must flush all families?
▶ Our sequence id accounting is per region.
▶ Can not know the lowest unflushed sequence id.
▶ Track sequence id per store, i.e., per family
▶ Map<RegionName, SequenceId> to
Map<RegionName, Map<FamilyName, SequenceId>>
▶ SequenceId map in WAL implementation
▶ FlushedSequenceId in ServerManager at master
▶ Report a Map of flushed sequence id to master(Thanks protobuf for
compatibility)
▶ Skip WAL cells per store when replaying
▶ FlushPolicy
▶ FlushAllStoresPolicy
▶ FlushLargeStoresPolicy
10 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Per-CF Flush
▶ Flush is not only used for releasing memory
▶ WAL truncating
▶ Region merge, split, move...
▶ Bulk load
▶ Introduce a ’force’ flag
▶ Always flush all families regardless of which FlushPolicy we use
11 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Per-CF Flush
▶ First introduced in HBase-1.1.x, default is FlushAllStoresPolicy
▶ In HBase-1.2.x, default is FlushLargeStoresPolicy
▶ HBASE-10201, HBASE-12405
12 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Async WAL
13 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Async WAL
Problem: FSHLog
▶ DFSOutputStream is too complicated and hard to optimize
▶ Pipeline recovery
▶ Need to use multiple SyncRunner threads to simulate event-driven.
▶ Chained pipeline, 3 times latency
14 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Async WAL
15 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Async WAL
Solution: AsyncFSWAL and FanOutOneBlockAsyncDFSOutput
▶ Simple, can only write one block
▶ Fail-fast
▶ All things are done in netty’s EventLoop, fully event-driven
▶ Fan out, write to 3 datanodes concurrently
16 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Async WAL
Implementation:
▶ Why not disruptor?
▶ Should not block EventLoop thread
▶ Submit consumer task only if there are entries in queue
▶ Avoid submit a task for every entry
▶ SASL and encryption support
▶ Be compatible with hadoop from 2.4.x to 2.7.x
▶ Classes and methods are changed, moved, removed, etc.
▶ Abstract common interface
▶ Reflection
17 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Async WAL
Performance numbers:
Table: WALPE
Threads Default(s) Async(s) Diff
1 837 228 3.7x
3 647 274 2.4x
5 609 310 2x
10 916 376 2.5x
25 1177 556 2.1x
50 1463 828 1.8x
100 1902 1382 1.4x
▶ Why diff decrease as threads
increase?
▶ high latency ̸= low throughput
▶ Increase concurrency can
increase throughput
▶ The bottle neck is HDD under
high workload
▶ YCSB write qps: roughly 14.3k vs 16.3k, about 10-15% more throughput
18 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Async WAL
▶ Available in HBase-2.0
▶ Also the default WAL implementation in HBase-2.0
▶ Will push the AsyncFSOutput related code to HDFS
▶ HBASE-14790
19 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: The ’Delete Version’ problem
▶ Let MaxVersion = 2, and timestamp T1 < T2 < T3
1. Put T1, T2, T3
2. Major compaction
3. Delete T2
1. Put T1, T2, T3
2. Delete T2
3. Major compaction
T3 vs. T3, T1
20 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: Delete has effect on newer Put(with higher sequence id)
▶ Let timestamp T1 < T2
▶ Delete all versions less than T2
▶ Major compaction
▶ Put T1
▶ Delete all versions less than T2
▶ Put T1
▶ Major compaction
T1 vs. Nothing
21 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Problem: Delete has effect on newer Put(with higher sequence id)
▶ Let timestamp T1 < T2
▶ Delete all versions less than T2
▶ Major compaction
▶ Put T1
▶ Delete all versions less than T2
▶ Put T1
▶ Major compaction
T1 vs. Nothing
21 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
▶ Not a big problem? It depends.
▶ Major compaction is a low frequency operation
▶ You just choose one path so the result is deterministic
▶ What if we use replication?
Eventual inconsistency
22 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
▶ Not a big problem? It depends.
▶ Major compaction is a low frequency operation
▶ You just choose one path so the result is deterministic
▶ What if we use replication?
Eventual inconsistency
22 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
▶ Not a big problem? It depends.
▶ Major compaction is a low frequency operation
▶ You just choose one path so the result is deterministic
▶ What if we use replication?
Eventual inconsistency
22 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
Solution: Also consider sequence id
▶ Once a value is invisible, it should never appear again
▶ A modified scanner that also consider sequence id when deciding visibility
▶ Can not use max timestamp to exclude store files when scan
▶ Delete should not have effect on put with a higher sequence id
▶ Maybe a table level config to turn it on
23 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
▶ Enough?
▶ Not really for replication
▶ The WAL of the same Cell should be sent by ascending order of sequence id
▶ HBASE-2256, HBASE-8721, HBASE-8770...
24 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
▶ Enough?
▶ Not really for replication
▶ The WAL of the same Cell should be sent by ascending order of sequence id
▶ HBASE-2256, HBASE-8721, HBASE-8770...
24 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Revisit the semantic of Delete
▶ Enough?
▶ Not really for replication
▶ The WAL of the same Cell should be sent by ascending order of sequence id
▶ HBASE-2256, HBASE-8721, HBASE-8770...
24 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Multi-Tenancy Practice
Difference from trunk HBase quota implementation
▶ Requests are size weighted when counting quota
▶ Per user instead of per regionserver
▶ Assume the workloads are evenly distributed to each region
▶ Soft qps limit, like DynamoDB
▶ Configurable qps quota limit for each regionserver
▶ User can have a qps higher than its quota if regionserver has free quota
▶ Transparent client side auto backoff when quota exceeds
25 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Cross Data-Center Failover Practice
Modifications of HBase:
▶ HBase nameservice
▶ Read-write switch in client configuration
▶ Dynamic configuration with zookeeper
▶ Record last synced WAL write time when update replication log position
Failover steps:
▶ Check and make sure replication is in-sync
▶ Stop write operation by update config in zookeeper
▶ Check and wait replication is done by checking the sync time of last
replicated log
▶ Switch master cluster and turn on write operation
26 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Cross Data-Center Failover Practice
Modifications of HBase:
▶ HBase nameservice
▶ Read-write switch in client configuration
▶ Dynamic configuration with zookeeper
▶ Record last synced WAL write time when update replication log position
Failover steps:
▶ Check and make sure replication is in-sync
▶ Stop write operation by update config in zookeeper
▶ Check and wait replication is done by checking the sync time of last
replicated log
▶ Switch master cluster and turn on write operation
26 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Cross Data-Center Failover Practice
Modifications of HBase:
▶ HBase nameservice
▶ Read-write switch in client configuration
▶ Dynamic configuration with zookeeper
▶ Record last synced WAL write time when update replication log position
Failover steps:
▶ Check and make sure replication is in-sync
▶ Stop write operation by update config in zookeeper
▶ Check and wait replication is done by checking the sync time of last
replicated log
▶ Switch master cluster and turn on write operation
26 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Cross Data-Center Failover Practice
Modifications of HBase:
▶ HBase nameservice
▶ Read-write switch in client configuration
▶ Dynamic configuration with zookeeper
▶ Record last synced WAL write time when update replication log position
Failover steps:
▶ Check and make sure replication is in-sync
▶ Stop write operation by update config in zookeeper
▶ Check and wait replication is done by checking the sync time of last
replicated log
▶ Switch master cluster and turn on write operation
26 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Agenda
1. Current Status
2. Problems and Solutions
3. HBase as a service
27 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
SDS (Structured Datastore Service)
We build SDS on top of HBase:
▶ Simplified interface, configuration and dependency
▶ Multi-platform support
▶ Flexible access/quota control
▶ Minimized administration cost
28 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Screenshots
29 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Screenshots
30 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Architecture and Applications
Currently serving:
▶ 1000+ tables
▶ Dozens types of smart devices
▶ Several millions of independent devices
31 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
libsds
Formalized Data Model
▶ Entity Group: group of records
belong to a single entity
▶ Primary Index: primary index
within an entity group
▶ Local Secondary Index: index
within a single entity group
▶ Eager index
▶ Lazy index
▶ Immutable index
32 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Example: Schema definition
Example: cloud notebook
-- Equivalent SQL definition
CREATE TABLE note (
userId VARCHAR (64) NOT NULL , -- Entity group key
noteId INT8 NOT NULL , -- Primary key
title VARCHAR (256) ,
body VARCHAR (2048) ,
mtime BIGINT ,
tag VARCHAR (16),
version INT ,
PRIMARY KEY(userId , noteId),
INDEX(userId , mtime),
INDEX(userId , tag)
);
33 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Example: Data Type Definition
@Record(table = "note", family = "B")
public class Note {
@Column(keyOnly = true)
String uid; // user ID
@Column(keyOnly = true)
Long id; // note ID
@Column String title;
@Column(serialization = Column. SerializationType .UNIX_TIME)
private Date mtime;
@Column(collection = true , elementClass = String.class , type =
private Set <String > tags;
@Column(serialization = Column. SerializationType .JSON)
private NoteBody body; 34 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Example: Data Layout
CF Rowkey Values
B hash(userId) userId noteId title, mtime, tags, body, version
I hash(userId) userId idx-mtime mtime noteId title1
I hash(userId) userId idx-tags tag1 noteId title
I hash(userId) userId idx-tags tag2 noteId title
1
projected attribute
35 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Example: Query
// random read
@Override public Note findNoteById(String userId , long nid) {
Note key = new Note(userId , nid , null , null , null , ...);
return typedAccessClient .get(key);
}
// range query , same as SELECT * FROM note
// WHERE uid=userId AND title LIKE ’Test%’
// ORDER BY mtime DESC LIMIT N
@Override public List <ListViewItem > searchNLatestItems (
String userId , int N, String title) {
return typedAccessClient .scan(Note.class ,
ListViewItem .class ,
Constants.IDX_MTIME , // implicitly specify index name
Note.entityGroupNote (userId),
Note.entityGroupNote (userId),
"title␣REGEX␣’" + match + "’",// title REGEX ’Test.*’
N). getRecords (); 36 / 38
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Example: Update
// same as UPDATE note SET version = oldVersion + 1,
// mtime = NOW , contents = ’...’
// WHERE version = oldVersion
// AND uid = userId AND id = noteId
@Override public boolean updateNote(Note note) {
int currentVersion = note.getVersion ();
try {
SimpleCondition versionPredicate =
SimpleCondition .predicate(note.getVersion (),
CompareFilter.CompareOp.EQUAL ,
Constants.VERSION_FIELD );
note.setMtime(new Date ());
note.setVersion( currentVersion + 1);
return typedAccessClient .put(note , versionPredicate );
} finally {
note.setVersion( currentVersion );
}
} 37 / 38
Thanks! Questions?
Contacts: {zhangduo, heliangliang}@xiaomi.com
........ ..... ................. ................. ................. .... .... . .... ........ .

Apache HBase Improvements and Practices at Xiaomi