MySQL Cluster
tips & tricks
Bogdan Kecman
MySQL Principal Technical Engineer
Bogdan.Kecman@oracle.com
1
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may
not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should
not be relied upon in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
2
Industry Leaders Rely on MySQL
Web & Enterprise OEM & ISVs
Cloud
3
MySQL Powers The Web
Over 50 million Tweets/day. 143,200 Tweets/sec in Aug 2013
”Many petabytes” of data. 11.2 Million Row changes & 2.5 billion
rows read /sec handled in MySQL
6 billion hours of video watched each month
Globally-distributed database with 100 terabytes of user-related
data based on MySQL Cluster
4
The #1 Database in the Cloud
SaaS
Hosting IaaS, PaaS
5
Architectural overview
MySQL Cluster Data Nodes InnoDB MyISAM SE s1 SE s2
6
Who’s Using MySQL Cluster?
MySQL Cluster Architecture
8
MySQL Cluster NODES (1/2)
• Data nodes (ndbd or ndbmtd)
• Stores data and indexes
• In memory
• Non-indexed data can be stored on disk
• Stores schema definition
• Check pointed to disk
• Transaction coordination
• Online backup
• All connect to each other
• Management nodes (ndb_mgmd)
• Distributing configuration
• Logging
• Monitoring
• Act as Arbitrator to prevent split-brain scenarios
• Not crucial for Cluster operation (if not working cluster still works properly,
only limit is that no new nodes can start if no mgm nodes are around)
• One is enough, two are perfect, three are too much
9
MySQL Cluster NODES (2/2)
• SQL nodes (mysqld)
• The NDBCLUSTER Storage Engine is actually API node
• Transparent for most applications
• Used to create tables
• Used for geo-replication
• Can act as arbitrator
• Connects to all Data Nodes
• API nodes (your own executable)
• Application written using NDB API
• C
• C++
• Java
• FAST
• No SQL parsing
• No Optimizer
• Examples
• Ndbcluster storage engine, ndb_restore, memcached-ndbcluster plugin, ldap-ndbcluster…
10
High Availability
• Fragmentation
• Table data is split among data nodes
• Synchronous replication
• Each fragment is stored NoOfReplicas times
• Heartbeating
• Automatic failover
• Online backups
• Online Updates
11
High Performance
• Performance boost
• In-memory
• Shared IO load
• As many SQL/API nodes as you like (up to 200)
• Direct access trough NDB API
• Performance killers
• Network (latency creates big problem)
• With 1gbit nic up to 10 data nodes works without problem
• With modern 10gbit nic up to 40 data nodes can run ok
• Joins
• Huge improvements in 7.4 but still joins on distributed data are always
going to suffer performance, especially on slower network
• Blobs
12
Scaling
• Up to 254 nodes
• Add data nodes on-line
• Note that adding data nodes does not only increase storage capacity
but also increase your IO capacity too.
• Geographical Replication
• Multi channel replication (note it is always idempotent)
13
Accessing NDBCLUSTER data
• SQL (via MySQL connector/php,java,ruby,python..., odbc, MySQL C-API..)
• MEMCACHED (add ndbcluster driver to memcached server)
• ndbAPI (C/C++ API)
• ClusterJ, JPA, ClusterJPA, LDAP
Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps
JPA
ClusterJPA
PHP PERL Python Ruby JDBC ClusterJ JSON Apache Memcached
MySQL JNI Node.js mod-ndb ndb-eng
NDB API (C++)
MySQL Cluster Data Nodes
14
ClusterJ/JPA
• Domain Object Model Persistence API (ClusterJ):
• Java API
• High performance, low latency
• Feature rich
• JPA interface built upon this new Java layer:
• Java Persistence API compliant
• Implemented as an OpenJPA plugin
• Uses ClusterJ where possible, reverts to JDBC for some
operations
• Higher performance than JDBC
• More natural for most Java designers
• Easier Cluster adoption for web applications
Memcached
• Memcached is a distributed memory based hash-
key/value store with no persistence to disk
• NoSQL, simple API, popular with developers
• MySQL Cluster already provides scalable, in-memory
performance with NoSQL (hashed) access as well as
persistence
• Provide the Memcached API but map to NDB API calls
• Writes-in-place, so no need to invalidate cache
• Simplifies architecture as caching & database integrated
into 1 tier
• Access data from existing relational tables
Traditional Memcached Architecture
httpd memcached
hash key
PHP/Perl memcached
to find data
Memcache
friends:12389 memcached
memcache key
NDB & Memcache Architecture: Memcache protocol + NDB storage
MySQL
Cluster
Application
memcached Data Node
Memcache NDB Engine MySQL
Client Cluster
Data Node
Memcached/MySQL Cluster latency
memcachetest -t 2 -M 7000 -c 25000
Cluster & Memcached – Configured Schema
key value
<town:maidenhead,SL6>
Application view
SQL view prefix key value
<town:maidenhead,SL6>
Prefix Table Key-col Val-col policy town ... code ...
town: map.zip town code cluster maidenhead ... SL6 ...
Config tables map.zip
Node.js NoSQL API
• Native JavaScript access to MySQL Cluster
–End-to-End JavaScript: browser to the app and database
Clients –Storing and retrieving JavaScript objects directly in
MySQL Cluster
–Eliminate SQL transformation
• Implemented as a module for node.js
V8 JavaScript Engine –Integrates full Cluster API library within the web app
• Couple high performance, distributed apps, with high
MySQL Cluster Node.js Module performance distributed database
MySQL Cluster Data Nodes
MySQL Cluster NoSQL API for Node.js
Application Code
// Constructor // Create a tweet
function Tweet(user, message) { function newTweet(
this.id = UUID.generate(); err, dbSession, httpReq){
this.timestamp = Date.now(); var tweet = new Tweet(
this.user = user; httpReq.user,
this.message = message; httpReq.message);
} dbSession.persist(tweet);
}
// Server Startup
var nosql = require('mysql-js'); function onNewTweetRequest(
err, httpReq){
var sessionFactory =
nosql.connectSync('ndb'); essionFactory.openSession(
null, newTweet, httpReq);
nosql.mapClass(Tweet, 'tweets');
}
Need for Speed?
NDB API
•C++ programming interface
•Provides direct access to data nodes
•No MySQL server needed (still it is recommended to use
MySQL to manipulate schema)
•No SQL layer (no parser, no optimizer, no …)
•Query batching
•Async transactions
•NDB Events
Can’t all be that good?
•Portability (all requests are hardcoded into your application)
•Less flexibility
–schema changes need to be hardcoded into your app
–to change a simple query you have to change and recompile your c++ code
•No privileges (everyone have access to everything)
•No security (if you can telnet to a data node port you have
access to everything)
•No triggers, views, stored procedures
•No auditting
NDB API
Life of a transaction
1. Start transaction
2. Define operations
3. Execute operations
4. Commit / Abort transaction
NDB API
Starting a Transaction
•A transaction is started by getting an NdbTransaction object
•An Ndb object can have maximum of 1024 parallel transactions
NdbTransaction * t = ndb->startTransaction();
if (t == NULL){
printerr(“could not start transaction\n”);
return (-1);
}
NDB API
Getting an NdbOperation Object
•An NdbOperation object is created with the getNdbOperation
•A table name or a NdbDictionary::Table* needs to be provided
NdbOperation * op = t->getNdbOperation(“tab1”);
if (op == NULL){
//handle error
}
NDB API
Defining the operation type
•insertTuple()
•readTuple()
•writeTuple()
•updateTuple()
•deleteTuple()
op->readTuple();
NDB API EXAMPLE (pk access)
NdbTransaction * trans = ndb->startTransaction();
if (trans == NULL){ printerr(“could not start transaction\n”); return (-1); }
NdbOperation * op = trans->getNdbOperation(“City”);
if (op == NULL) return -1;
op->readTuple();
int idvalue = 3236;
op->equal(“ID”, idvalue);
int population = 0;
op->getValue(“Population”, (char*)&population);
char name[35];
op->getValue(“Name”, name);
if (trans->execute(NdbTransaction::Commit, NdbOperation::AbortOnError, 1)
== -1){
printerr(“transaction was not successful\n”);
return (-1);
}
trans->close();
printf(“The City %s has the population of %d\n”, name, population);
NDB API
Joining tables
•Joining tables with NDB API is way more complex then with SQL
•The basic principle is easy – nested FOR loops
•The method for retrieving the rows depends on the tables
involved and possible indexes
•It is very difficult to do dynamic optimization
•In principle the join method has to be decided when creating
the program (coding time)
SQL vs NDB API speed comparison (reads/second – note logarithmic Y axes)
Thank You!
Questions?
Bogdan Kecman
MySQL Principal Technical Engineer
Bogdan.Kecman@oracle.com 32