Ceph vs Swift
Performance Evaluation on a Small Cluster
eduPERT monthly call
July, 24th 2014
Jul 24, 2014
GANT eduPERT meeting
About me
Vincenzo Pii
Researcher @
Leading research initiative on Cloud Storage
Under the theme IaaS
More on ICCLab: www.cloudcomp.ch
Jul 24, 2014
GANT eduPERT meeting
About this work
Performance evaluation study on cloud
storage
Small installations
Hardware resources hosted at the ZHAW
ICCLab data center in Winterthur
Two OpenStack clouds (stable and experimental)
One cluster dedicated to storage research
Jul 24, 2014
GANT eduPERT meeting
INTRODUCTION
Jul 24, 2014
GANT eduPERT meeting
Cloud storage
Cloud storage
Based on distributed, parallel, fault-tolerant file systems
Distributed resources exposed through a single homogeneous
interface
Typical requirements
Highly scalable
Replication management
Redundancy (no single point of failure)
Data distribution
Object storage
A way to manage/access data in a storage system
Typical alternatives
Block storage
File storage
Jul 24, 2014
GANT eduPERT meeting
Ceph and Swift
Ceph (ceph.com)
Supported by Inktank
Recently purchased by RedHat (owners of GlusterFS)
Mostly developed in C++
Started as PhD thesis project in 2006
Block, file and object storage
Swift (launchpad.net/swift)
OpenStack object storage
Completely written in Python
RESTful HTTP APIs
Jul 24, 2014
GANT eduPERT meeting
Objectives of the study
1. Performance evaluation of Ceph and Swift on a small
cluster
Private storage
Storage backend for own-apps with limited requirements
in size
Experimental environments
2. Evaluate Ceph maturity and stability
Swift already widely deployed and industry-proven
3. Hands-on experience
Jul 24, 2014
Configuration
Tooling
GANT eduPERT meeting
CONFIGURATION AND PERFORMANCE
OF SINGLE COMPONENTS
Jul 24, 2014
GANT eduPERT meeting
Network configuration
Three servers on a dedicated VLAN
1 Gbps NICs
100BaseT cabling
Node 1
(.2)
Node 2
(.3)
10.0.5.x/24
Jul 24, 2014
GANT eduPERT meeting
Node 3
(.4)
Servers configuration
Hardware
Lynx CALLEO Application Server 1240
2x Intel Xeon E5620 (4 core)
8x 8 GB DDR3 SDRAM, 1333 MHz, registered, ECC
4x 1 TB Enterprise SATA-3 Hard Disk, 7200 RPM, 6
Gb/s (Seagate ST1000NM0011)
2x Gigabit Ethernet network interfaces
Operating system
Ubuntu 14.04 Server Edition with Kernel 3.13.0-24generic
Jul 24, 2014
GANT eduPERT meeting
Disks performance
READ:
$ sudo hdparm -t --direct /dev/sdb1
/dev/sdb1:
Timing O_DIRECT disk reads: 430 MB in
3.00 seconds = 143.17 MB/sec
WRITE:
$ dd if=/dev/zero of=anof bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 8.75321 s, 123 MB/s
Jul 24, 2014
GANT eduPERT meeting
Network performance
$ iperf -c ceph-osd0
-----------------------------------------------------------Client connecting to ceph-osd0, TCP port 5001
TCP window size: 85.0 KByte (default)
-----------------------------------------------------------[ 3] local 10.0.5.2 port 41012 connected with 10.0.5.3 port 5001
[ ID] Interval
Transfer
Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes
942 Mbits/sec
942 Mbits/s 117.5 MB/s
Jul 24, 2014
GANT eduPERT meeting
CLOUD STORAGE CONFIGURATION
Jul 24, 2014
GANT eduPERT meeting
Ceph OSDs
cluster-admin@ceph-mon0:~$ ceph status
cluster ff0baf2c-922c-4afc-8867-dee72b9325bb
health HEALTH_OK
monmap e1: 1 mons at {ceph-mon0=10.0.5.2:6789/0}, election epoch 1,
quorum 0 ceph-mon0
osdmap e139: 4 osds: 4 up, 4 in
pgmap v17348: 632 pgs, 13 pools, 1834 bytes data, 52 objects
199 MB used, 3724 GB / 3724 GB avail
632 active+clean
cluster-admin@ceph-mon0:~$ ceph osd tree
# id
weight
type name
up/down
-1
3.64
root default
-2
1.82
host ceph-osd0
0
0.91
osd.0
up
1
0.91
osd.1
up
-3
1.82
host ceph-osd1
2
0.91
osd.2
up
3
0.91
osd.3
up
Jul 24, 2014
reweight
1
1
1
1
Monitor
(mon0)
St. node 0
St. node 1
HDD1 (OS)
Not used
Not used
Not used
HDD1 (OS)
osd0 XFS
osd1 XFS
Journal
HDD1 (OS)
osd2 XFS
osd3 XFS
Journal
GANT eduPERT meeting
Swift devices
Building rings on storage devices
(No separation of Accounts, Containers and Objects)
export ZONE=
# set the zone number for that storage device
export STORAGE_LOCAL_NET_IP=
# and the IP address
export WEIGHT=100
# relative weight (higher for bigger/faster disks)
export DEVICE=
swift-ring-builder account.builder add z$ZONE-$STORAGE_LOCAL_NET_IP:6002/$DEVICE $WEIGHT
swift-ring-builder container.builder add z$ZONE-$STORAGE_LOCAL_NET_IP:6001/$DEVICE $WEIGHT
swift-ring-builder object.builder add z$ZONE-$STORAGE_LOCAL_NET_IP:6000/$DEVICE $WEIGHT
Jul 24, 2014
Swift
Proxy
St. node 0
St. node 1
HDD1 (OS)
Not used
Not used
Not used
HDD1 (OS)
dev1 XFS
dev2 XFS
Not used
HDD1 (OS)
dev3 XFS
dev4 XFS
Not used
GANT eduPERT meeting
Highlighting a difference
LibRados used to access Ceph
Plain installation of a Ceph storage cluster
Non ReST-ful interface
This is the fundamental access layer in Ceph
RadosGW (Swift/S3 APIs) is an additional component on top of
LibRados (as block and file storage clients)
ReST-ful APIs over HTTP used to access Swift
Extra overhead in the communication
Out-of-the box access method for Swift
This is part of the differences to be benchmarked, even if...
HTTP APIs for object-storage are interesting for many use cases
This use case:
Unconstrained self-managed storage infrastructure for, e.g., own apps
Control over infrastructure and applications
Jul 24, 2014
GANT eduPERT meeting
WORKLOADS
Jul 24, 2014
GANT eduPERT meeting
Tools
COSBench (v. 0.4.0.b2) - https://github.com/intel-cloud/cosbench
Cool web interface to submit workloads and monitor current status
Workloads defined as XML files
Very good level of abstractions applying to object storage
Supported metrics
Developed by Intel
Benchmarking for Cloud Object Storage
Supports both Swift and Ceph
Op-Count (number of operations)
Byte-Count (number of bytes)
Response-Time (average response time for each successful request)
Processing-Time (average processing time for each successful request)
Throughput (operations per seconds)
Bandwidth (bytes per seconds)
Success-Ratio (ratio of successful operations)
Outputs CSV data
Graphs generated with cosbench-plot - https://github.com/icclab/cosbench-plot
Jul 24, 2014
Describe inter-workload charts in Python
GANT eduPERT meeting
Workloads gist
Jul 24, 2014
GANT eduPERT meeting
COSBench web interface
Jul 24, 2014
GANT eduPERT meeting
Workload matrix
Containers
Objects size
R/W/D Distr. (%)
Workers
4 kB
80/15/5
20
128 kB
100/0/0
16
512 kB
0/100/0
64
Jul 24, 2014
1024 kB
128
5 MB
256
10 MB
512
GANT eduPERT meeting
Workloads
216 workstages (all the combinations of the
values of the workload matrix)
12 minutes per workstage
2 minutes warmup
10 minutes running time
1000 objects per container (pools in Ceph)
Uniformly distributed operations over the
available objects (1000 or 20000)
Jul 24, 2014
GANT eduPERT meeting
Performance Results
READING
Jul 24, 2014
GANT eduPERT meeting
Read tpt Workstage AVGs
Jul 24, 2014
GANT eduPERT meeting
Read tpt 1 cont 4 KB
Jul 24, 2014
GANT eduPERT meeting
Read tpt 1 cont 128 KB
Jul 24, 2014
GANT eduPERT meeting
Read ResponseTime 1 cont
128 KB
Jul 24, 2014
GANT eduPERT meeting
Read bdw Workstage AVGs
Jul 24, 2014
GANT eduPERT meeting
Read tpt 20 cont 1024 KB
Jul 24, 2014
GANT eduPERT meeting
Response time 20 cont
1024 KB
Jul 24, 2014
GANT eduPERT meeting
Performance Results
WRITING
Jul 24, 2014
GANT eduPERT meeting
Write tpt Workstage AVGs
Jul 24, 2014
GANT eduPERT meeting
Write tpt 1cont 128 KB
Jul 24, 2014
GANT eduPERT meeting
Ceph write tpt 1 cont 128
KB replicas
Jul 24, 2014
GANT eduPERT meeting
Ceph write RT 1 cont 128
KB replicas
Jul 24, 2014
GANT eduPERT meeting
Write bdw Workstage AVGs
Jul 24, 2014
GANT eduPERT meeting
Write Response Time 20 cont
512 KB
Jul 24, 2014
GANT eduPERT meeting
Performance Results
READ/WRITE/DELETE
Jul 24, 2014
GANT eduPERT meeting
R/W/D tpt Workstage AVGs
Jul 24, 2014
GANT eduPERT meeting
Read R/W/D Response Time
Jul 24, 2014
GANT eduPERT meeting
General considerations and future works
CONCLUSIONS
Jul 24, 2014
GANT eduPERT meeting
Performance Analysis Recap
Ceph performs better when reading, Swift when writing
Ceph librados
Swift ReST APIs over HTTP
More remarkable difference with small objects
Less overhead for Ceph
Librados
CRUSH algorithm
Comparable performance with bigger objects
Network bottleneck at 120 MB/s for read operations
Response time
Swift: greedy behavior
Ceph: fairness
Jul 24, 2014
GANT eduPERT meeting
General considerations:
challenges
Equivalency
Comparing two similar systems that are not exactly overlapping
Creating fair setups (e.g., Journals on additional disks for Ceph)
Transposing corresponding concepts
Configuration
Choosing the right/best settings for the context (e.g., number of Swift
workers)
Identifying bottlenecks
To be done in advance to create meaningful workloads
Workloads
Run many tests to identify saturating conditions
Huge decision space
Keep up the pace
Lot of developments going on (new versions, new features)
Jul 24, 2014
GANT eduPERT meeting
General considerations:
lessons learnt
Publication medium (blog post)
Excellent feedback (e.g., Rackspace developer)
Immediate right of reply and real comments
Most important principles
Openness
Share every bit of information
Clear intents, clear justifications
Neutrality
When analyzing the results
When drawing conclusions
Very good suggestions coming from you could, you
should, you didnt comments
Jul 24, 2014
GANT eduPERT meeting
Future works
Performance evaluation necessary for cloud
storage
More object storage evaluations
Interesting because its very close to the application
level
Block storage evaluations
Very appropriate for IaaS
Provide storage resources to VMs
Seagate Kinetic
Possible opportunity to work on a Kinetic setup
Jul 24, 2014
GANT eduPERT meeting
Vincenzo Pii: piiv@zhaw.ch
THANKS!
QUESTIONS?
Jul 24, 2014
GANT eduPERT meeting