Cloud Computing
Amazon Web Services - introduction
Keke Chen
Agenda
Approaching cloud computing via a real
example: Amazon Web Services (AWS)
You can get quick hands-on experience
Have an understanding of CC from the
user’s point of view
Then explore the internal of a cloud
Internal Architecture
Enabling technologies
Virtualization
Web services
Infrastructure as a service
Users can ask for
Computing nodes (servers)
Storage
Networking (between nodes)
AWS: Infrastructure as a service
Computing
Elastic Compute Cloud (EC2)
Storage
Simple Storage Services (S3)
Elastic Block Stores (EBS)
Glaciar
DynamoDB
Other services
Simple Queue Service
EC2
A typical example of utility computing
functionality:
launch instances with a variety of operating
systems (windows/linux)
load them with your custom application
environment (customized AMI)
Full root access to a blank Linux machine
manage your network’s access permissions
run your image using as many or few
systems as you desire (scaling up/down)
Backyard…
Powered by Xen – Virtual Machine
Different from Vmware & VPC
- high performance
Hardware contributions by Intel
(VT-x/Vanderpool) and AMD (AMD-V)
Supports “Live Migration” of a virtual
machine between hosts
We will dedicate one class to Xen...
Amazon Machine Images
Public AMIs: Use pre-configured, template
AMIs to get up and running immediately.
Choose from Fedora, Movable Type, Ubuntu
configurations, and more
Private AMIs: Create an Amazon Machine
Image (AMI) containing your applications,
libraries, data and associated configuration
settings
Paid AMIs: Set a price for your AMI and let
others purchase and use it (Single payment
and/or per hour)
AMIs with commercial DBMS
Normal way to use EC2
For web applications
Run your base system in minimum # of VMs
Monitoring the system load (user traffic)
Load is distributed to VMs
If over some threshold increase # of VMs
If lower than some thresholds decrease # of
VMs
For data intensive analysis
Estimate the optimal number of nodes
(tricky!)
Load data
Start processing
Tools (most are for web apps)
Elastic Block Store: network-attached
persistent storage, can be attached to each VM
instance
Elastic IP address: programmatically remap
public IP to any instance
Virtual private cloud: bridge private cloud and
AWS resources
CloudWatch: monitoring EC2 resouces
Auto Scaling: conditional scaling
Elastic load balancing: automatically distribute
incoming traffic across instances
Type of instances
Standard instances (micro, small, large,
extra)
E.g., small: 2GB Memory, 1EC2 Compute
Unit (Xeon processor?), some GBs of
instance/EBS storage (i.e., root volume)
High-CPU instances
More CPU with same amount of memory
Pricing
Check the website
Amazon machine images(AMIs)
Virtual machine images
When users ask for a specific system
AWS loads the corresponding AMI and
creates the virtual machine (VM)
The user access the VM instance, no
different from accessing a real remote
server
AMIs cover most common systems
Linux distributions
Windows server
AMIs with special software
IBM DB2, Informix Dynamic Server,
Lotus Web Content Management,
WebSphere Portal Server
MS SQL Server, IIS/Asp.Net
Hadoop
Open MPI
Apache web server
MySQL
Oracale 11g
…
Access methods
Web interface
Command line
Programming Interface
E.g., boto python library
Simple Storage Service (S3)
Write,read,delete objects 0byte-5TB,
single PUT <5GB
Namespace: two levels: buckets, keys
Accessible using URLs
S3 namespace
Amazon S3
bucket bucket
object object object object
bucket
object object
Amazon S3
mculver-images media.mydomain.com
Beach.jp 2005/party/ img1.jp
img2.jpg
g hat.jpg g
public.blueorigin.com
index.html img/pic1.jpg
Accessing objects
Bucket: keke-images, key: jpg1, object:
a jpg image
accessible with
https://keke-images.s3.amazonaws.com/jpg1
mapping your subdomain to S3
with DNS CNAME configuration
e.g. media.yourdomain.com
media.yourdomain.com.s3.amazonaws.com/
Access control
Access log
Authorization
ACL: AWS users, users identified by email,
any user …
Digital signature to ensure integrity
Encrypted access: https
Pricing
Check the website
Elastic Block Store
An EBS volume is a virtual disk of a fixed size with a
block read/write interface. It can be mounted as a
filesystem on a running EC2 instance where it can be
updated incrementally. Unlike an instance store, an
EBS volume is persistent.
(Compare to an S3 object, which is essentially a file that
must be accessed in its entirety.)
Fundamental operations:
CREATE a new volume (1GB-1TB)
COPY a volume from an existing EBS volume or S3 object.
MOUNT on one instance at a time.
SNAPSHOT current state to an S3 object.
EBS is approx. 3x more expensive by
volume and 10x more expensive by
IOPS than S3.
Use Glacier for Cold Data
Glacier is structured like S3: a vault is a container for an
arbitrary number of archives. Policies, accounting, and
access control are associated with vaults, while an archive is a
single object.
However:
All operations are asynchronous and notified via SMS.
Vault listings are updated once per day.
Archive downloads may take up to four hours.
Only 5% of total data can be accessed in a given month.
Pricing:
Storage: $0.01 per GB-month
Operations: $0.05 per 1000 requests
Data Transfer: Like S3, free within AWS.
S3 Policies can be set up to automatically move data into
Glacier.
Durability
Amazon claims about S3:
Amazon S3 is designed to sustain the concurrent loss of data in two facilities, e.g.
3+ copies across multiple available domains.
99.999999999% durability of objects over a given year.
Amazon claims about EBS:
Amazon EBS volume data is replicated across multiple servers in an Availability
Zone to prevent the loss of data from the failure of any single component.
Volumes <20GB modified data since last snapshot have an annual failure rate of
0.1% - 0.5%, resulting in complete loss of the volume.
Commodity hard disks have an AFR of about 4%.
Amazon claims about Glacier is the same as S3:
Amazon S3 is designed to sustain the concurrent loss of data in two facilities, e.g.
3+ copies across multiple available domains PLUS periodic internal integrity
checks.
99.999999999% durability of objects over a given year.
Beware of oversimplified arguments about low-probability events!
DynamoDB
Simple table like storage
Weak schema
Back-end: a key-value store
Features
Scalable: Dynamo architecture
Reliable
Replicas over multiple data centers
Speed
Fast, single-digit milliseconds
Secure
Data Model
table
Container, similar to a worksheet in excel,
Cannot query across domains
Item
Item name
item name ->(Attribute, value) pairs
An item is stored in a domain (a row in a
worksheet. Attributes are column names)
Example
domain: “cars”
Item 1: “car1”:{“make”:”BMW”, “year”:”2009”}
Primary key of table
Single key (hash)
Hash-range key
A pair of attributes: first one is hash key,
2nd one is range key.
Example: Reply(Id, datetime, …)
Data type
Simple: string and number
Multi-valued: string set and number set
example
Access methods
Amazon DynamoDB is a web service that
uses HTTP and HTTPS as the transport
method
JavaScript Object Notation (JSON) as a
message serialization format
APIs
Java, PHP, .Net
Boto
Simple Queue Service
Store messages traveling between
computers
Make it easy to build automated
workflows
Implemented as a web service
read/add messages easily
Scalable to millions of messages a day
Some features
Message body : <8Kb in any format
Message is retained in queues for up to
4days
Messages can be sent and read
simultaneously
Can be “locked”, keeping from simultaneous
processing
Accessible with SOAP/REST
Simple: Only a few methods
Secure sharing
A typical workflow
Workflow with AWS
Summary
AWS is the most popular IaaS
You can use AWS to construct a computing
workflow
Computing nodes
Storages
Message passing