Content Delivery Networks (CDN)
Content Delivery Networks (CDN)
• What: Geographically distributed network of Web
servers around the globe (by an individual provider,
E.g. Akamai).
• Why: Improve the performance and scalability of
content retrieval.
• How: Allow several content providers to replicate
their content in a network of servers.
•
Web Browser Caching
Web browsers have their own caches. When a page is
downloaded from a site the web page is put into the
browser cache.
This is especially useful in those cases when the back
button is pressed.
If a new copy is needed then a “refresh” can be done.
No page stays permanently in the cache. There is limited
room.
A replacement algorithm is needed to determine which cached
page should be purged.
Content Distribution Networks
(CDN)
Business Model: A content provider such as
www.cnn.com or Yahoo pays a CDN
company (such as Akamai) to get its
content to the requesting users with short
delays.
A CDN provides a mechanism for
Replicating content on multiple servers in the
Internet
Providing clients with a means to determine the
servers that can deliver the content fastest.
Terminology
Content: Any publicly accessible combination of text,
images, applets, frames, MP3, video, flash, virtual
reality objects, etc.
Content Provider: Any individual, organization, or
company that has content that it wishes to make
available to users.
Origin Server: Content provider’s server , where the
content is first uploaded.
Surrogate Server (sometimes called edge server):
Content distributor’s server, where the replicated
content is kept.
A Big Picture
CDNs – Content Delivery
Networks (1)
CDNs scale Web servers by having clients get
content from a nearby CDN node (cache)
Content Delivery Networks (2)
Directing clients to nearby CDN nodes with DNS:
– Client query returns local CDN node as response
– Local CDN node caches content for nearby clients
and reduces load on the origin server
Content Delivery Networks (3)
Origin server rewrites pages to serve content
via CDN
Traditional Web page on server
Page that distributes content via CDN
CDN – why?
• One of the main goals of CDNs is to put content
provider in control over how her content is cached
• Content provider signs a contract with CDN
– Contract specifies how content can be cached
• Contract also means CDN will follow what content
provider wants
• CDNs typically charge per-byte of traffic served
• CDNs can be used for any kind of content
– Typically main use is for web content
– Streaming media has also been delivered over CDNs
How Akamai Works
• Clients fetch html document from primary server
– E.g. fetch index.html from cnn.com
–
• URLs for replicated content are replaced in HTML
– E.g. <img src=“http://cnn.com/af/x.gif”> replaced with
<img src=http://a73.g.akamai.net/7/23/cnn.com/af/x.gif>
– Or, cache.cnn.com, and CNN adds CNAME (alias) for
cache.cnn.com a73.g.akamai.net
–
• Client resolves aXYZ.g.akamaitech.net hostname
– Maps to a server in one of Akamai’s clusters
11
Current Akamai Customers
Content Distribution Networks (CDNs)
• Content providers are CDN customers origin server
in North America
Content replication
• CDN company installs thousands of
servers throughout Internet
– In large datacenters CDN distribution node
– Or, close to users
• CDN replicates customers’ content
• When provider updates content, CDN
updates servers
CDN server
in S. America CDN server
CDN server in Asia
in Europe
13
Problems
• Significant fraction (>50%?) of HTTP objects uncachable
•
• Sources of dynamism?
– Dynamic data: Stock prices, scores, web cams
– CGI scripts: results based on passed parameters
– Cookies: results may be based on passed data
– SSL: encrypted data is not cacheable
– Advertising / analytics: owner wants to measure # hits
• Random strings in content to ensure unique counting
•
• But…much dynamic content small, while static content
large (images, video, .js, .css, etc.)
14
Content Distribution Networks &
Server Selection
• Replicate content on many servers
•
• Challenges
– How to replicate content
– Where to replicate content
– How to find replicated content
– How to choose among know replicas
– How to direct clients towards replica
15
Server Selection
• Which server?
– Lowest load: to balance load on servers
– Best performance: to improve client performance
• Based on Geography? RTT? Throughput? Load?
– Any alive node: to provide fault tolerance
• How to direct clients to a particular server?
– As part of routing: anycast, cluster load balancing
– As part of application: HTTP redirect
– As part of naming: DNS
16
DNS Redirection vs. URL Rewrite
Discussion
Comparison?
How Akamai Works
cnn.com (content provider) DNS root server
GET foo.jpg
11
12
Akamai
Akamai global
DNS server
cluster
5
1 2 3
HTTP 6
4 Akamai regional
7 DNS server
9 Nearby
Akamai
End-user GET
Host:/foo.jpg
cache.cnn.com cluster
Players
Yahoo,
MSNBC,
CNN Content Provider
CBC Akamai,
Content
Distributor
Cisco,
Oracle- H/W and S/W
Sun Vendor Bell
Hosting
Provider