Content Delivery Networks (CDN)
Content Delivery Networks (CDN)
• What: Geographically distributed network of Web
  servers around the globe (by an individual provider,
  E.g. Akamai).
• Why: Improve the performance and scalability of
  content retrieval.
• How: Allow several content providers to replicate
  their content in a network of servers.
•
              Web Browser Caching
   Web browsers have their own caches. When a page is
    downloaded from a site the web page is put into the
    browser cache.
   This is especially useful in those cases when the back
    button is pressed.
   If a new copy is needed then a “refresh” can be done.
   No page stays permanently in the cache. There is limited
    room.
       A replacement algorithm is needed to determine which cached
        page should be purged.
       Content Distribution Networks
                   (CDN)
   Business Model: A content provider such as
    www.cnn.com or Yahoo pays a CDN
    company (such as Akamai) to get its
    content to the requesting users with short
    delays.
   A CDN provides a mechanism for
        Replicating content on multiple servers in the
         Internet
        Providing clients with a means to determine the
         servers that can deliver the content fastest.
                    Terminology
   Content: Any publicly accessible combination of text,
    images, applets, frames, MP3, video, flash, virtual
    reality objects, etc.
   Content Provider: Any individual, organization, or
    company that has content that it wishes to make
    available to users.
   Origin Server: Content provider’s server , where the
    content is first uploaded.
   Surrogate Server (sometimes called edge server):
    Content distributor’s server, where the replicated
    content is kept.
A Big Picture
  CDNs – Content Delivery
       Networks (1)
CDNs scale Web servers by having clients get
 content from a nearby CDN node (cache)
Content Delivery Networks (2)
Directing clients to nearby CDN nodes with DNS:
  – Client query returns local CDN node as response
  – Local CDN node caches content for nearby clients
    and reduces load on the origin server
Content Delivery Networks (3)
Origin server rewrites pages to serve content
 via CDN
    Traditional Web page on server
   Page that distributes content via CDN
                   CDN – why?
• One of the main goals of CDNs is to put content
  provider in control over how her content is cached
• Content provider signs a contract with CDN
  – Contract specifies how content can be cached
• Contract also means CDN will follow what content
  provider wants
• CDNs typically charge per-byte of traffic served
• CDNs can be used for any kind of content
  – Typically main use is for web content
  – Streaming media has also been delivered over CDNs
               How Akamai Works
• Clients fetch html document from primary server
   – E.g. fetch index.html from cnn.com
   –
• URLs for replicated content are replaced in HTML
   – E.g. <img src=“http://cnn.com/af/x.gif”> replaced with
     <img src=http://a73.g.akamai.net/7/23/cnn.com/af/x.gif>
   – Or, cache.cnn.com, and CNN adds CNAME (alias) for
     cache.cnn.com  a73.g.akamai.net
   –
• Client resolves aXYZ.g.akamaitech.net hostname
   – Maps to a server in one of Akamai’s clusters
                                                               11
   Current Akamai Customers
     Content Distribution Networks (CDNs)
• Content providers are CDN customers                     origin server
                                                        in North America
Content replication
• CDN company installs thousands of
  servers throughout Internet
   – In large datacenters                         CDN distribution node
   – Or, close to users
• CDN replicates customers’ content
• When provider updates content, CDN
  updates servers
                                         CDN server
                                        in S. America                      CDN server
                                                          CDN server         in Asia
                                                           in Europe
                                                                                  13
                          Problems
• Significant fraction (>50%?) of HTTP objects uncachable
•
• Sources of dynamism?
   –   Dynamic data: Stock prices, scores, web cams
   –   CGI scripts: results based on passed parameters
   –   Cookies: results may be based on passed data
   –   SSL: encrypted data is not cacheable
   –   Advertising / analytics: owner wants to measure # hits
        • Random strings in content to ensure unique counting
        •
• But…much dynamic content small, while static content
  large (images, video, .js, .css, etc.)
                                                                14
        Content Distribution Networks &
                Server Selection
• Replicate content on many servers
•
• Challenges
  – How to replicate content
  – Where to replicate content
  – How to find replicated content
  – How to choose among know replicas
  – How to direct clients towards replica
                                            15
             Server Selection
• Which server?
  – Lowest load: to balance load on servers
  – Best performance: to improve client performance
     • Based on Geography? RTT? Throughput? Load?
  – Any alive node: to provide fault tolerance
• How to direct clients to a particular server?
  – As part of routing: anycast, cluster load balancing
  – As part of application: HTTP redirect
  – As part of naming: DNS
                                                          16
    DNS Redirection vs. URL Rewrite
   Discussion
   Comparison?
                       How Akamai Works
cnn.com (content provider)       DNS root server
                 GET foo.jpg
                                 11
                        12
                                                                                    Akamai
                                                           Akamai global
                                                           DNS server
                                                                                    cluster
                                                   5
        1    2          3
 HTTP                                              6
                             4                                    Akamai regional
                                                       7          DNS server
                                            9                                       Nearby
                                                                                    Akamai
      End-user   GET
                 Host:/foo.jpg
                       cache.cnn.com                                                cluster
                              Players
Yahoo,
MSNBC,
CNN         Content Provider
CBC                                                      Akamai,
                                          Content
                                         Distributor
  Cisco,
  Oracle-        H/W and S/W
  Sun               Vendor                                         Bell
                                                       Hosting
                                                       Provider