BASICS OF DISTRIBUTED SYSTEMS
INTRODUCTION TO THE COURSE
DEFINITION OF DISTRIBUTED SYSTEM
A collection of independent computers connected through a
communication network that work together to accomplish some goal
No shared operating system
No shared memory
No shared clock
SINGLE SYSTEM IMAGE
Collection of independent computers that appears as a single system
to the user(s)
Independent = autonomous, self-contained
Single system = user not aware of distribution
FLYNN’S TAXONOMY (1966)
Classify computer architectures by looking at the number of instruction streams and number of data streams
SISD — Single Instruction, Single Data stream
Traditional uniprocessor systems
SIMD — Single Instruction, Multiple Data streams
Array (vector) processors
Examples:
GPUs – Graphical Processing Units for computer graphics, GPGPU (General Purpose GPU): AMD/ATI, NVIDIA
AVX: Intel’s Advanced Vector Extensions
MISD — Multiple Instructions, Single Data stream
Multi-layer data processing
MIMD — Multiple Instruction, Multiple Data streams – Multiple computers, each with a program counter, program (instructions),
data
Parallel and distributed systems
SUBCLASSIFYING MIMD
Memory
Shared memory systems: multiprocessors
No shared memory: networks of computers, multicomputers
Interconnect
Bus
Switch
Delay/bandwidth
Tightly coupled systems
Loosely coupled systems
MULTIPROCESSORS AND MULTICOMPUTERS
Multiprocessors
Shared memory
Shared clock
Shared operating system
All-or-nothing failure
Multicomputers (networks of computers) ⇒ distributed systems
No shared memory
No shared clock
Partial failures
Inter-computer communication mechanism needed: the network
WHY DO WE WANT DISTRIBUTED SYSTEMS?
Scale
Collaboration
Reduced latency
Mobility
High availability & Fault tolerance
Incremental cost
Delegated infrastructure & operations
SCALE – INCREASING PERFORMANCE
Computers are
getting faster
Moore's Law
prediction:
performance doubles
approximately every
18 months because
of faster transistors
and more transistors
per chip
SCALING A SINGLE SYSTEM HAS LIMITS
Getting harder for technology to keep up with Moore's law
More cores per chip → requires multithreaded programming
There are limits to the die size and # of transistors
Intel Xeon W-3175X CPU: 28 cores per chip ($2,999/chip!)
8 billion transistors, 255 watts @ 3.1-4.3 GHz
AMD EPYC 7601 CPU: 32 cores per chip ($4,200/chip)
19.2 billion transistors, 180 watts
NVIDIA GeForce RTX 2080 Ti: 4,352 CUDA cores per chip
Special purpose apps: Graphics rendering, neural networks
OUR COMPUTING NEEDS EXCEED CPU ADVANCES
Google
Over 63,000 search queries per second on average
Over 130 trillion pages indexed
Uses hundreds of thousands of servers to do this
In 1999, it took Google one month to crawl and build an index of about 50 million pages
In 2012, the same task was accomplished in less than one minute.
16% to 20% of queries that get asked every day have never been asked before
Every query has to travel on average 1,500 miles to a data center and back to return the answer to the user
A single Google query uses 1,000 computers in 0.2 seconds to retrieve an answer
Facebook
Approximately 100M requests per second with 4B users
COLLABORATION AND CONTENT
Systems exchange their data and functionality
Metcalfe’s Law states the value of a telecommunications network is
proportional to the square of the number of connected users of the
system.
REDUCED LATENCY
Cache data close to where it is needed
Caching vs. replication
Replication: multiple copies of data for increased fault tolerance
Caching: temporary copies of frequently accessed data closer to where it’s
needed
Some caching services:
Akamai, Cloudflare, Amazon Cloudfront, Apache Ignite
MOBILITY
3.5 billion smartphone users
Remote sensors
Cars
Traffic cameras
Toll collection
Shipping containers
Vending machines
IoT = Internet of Things
2017: more IoT devices than
humans
HIGH AVAILABILITY
Redundancy = replicated components
Service can run even if some systems die
If P(any one system down) = 5%
P(two systems down at the same time) = 5% × 5% = 0.25%
Uptime = 1 – downtime = 1 – 0.0025 = 99.75%
We get 99.7% uptime instead of 95% because we need both
replicated components to fail instead of just one.
HIGH AVAILABILITY?
No redundancy = dependence on all components
If we need all systems running to provide a service
P(any system down) = 1 - P( A is up AND B is up )
= 1 - (1-5%) × (1-5%) = 1 - 0.95 × 0.95 = 9.75%
⇒ 39x greater than a single component failure with redundancy!
Uptime = 1 – downtime = 1 – 0.0975 = 90.25%
With a large # of systems, P(any system down) approaches 100% !
Requiring a lot of components to be up & running is a losing proposition.
With large enough systems, something is always breaking!
AVAILABILITY REQUIRES FAULT TOLERANCE
Fault tolerance
Identify & recover from component failures
Recoverability
Software can restart and function
May involve restoring state
INCREMENTAL COST
Version 1 does not have to be the full system
Add more servers & storage over time
Scale also implies cost – you don’t need millions of $ for v1.0
DELEGATED OPERATIONS
Offload responsibility
Let someone else manage systems
Use third-party services
Speed deployment
Don’t buy & configure your own systems
Don’t build your own data center
Modularize services on different systems
Dedicated systems for storage, email, etc.
Use cloud, network attached storage
Let someone else figure out how to expand storage and do backups
TRANSPARENCY
High level: hide distribution from users
Low level: hide distribution from software
Location transparency
Users don’t care where resources are
Migration transparency
Resources move at will
Replication transparency
Users cannot tell whether there are copies of resources
Concurrency transparency
Users share resources transparently
Parallelism transparency
Operations take place in parallel without user’s knowledge
CORE CHALLENGES IN DISTRIBUTED SYSTEMS DESIGN
Concurrency
Latency
Partial Failure
Security
CONCURRENCY
Lots of requests may occur at the same time
Need to deal with concurrent requests
Need to ensure consistency of all data
Understand critical sections & mutual exclusion
Beware: mutual exclusion (locking) can affect performance
Replication adds complexity
All operations must appear to occur in the same order on all replicas
LATENCY
Synchronous network model
There is some upper bound, T, between when a node sends a message and another node receives it
Knowing T enables a node to distinguish between a node that has failed and a node that is taking a long time to
respond
Partially synchronous network model
There’s an upper bound for message communication but the programmer doesn’t know it – it has to be
discovered
Protocols will operate correctly only if all messages are received within some time, T
We cannot make assumptions on the delay time distribution
Asynchronous network model
Messages can take arbitrarily long to reach a peer node
This is what we get from the Internet
LATENCY AND ASYNCHRONOUS NETWORKS
Asynchronous networks can be a pain
Messages may take an unpredictable amount of time
We may think a message is lost but it’s really delayed
May lead to retransmissions duplicate messages
May lead us to assume a service is dead when it isn’t
May mess with our perception of time
May cause messages to arrive in a different order … or a different order on
different systems
LATENCY
Speed up data access via caching – temporary copies of data
Keep data close to where it’s processed to maximize efficiency
Memory vs. disk
Local disk vs. remote server
Remote memory vs. remote disk
Cache coherence: cached data can become stale
Underlying data can change → cache needs to be invalidated
System using the cache may change the data → propagate results
Write-through cache
But updates take time ⇒ can lead to inconsistencies (incoherent views)
PARTIAL FAILUTE
Failure is a fact of life in distributed systems!
In local systems, failure is usually total (all-or-nothing)
In distributed systems, we get partial failure
A component can fail while others continue to work
Failure of a network link is indistinguishable from a remote server failure
Send a request but don't get a response ⇒ what happened?
No global state
There is no global state that can be examined to determine errors
There is no agent that can determine which components failed and inform everyone else
Need to ensure the state of the entire system is consistent after a failure
HANDLING FAILURES
Handle detection, recovery, and restart
Availability = fraction of time system is usable
Achieve with redundancy
But then consistency is an issue!
Reliability: data must not get lost
Includes security
SYSTEM FAILURE TYPES
Fail-stop
Failed component stops functioning
Halting = stop without notice
Detect failed components via timeouts
But you can’t count on timeouts in asynchronous networks
And what if the network isn’t reliable?
Sometimes we guess
Fail-restart
Component stops but then restarts
Danger: stale state
NETWORK FAILURE TYPES
Omission
Failure to send or receive messages
Due to queue overflow in router, corrupted data, receive buffer overflow
Timing
Messages take longer than expected
We may assume a system is dead when it isn't
Unsynchronized clocks can alter process coordination
Partition
Network fragments into two or more sub-networks that cannot communicate with each other
NETWORK AND SYSTEM FAILURE TYPES
Fail-silent
A failed component (process or hardware) does not produce any output
Byzantine failures
Instead of stopping, a component produces faulty data
Due to bad hardware, software, network problems, or malicious interference
Goal: avoid single points of failure
REDUNDANCY
We deal with failures by adding redundancy
Replicated components
But this means we need to keep the state of those components
replicated
STATE, REPLICAS, AND CACHES
State
Information about some component that cannot be reconstructed
Network connection info, process memory, list of clients with open files, lists of which
clients finished their tasks
Replicas
Redundant copies of data → used to address fault tolerance
Cache
Local storage of frequently-accessed data to reduce latency → used to address latency
NO GLOBAL KNOWLEDGE
Nobody has the true global state of a system
There is no global state that can be examined to determine errors
There is no agent that can determine which components failed and inform everyone
else
No shared memory
A process knows its current state
It may know the last reported state of other processes
It may periodically report its state to others
No foolproof way to detect failure in all cases
SECURITY
The environment
Public networks, remotely-managed services, 3rd party services
Some issues
Malicious interference, bad user input, impersonation of users & services
Protocol attacks, input validation attacks, time-based attacks, replay attacks
Rely on authentication, cryptography (hashes, encryption) … and good defensive
programming!
Users also want convenience
Single sign-on, no repeated entering of login credentials
Controlled access to services
KEY APPROACHES IN DISTRIBUTED SYSTEMS
Divide & conquer
Break up data sets (sharding) and have each system work on a small part
Merging results is usually the easy & efficient part
Replication
For high availability, caching, and sharing data
Challenge: keep replicas consistent even if systems go down and come up
Quorum/consensus
Enable a group to reach agreement
QUESTIONS?
NOW, BY E-MAIL, …