0 ratings 0% found this document useful (0 votes) 89 views 36 pages System Design Concepts
The document outlines key concepts in system design, including sharding, bottlenecks, the CAP theorem, scaling methods, RESTful API principles, and the differences between REST and SOAP. It also covers metrics for measuring system performance, caching strategies, and the role of message queues in distributed systems. This information is essential for preparing for system design interviews in the tech industry.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save System Design Concepts For Later A_ BOSSCODER
ACADEMY
IMPORTANT
SYSTEM
DESIGN
CONCEPTS
Step]
Planning
Step7
Maintenance
& Support
Step2
Feasibility Study
Step 6
Deployment
Step3
System Design
Step5
Testing
Step4
Implementaion*Disclaimer*
System Design is the most asked topic in tech
interviews.
So, make sure you prepare it thoroughly.
Take the help of this doc and ace your System
Design Interviews.
A BOSSCODER
WAT acavemy www.bosscoderacademy.com(ole) 186) |
What is sharding in system
design?
Sharding is a technique to horizontally scale a database by
splitting it into multiple, independent servers called shards.
® Benefits:
© Improved performance: Each shard handles less data,
leading to faster queries and processing.
e Increased scalability: Adding more shards easily expands
the database capacity.
& Example:
Imagine a social media app with millions of users. Storing all
data in one server would be overwhelming. We can shard based
on user ID ranges:
@ Shard 1: Users with IDs 1-1 million
e Shard 2: Users with IDs 1 million-2 million
Now:
e User queries are directed to the specific shard holding their
data.
e Individual shards handle smaller workloads, improving
performance.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 2« Adding more shards scales the database as the user base
grows.
However, sharding also brings complexity:
© Data distribution: Determining the right sharding key and
managing data movement across shards is crucial.
e Joins and complex queries: Combining data from multiple
shards may require additional logic and can be slower.
A BOSSCODER
WAT acavemy www.bosscoderacademy.comWhat are bottlenecks in system
design?
Bottlenecks in system design are points or components that
restrict overall performance. Identifying and addressing them is
crucial for optimizing efficiency.
Common bottlenecks include:
1. CPU Bottleneck: When the CPU cannot process data fast
enough to meet system demands, it becomes a limiting
factor.
ny
Memory Bottleneck: Insufficient RAM or slow memory
access can hinder data storage and retrieval, slowing down
the system.
w
Storage Bottleneck: Slow read/write speeds or limited
storage capacity can impede system performance,
especially in data-heavy applications.
4. Database Bottleneck: Inefficient database queries, poor
indexing, or database contention can significantly affect
application performance, particularly in database-
dependent systems.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 4What is CAP Theorem? (22D
The CAP theorem (Brewer’s theorem) states that a distributed
system or database can provide only two out of the following
three properties:
Consistency: Similar to ACID Properties, Consistency means
that the state of the system before and after transactions
should remain consistent.
Availability: This states that resources should always be
available, there should be a non-error response.
Partition tolerance: Even when the network communication
fails between the nodes in a cluster, the system should work
well.
By the CAP theorem, all of these three properties cannot be
achieved at the same time.
Consistency
CA Category
Network Problem might
stop the system
ExcRDBMS(Oracle SQL Server MySQL)
CP Category
‘There is a risk of some data
becoming unavailable
x: MonogoDB Hbase
Memeache Big table Redis
Partition Availability
Tolerance
AP Category
Clients may reed inconsistent data
x: Cassandra RIAK CouchDB
AL BOSSCODER www.bosscoderacademy.com
ACADEMYQUESTION-4
What is the difference between
horizontal and vertical scaling? CD
Horizontal scaling involves adding more machines or nodes to
a system to distribute the load and increase performance.
Vertical scaling involves increasing the resources (CPU, RAM,
storage, etc.) on a single machine to improve its performance.
Fundamentals Horizontal Scaling Vertical Scaling
In vertical sealing, the data
Scaling horizontally resides on a single node, and
Data typically relies on data sealing is accomplished by
Management | partitioning as each node multi-core, primarily by
only contains part of the distributing the load among
data. the machine’s CPU and RAM
resources.
Cassandra, MongoDB,
Examples Google Cloud Spanner MySQL and Amazon RDS
You can scale with less here anticeh ec physical
downtime by adding imit to vertical scaling,
: which is the scale of the
additional computers to the 4
. + Ib current hardwareVertical
Downtime tro no longer ontrained scaling is restricted to the
possibility i capacity of one machine
y the capacity ofasingle | ecause expanding over that
evice. limit can result in downtime.
A_ BOSSCODER
ACADEMY
www.bosscoderacademy.com 6Fundamentals Horizontal Scaling
Vertical Scaling
As it entails distributing
jobs among devices over a
network is known as
It involves the Actor model:
Multi-threading and in-
process message
Concurrency | distributed programming. forwarding are frequently
Models Several patterns are used to implement
connected to this model: concurrent programming
MapReduce, Master/ on multi-core platforms.
Worker*, Blackboard, and
many spaces.
Data sharing is more
difficult in distributed Data sharing and message
computing because there | Passing Can be
Message isn't a shared address accomplished by passing a
Passing space. Since you will send | reference in a multi-threaded
copies of the data, it also
increases the cost of
sharing, transferring, or
updating data.
scenario since it is
reasonable to presume that
there is a shared address
space.
Choosing between horizontal and vertical scaling depends on
your specific needs. Here are some general guidelines:
e Use horizontal scaling for:
a. Handling high workloads and surges in traffic.
b. Building highly resilient and available systems.
c. Processing large datasets efficiently.
e Use vertical scaling for:
a. Simple workloads and applications.
b. Rapid deployment and testing.
c. Cost-efficiency for low workloads.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 7Describe the RESTful API design
principles. C2=D
1. Uniform Interface: Consistent resource naming and actions
using HTTP methods (GET, POST, PUT, DELETE).
2. Client-Server: Separation of concerns between clients
making requests and servers handling them.
3. Statelessness: Each request contains all information
needed, servers don't "remember" past requests.
4. Cacheable: Resources can be cached by clients or
intermediaries for better performance.
5. Layered System: Intermediaries can be placed between
clients and servers without affecting communication.
6. Code on Demand (Optional): Servers can send executable
code to clients to extend functionality.
These principles lead to well-designed, predictable, and
scalable APIs.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.comQUESTION-6
How can you select which
webservice to use between REST
and SOAP?
When deciding between SOAP and REST for web services,
consider the following factors:
1. Nature of Data/Logic Exposure:
SOAP: Used for exposing business logic.
REST: Used for exposing data.
2. Formal Contract Requirement:
SOAP: Provides strict contracts through WSDL.
REST: No strict contract requirement.
3. Data Format Support:
SOAP: Limited support.
REST: Supports multiple data formats.
4. AJAX Call Support:
SOAP: No direct support.
REST: Supports XMLHttpRequest for AJAX calls.
5. Synchronous/Asynchronous Requests:
SOAP: Supports both sync and async.
REST: Supports only synchronous calls.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com6. Statelessness Requirement:
SOAP: No.
REST: Yes.
7. Security Level:
SOAP: Preferred for high-security needs.
REST: Security depends on underlying implementation.
8. Transaction Support:
SOAP: Provides advanced support for transactions.
REST: Limited transaction support.
o
Bandwidth/Resource Usage:
SOAP: High bandwidth due to XML data overhead.
REST: Uses less bandwidth.
10. Development and Maintenance Ease:
SOAP: More complex.
REST: Known for simplicity, easy development, testing, and
maintenance.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 10QUESTION-7
What are Content Delivery
Networks (CDN) ? €2=D
e ACDN is a geographically distributed network of servers
strategically placed worldwide that cache and deliver static
content (images, videos, JavaScript, CSS, etc.) to users,
optimising performance and availability.
e Key concepts:
a. Edge servers: Physically located at Points of Presence
(PoPs) closer to users for faster content delivery.
b. Caching: Stores frequently accessed content on edge
servers, reducing origin server load and latency.
c. Routing: Directs user requests to the nearest edge server
with cached content.
d. Security: Can provide DDoS protection, SSL/TLS
offloading, and other security features.
Types of CDNs:
1. Pull CDNs:
e Edge servers fetch content from the origin server upon user
request.
* Suitable for content with low update frequency or high
demand spikes.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 112. Push CDNs:
© Content proactively pushed to edge servers, ensuring
immediate availability.
e Ideal for dynamic content that changes frequently or has
high peak periods.
A BOSSCODER
WAT acavemy www.bosscoderacademy.comQUESTION-8
Explain the difference between
data sharding and database
partitioning? (==
Both partitioning and sharding are techniques for managing
large datasets in databases. However, they differ in their scope
and implementation:
Partitioning:
© Definition: Dividing a table into smaller logical segments
based on specific criteria.
e Location: Within a single database server.
© Types: Vertical (split columns) and horizontal (split rows).
© Benefits: Improved query performance for specific types of
queries.
© Drawbacks: Limited scalability, potential performance
bottlenecks on a single server.
Sharding:
© Definition: A specific type of horizontal partitioning where
data is distributed across multiple separate servers.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 13e Location: Each partition (shard) resides on a dedicated
server.
© Benefits: Excellent scalability, high availability, faster query
execution due to true parallelism.
© Drawbacks: Increased complexity, potential data
consistency issues, higher infrastructure costs.
Choosing the right technique:
« For smaller datasets or non-critical applications, partitioning
within a single server might be sufficient.
© For large, heavily accessed datasets requiring high
scalability and performance, sharding across multiple
servers is the ideal choice.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 14QUESTION-9
Explain availability, reliability,
latency, performance, throughput in
system design.
1. Availability: This refers to the percentage of time your
system is up and running, accessible to users, and fulfilling its
intended purpose. High availability systems aim for near-
constant uptime, often measured in "nines" (e.g., 99.99%
uptime).
2. Reliability: This builds on availability and focuses on the
system's consistency in delivering correct and expected results.
A reliable system performs as intended without errors or
unexpected behavior, even under varying conditions.
3. Latency: This signifies the time it takes for a single request or
operation to complete within the system. Think of it as the
response time, measured in milliseconds (ms) or seconds (s).
Low latency is crucial for real-time interactions and user
experience.
4. Performance: This encompasses the overall responsiveness
and efficiency of your system. It considers factors like latency,
throughput, resource utilization, and scalability. A performant
system handles requests quickly, utilizes resources effectively,
and scales smoothly with increased demands.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 155. Throughput: This measures the number of requests or
operations your system can successfully process per unit of
time. It's often expressed in requests per second (RPS) or
transactions per second (TPS). High throughput ensures the
system can handle large volumes of users or data.
A, BOSSCODER
ACADEMY. www.bosscoderacademy.comQUESTION-10
What is consistency? What are its
different types? (E™D
Consistency refers to the agreement between multiple copies
of the same data stored across different servers. It ensures
users have the same view of the data, regardless of their
location.
There are three main types of consistency:
1. Eventual Consistency:
Reads may not immediately reflect the latest write, but
eventually (within milliseconds) will.
© Think of sending an email: the recipient won't see it
instantly, but it will arrive eventually.
e Used in scalable systems like DNS and email due to its high
availability and low cost.
2. Strong Consistency:
e Reads always reflect the latest write, meaning data is
replicated synchronously across all servers.
e Provides strict consistency but sacrifices scalability and
performance.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 7e Used in systems like databases and filesystems where data
integrity is crucial.
3. Weak Consistency:
« Focuses on fast access, allowing reads to potentially see
outdated data.
e No well-defined rules for data updates, so different servers
might return different values.
e Used in real-time applications like VoIP and video chat
where speed is prioritized over absolute data accuracy.
Choosing the right consistency model depends on your
application's specific needs:
e Prioritize accuracy? Choose strong consistency.
© Need high availability and scalability? Consider eventual
consistency.
© Speed is essential? Weak consistency might be acceptable.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.comQUESTION-11
What are some metrics for
measuring system performance? (=
User Experience:
@ Apdex Score: Blends response time and success rate for
user satisfaction.
« Average Response Time: How long it takes the system to
answer user requests.
System Health:
@ Throughput: Number of requests handled per unit of time
(helps determine scaling needs).
© Availability/Uptime: Percentage of time the system is
operational.
e Error Rates: Frequency of errors encountered during
operation.
Resource Utilization:
e Latency & CPU: Monitors data transfer speed and CPU
usage to identify bottlenecks.
Garbage Collection: Tracks memory management
efficiency.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 19Others:
© Database Performance: Monitor key database metrics for
optimal operation.
@ Network Performance: Track bandwidth, packet loss, and
latency for network reliability.
Security Metrics: Monitor events and access controls for
system security.
A_ BOSSCODER
WAT acavemy www.bosscoderacademy.com
20QUESTION-12
What is caching in system design?
Explain types of caching based on
data consistency. (=)
Caching is a powerful optimization technique that stores
frequently accessed data in a temporary location closer to the
requesting process. This location, called a cache, is generally
faster to access than the original data source, significantly
improving system performance and scalability.
Based on Data Consistency:
Write-through:
e Writes happen to both cache and database simultaneously.
© Pros: Fast reads, high data consistency.
® Cons: High write latency due to double writes.
Write-around:
e Writes bypass the cache and go directly to the database.
© Pros: Potentially lower write latency.
e Cons: Increased cache misses, leading to higher read
latency for frequently written and re-read data.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 21Write-back:
e Writes occur only in the cache, confirmed immediately. Data
is asynchronously synced to the database later.
e Pros: Lower write latency, higher throughput for write-
intensive workloads.
© Cons: Risk of data loss if the cache crashes before syncing.
Improved reliability with multiple write acknowledgements.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 22QUESTION-13
What are message queues in
system design? (=D
Message queues are a form of service-to-service
communication that facilitates asynchronous communication.
They are used to effectively manage requests in large-scale
distributed systems.
Here are some features of message queues:
e Push or Pull Delivery: Provides options for continuous
querying (pull) or notification-based retrieval (push), with
support for long-polling.
e FIFO (First-In-First-Out) Queues: Processes the oldest entry
in the queue first.
e Schedule/Delay Delivery: Allows setting specific delivery
times, including common delays using delay queues.
e At-Least-Once Delivery: Ensures message delivery by storing
multiple copies and resending in case of failures.
e Exactly-Once Delivery: Filters out duplicates to guarantee
single delivery.
e Dead-letter Queues: Stores unprocessable messages for
further inspection without blocking.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 23* Ordering: Delivers messages in sender's order, ensuring each
message is received at least once.
« Task Queues: Executes tasks in the background, supporting
scheduling and intensive computations.
A_ BOSSCODER
WAT acavemy www.bosscoderacademy.com 24What is N-tier architecture in
system design? 2D
N-tier architecture divides an application into logical layers and
physical tiers to manage dependencies and separate
responsibilities.
Tiers in N-tier architecture are physically separated, possibly
running on different machines. Communication between tiers
can be direct or asynchronous, improving scalability and
resilience while introducing latency due to network
communication.
Two types of N-tier architectures exist:
e Closed-layer architecture restricts layers to call only the
immediately lower layer.
@ Open-layer architecture allows a layer to call any layer
below it.
N-Tier architecture examples:
e 3-Tier architecture: Comprising Presentation, Business
Logic, and Data Access layers.
© 2-Tier architecture: Involving client-side Presentation layer
communicating directly with a data store.
© Single Tier (1-Tier) architecture: Simplest form where all
components run on a single server or application.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 25QUESTION-15
What are virtual machines? What
are the benefits of using virtual
machines? (22D
e AVM is a virtual environment mimicking a computer system
with its own CPU, memory, network interface, and storage,
hosted on physical hardware.
e A hypervisor separates hardware resources from the VM and
manages them.
Benefits of Using VMs:
* Server Consolidation: Virtualizing servers enables efficient
utilization of physical resources by hosting multiple VMs on
one server.
« Isolation: VMs provide a segregated environment,
preventing interference between VMs and host hardware.
© Testing and Production Environments: VMs are ideal for
testing new applications or setting up production
environments due to their isolated nature.
© Specific Use Cases: Single-purpose VMs can be employed
to support particular functions.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 26QUESTION-16
What are containers? What are
their advantages? (2D
Containers package code and dependencies to ensure
applications run consistently across different computing
environments.
Need for Containers:
© Separation of Responsibility: Allows developers to focus on
application logic while operations teams handle
deployment.
e Workload Portability: Containers can run anywhere,
simplifying development and deployment processes.
© Application Isolation: Virtualizes resources at the OS level,
providing logical isolation for applications.
© Agile Development: Enables rapid development by
eliminating concerns about dependencies and
environments.
© Efficient Operations: Lightweight containers optimize
resource usage, ensuring efficient computing.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 27QUESTION-17
What is rate limiting? Explain a few
algorithms used in API rate
limiting.
Rate limiting is a crucial strategy in large-scale systems to
prevent the frequency of operations from surpassing a defined
threshold.
It safeguards our APIs from both unintentional and malicious
overuse by constraining the quantity of requests permitted to
access our API within a specified timeframe.
The following are few API Rate Limiting algorithms:
1. Leaky Bucket:
Utilizes a queue system where requests
are processed at a constant rate.
Additional requests are discarded if the
queue is full.
n
Token Bucket:
Requests require tokens
from a bucket for processng.
Refuses requests if no tokens
are available, refreshing
the bucket over time.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 283. Fixed Window:
22:45 2345
Uses a fixed time window
to track request rates. 2 a a |
22 mint widow —— >
Requests exceeding a ere Rend
threshold within the window Total Count bucket 22 Total Countin bucket 2980 far
are discarded.
a
Sliding Log:
Uses a fixed time window to
track request rates. bE EI
Requests exceeding a OI Nene nee
threshold within the window E
are discarded.
4. Sliding Window:
Combines elements of fixed Sliding Window Algorithm
window and sliding log
approaches. susngwinsow [16] 22] 9 | 49| 11] 8
Tracks request rates for each a
fixed window, considering Meco” [46] 12| 9 | 19177] &
weighted values from previous rm
windows for smoother traffic handling.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 29What is consistent hashing? €==D
Consistent hashing is a strategy used to distribute keys/data
among multiple servers in a distributed system.
It employs a hash function to map keys to points on a circular
hash ring, where each server is associated with a point on the
ring.
When a key request arrives, the hash function determines the
corresponding point on the ring, and the server closest to that
point handles the request.
Benefits of consistent hashing include:
@ Minimizing key remapping when servers change.
« Allowing dynamic cluster resizing without significant
overhead.
e Evenly distributing load across servers.
It's widely used in distributed systems like caching systems and
content delivery networks to evenly distribute data and is
integral to distributed hash table implementation.
A_ BOSSCODER
WT acavemy www.bosscoderacademy.com 30QUESTION-19
What are the advantages and
disadvantages of microservices
architecture? G&D
Microservices are an architectural style that structures an
application as a collection of small, loosely coupled, and
independently deployable services.
Key Concepts:
e Independence: Each service has specific business function.
Developed & scaled separately.
e@ Modularity: Breaking down a large, monolithic application
into smaller, manageable pieces.
Advantages of Microservices:
e Agility and Speed: Faster development and deployment
cycles due to independent services.
e Scalability: Individual services can be scaled up or down
independently based on demand.
© Resilience: Failure of one service doesn't cripple the entire
app.
@ Technology Choice: Each service can use the best tool for
the job without affecting others.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 31Disadvantages of Microservices:
@ Complexity: Increased overhead in managing infrastructure,
communication, and monitoring.
© Testing: Testing complex distributed systems can be
challenging and time-consuming.
© Debugging: Identifying and fixing issues across services can
be difficult.
© Cost: Initial setup and ongoing maintenance can be more
expensive than monolithic.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 32QUESTION-20
When would you prefer to use an
SQL database and when a NoSQL
database? GD
Use an SQL database if:
e Structured data with well-defined relationships: Your data
fits neatly into tables with rows and columns, and there are
clear connections between different tables. SQL excels at
querying and manipulating such data.
e Strong data consistency and ACID properties are crucial:
Transactions must be atomic, consistent, isolated, and
durable. SQL adheres to ACID principles, ensuring data
integrity for financial systems, regulatory applications, etc.
@ Complex queries and reporting are essential: SQL's
powerful query language enables intricate data analysis and
report generation.
Use a NoSQL database if:
e Unstructured or semi-structured data: Your data doesn't
adhere to a rigid structure, including documents, JSON, or
graphs. NoSQL offers greater flexibility in storing and
managing such data.
A_ BOSSCODER
ACADEMY. www.bosscoderacademy.com 33« High scalability and performance are primary concerns:
You anticipate significant data growth and require rapid
read/write operations. NoSQL often scales horizontally,
adding more servers to handle increased load.
© Data consistency trade-offs are acceptable: Your
application priorities might emphasize speed and availability
over absolute data consistency.
A BOSSCODER
WAT acavemy www.bosscoderacademy.com 34A_ BOSSCODER
WT acavemy
«we
WHY BOSSCODER?
#23 2200+ Alumni placed at Top Product-
based companies.
fl More than 120% hike for every
2 out of 3 Working Professional.
Average Package of 24LPA.
The syllabus is most Course is very well
up-to-date and the list of structured and streamlined
problems provided covers to crack any MAANG
all important topics. company .
Lavanya 8 Rahul a
Meta Google
EXPLORE MORE
You might also like Adv - Java Means Durga Sir... : Durgasoft, Plot No: 202, Iind Floor, Huda Maitrivanam, Ameerpet, Hyderabad-500038 PDF
Adv - Java Means Durga Sir... : Durgasoft, Plot No: 202, Iind Floor, Huda Maitrivanam, Ameerpet, Hyderabad-500038
35 pages