KEMBAR78
System Design Concepts | PDF
0% found this document useful (0 votes)
89 views36 pages

System Design Concepts

The document outlines key concepts in system design, including sharding, bottlenecks, the CAP theorem, scaling methods, RESTful API principles, and the differences between REST and SOAP. It also covers metrics for measuring system performance, caching strategies, and the role of message queues in distributed systems. This information is essential for preparing for system design interviews in the tech industry.

Uploaded by

abhay.jirati12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
89 views36 pages

System Design Concepts

The document outlines key concepts in system design, including sharding, bottlenecks, the CAP theorem, scaling methods, RESTful API principles, and the differences between REST and SOAP. It also covers metrics for measuring system performance, caching strategies, and the role of message queues in distributed systems. This information is essential for preparing for system design interviews in the tech industry.

Uploaded by

abhay.jirati12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 36
A_ BOSSCODER ACADEMY IMPORTANT SYSTEM DESIGN CONCEPTS Step] Planning Step7 Maintenance & Support Step2 Feasibility Study Step 6 Deployment Step3 System Design Step5 Testing Step4 Implementaion *Disclaimer* System Design is the most asked topic in tech interviews. So, make sure you prepare it thoroughly. Take the help of this doc and ace your System Design Interviews. A BOSSCODER WAT acavemy www.bosscoderacademy.com (ole) 186) | What is sharding in system design? Sharding is a technique to horizontally scale a database by splitting it into multiple, independent servers called shards. ® Benefits: © Improved performance: Each shard handles less data, leading to faster queries and processing. e Increased scalability: Adding more shards easily expands the database capacity. & Example: Imagine a social media app with millions of users. Storing all data in one server would be overwhelming. We can shard based on user ID ranges: @ Shard 1: Users with IDs 1-1 million e Shard 2: Users with IDs 1 million-2 million Now: e User queries are directed to the specific shard holding their data. e Individual shards handle smaller workloads, improving performance. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 2 « Adding more shards scales the database as the user base grows. However, sharding also brings complexity: © Data distribution: Determining the right sharding key and managing data movement across shards is crucial. e Joins and complex queries: Combining data from multiple shards may require additional logic and can be slower. A BOSSCODER WAT acavemy www.bosscoderacademy.com What are bottlenecks in system design? Bottlenecks in system design are points or components that restrict overall performance. Identifying and addressing them is crucial for optimizing efficiency. Common bottlenecks include: 1. CPU Bottleneck: When the CPU cannot process data fast enough to meet system demands, it becomes a limiting factor. ny Memory Bottleneck: Insufficient RAM or slow memory access can hinder data storage and retrieval, slowing down the system. w Storage Bottleneck: Slow read/write speeds or limited storage capacity can impede system performance, especially in data-heavy applications. 4. Database Bottleneck: Inefficient database queries, poor indexing, or database contention can significantly affect application performance, particularly in database- dependent systems. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 4 What is CAP Theorem? (22D The CAP theorem (Brewer’s theorem) states that a distributed system or database can provide only two out of the following three properties: Consistency: Similar to ACID Properties, Consistency means that the state of the system before and after transactions should remain consistent. Availability: This states that resources should always be available, there should be a non-error response. Partition tolerance: Even when the network communication fails between the nodes in a cluster, the system should work well. By the CAP theorem, all of these three properties cannot be achieved at the same time. Consistency CA Category Network Problem might stop the system ExcRDBMS(Oracle SQL Server MySQL) CP Category ‘There is a risk of some data becoming unavailable x: MonogoDB Hbase Memeache Big table Redis Partition Availability Tolerance AP Category Clients may reed inconsistent data x: Cassandra RIAK CouchDB AL BOSSCODER www.bosscoderacademy.com ACADEMY QUESTION-4 What is the difference between horizontal and vertical scaling? CD Horizontal scaling involves adding more machines or nodes to a system to distribute the load and increase performance. Vertical scaling involves increasing the resources (CPU, RAM, storage, etc.) on a single machine to improve its performance. Fundamentals Horizontal Scaling Vertical Scaling In vertical sealing, the data Scaling horizontally resides on a single node, and Data typically relies on data sealing is accomplished by Management | partitioning as each node multi-core, primarily by only contains part of the distributing the load among data. the machine’s CPU and RAM resources. Cassandra, MongoDB, Examples Google Cloud Spanner MySQL and Amazon RDS You can scale with less here anticeh ec physical downtime by adding imit to vertical scaling, : which is the scale of the additional computers to the 4 . + Ib current hardwareVertical Downtime tro no longer ontrained scaling is restricted to the possibility i capacity of one machine y the capacity ofasingle | ecause expanding over that evice. limit can result in downtime. A_ BOSSCODER ACADEMY www.bosscoderacademy.com 6 Fundamentals Horizontal Scaling Vertical Scaling As it entails distributing jobs among devices over a network is known as It involves the Actor model: Multi-threading and in- process message Concurrency | distributed programming. forwarding are frequently Models Several patterns are used to implement connected to this model: concurrent programming MapReduce, Master/ on multi-core platforms. Worker*, Blackboard, and many spaces. Data sharing is more difficult in distributed Data sharing and message computing because there | Passing Can be Message isn't a shared address accomplished by passing a Passing space. Since you will send | reference in a multi-threaded copies of the data, it also increases the cost of sharing, transferring, or updating data. scenario since it is reasonable to presume that there is a shared address space. Choosing between horizontal and vertical scaling depends on your specific needs. Here are some general guidelines: e Use horizontal scaling for: a. Handling high workloads and surges in traffic. b. Building highly resilient and available systems. c. Processing large datasets efficiently. e Use vertical scaling for: a. Simple workloads and applications. b. Rapid deployment and testing. c. Cost-efficiency for low workloads. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 7 Describe the RESTful API design principles. C2=D 1. Uniform Interface: Consistent resource naming and actions using HTTP methods (GET, POST, PUT, DELETE). 2. Client-Server: Separation of concerns between clients making requests and servers handling them. 3. Statelessness: Each request contains all information needed, servers don't "remember" past requests. 4. Cacheable: Resources can be cached by clients or intermediaries for better performance. 5. Layered System: Intermediaries can be placed between clients and servers without affecting communication. 6. Code on Demand (Optional): Servers can send executable code to clients to extend functionality. These principles lead to well-designed, predictable, and scalable APIs. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com QUESTION-6 How can you select which webservice to use between REST and SOAP? When deciding between SOAP and REST for web services, consider the following factors: 1. Nature of Data/Logic Exposure: SOAP: Used for exposing business logic. REST: Used for exposing data. 2. Formal Contract Requirement: SOAP: Provides strict contracts through WSDL. REST: No strict contract requirement. 3. Data Format Support: SOAP: Limited support. REST: Supports multiple data formats. 4. AJAX Call Support: SOAP: No direct support. REST: Supports XMLHttpRequest for AJAX calls. 5. Synchronous/Asynchronous Requests: SOAP: Supports both sync and async. REST: Supports only synchronous calls. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 6. Statelessness Requirement: SOAP: No. REST: Yes. 7. Security Level: SOAP: Preferred for high-security needs. REST: Security depends on underlying implementation. 8. Transaction Support: SOAP: Provides advanced support for transactions. REST: Limited transaction support. o Bandwidth/Resource Usage: SOAP: High bandwidth due to XML data overhead. REST: Uses less bandwidth. 10. Development and Maintenance Ease: SOAP: More complex. REST: Known for simplicity, easy development, testing, and maintenance. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 10 QUESTION-7 What are Content Delivery Networks (CDN) ? €2=D e ACDN is a geographically distributed network of servers strategically placed worldwide that cache and deliver static content (images, videos, JavaScript, CSS, etc.) to users, optimising performance and availability. e Key concepts: a. Edge servers: Physically located at Points of Presence (PoPs) closer to users for faster content delivery. b. Caching: Stores frequently accessed content on edge servers, reducing origin server load and latency. c. Routing: Directs user requests to the nearest edge server with cached content. d. Security: Can provide DDoS protection, SSL/TLS offloading, and other security features. Types of CDNs: 1. Pull CDNs: e Edge servers fetch content from the origin server upon user request. * Suitable for content with low update frequency or high demand spikes. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 11 2. Push CDNs: © Content proactively pushed to edge servers, ensuring immediate availability. e Ideal for dynamic content that changes frequently or has high peak periods. A BOSSCODER WAT acavemy www.bosscoderacademy.com QUESTION-8 Explain the difference between data sharding and database partitioning? (== Both partitioning and sharding are techniques for managing large datasets in databases. However, they differ in their scope and implementation: Partitioning: © Definition: Dividing a table into smaller logical segments based on specific criteria. e Location: Within a single database server. © Types: Vertical (split columns) and horizontal (split rows). © Benefits: Improved query performance for specific types of queries. © Drawbacks: Limited scalability, potential performance bottlenecks on a single server. Sharding: © Definition: A specific type of horizontal partitioning where data is distributed across multiple separate servers. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 13 e Location: Each partition (shard) resides on a dedicated server. © Benefits: Excellent scalability, high availability, faster query execution due to true parallelism. © Drawbacks: Increased complexity, potential data consistency issues, higher infrastructure costs. Choosing the right technique: « For smaller datasets or non-critical applications, partitioning within a single server might be sufficient. © For large, heavily accessed datasets requiring high scalability and performance, sharding across multiple servers is the ideal choice. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 14 QUESTION-9 Explain availability, reliability, latency, performance, throughput in system design. 1. Availability: This refers to the percentage of time your system is up and running, accessible to users, and fulfilling its intended purpose. High availability systems aim for near- constant uptime, often measured in "nines" (e.g., 99.99% uptime). 2. Reliability: This builds on availability and focuses on the system's consistency in delivering correct and expected results. A reliable system performs as intended without errors or unexpected behavior, even under varying conditions. 3. Latency: This signifies the time it takes for a single request or operation to complete within the system. Think of it as the response time, measured in milliseconds (ms) or seconds (s). Low latency is crucial for real-time interactions and user experience. 4. Performance: This encompasses the overall responsiveness and efficiency of your system. It considers factors like latency, throughput, resource utilization, and scalability. A performant system handles requests quickly, utilizes resources effectively, and scales smoothly with increased demands. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 15 5. Throughput: This measures the number of requests or operations your system can successfully process per unit of time. It's often expressed in requests per second (RPS) or transactions per second (TPS). High throughput ensures the system can handle large volumes of users or data. A, BOSSCODER ACADEMY. www.bosscoderacademy.com QUESTION-10 What is consistency? What are its different types? (E™D Consistency refers to the agreement between multiple copies of the same data stored across different servers. It ensures users have the same view of the data, regardless of their location. There are three main types of consistency: 1. Eventual Consistency: Reads may not immediately reflect the latest write, but eventually (within milliseconds) will. © Think of sending an email: the recipient won't see it instantly, but it will arrive eventually. e Used in scalable systems like DNS and email due to its high availability and low cost. 2. Strong Consistency: e Reads always reflect the latest write, meaning data is replicated synchronously across all servers. e Provides strict consistency but sacrifices scalability and performance. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 7 e Used in systems like databases and filesystems where data integrity is crucial. 3. Weak Consistency: « Focuses on fast access, allowing reads to potentially see outdated data. e No well-defined rules for data updates, so different servers might return different values. e Used in real-time applications like VoIP and video chat where speed is prioritized over absolute data accuracy. Choosing the right consistency model depends on your application's specific needs: e Prioritize accuracy? Choose strong consistency. © Need high availability and scalability? Consider eventual consistency. © Speed is essential? Weak consistency might be acceptable. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com QUESTION-11 What are some metrics for measuring system performance? (= User Experience: @ Apdex Score: Blends response time and success rate for user satisfaction. « Average Response Time: How long it takes the system to answer user requests. System Health: @ Throughput: Number of requests handled per unit of time (helps determine scaling needs). © Availability/Uptime: Percentage of time the system is operational. e Error Rates: Frequency of errors encountered during operation. Resource Utilization: e Latency & CPU: Monitors data transfer speed and CPU usage to identify bottlenecks. Garbage Collection: Tracks memory management efficiency. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 19 Others: © Database Performance: Monitor key database metrics for optimal operation. @ Network Performance: Track bandwidth, packet loss, and latency for network reliability. Security Metrics: Monitor events and access controls for system security. A_ BOSSCODER WAT acavemy www.bosscoderacademy.com 20 QUESTION-12 What is caching in system design? Explain types of caching based on data consistency. (=) Caching is a powerful optimization technique that stores frequently accessed data in a temporary location closer to the requesting process. This location, called a cache, is generally faster to access than the original data source, significantly improving system performance and scalability. Based on Data Consistency: Write-through: e Writes happen to both cache and database simultaneously. © Pros: Fast reads, high data consistency. ® Cons: High write latency due to double writes. Write-around: e Writes bypass the cache and go directly to the database. © Pros: Potentially lower write latency. e Cons: Increased cache misses, leading to higher read latency for frequently written and re-read data. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 21 Write-back: e Writes occur only in the cache, confirmed immediately. Data is asynchronously synced to the database later. e Pros: Lower write latency, higher throughput for write- intensive workloads. © Cons: Risk of data loss if the cache crashes before syncing. Improved reliability with multiple write acknowledgements. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 22 QUESTION-13 What are message queues in system design? (=D Message queues are a form of service-to-service communication that facilitates asynchronous communication. They are used to effectively manage requests in large-scale distributed systems. Here are some features of message queues: e Push or Pull Delivery: Provides options for continuous querying (pull) or notification-based retrieval (push), with support for long-polling. e FIFO (First-In-First-Out) Queues: Processes the oldest entry in the queue first. e Schedule/Delay Delivery: Allows setting specific delivery times, including common delays using delay queues. e At-Least-Once Delivery: Ensures message delivery by storing multiple copies and resending in case of failures. e Exactly-Once Delivery: Filters out duplicates to guarantee single delivery. e Dead-letter Queues: Stores unprocessable messages for further inspection without blocking. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 23 * Ordering: Delivers messages in sender's order, ensuring each message is received at least once. « Task Queues: Executes tasks in the background, supporting scheduling and intensive computations. A_ BOSSCODER WAT acavemy www.bosscoderacademy.com 24 What is N-tier architecture in system design? 2D N-tier architecture divides an application into logical layers and physical tiers to manage dependencies and separate responsibilities. Tiers in N-tier architecture are physically separated, possibly running on different machines. Communication between tiers can be direct or asynchronous, improving scalability and resilience while introducing latency due to network communication. Two types of N-tier architectures exist: e Closed-layer architecture restricts layers to call only the immediately lower layer. @ Open-layer architecture allows a layer to call any layer below it. N-Tier architecture examples: e 3-Tier architecture: Comprising Presentation, Business Logic, and Data Access layers. © 2-Tier architecture: Involving client-side Presentation layer communicating directly with a data store. © Single Tier (1-Tier) architecture: Simplest form where all components run on a single server or application. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 25 QUESTION-15 What are virtual machines? What are the benefits of using virtual machines? (22D e AVM is a virtual environment mimicking a computer system with its own CPU, memory, network interface, and storage, hosted on physical hardware. e A hypervisor separates hardware resources from the VM and manages them. Benefits of Using VMs: * Server Consolidation: Virtualizing servers enables efficient utilization of physical resources by hosting multiple VMs on one server. « Isolation: VMs provide a segregated environment, preventing interference between VMs and host hardware. © Testing and Production Environments: VMs are ideal for testing new applications or setting up production environments due to their isolated nature. © Specific Use Cases: Single-purpose VMs can be employed to support particular functions. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 26 QUESTION-16 What are containers? What are their advantages? (2D Containers package code and dependencies to ensure applications run consistently across different computing environments. Need for Containers: © Separation of Responsibility: Allows developers to focus on application logic while operations teams handle deployment. e Workload Portability: Containers can run anywhere, simplifying development and deployment processes. © Application Isolation: Virtualizes resources at the OS level, providing logical isolation for applications. © Agile Development: Enables rapid development by eliminating concerns about dependencies and environments. © Efficient Operations: Lightweight containers optimize resource usage, ensuring efficient computing. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 27 QUESTION-17 What is rate limiting? Explain a few algorithms used in API rate limiting. Rate limiting is a crucial strategy in large-scale systems to prevent the frequency of operations from surpassing a defined threshold. It safeguards our APIs from both unintentional and malicious overuse by constraining the quantity of requests permitted to access our API within a specified timeframe. The following are few API Rate Limiting algorithms: 1. Leaky Bucket: Utilizes a queue system where requests are processed at a constant rate. Additional requests are discarded if the queue is full. n Token Bucket: Requests require tokens from a bucket for processng. Refuses requests if no tokens are available, refreshing the bucket over time. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 28 3. Fixed Window: 22:45 2345 Uses a fixed time window to track request rates. 2 a a | 22 mint widow —— > Requests exceeding a ere Rend threshold within the window Total Count bucket 22 Total Countin bucket 2980 far are discarded. a Sliding Log: Uses a fixed time window to track request rates. bE EI Requests exceeding a OI Nene nee threshold within the window E are discarded. 4. Sliding Window: Combines elements of fixed Sliding Window Algorithm window and sliding log approaches. susngwinsow [16] 22] 9 | 49| 11] 8 Tracks request rates for each a fixed window, considering Meco” [46] 12| 9 | 19177] & weighted values from previous rm windows for smoother traffic handling. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 29 What is consistent hashing? €==D Consistent hashing is a strategy used to distribute keys/data among multiple servers in a distributed system. It employs a hash function to map keys to points on a circular hash ring, where each server is associated with a point on the ring. When a key request arrives, the hash function determines the corresponding point on the ring, and the server closest to that point handles the request. Benefits of consistent hashing include: @ Minimizing key remapping when servers change. « Allowing dynamic cluster resizing without significant overhead. e Evenly distributing load across servers. It's widely used in distributed systems like caching systems and content delivery networks to evenly distribute data and is integral to distributed hash table implementation. A_ BOSSCODER WT acavemy www.bosscoderacademy.com 30 QUESTION-19 What are the advantages and disadvantages of microservices architecture? G&D Microservices are an architectural style that structures an application as a collection of small, loosely coupled, and independently deployable services. Key Concepts: e Independence: Each service has specific business function. Developed & scaled separately. e@ Modularity: Breaking down a large, monolithic application into smaller, manageable pieces. Advantages of Microservices: e Agility and Speed: Faster development and deployment cycles due to independent services. e Scalability: Individual services can be scaled up or down independently based on demand. © Resilience: Failure of one service doesn't cripple the entire app. @ Technology Choice: Each service can use the best tool for the job without affecting others. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 31 Disadvantages of Microservices: @ Complexity: Increased overhead in managing infrastructure, communication, and monitoring. © Testing: Testing complex distributed systems can be challenging and time-consuming. © Debugging: Identifying and fixing issues across services can be difficult. © Cost: Initial setup and ongoing maintenance can be more expensive than monolithic. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 32 QUESTION-20 When would you prefer to use an SQL database and when a NoSQL database? GD Use an SQL database if: e Structured data with well-defined relationships: Your data fits neatly into tables with rows and columns, and there are clear connections between different tables. SQL excels at querying and manipulating such data. e Strong data consistency and ACID properties are crucial: Transactions must be atomic, consistent, isolated, and durable. SQL adheres to ACID principles, ensuring data integrity for financial systems, regulatory applications, etc. @ Complex queries and reporting are essential: SQL's powerful query language enables intricate data analysis and report generation. Use a NoSQL database if: e Unstructured or semi-structured data: Your data doesn't adhere to a rigid structure, including documents, JSON, or graphs. NoSQL offers greater flexibility in storing and managing such data. A_ BOSSCODER ACADEMY. www.bosscoderacademy.com 33 « High scalability and performance are primary concerns: You anticipate significant data growth and require rapid read/write operations. NoSQL often scales horizontally, adding more servers to handle increased load. © Data consistency trade-offs are acceptable: Your application priorities might emphasize speed and availability over absolute data consistency. A BOSSCODER WAT acavemy www.bosscoderacademy.com 34 A_ BOSSCODER WT acavemy «we WHY BOSSCODER? #23 2200+ Alumni placed at Top Product- based companies. fl More than 120% hike for every 2 out of 3 Working Professional. Average Package of 24LPA. The syllabus is most Course is very well up-to-date and the list of structured and streamlined problems provided covers to crack any MAANG all important topics. company . Lavanya 8 Rahul a Meta Google EXPLORE MORE

You might also like