ITEC 2210-A
Summer 2025
Instructor Rico Hao
Topic 07
Practice
Understand how a file is stored in hard disks
The hard disk organizes data in multiple layers:
a) Partitioning and Formatting
• When a hard disk is initialized, it is partitioned into one or more logical sections.
• Each partition is then formatted with a file system (e.g., NTFS, FAT32, EXT4).
• Formatting creates a structure that defines how files are stored, indexed, and retrieved.
b) File System Structure
Once formatted, the file system organizes data into:
1. Boot Sector – Stores information about the file system and disk structure.
2. File Allocation Table (FAT) / Master File Table (MFT) (in NTFS) – Keeps track of file locations and
metadata.
3. Data Blocks (Clusters) – The actual locations where file contents are stored.
4. Directory Structure – Organizes files into folders for easy navigation.
c) Sectors and Clusters
• Sector: The smallest storage unit, typically 512 bytes or 4 KB.
• Cluster: A group of contiguous sectors (e.g., 8 sectors forming a 4 KB cluster).
• The file system assigns clusters to files, and each file is mapped to specific sectors on the disk.
How Data is Written to a Hard Disk
• When a user saves a file, the operating system (OS) interacts with the file system to determine where the data
should be stored.
• The file system updates the MFT (NTFS) or FAT (FAT32) to record the file’s name and assigned clusters.
How Data is Read from a Hard Disk
• When a file is requested, the OS retrieves its location from the MFT or FAT.
• The file’s assigned LBA addresses are used to locate the data.
Which format should you use?
• Windows internal drive → NTFS, or FAT32
• Linux internal drive → EXT4
• macOS internal drive → APFS
• USB/external drive (cross-platform) → exFAT
1. NTFS (New Technology File System) – Most common for Windows
o Default for Windows installations and internal hard drives.
o Supports large files (over 4GB) and partitions.
o Has security features like file permissions and encryption.
2. FAT32 (File Allocation Table 32) – For compatibility
o Works on Windows, macOS, Linux, and gaming consoles.
o Maximum file size: 4GB; maximum partition size: 2TB.
o Used for USB drives and external storage.
File Storage & Structure
• NTFS organizes data using a Master File Table (MFT), which keeps track of all files and directories.
• Each file or directory has a record in the MFT that stores metadata (name, size, location, permissions).
• Files are broken into clusters (small storage units) on the disk.
• The OS does not require clusters to be contiguous—they can be scattered across the disk (fragmentation can
occur).
Example: Storing a 12KB File on a 4KB Cluster NTFS Drive
• Cluster size: 4KB
• File size: 12KB
• Storage Process:
o The OS breaks the file into 3 clusters.
o If contiguous: File is stored in Cluster #100, #101, and #102.
o If fragmented: It might be in Cluster #100, #250, and #400.
A single cluster cannot be fragmented internally.
Understand Striping, Parity, different RAID level and how XOR works
Striping is a technique used in RAID (Redundant Array of Independent Disks) to improve performance by spreading data
across multiple disks. This allows faster read and write speeds because multiple disks work together in parallel.
Striping splits data into blocks (called stripes) and writes them across multiple drives in sequence.
• Instead of writing a whole file to one disk, striping spreads the file across multiple disks.
• Since multiple disks read/write at the same time, performance significantly improves.
• Striping alone does not provide redundancy (unless combined with mirroring or parity).
RAID 0 – Pure Striping (No Redundancy)
• Data is striped across 2+ disks.
• Fastest RAID, but no fault tolerance (if one disk fails, all data is lost).
striping does not require contiguous storage.
• The RAID controller manages striping at a logical level, not physical.
Data blocks can be stored anywhere on the disks as long as they follow the RAID structure
Parity is a method used in RAID to provide fault tolerance by storing extra data that can be used to rebuild lost
information if a disk fails. It helps balance redundancy and storage efficiency
Parity is a calculated value based on the data stored on multiple disks.
• If a disk fails, the RAID controller recalculates the missing data using parity and the remaining disks.
Parity Calculation Basics
Parity works using the XOR (Exclusive OR) operation:
• XOR outputs 1 if the number of 1s in the input is odd.
• XOR outputs 0 if the number of 1s in the input is even.
A B A ⊕ B (XOR)
0 0 0
0 1 1
1 0 1
A B A ⊕ B (XOR)
1 1 0
RAID
RAID 0 is a striping-only RAID level that distributes data across multiple disks to improve performance. However, it has
no redundancy, meaning if one disk fails, all data is lost.
Example
• Two 1 TB Disks in RAID 0:
o Total Storage: 2 TB
Improved Performance: Data reads and writes are faster because they are split across two disks
RAID 1 uses mirroring, meaning:
• Every piece of data written to one disk is simultaneously written to another disk.
• The system can read data from either disk, improving read speed.
• If one disk fails, the system continues running from the remaining disk.
• No striping, no parity—just an exact duplicate of data.
Example
• Two 1 TB Disks in RAID 1:
o Total Usable Storage: 1 TB
o Data Redundancy: If one disk fails, the other continues to operate with all data intact.
• Every write operation happens twice, once for each disk.
• The MFT is also mirrored, so all file metadata is available on both disks.
RAID 5 combines striping and parity to balance speed, redundancy, and storage efficiency.
• Data is striped across multiple disks for faster reads.
• Parity blocks are distributed across all disks, allowing for data recovery if one disk fails.
• Minimum 3 disks required.
• If one disk fails, RAID 5 uses parity data to rebuild the lost data.
• Example: Writing a 24KB File (4KB Clusters, 3 Disks, Parity Rotating)
Disk 1 Disk 2 Disk 3
4KB (File Part 1) 4KB (File Part 2) 4KB (Parity)
4KB (File Part 3) 4KB (Parity) 4KB (File Part 4)
4KB (Parity) 4KB (File Part 5) 4KB (File Part 6)
Example
• Three 1 TB Disks in RAID 5:
o Total Usable Storage: 2 TB (approximately, calculated as (n-1) * size of smallest disk = (3-1) * 1 TB = 2
TB)
o Redundancy: Can tolerate the failure of one disk without data loss.
RAID 4 is similar as RAID 5, but use a dedicated disk for parity data.
Stripe Disk 1 (Data) Disk 2 (Data) Disk 3 (Parity)
Block 1 A1 A2 P(A) = A1 ⊕ A2
Block 2 B1 B2 P(B) = B1 ⊕ B2
Block 3 C1 C2 P(C) = C1 ⊕ C2
RAID 6 is a striping + dual parity RAID level that provides high fault tolerance by using two parity blocks per stripe. This
allows the system to recover from up to two disk failures. However, it requires at least four disks to function.
RAID 6 builds upon RAID 5 by adding a second parity block, improving redundancy at the cost of slower writes.
• Data is striped across multiple disks for faster reads.
• Two parity blocks are stored per stripe, allowing for data recovery if two disks fail.
• Minimum 4 disks required.
• If one or two disks fail, RAID 6 can reconstruct missing data using parity.
RAID 10 (also called RAID 1+0) is a combination of RAID 1 (mirroring) and RAID 0 (striping). It provides fault tolerance
and high performance by mirroring each disk and then striping the mirrored pairs.
RAID 10 is a hybrid RAID level that combines:
• Mirroring (RAID 1) → Each disk has an identical mirrored copy.
• Striping (RAID 0) → Data is spread across multiple mirrored pairs for faster access.
Key Features:
• High redundancy: Data is mirrored (if one disk fails, its mirror takes over).
• High speed: Data is striped for fast read/write performance.
• Requires at least 4 disks.
• Can survive multiple disk failures, as long as one disk from each mirrored pair survives.
• Example: RAID 10 Setup (4KB Clusters, 4 Disks, Mirrored + Striped)
Stripe # Disk 1 Disk 2 (Mirror) Disk 3 Disk 4 (Mirror)
1 4KB (File Part 1) 4KB (Copy) 4KB (File Part 2) 4KB (Copy)
2 4KB (File Part 3) 4KB (Copy) 4KB (File Part 4) 4KB (Copy)
What Happens If a Disk Fails?
• RAID 10 can survive one disk failure per mirrored pair.
• If Disk 1 fails, Disk 2 still has the data.
• If Disk 3 fails, Disk 4 still has the data.
• If two disks from the same mirrored pair fail, data is lost.
RAID 10 (1+0, Mirroring first then Striping)
• Requires a minimum of four drives.
• Drives are paired, and each pair is mirrored.
• The mirrored pairs are then striped.
Capacity: Only 50% of the total drive capacity is usable (e.g., four 1TB drives provide 2TB of usable storage).
Understand VMware HA, FT, DRS, and VMotion
VMware High Availability (HA) is a feature in vSphere that automatically restarts virtual machines (VMs) on other ESXi
hosts if a host fails. This minimizes downtime without requiring manual intervention.
VMware HA is a cluster-level feature that allows VMs to automatically recover from ESXi host failures.
Key Functions:
• Detects ESXi host failures and restarts VMs on healthy hosts.
• Works within a VMware vSphere cluster (multiple ESXi hosts).
• Uses heartbeat communication to monitor host health.
• Relies on vCenter Server to configure and manage HA.
What VMware HA Protects Against: ESXi Host Failures → If an ESXi host crashes, VMs restart on another host.
Network Isolation → If a host loses network connectivity, VMs move to another host.
VM OS Failures (Optional) → VMware HA can restart VMs if their guest OS crashes.
What VMware HA Does NOT Protect Against: Storage Failure → Does not protect against datastore failures (use
vSAN or SRM for that).
Application-Level Failures → If an app crashes, VMware HA won’t detect it (use VMware FT or App HA for that).
VMware Fault Tolerance (FT) is an advanced feature in vSphere that eliminates downtime by creating a live shadow
copy of a virtual machine (VM) on another ESXi host. If the primary VM's host fails, the secondary VM takes over
instantly, with zero downtime and no data loss.
VMware FT is a high-availability solution that ensures continuous availability for VMs by mirroring their execution in
real-time.
Key Functions:
• Creates a live, running duplicate (Secondary VM) of a VM on another ESXi host.
• Instantly switches to the secondary VM if the primary VM's host fails.
• No downtime or data loss.
• Uses vLockstep technology to keep the primary and secondary VMs identical at the CPU and memory level.
What VMware FT Protects Against: ESXi Host Failures → If a host crashes, the secondary VM takes over instantly.
VM-Level Failures → If the VM crashes, FT ensures a backup is running.
What VMware FT Does NOT Protect Against: Application Failures → If an app inside the VM crashes, FT does not
help (use Application HA for that).
Datastore Failures → If shared storage fails, FT cannot recover the VM.
VMware Distributed Resource Scheduler (DRS) is a vSphere feature that automatically balances VM workloads across
multiple ESXi hosts in a cluster. It ensures optimal performance and resource allocation by migrating VMs between
hosts based on CPU, memory, and other resource usage.
VMware DRS is a load-balancing solution for vSphere clusters. It dynamically distributes virtual machines (VMs) across
ESXi hosts to:
• Optimize performance
• Prevent resource contention
• Reduce manual intervention
Key Functions:
• Monitors CPU & memory usage on all hosts.
• Automatically migrates VMs (vMotion) to balance workloads.
• Ensures VM placement follows configured policies.
What VMware DRS Helps With: Host Overload Prevention → Moves VMs if a host is overloaded.
Underutilized Host Optimization → Moves VMs off lightly loaded hosts.
High Availability Integration → Works with HA to rebalance VMs after a failure.
Power Management (DPM) → Powers off unused hosts to save energy.
VMware vMotion is a live migration technology that moves a running virtual machine (VM) from one ESXi host to
another without downtime. It transfers the VM’s memory, CPU state, and network connections seamlessly while the
VM continues running.
vMotion is a key feature in vSphere that allows live VM migration between ESXi hosts without service interruption.
Key Functions:
• Moves running VMs between ESXi hosts.
• Keeps network connections active so users don’t notice any downtime.
• Transfers memory & CPU state to the new host.
• Works with shared storage (VMFS, NFS, vSAN, iSCSI).
• Requires a dedicated vMotion network for migration traffic.
What VMware vMotion Helps With: Host Maintenance → Move VMs before taking a host down.
Load Balancing (DRS) → Move VMs to distribute CPU/memory load.
Zero Downtime Migrations → Migrate workloads without disruption.
What VMware vMotion Does NOT Do: Does not migrate VM storage (Use Storage vMotion for that).
Does not protect against host failures (Use VMware HA for that).
Network redundancy
Switch stacking
Layer 2 redundancy
EtherChannel / Port Channel
EtherChannel, also known as Link Aggregation or Port Channel, is a technology used in networking to combine multiple
physical Ethernet links into a single logical link.
How Ethernet Channel Works Step by Step
Step 1: Link Aggregation Setup
• Multiple Ethernet ports (e.g., GigabitEthernet1/0/1, 1/0/2) are grouped into an EtherChannel (Port Channel
interface).
• The switch treats these multiple ports as a single interface.
Step 2: EtherChannel Negotiation
• Dynamic Protocols (LACP/PAgP) can be used to form an EtherChannel.
• Static Mode can be used without protocols.
Step 3: Load Balancing Mechanism
• Traffic is distributed across links using hashing algorithms (e.g., based on MAC, IP, or Layer 4 port numbers).
• Ensures no single link gets overloaded.
Step 4: Link Failure Detection & Recovery
• If one link fails, traffic is redistributed to the remaining links.
If all links fail, the EtherChannel interface goes down
EtherChannel can be formed between switch to switch, and switch to VMware host.
Layer3 redundancy
Routing protocol (OSPF, or BGP) provide redundant paths by dynamically recalculating routes when a failure occurs. It
ensures high availability and optimal path selection by maintaining a real-time map of the network.
Routing protocol (OSPF, BGP)
Load Balancer (Layer 4, or Layer 7) redundancy
A load balancer is a networking device or software that distributes incoming traffic across multiple servers to optimize
performance, improve reliability, and ensure high availability.
Types of Load Balancing:
Layer 4 Load Balancer Transport Layer (TCP/UDP) TCP, UDP
Layer 7 Load Balancer Application Layer (HTTP/HTTPS) HTTP, HTTPS, SSL
1. Layer 4 Load Balancer (L4)
• Routes traffic based on IP and TCP/UDP ports.
• No deep packet inspection (doesn’t look inside HTTP requests).
• Faster because it only makes simple routing decisions.
Example:
• A client sends a TCP request to the load balancer.
• The load balancer forwards it to one of the backend servers.
2. Layer 7 Load Balancer (L7)
• Routes traffic based on application data (e.g., HTTP headers, cookies).
• Supports SSL termination, session persistence, and content-aware routing.
• Ideal for web applications and APIs.
Example:
• If a user requests /images, the load balancer routes it to an image server.
• If a user requests /api, it forwards the request to an API server.
3⃣ Load Balancing Algorithms
Load balancers use different algorithms to distribute traffic efficiently.
Algorithm How It Works Best For
Round Robin Distributes requests sequentially Equal servers
Least Connections Sends traffic to the server with the fewest active connections Variable workloads
IP Hash Assigns users to a fixed server based on IP Session persistence
Algorithm How It Works Best For
Weighted Round Robin Like round robin but gives priority to powerful servers Mixed hardware setups
Least Response Time Sends requests to the fastest server Real-time applications
Layer 4 load balance
How Layer 4 Load Balancing Works:
1. Client Request:
o A client sends a request to access a service or application, typically using an IP address and a specific port
(e.g., HTTP on port 80, HTTPS on port 443).
2. Load Balancer Receives Request:
o The request first arrives at the Layer 4 load balancer.
o The load balancer examines the IP address and port number in the request.
3. Decision Making:
o The load balancer uses a predefined algorithm to decide which backend server should handle the request.
o Common algorithms include Round Robin, Least Connections, and IP Hash.
4. Forwarding Request:
o The load balancer forwards the request to the selected backend server.
Layer 7 load balance
SSL Termination:
• Layer 7 load balancers often handle SSL/TLS encryption and decryption (SSL offloading).
• They terminate SSL connections from clients and establish new connections to backend servers using
unencrypted HTTP or re-encrypted HTTPS.