KEMBAR78
Storage Virtualization | PDF | File System | Computer Network
0% found this document useful (0 votes)
30 views25 pages

Storage Virtualization

The document provides an overview of data storage systems, focusing on storage virtualization and the I/O layers within both single hosts and virtual machines. It discusses various storage types, including SAN, NAS, and different disk image formats, along with techniques for efficient VM creation and data deduplication methods. Key concepts such as Copy-on-Write and the differences between pre-allocated and extensible disk images are also highlighted.

Uploaded by

rupalivaje99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views25 pages

Storage Virtualization

The document provides an overview of data storage systems, focusing on storage virtualization and the I/O layers within both single hosts and virtual machines. It discusses various storage types, including SAN, NAS, and different disk image formats, along with techniques for efficient VM creation and data deduplication methods. Key concepts such as Copy-on-Write and the differences between pre-allocated and extensible disk images are also highlighted.

Uploaded by

rupalivaje99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

CS-452/552 Introduction to

Cloud Computing

Storage Virtualization

1
Data Storage Systems

Data can be stored in various places in different manners


--- Hardware: CPU registers, caches, main memory and persistent storage
--- Software: File systems, object storage, databases (SQL databases and No-
SQL databases.

2
Storage I/O system within a single host
Persistent Storage media

FLASH HDD OR DISK DRIVE

3
I/O layers within a single host

Applications
User space
System Call Interfaces
VFS
Kernel space
File System (ext3, ext4, btrfs)

Page Cache

Generic Block Layer

I/O schedulers

Block device driver

Physical Block Devices 4


I/O layers within a VM
Applications

System Call Interfaces


VFS
File System (ext3, ext4, btrfs)
VM1 Page Cache

Generic Block Layer

I/O schedulers

Block device driver

Virtual Block Device


5
I/O Rings
I/O layers in Virtualization
Applications
System Call Interfaces
VFS
File System (ext3, ext4, btrfs)
Page Cache
VM1 QEMU Device
Generic Block Layer
I/O schedulers
Block device driver

System Call Interfaces


KVM Module
Hypervisor/Host VFS
File System (ext3, ext4, btrfs)
Page Cache
Generic Block Layer
I/O schedulers
Block device driver
7
I/O layers in Virtualization
Dual I/O stack – Not good.
Applications
System Call Interfaces
VFS
File System (ext3, ext4, btrfs)
Page Cache
VM1 QEMU Device
Generic Block Layer
I/O schedulers
Block device driver

System Call Interfaces


KVM Module
Hypervisor/Host VFS
File System (ext3, ext4, btrfs)
Page Cache
Generic Block Layer
I/O schedulers
Block device driver
8
I/O layers in Virtualization
Dual Page Cache – Not good
Applications
System Call Interfaces
VFS
File System (ext3, ext4, btrfs)
Page Cache
VM1 QEMU Device
Generic Block Layer
I/O schedulers
Block device driver

System Call Interfaces


KVM Module
Hypervisor/Host VFS
File System (ext3, ext4, btrfs)
Page Cache
Generic Block Layer
I/O schedulers
Block device driver
9
I/O Data Plane Redundancy

VM1 APP VM2 APP


I/O Frontend I/O Frontend

VM I/O
VM I/O Backend
Backend

VMM Reads Writes VMM

VM2 Virtual Disk

Multiple data copying steps for data communication between two VMs.
Not good! 10
Virtualize Storage Device

Guest OS ▪ Virtual disk is stored as


Guest Disk virtual disk ▪ a file in the host file system
Device Driver
▪ or partition on physical disk
Device Emulation Emulation Layer
▪ Operations to the block device is
A virtual image file
Map disk operations to emulated by QEMU
File operations
Host File System
Physical Disk
Device Driver
▪ Guest issues block reads & writes
Physical Disk
Real Disk
▪ QEMU converts them to file
operations on the virtual disk file
Virtual Disk Image Type Matters!

▪ A “pre-allocated” disk image (1 virtual to 1 physical block)


▪ A 10 GB disk image reserves 10 GB of disk space, regardless of
whether the virtual machine guests uses 1 GB or 10 GB (allocated
at creation time)
▪ An “extensible” disk image, useful for growing on demand
▪ From the VM point of view, it sees a full size disk, but the
hypervisor is actually lying to the VM, and is allocating the disk
blocks on the HOST side on demand

12
Disk images - pros / cons
• A “pre-allocated” disk image
▪ Pros: Fast
▪ Cons: Uses all space
• An extensible disk image
▪ Pros: Less space
▪ Cons: A bit overhead, fragmentation
▪ It depends on what we are trying to achieve: system design tradeoff

13
VM Creation and Virtual Disk Images
• Assume that each virtual machine (VM) needs a disk image. If we are only going
to create a single VM, it’s easy:
• Create VM
• (1) create disk image
• (2) attach ISO image (installation) to start VM
• (3) install operating system
• (4) Done!

• What if we want to install 2 VMs ? We could probably install a second time.


What about when we have to build 5 ? 40 ? And do this very often (e.g., cloud
service vendors)?
• How do you increase the efficiency of such VM creation?
14
Two Concrete Techniques
▪ Raw disks (“pre-allocated”)
▪ Byte-for-byte disk image, byte 0 = byte 0 of the disk

• QEMU-KVM’s “QCOW2” (Qemu Copy On Write, v.2) format


(extensible)
▪ Grow-on-demand
▪ Compression support
▪ Encryption support
▪ Copy-on-write!

15
What is Copy-on-Write?
• Traditionally (e.g., raw disks):
▪ When programs inside the guest VM write to the virtual disk, the changes
are written to the disk image in place.

• Copy-on-write:
▪ Write delta and store somewhere else (don’t modify the original copy)

16
Cloud Block Storage System
Transmission Layer
Networking
VMs/Containers
Rack1 Rack2 Rack11 Rack12

……
server server
server server
server Client Host server Storage Server Host
server server
server
Computing server
server server
server server
server server
VM1 VM2 VM3 VM4

Block
VOL
VOL Devices
Storage Area Network (SAN)
• A network which provides access to consolidated, block-level data storage.
• Looks and feels like a local block device
• But unlike local hard drive or SSD, the “server” has to access storage over the
network
• Access control (LUN masking)
• needed to restrict which server can access which storage device
• Accessing storage over the network uses a lot of network bandwidth
• Usually a dedicated/Isolated network for best performance and least interference.

18
Fiber Channel (FC)
• Specialized high-speed SAN interconnect
• 2/4/8/16 Gbps data rates (more now?)

• Can use both optical fiber and copper

• Storage devices and servers are connected to a FC switch


• Server (initiator) needs a FC interface
• Storage (target) is connected via traditional SCSI, SAS, or SATA
interfaces.

• For the end user, these look like locally connected drives.
19
Fiber Channel over Ethernet (FCoE)
• Use FC over an Ethernet network.
• No specialized network hardware required
• Ethernet means a single broadcast domain with
no routable information.

20
iSCSI (Internet Small Computer Systems Interface)
▪ iSCSI is a Storage Area Network (SAN) protocol that allows for SCSI command transmission over a
TCP/IP network
▪ Similar to FC, iSCSI allows for the sharing of I/O devices over network using SCSI commands.
• Reuse Ethernet Network by encapsulating SCSI commands into IP packets that don't require an
FC connection.
▪ iSCSI maintains the SCSI notion of an Initiator and Target device
▪ Just another protocol created by IBM and CISCO and now an RFC standard

IP Network
Client Host Storage Server Host
(iSCSI Initiator) TCP/IP connections (iSCSI Target)

21
Network-attached storage (NAS)
• File-level (vesus block-level storage) storage server accessed over
a computer network.
• Networked appliances that contain one or more storage drives
• NAS provides both storage and a file system.
• SAN provide only block-device access.
• NAS = file server, SAN = disk over network
• Provide access to files using network file sharing protocols such
as NFS, SMB, or AFP.

22
Data Deduplication
▪ Duplicate data is deleted leaving, only one copy of the data to be stored.

▪ Compare new data block to existing data blocks.


▪ If contents of new block are unique then store it in the disk.
▪ But if it is a duplicate of existing blocks then don’t store again but create a
reference.

▪ Only one unique instance of the data is retained on storage media (e.g., disk).
Redundant data is replaced with a pointer to the unique data copy.

23
Deduplication Methods
▪ In-line deduplication:
▪ Hash calculations are created as the data is entered in real time.
▪ If the target device identifies a block that has already been stored then it simply
references to the existing block.

▪ Pros: Inline deduplication significantly reduces the raw disk capacity


needed in the system since the full, not-yet-deduplicated data set is never
written to disk

▪ Cons: However, “because hash calculations and lookups takes so long,


data writes can be slower thereby reducing the backup throughput of the
device.”

▪ What is off-line deduplication?

24
References
• SAN: https://en.wikipedia.org/wiki/Storage_area_network
• NAS: https://en.wikipedia.org/wiki/Network-attached_storage

25

You might also like