Data Storage in DS
Prof. Tahar Kechadi
School of Computer Science
Learning Objectives
Explain cloud-based storage solution
Explain cloud-based databases
Benefits and limitations
Case studies
1
Evolution of Network Storage
File server
n Server with large disk capacity
n Sharing, replication and storage of large files
Storage-area networks (SAN)
n Storage devices connected directly to network
Network-attached storage (NAS)
Cloud-based Data Storage
NAS
LAN
Win2k Linux Unix
Generic Generic
Application
Servers
NAS Appliances
2
SAN Architecture
Interconnection
n Fibre Channel
n iSCSI protocol
Internet Small Computer System Interface
Network standard for linking data storage facilities
Enable the transfer of SCSI packets over a TCP/IP (Ethernet) network
Hard Drives
n The Logical Block Addressing (LBA)
n File Systems
Hard Drive File Systems
FAT (File Allocation Table)
Cluster Free / Next / Final LBA
0 1 0
1 5 4
items.txt
…
10 14 40
11 free 44
12 free 48
13 free 52
14 final 56
…
62 free 248
63 free 252
3
RAIDs
RAID
n Redundant Array of Inexpensive Disks
RAID Access
n Reading/writing information from a set of disks at the same time
Reliability
n Add parity and/or mirroring information on multiple disks of the array
Performance
n Improving performance and/or reliability of the storage device
Configuration
n RAID 0, RAID 1, RAID 2, RAID 3, etc.
7
Examples of RAIDs
RAID 0
RAID 5
RAID 1
4
Advantages of SANs
Reliability
n Data striping across multiple volumes
n Reconstruction of the file content
Performance
n Less system overhead
Compatibility
n Support common file systems
Backup
n Ease of performing backups
Cloud-Based Data Storage
Data storage resides in the cloud
Data Access
n Web browser interface
n Mounted disk drive: appear locally
n Set of API calls
Examples
n Dropbox, Google Drive, OneDrive, HomePipe, etc.
10
5
JustCloud
Unlimited Cloud Storage
Access Files Anywhere
Sync Multiple Computers
Share files
Sync Folder, Backup file
Data Security: 256-bit
Mobile Apps, Tracking …
Free Account: 15Mb storage, 50 files, 14
days
Personal, Business accounts
12
Carbonite
Unlimited Cloud Storage
Access Files Anywhere, Sync Multiple Computers
Share files, Sync Folder, Backup file
Data Security: 128-bit
Free Account: 15days
Personal, Pro, Server
13
6
Cloud-Based Data Storage Advantages
Scalability
n Scale storage capacity (up or down)
Pay as you use
Reliability
n Transparent data replication
Ease of access
n Support web-based access
Ease of use
n Remote file storage area -> logical drive
14
Cloud-Based Data Storage Disadvantages
Performance
n Data accessed over the Internet
Security
n Data in the cloud?
n Encrypt the files, (BoxCryptor)
Data orphans
n Abandon data in cloud storage facilities -> confidential data at risk
15
7
Cloud-based Backup Systems
Data backup
n Encrypted format
Scheduling
n When backup operations are to occur
Retrieving
n Retrieving backup files easily
Support multi platforms
17
Industry-Specific: Example
Different data storage and access requirements
Healthcare Industry
n Secure electronic medical records
Example: MS HealthVault
n Store medical records, prescriptions, measurements
n Share to GP, healthcare personnel, family members
n Set an expiration date
18
8
Understanding File Systems
OS File systems (FS)
n Handling storage, retrieval of files to/from a local disk
n File operations: copy, delete, create, move,…
Network File Systems (NFS)
n Handling files residing on devices across the network
Cloud File Systems (CFS)
n Handling files residing on the cloud
19
NFS
Network File System
20
9
NFS
21
Google File Systems (GFS)
A scalable distributed file system for large distributed data
intensive applications
Large, distributed, highly fault-tolerant file system
Multiple GFS clusters currently have:
n 1000+ storage nodes
n 300+ TeraBytes of disk storage
n Heavily accessed by hundreds of clients on distinct machines
22
10
GFS Architecture
A cluster consists of a single master & multiple chunk-servers & is
accessed by multiple clients
23
GFS Master
Maintains all file system metadata
n names space, access control info, file to chunk mappings, chunk (including
replicas) location, etc.
Periodically communicates with chunk-servers in HeartBeat messages to
give instructions and check state
Read/write: client contacts Master to get chunk locations, then deals
directly with chunk-servers
24
11
GFS Chunk-server
Files are broken into chunks. Each chunk has an immutable
globally unique 64-bit chunk-handle
n handle is assigned by the master at chunk creation
Chunk size is 64 MB
Each chunk is replicated on 3 (default) servers
25
GFS Client
Linked to apps using the file system API
Communicates with master and chunk-servers for reading
and writing
n Master interactions only for metadata
n Chunk-server interactions for data
Only caches metadata information
n Data is too large to cache
26
12
GFS Chunk Location
Master
n does not keep a persistent record of locations of chunks and replicas
Chunk-Servers
n Master polls chunk-servers at startup and when chunk-servers join or
leave
HeartBeat Messages
n Stays up to date by controlling placement of new chunks and through
HeartBeat messages (when monitoring chunk-servers)
27
Cloud-based Databases
Databases
n Used by applications residing in the cloud
n Used by applications residing within the customer’s data centre
28
13
Cloud-Based Databases Advantages
Cost-effective database scalability
n Scale dynamically
n Pay-as-you-go
High availability
n Reside on redundant hardware
High data redundancy
n DB is replicated
Reduced administration
n Maintain the database updates and patches
29
Cloud-Based Databases Disadvantages
Data security concerns
n …
Performance
n Data queries travel through the Internet
30
14
Cloud-Based Block Storage
Block of data storage
n Fixed-size of sequence of bits
n Size of block corresponds to an underlying unit of storage
n Applications with very large blocks of data
Cloud-based block storage device
n Amazon ESB
Block size up to a terabyte
Reliable, scalable
31
Go raibh maith agat
32
15