0% found this document useful (0 votes)

11 views43 pages

PDB 2 Session Persistence

The document discusses data persistence in hard disk drives, highlighting their characteristics, advantages, and limitations. It covers topics such as hard disk organization, I/O timings, and the evolution of hard disk capacity over time. Additionally, it addresses the importance of disk cache and the requirements for different types of data storage applications.

Uploaded by

k8schem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views43 pages

PDB 2 Session Persistence

Uploaded by

k8schem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Prof Joseph G Vella © - PDBMS Data Persistence

Practical DBMS
Prof. Joseph G. Vella ©
Dept. of Computer Information Systems
JOSEPH.G.VELLA@UM.EDU.MT

Data Persistence
Devices, Aggregate Devices , and Techniques

1
Prof Joseph G Vella © - PDBMS Data Persistence

The Scope of Things – A Simple Start

Joseph G Vella (c) - Practical DBMS 3 Data Persistence

Hard disks

2
Prof Joseph G Vella © - PDBMS Data Persistence

The hard facts

• Hard disk drives offer data persistence.

• Hard disk are:

– Inexpensive (per unit capacity);
– But fragile.

• Hard disk offer direct and sequential access:

– But:
• Have very (very) slow access speed;
• Have a limited bandwidth (a max transfer rate exist);
• Orginisation is based blocks (e.g., 4k).

• Hard disk and [OS] file systems development have advanced

concurrently.

Joseph G Vella (c) - Practical DBMS 5 Data Persistence

Hard Disks and File Systems

Rendered
files

Logical Reading & Writing

Orginisation

Padding & Unpadding

Physical
files

Joseph G Vella (c) - Practical DBMS 6 Data Persistence

3
Prof Joseph G Vella © - PDBMS Data Persistence

Hard Disk Capacity Over Time

Joseph G Vella (c) - Practical DBMS 7 Data Persistence

YEAR MANUFACTURER COST/GB

Unit Data Space
1956 IBM $1,000,000 Cost on Hard Disks
1980 North Star $193,000
1981 Morrow Designs $138,000
1983 Davong $119,000
1984 Pegasus (Great Lakes) $80,000
1985 First Class Peripherals $71,000
1987 Iomega $45,000
1988 IBM $16,000
1990 First Class Peripherals $12,000
1991 Western Digital $9,000
1992 Iomega $7,000
1994 Iomega $2,000
1995 Seagate $850
1996 Maxtor $259
1997 Maxtor $93
1998 Quantum $43
1999 Fujitsu IDE $16
2000 Maxtor 7200rpm UDMA/66 $9.58
2001 Maxtor 5400 rpm IDE $4.57
2002 Western Digital 7200 rpm $2.68
2003 Maxtor 7200 rpm IDE $1.39
2004 Western Digital Caviar SE $1.15
2011 WD Caviar Green (3 TB for $140) $0.05

Joseph G Vella (c) - Practical DBMS 8 Data Persistence

4
Prof Joseph G Vella © - PDBMS Data Persistence

Forbes - Sales of HDD

(by quarter for Seagate, WD, Toshiba)
(Average Sales Price)

Joseph G Vella (c) - Practical DBMS 9 Data Persistence

Forbes – 2020 Projection of Drive Delivery

(by Application)

Joseph G Vella (c) - Practical DBMS 10 Data Persistence

5
Prof Joseph G Vella © - PDBMS Data Persistence

The main parts

Joseph G Vella (c) - Practical DBMS 11 Data Persistence

Same device – but more parts shown!

Joseph G Vella (c) - Practical DBMS 12 Data Persistence

6
Prof Joseph G Vella © - PDBMS Data Persistence

Head Stack Assembly for Rotary Actuator

Joseph G Vella (c) - Practical DBMS 13 Data Persistence

Rotary Actuator with top magnet removed to

view voice call.

Joseph G Vella (c) - Practical DBMS 14 Data Persistence

7
Prof Joseph G Vella © - PDBMS Data Persistence

High Level Components of a disk drive

Joseph G Vella (c) - Practical DBMS 15 Data Persistence

Disk Space Organisation (i):

Not all
sectors,
tracks,
platters are
available for
storage!?

Each sector contains

some overhead – like The rotations per minute
management and ECC (RPM) are impressive: from
data 4500 RPM to 18000 RPM.

Joseph G Vella (c) - Practical DBMS 16 Data Persistence

8
Prof Joseph G Vella © - PDBMS Data Persistence

Disk Space Orginisation (i): continued

Not all space is

available for
storage!? I.e.
whole tracks
and platters are
sometimes
reserved

Joseph G Vella (c) - Practical DBMS 17 Data Persistence

Disk Space Orginisation (ii):

The outer tracks have

more sectors –
circumference is longer!
- Multi Zoning

Joseph G Vella (c) - Practical DBMS 18 Data Persistence

9
Prof Joseph G Vella © - PDBMS Data Persistence

Disk Space Organisation (iii):

• A sector is the smallest unit of data that
can be read or written from a disk.
• A cluster is the smallest unit of data that
a file system can allocate for a file. Each
cluster has a fixed size that is always a
multiple of the sector size.
– A file is stored optimally on disk as a
series of contiguous clusters (clusters
that are in order on disk).
– When a file is split into multiple
clusters on different areas of the disk
and this is called external
fragmentation.
• A track is a concentric ring of sectors on a
platter. A read/write head can read all the
data from a certain track by moving to a
position and then rotating the platter.
• A cylinder is a group of tracks in all the
platters that are on top of each other.

Joseph G Vella (c) - Practical DBMS 19 Data Persistence

Hard Disk Characteristic Measures:

seek, rotational delay (latency), and transfer time

1. Move
Head

2. Wait for
sector

RAM 3. Move
Transfer Time sector’s data

Important: Rotational latency is on average

half the time of a complete turn.
Joseph G Vella (c) - Practical DBMS 20 Data Persistence

10
Prof Joseph G Vella © - PDBMS Data Persistence

Hard Disk I/O Timings

• Time of I/O (abbreviated as IO) is expressed in terms of three

characteristic measures:

𝑇𝐼𝑂 = 𝑇𝑠𝑒𝑒𝑘 + 𝑇𝑟𝑜𝑡𝑎𝑡𝑖𝑜𝑛 + 𝑇𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟

• The rate of I/O (abbreviated as IO) is expressed as size of data

blocks transferred over the time it takes:

𝑆𝑖𝑧𝑒𝐷𝑎𝑡𝑎𝐵𝑙𝑜𝑐𝑘𝑠
𝑅𝐼𝑂 =
𝑇𝐼𝑂

Joseph G Vella (c) - Practical DBMS 21 Data Persistence

Disk Space Orginisation (iv):

• A sector is the smallest unit of read/write storage. A sector is typically
512 bytes in size.
– A cluster is usually 4096 bytes.
– Most drives offer only sector level atomicity.
• Torn writes possible in case of cluster writes.

• A track is an important unit of storage as it holds all sectors, from a

disk platter, that can be read without moving the actuator from a
surface.
– There can be thousands of tracks on a surface.

• A cylinder is an important unit of storage too:

– It is the total storage accessible for reading and writing without
moving the actuators!
• Therefore only one seek time is required (i.e. to get to the required
cylinder)
• There are as many cylinders as tracks!

Joseph G Vella (c) - Practical DBMS 22 Data Persistence

11
Prof Joseph G Vella © - PDBMS Data Persistence

Hark Disk – expected self evident facts

• It is expected that:
time for accessing two blocks in succession is faster for two
neighboring blocks than distant blocks.

• One can also assume (within certain limits!?) that:

accessing blocks in a contiguous stream in a sequential read is
much faster than reading the blocks one by one in direct mode.

Joseph G Vella (c) - Practical DBMS 23 Data Persistence

Helping Sequential Reads (i)

0
11 1

10 2

9 3

8 4

7 5
6

• Track skew – angular offset should be long

enough to be just greater than seek time
required.
– Sequential scans that overlap cylinders are
avoiding rotational delay.

Joseph G Vella (c) - Practical DBMS 24 Data Persistence

12
Prof Joseph G Vella © - PDBMS Data Persistence

Helping Sequential Reads (ii)

0
11 1

10 2
0
7 5
9 3

2 10
8 4

9 3
7 5
6
4 8

11 1
• Interleaving – jump should be long 6
enough to be just greater than
transfer time required.
– This has become rare in the
presence of track buffering!

Joseph G Vella (c) - Practical DBMS 25 Data Persistence

Helping Hard Disks – Disk Cache

• All modern drives come with an ‘on-board’ memory cache (RAM).
– It is sometimes called buffer, or even track buffer.
– It is diminutive (from 16MB) compared to a drive persistent capacity.

• It holds disk blocks read from the drive and blocks to be written to the it.

• Other than holding a queue of blocks the disk drive might read a cluster
of sectors into it, rather than a single sector, to anticipate forthcoming
requests.
– What about writing back a cluster (i.e. sector at a time) can the cache
help?
• It depends what is needed?
If writer requires confirmation of write (write through) then ‘Not
much’.
If writer does not require write notice (write back / immediate
reporting) then ‘Yes’.
– (if un-aided this cluster write is in peril on drive failure.)

Joseph G Vella (c) - Practical DBMS 26 Data Persistence

13
Prof Joseph G Vella © - PDBMS Data Persistence

Average disk seek time approaches 1/3 of full seek time.

to
0 1 2 3 4 5 6 7 8
from 0 0 1 2 3 4 5 6 7 8
1 1 0 1 2 3 4 5 6 7
2 2 1 0 1 2 3 4 5 6
3 3 2 1 0 1 2 3 4 5
4 4 3 2 1 0 1 2 3 4
5 5 4 3 2 1 0 1 2 3
6 6 5 4 3 2 1 0 1 2
7 7 6 5 4 3 2 1 0 1
8 8 7 6 5 4 3 2 1 0

Average: 2.963 This is a model! In

reality time is not
linearly proportional to
inter distance!?
Joseph G Vella (c) - Practical DBMS 27 Data Persistence

Average disk seek time

is 1/3 of full seek time.

Joseph G Vella (c) - Practical DBMS 28 Data Persistence

14
Prof Joseph G Vella © - PDBMS Data Persistence

Hard Disk Characteristics:

Sequential vs Direct / Random access

What drive operations are required to support and execute these?

Joseph G Vella (c) - Practical DBMS 29 Data Persistence

Generic Disk Requirements

• Data Server:
– High RPM
– Lowest seek times
– High transfer band width
• Personal Computers
– Capacity & lowest cost per unit storage
• Laptops
– Sturdy
– Lowest power consumption (low RPM, few platters, etc)
• Home entertainment
– Low mechanical noise!

Joseph G Vella (c) - Practical DBMS 30 Data Persistence

15
Prof Joseph G Vella © - PDBMS Data Persistence

How to read Hard Disk numbers

• Disk parameters, for example one could read:

– Transfer size is 8K bytes
– Advertised average seek time is 12 ms
– Disk spins at 7200 RPM
– Transfer rate is 4 MB/sec
– Disk cache available
• Controller overhead is 2 ms
• Assume that disk is idle so no queuing delay
• What is average disk access time for a sector?
– Avg seek + avg rotational delay + transfer time + controller overhead
– 12 ms + 0.5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms
– 12 + 4.15 + 2 + 2 = 20 ms
• Advertised seek time assumes no locality: typically, 1/4 to 1/3
advertised seek time: 12 ms  3-4 ms

Joseph G Vella (c) - Practical DBMS 31 Data Persistence

A sample of disk drives!

Seagate Drives Cheetah 15k.5 Barracuda
(data server) (PC)

Capacity (GB) 300 1,000

RPM 15,000 7,200

Average Seek (ms) 4 9

Max. Transfer Rate (MB/s) 125 105

Platter 4 4

Cache (MB) 16 16/32

Interface SCSI SATA

Joseph G Vella (c) - Practical DBMS 32 Data Persistence

16
Prof Joseph G Vella © - PDBMS Data Persistence

Direct Access Workload:

Request small reads from anywhere on disk (e.g. 4K)
𝑆𝑖𝑧𝑒𝐷𝑎𝑡𝑎𝐵𝑙𝑜𝑐𝑘𝑠
𝑇𝐼𝑂 = 𝑇𝑠𝑒𝑒𝑘 + 𝑇𝑟𝑜𝑡𝑎𝑡𝑖𝑜𝑛 + 𝑇𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑅𝐼𝑂 =
𝑇𝐼𝑂

• Cheetah Drive • Barracuda Drive

✓ Tseek = 4 ms {use average seek} ✓ Tseek = 9 ms {use average seek}

✓ Trot = (1/2)*(1/15000)*60 s ✓ Trot = (1/2)*(1/7200)*60 s
= (1/2)*(1/250)*1000 ms = (1/2)*(1/120)*1000 ms
= 2 ms = 4.1 ms
✓ Ttran = size / max transfer ✓ Ttran = size / max transfer
= 4096 / 125 * 106 s = 4096 / 105 * 106 s
= 32 micro s = 39 micro s
❑ TIO = 6 ms ❑ TIO = 13.1 ms

✓ RIO = 4096 / .006 ✓ RIO = 4096 / .0131

❑ RIO = 0.68 MB/s ❑ RIO = 0.31 MB/s
Joseph G Vella (c) - Practical DBMS 33 Data Persistence

Sequential Access Workload:

Request a long sequence of contiguous blocks (32 MB)
𝑆𝑖𝑧𝑒𝐷𝑎𝑡𝑎𝐵𝑙𝑜𝑐𝑘𝑠
𝑇𝐼𝑂 = 𝑇𝑠𝑒𝑒𝑘 + 𝑇𝑟𝑜𝑡𝑎𝑡𝑖𝑜𝑛 + 𝑇𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑅𝐼𝑂 =
𝑇𝐼𝑂

• Cheetah Drive • Barracuda Drive

We do a seek, wait for right sector and then have a long transfer.
✓ Tseek = 4 ms {use average seek} ✓ Tseek = 9 ms {use average seek}
✓ Trot = (1/2)*(1/15000)*60 s ✓ Trot = (1/2)*(1/7200)*60 s
= (1/2)*(1/250)*1000 ms = (1/2)*(1/120)*1000 ms
= 2 ms = 4.1 ms
✓ Ttran = size / max transfer ✓ Ttran = size / max transfer
= 33554432 / 125 * 106 s = 33554432 / 105 * 106 s
= .27 s = .32 s
❑ TIO = .27 s ❑ TIO = .32 s

✓ RIO = 33554432 / .27 ✓ RIO = 33554432 / .32

❑ RIO = 124 MB/s ❑ RIO = 104 MB/s
Joseph G Vella (c) - Practical DBMS 34 Data Persistence

17
Prof Joseph G Vella © - PDBMS Data Persistence

Comparison of Direct & Sequential Examples

Transfer Rate per job

Cheetah Barracuda
Direct
(1 block of 4k) 0.68 0.31 MB/s
Sequential
(32MB) 124.00 104.00 MB/s

Total Time per job

Cheetah Barracuda
Direct
(1 block of 4k) 6.0 13.1 ms
Sequential
(32MB) 270.0 320.0 ms

Cheetah was designed as a high performance SCSI disk drive.

Joseph G Vella (c) - Practical DBMS 35 Data Persistence

Progress with Hard Disks:

but an apparent paradox appears
• Compare:
– The rate of growth in capacity (over time);
With the
– The rate of progress in seek (average) (over time).
• Continued advance in capacity (60%/yr) and bandwidth
(40%/yr.)
• Slow improvement in seek, rotation (8%/yr)

• Time to read whole disk

Year Sequentially Directly
1990 4 minutes 6 hours
2000 12 minutes 1 week

• It is apparent that the rate of growth is much faster than seek

time (average):
– Effectively we have more capacity with slower access!?

Joseph G Vella (c) - Practical DBMS 36 Data Persistence

18
Prof Joseph G Vella © - PDBMS Data Persistence

Have we reached a performance plateau?

• Physics is universal and has limits, and if these are reached then the
limits are real.
• We have hinted that cache, at different levels (e.g., h/d, controller,
operating system, DB engine) can offer some interesting advantages
• But there are some other exciting possibilities with clever engineering
of hard disk drives. For example:
– Double and independent actuators
(two read write heads on each plater)
– Two actuators is a very difficult engineering
problem because the heads needs to be
aligned together!

“Seagate lists the sustained, sequential transfer rate of the Mach.2 as

up to 524MBps—easily double that of a fast "normal" rust disk and
edging into SATA SSD territory. The performance gains extend into
random I/O territory as well, with 304 IOPS read / 384 IOPS write and
only 4.16 ms average latency. (Normal hard drives tend to be 100/150
IOPS and about the same average latency.)”

Joseph G Vella (c) - Practical DBMS 37 Data Persistence

Empirical analysis on multiple servers (heads)

effect on rotational delay (rd)

Given two servers (or three heads)

• Try multiple accesses
– For each head generate a random
number (0< rd <=1)
– Calculate the minimum of:
• Of the first two
• Of the three
– Repeat it x times (e.g., 128)
– Aggregate the stats
• First, second, third server
– Average the rd
» should be close to .5
– Minimum & maximum should
cover the range 0 to 1
• First two, three
– Average the min rd
» Should be close to .35 for 2
» Should be close to .27 for 3

Joseph G Vella (c) - Practical DBMS 38 Data Persistence

19
Prof Joseph G Vella © - PDBMS Data Persistence

Hard Disk Interfaces

Model of a disk drive attached to a host system

Joseph G Vella (c) - Practical DBMS 40 Data Persistence

20
Prof Joseph G Vella © - PDBMS Data Persistence

ATA & Serial ATA Configuration Controllers

Joseph G Vella (c) - Practical DBMS 41 Data Persistence

SCSI & Serial Attached SCI (SAS) Configuration

Controllers

Joseph G Vella (c) - Practical DBMS 42 Data Persistence

21
Prof Joseph G Vella © - PDBMS Data Persistence

Journey of a Byte
write(textfile, ch, 1); -- ch is assigned ‘P’

Joseph G Vella (c) - Practical DBMS 43 Data Persistence

Journey of a Byte

Joseph G Vella (c) - Practical DBMS 44 Data Persistence

22
Prof Joseph G Vella © - PDBMS Data Persistence

Journey of a Byte

Joseph G Vella (c) - Practical DBMS 45 Data Persistence

Journey of a Byte

Joseph G Vella (c) - Practical DBMS 46 Data Persistence

23
Prof Joseph G Vella © - PDBMS Data Persistence

Journey of a Byte

Joseph G Vella (c) - Practical DBMS 47 Data Persistence

HD, OS and FS Interaction: And Raw Device Access

The DBMS
has direct
User
Application access to
Interface
HD … by-
DBMS
passing
OS & FS.

OS & FS

H/W e.g HD

It’s origin is from Unix; i.e. Raw

(Block) Device. Raw device access is
available even in Linux too!

Joseph G Vella (c) - Practical DBMS 48 Data Persistence

24
Prof Joseph G Vella © - PDBMS Data Persistence

(External) Fragmentation
• External Fragmentation happens!?
• The file’s spanning might not be
contiguous.
• How does it affect:
– Sequential reads
• It interrupts flow by
introducing a seek access …
– Direct Access
• Superficially none.
• BUT!
– It can break the locality
advantage …
– Allocation
• Although space is available
it’s not contiguous –
consequently increasing the
problem.

Joseph G Vella (c) - Practical DBMS 49 Data Persistence

I/O Systems transfer mode

Joseph G Vella (c) - Practical DBMS 50 Data Persistence

25
Prof Joseph G Vella © - PDBMS Data Persistence

(Internal) Fragmentation

• Any data file needs to be spanned into a list of disk blocks (i.e.
sectors).
– Data files can't share any sector!
• Consider a data file of 6K (i.e. 6,000) bytes.
– The following table shows sector space utilisation.
Sectors in 1 2 4 8 16 32 64
Cluster

Bytes in Cluster 512 1024 2048 4096 8192 16384 32768

Clusters needed 12 6 3 2 1 1 1

File size on disk 6144 6144 6144 8192 8192 16384 32768

Disk space 98% 98% 98% 73% 73% 37% 18%

utilisation

Joseph G Vella (c) - Practical DBMS 51 Data Persistence

Hard Disk Internal & External Fragmentation

• There is an interesting study from Leffler et al 1989

(BSD Unix fame) about File System block size,
internal fragmentation, and access speed.
– These statistics are dated but most of the insights are
still valid.

Also, of interest is a preceding paper by:

Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry
A Fast File System for UNIX
ACM Trans. on Computer Systems 2(3), August 1984, pp. 181-197.

Joseph G Vella (c) - Practical DBMS 52 Data Persistence

26
Prof Joseph G Vella © - PDBMS Data Persistence

Issue of Which (i.e., to read and write)

• If where has to do with data placement into blocks … and issues
of external fragmentation.
• If how has to do with data packing into blocks … and issues of
internal fragmentation.
• We are still to discuss the ordering of disk bound operations:
– E.g., should be prefer a read from an outer sector rather than a
read from an inner sector?
– E.g., should we write before we read?
• The fact that an OS takes care of this is not the point:
1. DBMS still needs a model for read & write sequencing;
2. If a DBMS has raw (and direct) access to a hard disk then surly
the OS is out of the picture (does not interfere by the DBMS
planning).
• A common (and naïve) approach is Shortest Seek Time First.
– Considers actuator current location, orders request and
shortest is executed.
• Risk of Starvation …
Joseph G Vella (c) - Practical DBMS 53 Data Persistence

Aggregate Systems

27
Prof Joseph G Vella © - PDBMS Data Persistence

Requirements

• HOW TO MAKE A LARGER, FASTER, & MORE RELIABLE DISK?

– New techniques, and
– Some trade-offs required.

• One technique is to use multiple disks in an aggregate to build a

faster, bigger, and more reliable disk system.
– These are called RAID units.

• To the F/S the RAID unit is a big, fast, and reliable (single) disk.
– When a F/S issues a logical I/O request to the RAID, it internally
must calculate which disks to access in order to complete the
request, and then issue one or more physical I/Os to do so.

Joseph G Vella (c) - Practical DBMS 55 Data Persistence

Reliability:
Remember Disks Fail Badly!?

• Typical Numbers for SCSI drives is 1.2 million hours!

• Seagate quote a Barracuda 7200.7 model with 600,000-hour MTBF.

– i.e., half the population / farm should fail in the first 600K hours.

– Assuming:
• Units is in its first year of use;
• Tests results are in terms of arbitrary usage.

• What affects a unit’s reliability? Possibly:

– Number of Platters;
– Seek usage pattern (called duty cycle by engineers);
– Temperature; and
– Power consumption.

Joseph G Vella (c) - Practical DBMS 56 Data Persistence

Some Definitions
• Disk Shadowing
– Making two or more copies of data written to a disk drive.

• Disk Duplexing
– A method of storing data whereby the data from one hard disk
is duplicated onto another, with each using its own hard disk
controller.
• In contrast to Disk Mirroring

• Disk Mirroring
– A method of storing data whereby the data from one hard disk
is duplicated on another, with both hard disks sharing a single
hard disk controller
• In contrast to Disk Duplexing

Joseph G Vella (c) - Practical DBMS 57 Data Persistence

Conventional Computer System:

Note: Every component represents a point of failure

Joseph G Vella (c) - Practical DBMS 58 Data Persistence

Multi Controllers and Disk Mirroring reduce the

likelihood of total disk failure.
In this case the system can survive the failure of any single disk or single
disk controller.

Joseph G Vella (c) - Practical DBMS 59 Data Persistence

Dual-ported disks with multiple controllers further

enhance fault tolerance.
In this configuration the failure of a disk and controller is tolerated.

Joseph G Vella (c) - Practical DBMS 60 Data Persistence

A fault-tolerant system in which all components are

replicated

Joseph G Vella (c) - Practical DBMS 61 Data Persistence

A Reliability Indicator

• Mean Time Between Failures (MTBF) is measured by averaging the

timespans a unit is continuously functional (time between
successive unplanned and un self manageable down time).

σ 𝑻𝒊𝒎𝒆 𝒐𝒇 𝒇𝒂𝒊𝒍𝒖𝒓𝒆 − 𝑻𝒊𝒎𝒆 𝒓𝒆𝒔𝒕𝒂𝒓𝒕𝒆𝒅

𝑴𝑻𝑩𝑭 =
𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒇𝒂𝒊𝒍𝒖𝒓𝒆𝒔

• Manufacturers claim that a

drive population failures over
time follows this graph.

Joseph G Vella (c) - Practical DBMS 62 Data Persistence

Google’s Study of HD fleets (2005-2006):

100k ATA consumer drives with 5400-7200 RPM and 80 to 400 GB capacity

• Annualise failure rates by age

groups over a nine months
Failure rate co-relates with drive
model and age is confirmed in this
study too.

• Annualise failure rates by age

groups / seek usage pattern over a
nine months
The data extracted is a surprise as
it shows that intensive usage
does not play a role from 1- to 4-
year-olds.
Joseph G Vella (c) - Practical DBMS 63 Data Persistence

Google’s Study of HD fleets (continued)

• Although stated as a
primary cause of faults,
Google’s study suggests
there is more to it:
– Failures do not co-relate
with increase in
temperature.
– But failure co-relate with
low temperatures.
• Failure rates pick-up at
higher temperatures.

Joseph G Vella (c) - Practical DBMS 64 Data Persistence

Individual HDs fail; right!?

But what’s in a farm then?

• Say we have a device that is stated to have a MTBF of 12 years.

– Therefore, a single such device has a 50% chance to be alive by
12 years.
• Say we have two such devices.
– Therefore, two devices being alive by 12 years is 25% chance.
• Say we have three such devices.
– Therefore, the three devices being alive by 12 years is 12.5%
chance.

• Therefore, the more devices present into a single system the

less likely all devices are working by their MTBF.

Joseph G Vella (c) - Practical DBMS 65 Data Persistence

RAID ("Redundant Array of Inexpensive Disks")

• An aggregated unit is built from a number of simpler (e.g.

cheaper) disks, CPUs, & RAM.
– It offers larger, faster, and more reliable set-ups.
• There are a number of structural layouts;
– RAID 1 to 5 being the original set-up proposed by Patterson
Gibson & Katz in 1987.
– These structures differ in terms of data placement and the
type of redundancy built.
– Other structures are built by mixing and matching from the
original designs – RAID6 is RAID5 and stripping;
• Also, industry have changed the I to “independent”!
• Both ‘software’ and ‘hardware’ solutions are available.
• Many operating systems directly (and transparently) support
these units too.
Joseph G Vella (c) - Practical DBMS 66 Data Persistence

A RAID Evaluation is based on the following:

• Capacity
– Is the aggregate of N drives, what portion of these are used for
storage?
• For full redundancy we have N/2.
• Reliability
– What type of faults can a system withstand?
– How many faults can a system tolerate?
• Performance
– Different workloads are expected to have different measures.

Joseph G Vella (c) - Practical DBMS 67 Data Persistence

RAID0 (Not a RAID really!?)

• Data is divided into blocks (or chunks) and
then these are written across an array.
– This is called "striping".
• Requires at minimum two drives.
• No recovery feature is provided in case of a
disk failure – i.e. no redundancy present.
– As more disks are added, the higher the
possibility that a disk failure occurs.
• This process enables high level performance
as parallel access to the data on different
disks improves speed of retrieval.
• Typical application: e.g. in graphic image
processing.

Joseph G Vella (c) - Practical DBMS 68 Data Persistence

Chunk Sizes

• Chunk size and performance are somewhat related.

• The smaller the chunk is the more pieces a file requires and
more spread over disks;
– Positive: Parallelism for read and writes.
– Negative: Seek time per drive is involved.
• The larger chunk reduces the pieces required and have a less
spread over disks;
– Positive: Fewer seek time.
– Negative: Less parallelism.

• Observation: it’s hard to work out an ideal chunk size, one way
to address the problem is by building a workload profile.
– A workload is a mix of direct and sequential writes.

Joseph G Vella (c) - Practical DBMS 69 Data Persistence

RAID1
• This level uses "mirroring" to copy data onto
two disk drives simultaneously.
– Possible twice the Read transaction rate
of single disks, same Write transaction
rate as single disks.
• RAID 1 provides failure tolerance.
– If one disk fails, then other maintains the
data.
• Reestablishing RAID1 requires straight
copying from live drive to new drive.
• Storage cost doubles as duplicating all data
means only half the total disk capacity is able
for storage.

• Application: high availability requirement.

Joseph G Vella (c) - Practical DBMS 70 Data Persistence

RAID0+1
• RAID 0+1 is implemented as a
mirrored array whose segments
are RAID 0 arrays
• Use two set-ups in one array.
– Both data duplication and
improved access speed are
possible.
• High I/O rates are achieved
thanks to multiple stripe
segments.
• Four drives are the minimum.

• A single drive failure will cause the

whole set-up to become a RAID 0:
– Expensive;
– High storage overhead.

• Application: File server

Joseph G Vella (c) - Practical DBMS 71 Data Persistence

RAID5
• Uses a technique that avoids the
concentration of I/O on a dedicated parity
disk by writing it separately across multiple
disks.
– Three drives are required as a minimum.
• “Write penalty” still occurs as existing data
must be pre-read before update and parity
data has to be updated after the data is
written.
• RAID 5 enables multiple write orders to be
implemented concurrently because updated
parity data is dispersed across the multiple
disks.
• Highest Read data transaction rate.
• Medium Write data transaction rate.
• Difficult to rebuilt once a unit faults; when
compared to RAID1,
• Widely applicable: database and web servers

Joseph G Vella (c) - Practical DBMS 72 Data Persistence

RAID10
• Not exactly like RAID0+1, as in RAID10 first
we mirror and then strip.
• RAID 10 is implemented as a striped array
whose segments are RAID 1 arrays.
– Requires a minimum of four drives.
• RAID 10 has the same fault tolerance as
RAID level 1
• RAID 10 has the same overhead for fault-
tolerance as mirroring alone

• High I/O rates are achieved by striping RAID

1 segments
• Under certain circumstances, RAID 10 array
can sustain multiple simultaneous drive
failures
• Excellent solution for sites that would have
otherwise gone with RAID 1 but need some
additional performance boost
Joseph G Vella (c) - Practical DBMS 73 Data Persistence

RAID
Capacity, Reliability & Performance Comparison

Joseph G Vella (c) - Practical DBMS 74 Data Persistence

Persistent Memory
The wedge between the CPU stores
and persistent storage

Non-volatile random-access memory (NVRAM)

https://www.snia.org/education/what-is-persistent-memory

Joseph G Vella (c) - Practical DBMS 76 Data Persistence

What is NVRAM?

• NVRAM is a type of Random Access Memory (RAM) that retains its data
even when the main power is not available.
– Read access latency is quoted at: 100ns to 1000ns.
• Types of NVRAM:
– Uses SRAM that is made non-volatile by connecting it to an additional
power source, e.g., battery.
– Uses EEPROM (Electrically Erasable Programmable Read-Only Memory)
to save its data when power is not available. NVRAM has a combination
of SRAM and EEPROM semiconductors incorporated into one chip.
• Advantages
– NVRAM’s support high-speed data read/write operations for parallel
processing and DBMS cache.
– NVRAM can act as in-unit caches for HDD, SDD.
– NVRAM semiconductors are light on power consumption and backup
power exhaustion is unlikely to happen for a long time.
• Disadvantages
– Write to Read speed ratio is an issue (for performance).
– Still very iffy, production wise.
– Chips fail!

Joseph G Vella (c) - Practical DBMS 77 Data Persistence

Read
at your
leisure! An Example Package
• IP-NVRAM-1M Greenspring
Non-Volatile Memory Industry Pack Module
• Features:
• Single-Wide IndustryPack
• IndustryPack Wait State
• Lithium Battery
• The Greenspring IP-NVRAM Non-Volatile Memory IndustryPack
Module provides a convenient and reliable way to implement
non-volatile memory up to one megabyte in a single-high
IndustryPack.
Eight TSOP 128K x 8-bit low power SRAM chips and a lithium
battery provide 10 years (at room temperature) of maintenance-
free operation.
The IP-NVRAM powers-up ready to go. No software initialization
is required.
Four memory configurations were manufactured: 256 KB, 512 KB,
768 KB, and 1 MB. The one megabyte configuration and the 256
KB configuration are standard.
Access to the IP-NVRAM occurs in the IndustryPack memory
space.
A unique ID PROMS identifies the IP-NVRAM. Users or systems
integrators may add information to the ID PROMs to indicate
user-specific information.
• https://www.artisantg.com/TestMeasurement/59561-1/Abaco-Systems-SBS-Greenspring-IP-
NVRAM-1M-Non-Volatile-Memory-IndustryPack-Module

Joseph G Vella (c) - Practical DBMS 78 Data Persistence

Basic Idea of Performance measure with

NVRAM and Main Memory Databases
• The paper by (Hoya, 2019) used
an industry Main Memory
Database and applied a well-
known benchmarking suite on
various set-ups. E.g.;
– HDD (the baseline);
– NVMe-SSD;
– NVRAM Log Buffer;
– NVRAM Data Access and Log
buffer.
• Note the transaction throughput
results:
– NVMe-SSD is 11 times better
than the HDD
– NVRAM Log buffer is >100
times better than the HDD
• Observation:
– Look at the NVRAM Log
Buffer setup throughput, and
note it decreases with the
higher number of threads.
What could be the reason?
K. Hoya, K. Hatsuda, K. Tsuchida, Y. Watanabe, Y. Shirota and T. Kanai, "A
perspective on NVRAM technology for future computing system," 2019
International Symposium on VLSI Design, Automation and Test (VLSI-DAT),
Hsinchu, Taiwan, 2019, pp. 1-2
Joseph G Vella (c) - Practical DBMS 79 Data Persistence

SAN & NAS

Storage area network - SAN

• Computers and remote storage cabinets.

– Connection: SCSI over fiber-optic.
– Mounting of device allows storage to appear local.

• Is a dedicated network that provides access to consolidated,

block level data storage. SANs are primarily used to enhance
storage devices, such as disk arrays, tape libraries, and optical
jukeboxes, accessible to servers so that the devices appear like
locally attached devices to the operating system.

Joseph G Vella (c) - Practical DBMS 81 Data Persistence

Joseph G Vella (c) - Practical DBMS 82 Data Persistence

Joseph G Vella (c) - Practical DBMS 83 Data Persistence

Network-attached storage - NAS

• Computer and remote storage.

– Connection through NFS/CIFS [Common Internet FS] with
TPC/IP.
– Software: Logon and direct access facilities.

• Is file-level computer data storage connected to a computer

network providing data access to a heterogeneous group of
clients
• NAS systems are networked appliances which contain one or more
hard drives, often arranged into logical, redundant storage
containers or RAID.

Joseph G Vella (c) - Practical DBMS 84 Data Persistence

Joseph G Vella (c) - Practical DBMS 85 Data Persistence

Joseph G Vella (c) - Practical DBMS 86 Data Persistence

Chapter 6 - File - and - Storage
No ratings yet
Chapter 6 - File - and - Storage
63 pages
Notes 02 - Hardware
No ratings yet
Notes 02 - Hardware
62 pages
Lec 4
No ratings yet
Lec 4
29 pages
DBMS Storage & File Structures
No ratings yet
DBMS Storage & File Structures
45 pages
DBMS - Chapter 2 - Storage and File Structures
No ratings yet
DBMS - Chapter 2 - Storage and File Structures
118 pages
UNIT I Storage Fundamentals
No ratings yet
UNIT I Storage Fundamentals
63 pages
DBMS Internals: How Does It All Work?
No ratings yet
DBMS Internals: How Does It All Work?
94 pages
7 Disk Storage Architectures File Structures and Hashing Class22to24 8april2025
No ratings yet
7 Disk Storage Architectures File Structures and Hashing Class22to24 8april2025
59 pages
Data Storage and Access Methods: Min Song IS698
No ratings yet
Data Storage and Access Methods: Min Song IS698
50 pages
Module-4 Data Storage
No ratings yet
Module-4 Data Storage
78 pages
Storage and File Structure
No ratings yet
Storage and File Structure
60 pages
Secondary Storage & System Software
No ratings yet
Secondary Storage & System Software
31 pages
File Organization
No ratings yet
File Organization
93 pages
Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
No ratings yet
Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
28 pages
Secondary Storage Devices (1) :: Magnetic Disks
No ratings yet
Secondary Storage Devices (1) :: Magnetic Disks
56 pages
Lecture 15
No ratings yet
Lecture 15
19 pages
Storage and File Structure
No ratings yet
Storage and File Structure
55 pages
Classification o F Physical Storage Media
No ratings yet
Classification o F Physical Storage Media
21 pages
DBMS Architecture: Disks & Buffers
No ratings yet
DBMS Architecture: Disks & Buffers
50 pages
CH 1
No ratings yet
CH 1
39 pages
DE Unit-4
No ratings yet
DE Unit-4
35 pages
Dbms 5
No ratings yet
Dbms 5
14 pages
Scalable Storage for Physics DBs
No ratings yet
Scalable Storage for Physics DBs
22 pages
Database Files
No ratings yet
Database Files
121 pages
Chapter 4 - Storage Final
No ratings yet
Chapter 4 - Storage Final
22 pages
Storage and Indexing Overview
No ratings yet
Storage and Indexing Overview
100 pages
Bengal College of Engineering and Technology: Report On Storage Strategies
No ratings yet
Bengal College of Engineering and Technology: Report On Storage Strategies
15 pages
FULL
No ratings yet
FULL
449 pages
Secondary Storage
No ratings yet
Secondary Storage
50 pages
Dbms 5th Unit
No ratings yet
Dbms 5th Unit
30 pages
6 Data Storage and Querying
100% (1)
6 Data Storage and Querying
58 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
90 pages
Lecture 1 Edited-1
No ratings yet
Lecture 1 Edited-1
48 pages
03 Storage1
No ratings yet
03 Storage1
5 pages
Unit - V
No ratings yet
Unit - V
87 pages
Database Storage Media Overview
No ratings yet
Database Storage Media Overview
9 pages
Disk Storage & DBMS Basics
No ratings yet
Disk Storage & DBMS Basics
33 pages
Ch4-Data Storage and Indexing
No ratings yet
Ch4-Data Storage and Indexing
116 pages
Magnetic Disks & Storage Devices
No ratings yet
Magnetic Disks & Storage Devices
34 pages
File Organization-Lec5
No ratings yet
File Organization-Lec5
21 pages
6 Disks 4
No ratings yet
6 Disks 4
3 pages
Dbms - Unit 5 Notes
No ratings yet
Dbms - Unit 5 Notes
30 pages
03 Storage1
No ratings yet
03 Storage1
4 pages
Lec 8 - Disk Storage & Files
No ratings yet
Lec 8 - Disk Storage & Files
31 pages
C-CS316 - Lect12 - Storage Management
No ratings yet
C-CS316 - Lect12 - Storage Management
21 pages
Unit IV
No ratings yet
Unit IV
31 pages
18CSC205J Operating Systems Unit 5 - New
No ratings yet
18CSC205J Operating Systems Unit 5 - New
140 pages
Secondry Memory Management
No ratings yet
Secondry Memory Management
23 pages
Review Review: Views - "Named" Queries Subqueries in FROM Clause
No ratings yet
Review Review: Views - "Named" Queries Subqueries in FROM Clause
18 pages
Chapter 17: Disk Storage, Basic File Structures, and Hashing
No ratings yet
Chapter 17: Disk Storage, Basic File Structures, and Hashing
54 pages
8 DataStorageIndexingStructures Updated
No ratings yet
8 DataStorageIndexingStructures Updated
57 pages
Chapter 3 Secondary Storage and System Software
No ratings yet
Chapter 3 Secondary Storage and System Software
24 pages
Database Managment System
No ratings yet
Database Managment System
85 pages
Data Storage Tech Overview
No ratings yet
Data Storage Tech Overview
63 pages
Lecture 3 - Storage Systems
No ratings yet
Lecture 3 - Storage Systems
81 pages
Configuring RAID 5 Using MegaRAID WebBIOS
No ratings yet
Configuring RAID 5 Using MegaRAID WebBIOS
16 pages
File Allocation Methods
No ratings yet
File Allocation Methods
13 pages
10 Week 3 3rd Quarter
No ratings yet
10 Week 3 3rd Quarter
6 pages
Cloud Storage
No ratings yet
Cloud Storage
12 pages
IBM DS8000 Family Enterprise Disk Storage Technical Sales Level 3 Quiz
No ratings yet
IBM DS8000 Family Enterprise Disk Storage Technical Sales Level 3 Quiz
9 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
1 Computer Fundamentals 2
No ratings yet
1 Computer Fundamentals 2
12 pages
2024 Week 2 Microprocessor System
No ratings yet
2024 Week 2 Microprocessor System
36 pages
Datadmin Quiz 1
No ratings yet
Datadmin Quiz 1
2 pages
Solution Sizing
No ratings yet
Solution Sizing
3 pages
Англійська мова 3 курс 2 семестр PDF
No ratings yet
Англійська мова 3 курс 2 семестр PDF
95 pages
Data Warehousing for Business Insights
No ratings yet
Data Warehousing for Business Insights
2 pages
T6 Worksheet 6
No ratings yet
T6 Worksheet 6
3 pages
Q3 - Computer Hardware Servicing 8 - TQ
No ratings yet
Q3 - Computer Hardware Servicing 8 - TQ
2 pages
Technical Note Design and Use Considerations For NAND Flash Memory
100% (1)
Technical Note Design and Use Considerations For NAND Flash Memory
8 pages
NUTANIX Dumps
No ratings yet
NUTANIX Dumps
77 pages
Unity480-100TB Unity XT 480
No ratings yet
Unity480-100TB Unity XT 480
4 pages
RAID
No ratings yet
RAID
26 pages
NCM Mci Demo
No ratings yet
NCM Mci Demo
4 pages
Data Integrity With Rubrik
No ratings yet
Data Integrity With Rubrik
10 pages
How A Hard Drive Works
No ratings yet
How A Hard Drive Works
8 pages
Computer Organisation
No ratings yet
Computer Organisation
31 pages
Parallel & Distributed Storage Solutions
No ratings yet
Parallel & Distributed Storage Solutions
4 pages
Chapter - 8 1 97
No ratings yet
Chapter - 8 1 97
97 pages
Celerra Replcatr
No ratings yet
Celerra Replcatr
172 pages
Broker Stock List
No ratings yet
Broker Stock List
10 pages
20Z352 - OS Lab Report
No ratings yet
20Z352 - OS Lab Report
9 pages
Computer Organization Q&A Bank
No ratings yet
Computer Organization Q&A Bank
2 pages
Ict Revision 2 1
100% (1)
Ict Revision 2 1
118 pages
Mensuration of Computer Memory
No ratings yet
Mensuration of Computer Memory
2 pages