Lecture 1 2
Lecture 1 2
Database System Concepts - 6th Edition 10.2 ©Silberschatz, Korth and Sudarshan
Classification of Physical Storage Media
Database System Concepts - 6th Edition 10.3 ©Silberschatz, Korth and Sudarshan
Physical Storage Media
● Cache – fastest and most costly form of storage; volatile; managed by the
computer system hardware.
● Main memory:
● fast access (10s to 100s of nanoseconds; 1 nanosecond = 10–9
seconds)
● generally too small (or too expensive) to store the entire database
4 capacities of up to a few Gigabytes widely used currently
4 Capacities have gone up and per-byte costs have decreased
steadily and rapidly (roughly factor of 2 every 2 to 3 years)
● Volatile — contents of main memory are usually lost if a power
failure or system crash occurs.
Database System Concepts - 6th Edition 10.4 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
● Flash memory
● Data survives power failure
● Data can be written at a location only once, but location can be erased
and written to again
4 Can support only a limited number (10K – 1M) of write/erase
cycles.
4 Erasing of memory has to be done to an entire bank of memory
● Reads are roughly as fast as main memory
● But writes are slow (few microseconds), erase is slower
● Widely used in embedded devices such as digital cameras, phones, and
USB keys
Database System Concepts - 6th Edition 10.5 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
● Magnetic-disk
● Data is stored on spinning disk, and read/written magnetically
● Primary medium for the long-term storage of data; typically stores entire
database.
● Data must be moved from disk to main memory for access, and written back for
storage
4 Much slower access than main memory (more on this later)
● direct-access – possible to read data on disk in any order, unlike magnetic tape
● Capacities range up to roughly 1.5 TB as of 2009
4 Much larger capacity and cost/byte than main memory/flash memory
4 Growing constantly and rapidly with technology improvements (factor of 2 to
3 every 2 years)
● Survives power failures and system crashes
4 disk failure can destroy data, but is rare
Database System Concepts - 6th Edition 10.6 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
● Optical storage
● non-volatile, data is read optically from a spinning disk using a
laser
● CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms
● Blu-ray disks: 27 GB to 54 GB
● Write-one, read-many (WORM) optical disks used for archival
storage (CD-R, DVD-R, DVD+R)
● Multiple write versions also available (CD-RW, DVD-RW,
DVD+RW, and DVD-RAM)
● Reads and writes are slower than with magnetic disk
● Juke-box systems, with large numbers of removable disks, a few
drives, and a mechanism for automatic loading/unloading of disks
available for storing large volumes of data
Database System Concepts - 6th Edition 10.7 ©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
● Tape storage
● non-volatile, used primarily for backup (to recover from disk
failure), and for archival data
● sequential-access – much slower than disk
● very high capacity (40 to 300 GB tapes available)
● tape can be removed from drive ⇒ storage costs much cheaper than
disk, but drives are expensive
● Tape jukeboxes available for storing massive amounts of data
4 hundreds of terabytes (1 terabyte = 109 bytes) to even multiple
petabytes (1 petabyte = 1012 bytes)
Database System Concepts - 6th Edition 10.8 ©Silberschatz, Korth and Sudarshan
Storage Hierarchy
Database System Concepts - 6th Edition 10.9 ©Silberschatz, Korth and Sudarshan
Storage Hierarchy (Cont.)
Database System Concepts - 6th Edition 10.10 ©Silberschatz, Korth and Sudarshan
Performance Measures of Disks
● Access time – the time it takes from when a read or write request is issued to when
data transfer begins. Consists of:
● Seek time – time it takes to reposition the arm over the correct track.
4 Average seek time is 1/2 the worst case seek time.
– Would be 1/3 if all tracks had the same number of sectors, and we ignore
the time to start and stop arm movement
4 4 to 10 milliseconds on typical disks
● Rotational latency – time it takes for the sector to be accessed to appear under
the head.
4 Average latency is 1/2 of the worst case latency.
4 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
● Data-transfer rate – the rate at which data can be retrieved from or stored to the disk.
● 25 to 100 MB per second max rate, lower for inner tracks
● Multiple disks may share a controller, so rate that controller can handle is also
important
4 E.g. SATA: 150 MB/sec, SATA-II 3Gb (300 MB/sec)
4 Ultra 320 SCSI: 320 MB/s, SAS (3 to 6 Gb/sec)
4 Fiber Channel (FC2Gb or 4Gb): 256 to 512 MB/s
Database System Concepts - 6th Edition 10.11 ©Silberschatz, Korth and Sudarshan
Performance Measures (Cont.)
● Mean time to failure (MTTF) – the average time the disk is expected to
run continuously without any failure.
● Typically 3 to 5 years
● Probability of failure of new disks is quite low, corresponding to a
“theoretical MTTF” of 500,000 to 1,200,000 hours for a new disk
4 E.g., an MTTF of 1,200,000 hours for a new disk means that given
1000 relatively new disks, on an average one will fail every 1200
hours
● MTTF decreases as disk ages
Database System Concepts - 6th Edition 10.12 ©Silberschatz, Korth and Sudarshan
Optimization of Disk-Block Access
● Block – a contiguous sequence of sectors from a single track
● data is transferred between disk and main memory in blocks
● sizes range from 512 bytes to several kilobytes
4 Smaller blocks: more transfers from disk
4 Larger blocks: more space wasted due to partially filled blocks
4 Typical block sizes today range from 4 to 16 kilobytes
● Disk-arm-scheduling algorithms order pending accesses to tracks so that
disk arm movement is minimized
● elevator algorithm:
R6 R3 R1 R5 R2 R4
Database System Concepts - 6th Edition 10.13 ©Silberschatz, Korth and Sudarshan
Optimization of Disk Block Access (Cont.)
Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan
Optimization of Disk Block Access (Cont.)
● Nonvolatile write buffers speed up disk writes by writing blocks to a non-volatile RAM
buffer immediately
● Non-volatile RAM: battery backed up RAM or flash memory
4 Even if power fails, the data is safe and will be written to disk when power returns
● Controller then writes to disk whenever the disk has no other requests or request has
been pending for some time
● Database operations that require data to be safely stored before continuing can continue
without waiting for data to be written to disk
● Writes can be reordered to minimize disk arm movement
● Log disk – a disk devoted to writing a sequential log of block updates
● Used exactly like nonvolatile RAM
4 Write to log disk is very fast since no seeks are required
4 No need for special hardware (NV-RAM)
● File systems typically reorder writes to disk to improve performance
● Journaling file systems write data in safe order to NV-RAM or log disk
● Reordering without journaling: risk of corruption of file system data
Database System Concepts - 6th Edition 10.15 ©Silberschatz, Korth and Sudarshan
RAID
● RAID: Redundant Arrays of Independent Disks
● disk organization techniques that manage a large numbers of disks, providing a
view of a single disk of
4 high capacity and high speed by using multiple disks in parallel,
4 high reliability by storing data redundantly, so that data can be recovered
even if a disk fails
● The chance that some disk out of a set of N disks will fail is much higher than the
chance that a specific single disk will fail.
● E.g., a system with 100 disks, each with MTTF of 100,000 hours (approx. 11
years), will have a system MTTF of 1000 hours (approx. 41 days)
● Techniques for using redundancy to avoid data loss are critical with large
numbers of disks
● Originally a cost-effective alternative to large, expensive disks
● I in RAID originally stood for ``inexpensive’’
● Today RAIDs are used for their higher reliability and bandwidth.
4 The “I” is interpreted as independent
Database System Concepts - 6th Edition 10.16 ©Silberschatz, Korth and Sudarshan
Improvement of Reliability via Redundancy
Database System Concepts - 6th Edition 10.17 ©Silberschatz, Korth and Sudarshan
Improvement in Performance via Parallelism
Database System Concepts - 6th Edition 10.18 ©Silberschatz, Korth and Sudarshan
RAID Levels
● Schemes to provide redundancy at lower cost by using disk striping
combined with parity bits
● Different RAID organizations, or RAID levels, have differing cost,
performance and reliability characteristics
● RAID Level 0: Block striping; non-redundant.
● Used in high-performance applications where data loss is not critical.
● RAID Level 1: Mirrored disks with block striping
● Offers best write performance.
● Popular for applications such as storing log files in a database system.
Database System Concepts - 6th Edition 10.19 ©Silberschatz, Korth and Sudarshan
RAID Levels (Cont.)
● RAID Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit
striping.
● RAID Level 3: Bit-Interleaved Parity
● a single parity bit is enough for error correction, not just detection,
since we know which disk has failed
4 When writing data, corresponding parity bits must also be
computed and written to a parity bit disk
4 To recover data in a damaged disk, compute XOR of bits from
other disks (including parity bit disk)
Database System Concepts - 6th Edition 10.20 ©Silberschatz, Korth and Sudarshan
RAID Levels (Cont.)
● RAID Level 3 (Cont.)
● Faster data transfer than with a single disk, but fewer I/Os per second
since every disk has to participate in every I/O.
● Subsumes Level 2 (provides all its benefits, at lower cost).
● RAID Level 4: Block-Interleaved Parity; uses block-level striping, and
keeps a parity block on a separate disk for corresponding blocks from N
other disks.
● When writing data block, corresponding block of parity bits must also
be computed and written to parity disk
● To find value of a damaged block, compute XOR of bits from
corresponding blocks (including parity block) from other disks.
Database System Concepts - 6th Edition 10.21 ©Silberschatz, Korth and Sudarshan
RAID Levels (Cont.)
● RAID Level 4 (Cont.)
● Provides higher I/O rates for independent block reads than Level 3
4 block read goes to a single disk, so blocks stored on different disks can
be read in parallel
● Provides high transfer rates for reads of multiple blocks than no-striping
● Before writing a block, parity data must be computed
4 Can be done by using old parity block, old value of current block and
new value of current block (2 block reads + 2 block writes)
4 Or by recomputing the parity value using the new values of blocks
corresponding to the parity block
– More efficient for writing large amounts of data sequentially
● Parity block becomes a bottleneck for independent block writes since
every block write also writes to parity disk
Database System Concepts - 6th Edition 10.22 ©Silberschatz, Korth and Sudarshan
RAID Levels (Cont.)
● RAID Level 5: Block-Interleaved Distributed Parity; partitions data and parity
among all N + 1 disks, rather than storing data in N disks and parity in 1 disk.
● E.g., with 5 disks, parity block for nth set of blocks is stored on disk (n
mod 5) + 1, with the data blocks stored on the other 4 disks.
Database System Concepts - 6th Edition 10.23 ©Silberschatz, Korth and Sudarshan
RAID Levels (Cont.)
● RAID Level 5 (Cont.)
● Higher I/O rates than Level 4.
4 Block writes occur in parallel if the blocks and their parity blocks are on different disks.
● Subsumes Level 4: provides same benefits, but avoids bottleneck of parity disk.
● RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores extra redundant information to
guard against multiple disk failures.
● Better reliability than Level 5 at a higher cost; not used as widely.
Database System Concepts - 6th Edition 10.24 ©Silberschatz, Korth and Sudarshan
Choice of RAID Level
● Factors in choosing RAID level
● Monetary cost
● Performance: Number of I/O operations per second, and bandwidth
during normal operation
● Performance during failure
● Performance during rebuild of failed disk
4 Including time taken to rebuild failed disk
● RAID 0 is used only when data safety is not important
● E.g. data can be recovered quickly from other sources
● Level 2 and 4 never used since they are subsumed by 3 and 5
● Level 3 is not used anymore since bit-striping forces single block reads
to access all disks, wasting disk arm movement, which block striping
(level 5) avoids
● Level 6 is rarely used since levels 1 and 5 offer adequate safety for most
applications
Database System Concepts - 6th Edition 10.25 ©Silberschatz, Korth and Sudarshan
Choice of RAID Level (Cont.)
● Level 1 provides much better write performance than level 5
● Level 5 requires at least 2 block reads and 2 block writes to write a
single block, whereas Level 1 only requires 2 block writes
● Level 1 preferred for high update environments such as log disks
● Level 1 had higher storage cost than level 5
● disk drive capacities increasing rapidly (50%/year) whereas disk access
times have decreased much less (x 3 in 10 years)
● I/O requirements have increased greatly, e.g. for Web servers
● When enough disks have been bought to satisfy required rate of I/O,
they often have spare storage capacity
4 so there is often no extra monetary cost for Level 1!
● Level 5 is preferred for applications with low update rate,
and large amounts of data
● Level 1 is preferred for all other applications
Database System Concepts - 6th Edition 10.26 ©Silberschatz, Korth and Sudarshan
Optical Disks
● Compact disk-read only memory (CD-ROM)
● Removable disks, 640 MB per disk
● Seek time about 100 msec (optical read head is heavier and slower)
● Higher latency (3000 RPM) and lower data-transfer rates (3-6 MB/s)
compared to magnetic disks
● Digital Video Disk (DVD)
● DVD-5 holds 4.7 GB , and DVD-9 holds 8.5 GB
● DVD-10 and DVD-18 are double sided formats with capacities of 9.4 GB and
17 GB
● Blu-ray DVD: 27 GB (54 GB for double sided disk)
● Slow seek time, for same reasons as CD-ROM
● Record once versions (CD-R and DVD-R) are popular
● data can only be written once, and cannot be erased.
● high capacity and long lifetime; used for archival storage
● Multi-write versions (CD-RW, DVD-RW, DVD+RW and DVD-RAM) also
available
Database System Concepts - 6th Edition 10.27 ©Silberschatz, Korth and Sudarshan
Magnetic Tapes
● Hold large volumes of data and provide high transfer rates
● Few GB for DAT (Digital Audio Tape) format, 10-40 GB with DLT
(Digital Linear Tape) format, 100 GB+ with Ultrium format, and 330
GB with Ampex helical scan format
● Transfer rates from few to 10s of MB/s
● Tapes are cheap, but cost of drives is very high
● Very slow access time in comparison to magnetic and optical disks
● limited to sequential access.
● Some formats (Accelis) provide faster seek (10s of seconds) at cost of
lower capacity
● Used mainly for backup, for storage of infrequently used information, and as
an off-line medium for transferring information from one system to another.
● Tape jukeboxes used for very large capacity storage
● Multiple petabyes (1015 bytes)
Database System Concepts - 6th Edition 10.28 ©Silberschatz, Korth and Sudarshan