Data Storage for IT Professionals
Data Storage for IT Professionals
Module-1
Introduction to Information
Storage
Chapter Objectives
• Describe who is creating data and the amount of
data being created
• Describe the value of data to business
• List the solutions available for data storage
• List and explain the core elements of data center
• Describe the ILM strategy
• Describe storage evolution
Lesson : Information Storage
• Describe the importance of information to individuals
and to businesses
• Define data and information
• Discuss the categories of data
• Describe the storage architectures and their evolution
Why Information Storage
• “Digital universe – The Information Explosion”
• 21st Century is information era
• Information is being created at ever increasing rate
• Information has become critical for success
• We live in an on-command, on-demand world
• Example: Social networking sites, e-mails, video and photo
sharing website, online shopping, search engines etc
Nearly a quarter of the world's population –
roughly 1.4 billion people – will use the Internet
on a regular basis in 2009.
50 billion photos taken every year
Online Video
big challenge
Store Protect Optimiz Leverage
e
1 Megabyte =
1 million bytes
a tablespoon of
sand
1 Gigabyte =
1 billion bytes
patch of sand—
9” square, 1’
1 Terabyte =
deep
1 trillion bytes
a sandbox—
24’ square, 1’
1 Petabyte =
deep
1,000 terabytes
a mile long beach—
1 Exabyte =
1 Megabyte = 1,000 petabytes
1 million bytes the same beach—
a tablespoon of from Maine to North
sand
1 Gigabyte = 1 Zetabyte =
Carolina
1 billion bytes 1,000 exabytes
patch of sand— the same beach—
9” square, 1’ along the entire US
1 Terabyte =
deep 1 Yottabyte =
coast
1 trillion bytes 1,000 zetabytes
a sandbox— enough info to bury the
24’ square, 1’ entire
1 Petabyte =
deep US under 296 feet of sand
1,000 terabytes
a mile long beach—
a) <100 GB
b) 100 GB - 500 GB
c) 500 GB – 1 TB
d) > 1 TB
What is Data
“Collection of raw facts from which conclusions may be drawn”
• Data is converted into more
convenient form i.e. Digital
Data Vide
o
• Individuals
• Businesses
Categories of Data
X-Rays
• Over 80% of enterprise Check Unstructured (80%)
Images
Documents
Forms
Web Pages
Contract
Rich
s
Medi
a
Invoices
Audio
Video
Structured (20%)
Define Information
• What do individuals/businesses do
with the data they collect?
• They turn it into “information”
• “Information is the intelligence Centralized information
storage and
Uploading Accessing
informatio informatio
n n
• For example:
Creators of
Information
informatio
n
Multi
Protoc
LA FC SAN ol R
N o
u
t
e
r
IP SAN
JBOD
Internal DAS
Time
Data Center Infrastructure
• Data Centre
Components:
Applications
Database
Operating System/Server
Network
Storage Device
Example of an Order Processing
System
Server/ OS
Client
LAN
Storage
network
Application
User
Interface DBMS
Storage Array
Key Requirements for Data Center Elements
Availability
Manageability
Performance Capacity
Scalabilit
y
Managing Storage Information
• Monitoring
• Security, Performance, Accessibility, Capacity
• Reporting
• Resource performance, Capacity, Utilization
• Provisioning
• Providing necessary h/w, s/w and other resources needed to
run the data center
• Capacity, resource planning
Challenges in Managing Information
• Exploding digital universe
• Multifold increase of information growth
• Increasing dependency on information
• The strategic use of information plays
• Changing value of information
• Information that is valuable today may become less
important tomorrow.
Information Lifecycle
Management
“CHANGE IN THE VALUE OF INFORMATION OVER TIME”
Protect
AUTOMATED
FLEXIBLE
IMPLEMENTATION OF ILM
Benefits of Implementing ILM
• Improved utilization
• Tiered storage platforms
• Simplified management
• Processes, tools and automation
• Simplified backup and recovery
• A wider range of options to balance the need for business continuity
• Maintaining compliance
• Knowledge of what data needs to be protected for what length of time
• Lower Total Cost of Ownership
• By aligning the infrastructure and management costs with information value
Lesson Summary
Key points covered in this lesson:
• The five core elements of a Data Center infrastructure
• Key requirements of storage systems to support
business activities, as well as some of the constraints
• ILM strategy
• Importance
• Characteristics
• Activities in developing ILM strategy
• IML implementation
• Benefits of ILM
Chapter Summary
Key points covered in this chapter:
• Importance of data, information, and storage
infrastructure
• Types of data, its value, and key management
requirements of a storage system
• Evolution of storage architectures
• Core elements of a data center
• Importance of the ILM strategy
With the advancement of computer and
communication technologies, the rate of data
generation and sharing has increased exponentially.
The following is a list of some of the factors that have
contributed to the growth of digital data:
1. Increase in data-processing capabilities
2. Lower cost of digital storage
3. Affordable and faster communication
technology
Types of Data
Data can be classified as structured or unstructured based on
how it is stored and managed.
Structured data is organized in rows and columns in a rigidly
defined format so that applications can retrieve and process it
efficiently.
Big data is a new and evolving concept, which refers to data sets
whose sizes are beyond the capability of commonly used
software tools to capture, store, manage, and process within
acceptable time limits. It includes both structured and
unstructured data generated by a variety of sources, including
business application transactions, web pages, videos, images, e-
mails, social media, and so on. These data sets typically require
real-time capture or updates for analysis, predictive modeling,
and decision making
Information
Data, whether structured or unstructured, does not fulfill any
purpose for individuals or businesses unless it is presented in a
meaningful form. Information is the intelligence and knowledge
derived from data.
Storage
Data created by individuals or businesses must be stored so that it
is easily accessible for further processing. In a computing
environment, devices designed for storing data are termed storage
devices or simply storage.
Evolution of Storage Architecture
There are two types of Storage Architecture:
1.Server-Centric Storage Architecture
2.Information-Centric Storage Architecture
Data Center Infrastructure
of a host system.
1. Operating System
In a traditional computing environment, an operating system
controls all aspects of computing. It works between the application
and the physical components of a compute system. One of the
services it provides to the application is data access. The operating
system also monitors and responds to user actions and
the environment.
2. Device Driver
A device driver is special software that permits the
operating system to interact with a specific device,
such as a printer, a mouse, or a disk drive. A device
driver enables the operating system to recognize the
device and to access and control devices. Device
drivers are hardware-dependent and operating-
system-specific.
3. Volume Manager
•The Logical Volume Managers (LVMs) enabled dynamic extension of file system
capacity and efficient storage management.
•The LVM is software that runs on the compute system and manages logical and
physical storage. LVM is an intermediate layer between the fi le system and the
physical disk.
•It can partition a larger-capacity disk into virtual, smaller-capacity volumes (the
process is called partitioning) or aggregate several smaller disks to form a larger
virtual volume. (The process is called concatenation.) These volumes are then
presented to applications.
Disk partitioning was introduced to improve the flexibility and
utilization of disk drives. In partitioning, a disk drive is divided
into logical containers called logical volumes (LVs)
4. File System
• Search and retrieval of data are done sequentially, and it invariably takes
several seconds to access the data. As a result, random data access is slow
and time-consuming. This limits tapes as a viable option for applications
that require real-time, rapid access to data.
• In a shared computing environment, data stored on tape cannot be
accessed by multiple applications simultaneously, restricting its use to one
application at a time.
• On a tape drive, the read/write head touches the tape surface, so the tape
degrades or wears out after repeated use.
• The storage and retrieval requirements of data from the tape and the
overhead associated with managing the tape media are significant.
Disk Drive Components
Platter and Spindle
Actuator ARM Assembly
Physical disk
structures :sectors ,tracks and
cylinders
Zoned bit recording
Logical Block addressing
Disk Drive Performance
• Data Transfer Rate
• Disk Service Time
⮚ Seek Time
⮚ Rotational Latency
I/O Queuing
Utilization v/s
Response Time
Host Access to Data
Directed Attached Storage
DAS Benefits
• Requires low initial investment
• Simple and can be easily deployed
• The setup is managed using host-based tools, such
as the host-OS which makes the storage
management task easy for small environments.
• Requires fewer management tasks
• Less hardware and software elements to set up and
operate
DAS Limitations
• DAS does not scale well.