Unit-II Data Storage and Cloud Computing
Contents: Data Storage: Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Area Network, Network
Attached Storage, Data Storage Management, File System, Cloud Data Stores, Using Grids for Data Storage. Cloud Storage:
Data Management, Provisioning Cloud storage, Data Intensive Technologies for Cloud Computing. Cloud Storage from LANs
to WANs: Cloud Characteristics, Distributed Data Storage.
Course Objectives:
To learn various data storage methods on cloud
Unit outcomes : On completion the students will be able to
CO2: Use appropriate data storage technique on Cloud, based on Cloud application
Outcome Mapping:
PEOs: POs: COs: 1 PSOs:
Books used
1. James Bond ,“The Enterprise Cloud”, O'Reilly Media, Inc. ISBN: 9781491907627
2. Dr. Kris Jamsa, “Cloud Computing: SaaS, PaaS, IaaS, Virtualization and more”, Wiley Publications, ISBN: 978-0-470-
97389-9
3. A. Srinivasan, J. Suresh, “Cloud Computing: A Practical Approach for Learning and Implementation”, Pearson, ISBN: 978-
81-317-7651-3
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Data Storage in Cloud Computing
Cloud Storage is a service that allows to save data on offsite storage system managed by third-
party and is made accessible by a web services API.
Storage Devices : Storage devices can be broadly classified into two categories:
1.Block Storage Devices
2.File Storage Devices
Block Storage Devices : The block storage devices offer raw storage to the clients. These raw
storage are partitioned to create volumes.
File Storage Devices : The file Storage Devices offer storage to clients in the form of files,
maintaining its own file system. This storage is in the form of Network Attached Storage
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Cloud Storage Classes
Cloud Storage Classes : Cloud storage can be broadly classified into two categories:
1.Unmanaged Cloud Storage
2.Managed Cloud Storage
Unmanaged cloud storage means the storage is preconfigured for the customer. The customer
can neither format, nor install his own file system or change drive properties.
Managed Cloud Storage Managed cloud storage offers online storage space on-demand. The
managed cloud storage system appears to the user to be a raw disk that the user can partition
and format.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Creating Cloud Storage System
1. The cloud storage system stores many copies of data on
many servers at the various locations.
2.The data is stored at various locations so that if one system
fails it can change the pointer location to where the object is
stored.
3.The cloud provider uses the virtualization software to
aggregate the storage assets into cloud storage system. This
system is called as Storage GRID(GrassRoot Innovation Database)
4.Storage GRID creates a virtualization layer which fetches
storage from various storage devices into a single management
system.
5.It manages the data from CIFS(Common Internet File System) and
NFS file system over the Internet.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Virtual Storage Container
The virtual storage containers offer high performance cloud storage
systems.
Logical Unit Number (LUN) of device, files and other objects are
created in virtual storage containers. given diagram shows a virtual
storage container, defining a cloud storage domain:
Challenges :
1.Storing the data in cloud is not that simple task. Apart from
its flexibility and convenience, it also has several challenges
faced by the customers.
The customers must be able to:
2.Get provision for additional storage on-demand.
3.Know and restrict the physical location of the stored data.
4.Verify how data was erased.
5.Have access to a documented process for disposing of data
storage hardware.
6.Have administrator access control over data.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Introduction of Enterprise Data Storage
“Enterprise storage is a centralized repository for business information that provides
common data management, protection.”
An Enterprise Storage System is a centralized repository for business information.
It provides a common resource for data sharing, management and protection via
connections to other computer systems. Enterprise storage systems are designed to process
heavy workloads of business-critical information.
Types of Enterprise Storage :
1.Direct-Attached Storage.
2.Network-Attached Storage.
3.Storage Area Networks.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Direct Attached Storage
➢ DAS stands for Direct Attached Storage. It is a digital storage device connected directly to the
server, workstation, or personal computer via the cable.
➢ In Direct Attached Storage, applications use the block-level access protocol for accessing the data.
➢ There is no need for any network to attach the device to the server or workstation. So, DAS (Direct
Attached Storage) is not a part of the storage network. Some examples of this storage device are
solid-state drive, hard drives, tape libraries, and optical disk drives.
➢ The System of DAS is attached directly to the computer through the HBA (Host Bus Adapter). As
compared to NAS devices, its device attaches directly to the server without the network.
➢ The modern systems of this storage device include the integrated controllers of a disk array with the
advanced features.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Types of DAS Following are the two types of Direct Attached Storage (DAS):
1.Internal DAS
2.External DAS
Internal DAS
Internal DAS is a DAS in which the storage device is attached internally to the server or PC by the
HBA. In this DAS, HBA is used for high-speed bus connectivity over a short distance.
External DAS
External DAS is a DAS in which the external storage device is directly connected to the server without
any device.
In this type of DAS, FCP and SCSI are the protocols which act as an interface between server and the
storage device.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Network-attached Storage (Commonly known as NAS)
is a file storage device which is connected to the network and enables multiple users to access data form
the centralized disk capacity.
➢ The users on a LAN access the shared storage by the ethernet connection. This storage is fast, low-
cost and offers all the advantages of a public cloud on the site.
➢ It uses file access protocols such as NFS, SMB,AFP.
➢ NFS is a file-based protocol which is popular on Unix systems.
➢ SMB stands for Server Message Block, which is used with the Microsoft Windows systems.
➢ AFP is also a file access protocol that is used with the Apple computers.
➢ It is basically designed for those network systems, which may be processing
➢ It supports the storage device for the organization, which need a reliable network system.
➢ It is more economical than the file servers and more versatile than the external disks.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Advantages of NAS :
1.The architecture of NAS is easy to install and configure.
2.Every user or client in the network can easily access to Network Attached Storage.
3.A main advantage of NAS is that it is more reliable than the simple hard disks.
4.The performance is good in serving the files.
5.The devices of NAS are scalable and can be easily accessed remotely.
Disadvantages of NAS
1.The speed of transferring the data is not as fast as DAS.
2.Users also require basic knowledge of computer networks to use the NAS efficiently.
3.Those users or clients who want to back up their data, then they cannot proceed directly. They
can do it by using the installed operating system only.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Storage Area Network (SAN):
➢ Storage Area Network is a dedicated, specialized, and high-speed network which provides
block-level data storage.
➢ It delivers the shared pool of storage devices to more than one server.
➢ The main aim of SAN is to transfer the data between the server and storage device.
➢ It also allows for transferring the data between the storage systems.
➢ Storage Area networks are mainly used for accessing storage devices such as tape libraries
and disk-based devices from the servers.
➢ It is a dedicated network which is not accessible through the LAN.
➢ It consists of hosts, switches, and storage devices which are interconnected using the
topologies, protocols, and technologies.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Protocols of SAN
1.FCP (Fibre Channel Protocol)
2.ISCSI
3.FCoE
4.NVMe
FCP (Fibre Channel Protocol) It is the most commonly used protocol of the Storage Area Network. It is a mapping
of SCSI command over the Fibre Channel (FC) network.
ISCSI, It stands for Internet SCSI or Internet Small Computer System Interface. It is the second-largest block or
SAN protocol. It puts the SCSI commands inside an ethernet frame and then transports them over an Internet
protocol (IP) ethernet.
FCOE stands for "Fibre Channel Over Internet". It is a protocol which is similar to the iSCSI. It puts the Fibre
channel inside the ethernet datagram and then transports over an IP Ethernet network.
NVME stands for Non-Volatile Memory Express. It is also a protocol of SAN, which access the flash storage by
the PCI Express bus.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Advantages :
1.It is more scalable.
2.Security is also a main advantage of SAN. If users want to secure their
data, then SAN is a good option to use. Users can easily implement various
security measures on SAN.
3.Storage devices can be easily added or removed from the network. If users
need more storage, then they simply add the devices.
4.The cost of this storage network is low as compared to others.
5.Another big advantage of using the SAN (Storage Area Network) is better
disk utilization.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Data Storage Management
➢ Cloud data management is a way to manage data across cloud platforms, either with or
instead of on-premises storage.
➢ The cloud is useful as a data storage tier for disaster recovery, backup and long-term
archiving.
➢ With cloud data management, resources can be purchased as needed.
➢ Storage management ensures data is available to users when they need it.
➢ Data storage management is typically part of the storage administrator's job. Organizations
without a dedicated storage administrator might use an IT generalist for storage
management.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
File System in Cloud
➢ A file system in the cloud is a hierarchical storage system that provides shared
access to file data.
➢ Users can create, delete, modify, read, and write files and can organize them
logically in directory trees for intuitive access.
➢ The vendor creates a file system that offers traditional file protocols like NFS or
SMB to cloud hosted applications.
➢ Essentially, the vendor provides an instance of their file system and the
organization implements it in the cloud provider of their choice.
➢ It then allocates the appropriate storage compute performance and the storage
IO.
➢ The goal with these file systems is to speed the migration of applications to the
cloud.
➢ By using a file system in the cloud, the organization does not need to re-write
the storage IO components of the application
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
How does Cloud Storage Works?
➢ Cloud storage is purchased from a third-party cloud vendor who owns and operates data storage capacity and delivers it
over the Internet in a pay-as-you-go model.
➢ These cloud storage vendors manage capacity, security and durability to make data accessible to your applications all
around the world.
➢ Applications access cloud storage through traditional storage protocols or directly via an API. Many vendors offer
complementary services designed to help collect, manage, secure and analyze data at massive scale.
➢ Cloud storage works as a virtual data center. It offers end users and applications virtual storage infrastructure that can be
scaled to the application’s requirements.
➢ It generally operates via a web-based API implemented remotely through its interaction with in-house cloud storage
infrastructure.
➢ Cloud storage includes at least one data server to which a user can connect via the internet.
➢ The user sends files to the data server, which forwards the message to multiple servers, manually or in an automated
manner, over the internet.
➢ The stored data can then be accessed via a web-based interface.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Data Management in Cloud Storage
Cloud data management is the practice of storing a company’s data at an offsite data center that is typically
owned and overseen by a vendor who specializes in public cloud infrastructure, such as AWS or Microsoft
Azure. Managing data in the cloud provides an automated backup strategy, professional support, and ease of
access from any location.
Benefit of Cloud Data Management
1. Security: Modern cloud data management often delivers better data protection than on-premises solutions. In
fact, 94% of cloud adopters report security improvements. Why? First of all, cloud data management reduces
the risk of data loss due to device damage or hardware failure. Second, companies specializing in cloud
hosting and data management employ more advanced security measures and practices to protect sensitive data
than companies that invest in their on-premises data.
2. Scalability and savings: Cloud data management lets users scale services up or down as needed. More
storage or compute power can be added when needed to accommodate changing workloads. Companies can
then scale back after the completion of a big project to avoid paying for services they don’t need.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
3. Governed access: With improved security comes greater peace of mind regarding governed data access. Cloud storage
means team members can access the data they need from wherever they are. This access also supports a collaborative work
culture, as employees can work together on a dataset, easily share insights, and more.
4. Automated backups and disaster recovery: The cloud storage vendor can manage and automate data backups so that
the company can focus its attention on other things, and can rest assured that its data is safe. Having an up-to-date backup
at all times also speeds up the process of disaster recovery after emergencies, and can help mitigate the effects of
ransomware attacks.
5. Improved data quality: An integrated, well-governed cloud data management solution helps companies tear down data
silos and create a single source of truth for every data point. Data remains clean, consistent, up-to-date, and accessible for
every use case, from real-time data analytics to advanced machine learning applications to external sharing via APIs.
6. Automated updates: Cloud data management providers are committed to providing the best services and capabilities.
When applications need updating, cloud providers run these updates automatically. That means your team doesn’t need to
pause work while they wait for IT to update everyone’s system.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Cloud Provisioning
Cloud provisioning means allocating a cloud service provider’s resources to a customer. It is a key feature of cloud
computing. It refers to how a client gets cloud services and resources from a provider. The cloud services that
customers can subscribe to include infrastructure-as-a-service (IaaS), software-as-a-service (SaaS), and platform-as-
a-service (PaaS) in public or private environments.
Benefits of Cloud Provisioning
Scalability: A company makes a huge investment in its on-site infrastructure under the conventional IT provisioning model.
This requires immense preparation and prophesying infrastructure needs. However, in the cloud provisioning model, cloud
resources can scale up and scale down which is entirely dependent on the short-term consumption of usage. This way scalability
can help the organizations.
Speed: Speed is another factor of the cloud’s provisioning which can benefit the organizations. For this, the developers of the
organization can schedule the jobs which in turn removes the need for an administrator who provisions and manages resources.
Cost Savings: It is another potential benefit of cloud provisioning. Traditional technology can incur a huge cost to the
organizations while cloud providers allow customers to pay only for what they consume. This is another major reason why
cloud provisioning is preferred.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Types of Cloud Provisioning
Network Provisioning: Network Provisioning in the telecom industry is a means of referring
to the provisions of telecommunications services to a client.
Server Provisioning: Datacenter’s physical infrastructure, installation, configuration of the
software, and linking it to middleware, networks, and storage.
User Provisioning: It is a method of identity management that helps us in keeping a check on
the access and privileges of authorization. Provisioning is featured by the artifacts such as
equipment, suppliers, etc.
Service Provisioning: It requires setting up a service and handling its related data.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Tools and Software's Used in Cloud Provisioning
Several enterprises can provide the services and resources manually as per their
need, whereas public cloud providers offer tools to provide various resources
and services such as:
1.IBM Cloud Orchestrator
2.Cloud Bolt
3.Morpheus Data
4.Flexera
5.Cloud Sphere
6.Scalr
7.Google Cloud Deployment manager
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Data Intensive Technology in Cloud Computing
Data Intensive Computing is a class of parallel computing which uses data parallelism in order
to process large volumes of data. The size of this data is typically in terabytes or petabytes. This
large amount of data is generated each day and it is referred to Big Data.
Data intensive computing has some characteristics which are different from other forms of
computing. They are:
1.In order to achieve high performance in data intensive computing, it is necessary to minimize
the movement of data. This reduces system overhead and increases performance by allowing the
algorithms to execute on the node where the data resides.
2.The data intensive computing system utilizes a machine independent approach where the run
time system controls the scheduling, execution, load balancing, communications and the
movement of programs.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
3.Data intensive computing hugely focuses on reliability and availability of data. Traditional
large-scale systems may be susceptible to hardware failures, communication errors and
software bugs, and data intensive computing is designed to overcome these challenges.
4.Data intensive computing is designed for scalability so it can accommodate any amount of
data and so it can meet the time critical requirements. Scalability of the hardware as well as the
software architecture is one of the biggest advantages of data intensive computing.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Cloud Storage from LANs to WANs:
Characteristics :
1. Computer power is elastic, when it can perform parallel operations. In general, applications conceived to run
on the peak of a shared-nothing architecture are well matched for such an environment. Some cloud computing
goods, for example, Google’s App Engine, supply not only a cloud computing infrastructure, but also an entire
programs stack with a constrained API so that software developers are compelled to compose programs that can
run in a shared-nothing natural environment and therefore help elastic scaling.
2. Data is retained at an unknown host server. In general, letting go off data is a threat to many security issues
and thus suitable precautions should be taken. The very title ‘loud computing’ implies that the computing and
storage resources are being operated from a celestial position. The idea is that the data is physically stored in a
specific host country and is subject to localized laws and regulations. Since most cloud computing vendors give
their clientele little command over where data is stored, the clientele has no alternative but to expect the least that
the data is encrypted utilizing a key unavailable with the owner, the data may be accessed by a third party without
the customer’s knowledge.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
3. Data is duplicated often over distant locations. Data accessibility and durability is
paramount for cloud storage providers, as data tampering can be impairing for both the business
and the organization’s reputation. Data accessibility and durability are normally accomplished
through hidden replications. Large cloud computing providers with data hubs dispersed all
through the world have the proficiency to provide high levels of expected error resistance by
duplicating data at distant locations across continents. Amazon’s S3 cloud storage service
replicates data over ‘regions’ and ‘availability zones’ so that data and applications can survive
even when the whole location collapses.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Distributed Data Storage :
➢ Distributed storage means are evolving from the existing practices of data storage for the new
generation of WWW applications through organizations like Google, Amazon and Yahoo.
➢ There are some reasons for distributed storage means to be favored over traditional relational
database systems encompassing scalability, accessibility and performance.
➢ The new generation of applications require processing of data to a tune of terabytes and even peta
bytes.
➢ This is accomplished by distributed services. Distributed services means distributed data.
➢ This is a distinct giant compared to traditional relational database systems. Several studies have
proposed that this is an end of an architectural era and relational database systems have to take over.
➢ Emerging answers are Amazon Dynamo, CouchDB and ThruDB.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
Amazon Dynamo
Amazon Dynamo is a widely used key-value store. It is one of the main components of Amazon. com, the biggest e-
commerce stores in the world.
It has a primary-key only interface. This demands that data is retained as key-value in twos, and the only interface to get
access to data is by identifying the key.
Values are anticipated to be barely there (less than 1 MB). Dynamo is said to be highly accessible for composing as
opposed to reading, since malfunction of composing inconveniences the end-user of the application. Therefore, any data
confrontations are finalized at the time of reading than writing.
CouchDB :
CouchDB is a document-oriented database server, accessible by REST APIs. Couch is an acronym for ‘Cluster Of
Unreliable Commodity Hardware’, emphasizing the distributed environment of the database. CouchDB is designed for
document-oriented applications, for example, forums, bug following, wiki, Internet note, etc. CouchDB is ad-hoc and
schema-free with a flat address space. CouchDB aspires to persuade the Four Pillars of Data Management by these
methods: Save: ACID compliant, save efficiently See: Easy retrieval, straightforward describing procedures, fulltext
search Secure: Strong compartmentalization, ACL, connections over SSL Share: Distributed means
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala
ThruDB
✓ ThruDB aspires to be universal in simplifying the administration of the up-to-date WWW data level (indexing,
caching, replication, backup) by supplying a reliable set of services:
✓ Thrucene for indexing Throxy for partitioning and burden balancing Thrudoc for article storage
✓ ThruDB builds on top of some open-source projects: Thrift, Lucene (indexing), Spread (message bus),
Memcached (caching), Brackup (backup to disk/S3) and also values Amazon S3.
✓ Thrift is a structure for effective cross-language data serialization, RPC and server programming.
✓ Thrift is a programs library and set of code-generation devices conceived to expedite development and
implementation of effective and scalable backend services.
✓ Its prime aim is to enhance effective and dependable connection over programming languages.
✓ This is finished by abstracting the portions of each dialect that are inclined to need the most customization into
a widespread library that is applied in each language.
Mr. K. S. Mulani, Department of Computer Engineering, SIT Lonavala