CS8791 CLOUD COMPUTING
OBJECTIVES:
-To understand the concept of cloud computing.
-To appreciate the evolution of cloud from the existing technologies.
-To have knowledge on the various issues in cloud computing.
-To be familiar with the lead players in cloud.
-To appreciate the emergence of cloud as the next generation
computing paradigm.
UNIT I
INTRODUCTION
Introduction to Cloud Computing – Definition of Cloud – Evolution of
Cloud Computing –Underlying Principles of Parallel and Distributed
Computing – Cloud Characteristics – Elasticity in Cloud – On-demand
Provisioning.
CLOUD COMPUTING
Motivation:
• Promise of reduced capital
• Operating expenses
• Ease of dynamically scaling and
• Deploying new services without maintaining a dedicated
compute infrastructure
• Cloud computing is a model for enabling convenient, on-demand
network access to a shared pool of configurable computing resources
that can be rapidly provisioned and released with service provider
interaction or minimal management effort.
• The end-users need not to know the details of a specific technology
while using application, as the service is completely managed by the
Cloud Service Provider (CSP).
• on-demand service can be provided any time.
• CSP would take care of all the necessary complex backend operations
on behalf of the user.
Deployment Models
• Deployment models define the type of access to the
cloud, i.e., how the cloud is located? Cloud can have
any of the four types of access: Public, Private, Hybrid
Service Models
• Cloud computing is based on service models. These
are categorized into three basic service models which
are -
• Infrastructure-as–a-Service (IaaS)
• Platform-as-a-Service (PaaS)
• Software-as-a-Service (SaaS)
Cloud Computing
• Cloud Computing is the use of hardware and software to deliver a
service over a network (typically the Internet).
• With cloud computing, users can access files and use applications
from any device that can access the Internet. An example of a Cloud
Computing provider is Google's Gmail.
• Cloud Computing lets you store and access your applications or data
over remote computers instead of your own computer.
• Example: AWS, Azure, Google Cloud
Why Cloud Computing?
• Large and small scale businesses today increase on their data & they spent
a huge amount of money to maintain this data.
• It requires a strong IT support and a storage hub. Not all businesses can
afford high cost of in-house IT infrastructure and back up support services.
• For them Cloud Computing is a cheaper solution.
• Cloud computing decreases the hardware and software demand from the
user’s side.
• We all have experienced cloud computing at some instant of time, some of
the popular cloud services we have used or we are still using are mail
services like Gmail, Hotmail or Yahoo etc.
Benefits of Cloud Computing
• Pay-per-use Model: You only have to pay for the services you use, and
nothing more!
• 24/7 Availability: It is always online! There is no such time that you cannot
use your cloud service; you can use it whenever you want.
• Easily Scalable: It is very easy to scale up and down or turn it off as per
customers’ needs.
• Security: Cloud computing offers amazing data security. Especially if the
data is mission-critical, then that data can be wiped off from local drives
and kept on the cloud only for your access to stop it ending up in wrong
hands.
• Easily Manageable: You only have to pay subscription fees; all
maintenance, up-gradation and delivery of services are completely
maintained by the Cloud Provider.
• Security: Cloud computing offers amazing data security. Especially if
the data is mission-critical, then that data can be wiped off from local
drives and kept on the cloud only for your access to stop it ending up
in wrong hands.
• Easily Manageable: You only have to pay subscription fees; all
maintenance, up-gradation and delivery of services are completely
maintained by the Cloud Provider. This is backed by the Service-level
Agreement (SLA).
Deployment Models
• Deployment models define the type of access to the
cloud, i.e., how the cloud is located? Cloud can have
any of the four types of access: Public, Private,
Hybrid, and Community.
Service Models
• Cloud computing is based on service models. These
are categorized into three basic service models which
are -
• Infrastructure-as–a-Service (IaaS)
• Platform-as-a-Service (PaaS)
• Software-as-a-Service (SaaS)
Public Cloud
• The public cloud allows systems and services to be easily accessible to the
general public. Public cloud may be less secure because of its openness.
• Eg: Google, Facebook.
Private Cloud
• The private cloud allows systems and services to be accessible within an
organization. It is more secured because of its private nature.
• Eg: VM ware, etc.
Community Cloud
• The community cloud allows systems and services to be accessible by a
group of organizations.
Hybrid Cloud
• The hybrid cloud is a mixture of public and private cloud, in which the
critical activities are performed using private cloud while the non-critical
activities are performed using public cloud.
• Eg. NASA( which uses both public cloud for data sharing and private cloud
for research)
Infrastructure-as-a-Service (IaaS)
• IaaS provides access to fundamental resources such
as physical machines, virtual machines, virtual
storage, etc.
• Major IaaS players include companies like IBM,
Google and Amazon.com.
Platform-as-a-Service (PaaS)
• PaaS provides the runtime environment for
applications, development and deployment tools, etc.
• An example of PaaS is Salesforce.com
Software-as-a-Service (SaaS)
• SaaS model allows to use software applications as a service to end-
users.
• Examples of SaaS include: Google Applications and internet based
email applications like Yahoo! Mail, Hotmail and Gmail
SAAS( Software as a Service)
• Provides clients with the ability to use software applications
on a remote basis via an internet web browser.
• The software service can be purchased with a monthly fee
and pay as you go.
• Software as a service is also referred to as “software on
demand”.
• Clients can access SaaS applications from anywhere via the
web because service providers host applications and their
associated data at their location.
The primary benefit of SaaS
• Lower cost of use, since subscriber fees require a
much smaller investment
• No Licensing fees,
• No installation costs,
• No maintenance fees and
• Multi-tenant software architecture
• High manageability
• Free of deployment and support
• Cost-effective: pay as we go
• Scalability: SaaS application can be easily scaled
up or down to meet consumer demand.
Consumers do not need to worry about additional
computing infrastructure to scale up.
Drawbacks of SaaS
Robustness:
• SaaS applications may not be able to provide the same level
of functionality as traditional applications. This is partly due to
current limitations of the web browser.
Privacy:
• Having all of a user’s data sit in the cloud raises security &
privacy
concerns. SaaS providers are usually the target of hack
exploits
Security:
• SaaS applications are prone to attack because everything is
sent over the internet
• Data encryption and decryption
Reliability:
• In the rare event of a SaaS provider going down, a wide
range of dependent clients could be affected.
• The application, data, backups, everything are in the cloud,
thus making it hard to recover from the server down time.
Examples of SaaS include:
• Google Applications and
• Email applications like Yahoo! Mail, Hotmail and Gmail.
PAAS(Platform as a Service)
• Provides clients with the ability to develop and
publish customized applications in a hosted
environment via the web.
• It represents a new model for software
development that is rapidly increasing in its
popularity.
• An example of PaaS is Salesforce.com.
What is a Platform?
• The environment in which a piece of software is
executed.
• A platform is anything you can influence to
accomplish something in a simpler, faster, or
otherwise better way.
• As a programmer, you influence pre-existing code
rather than starting from scratch and writing
everything.
Types of PaaS
Communications platform as a service (CPaaS)
• A CPaaS is a cloud-based platform that
enables developers to add real-time
communications features (voice, video, and
messaging) in their own applications.
Mobile platform as a service
• Initiated in 2012, mobile PaaS (mPaaS)
provides development capabilities for mobile
app designers and developers.
Open PaaS
• Open PaaS provides open source software allowing a PaaS
provider to run applications in an open source environment,
such as Google App Engine. Some open platforms let the
developer use any programming language, database, operating
system or server to deploy their applications.
Advantages of PaaS
Reduced Costs
• When using a PaaS system, real savings are possible due to the fact
that
you don’t perform low-level work yourself and
you don’t have to hire additional personnel or
pay for extra working hours.
Continuous Updates
• you should keep in mind all components that need to be updated
and re-integrated from time to time to keep pace with your
competitors.
Scalability
• Imagine a common situation with a self-built platform: a small
company begins to build an application counting on a certain
number of users; over time, things are going well and the
company is expanding and attracting more users; as the
business grows, it requires more resources to serve the
growing number of users.
Freedom of Action:
• This model of cloud computing is, perhaps, the most
advantageous for creative developers and companies that
need custom solutions. The low-level work is done by
professionals and numerous tools are available and ready to
operate, which saves time.
- lower cost of use, since subscriber fees require a much smaller
investment than what is typically encountered when implementing
traditional tools for software development
- Testing and deployment.
- PaaS providers handle platform maintenance and system upgrades,
resulting in a more efficient and cost effective solution for enterprise
software development.
Disadvantages of PaaS
• Dependency on Vendor
• Compatibility of Existing Infrastructure
some difficulties and contradictions may arise when two systems
come into contact.
• Security Risks
As a rule, PaaS software is available in a public environment where
multiple end users have access to the same basic resources. For some
apps that contain sensitive data or have strict compliance
requirements, this is not a good option.
Paas Examples
• AWS Elastic Beanstalk,
• Windows Azure,
• Heroku,
• Force.com,
• Google App Engine,
• Apache Stratos.
IAAS(Infrastructure as a Service)
- Allows clients to remotely use IT hardware and resources on a “pay-as-you-
go” basis.
- It is also referred to as HaaS (hardware as a service).
- IaaS employs virtualization, a method of creating and managing
infrastructure resources in the “cloud”
- IaaS provides small start up firms with a major advantage, since it allows
them to gradually expand their IT infrastructure without the need for large
capital investments in hardware and peripheral systems.
- Major IaaS players include companies like IBM, Google and Amazon.com.
Overview
• Problems in conventional case:
• Companies I investment for peak capacity
• Lack of agility for IT infrastructure
• IT maintain cost for every company
• Usually suffered from hardware failure risk
• …etc
• These IT complexities force company back !!
• How to solve these problem ?
• Let’s consider some kind of out-sourcing solution
• Somebody will handle on demand capacity for me
• Somebody will handle high available resource for me
• Somebody will handle hardware management for me
• Somebody will handle system performance for me
• Somebody will …
• Frankly, that would be a great solution IF there were “somebody”.
• But who can be this “somebody”, and provide all these services ?
• Infrastructure as a Service will be the salvation.
• IaaS cloud provider takes care of all the IT
infrastructure complexities.
• IaaS cloud provider provides all the infrastructure
functionalities.
• IaaS cloud provider guarantees qualified
infrastructure services.
• IaaS cloud provider charges clients according to their
resource usage.
• But, what make all of these happen so magically ?
Virtualization
• Assume that you are going to be an IaaS cloud
provider.
• Then, what are the problems you are facing ?
• Clients will request different operating systems.
• Clients will request different storage sizes.
• Clients will request different network bandwidths.
• Clients will change their requests anytime.
• Clients will …
• Is there any good strategy ?
• Allocate a new physical machine for each incomer.
• Prepare a pool of pre-installed machines for different
requests.
• or …
Virtualization
• What if we allocate a new physical machine for each incomer ?
I want I want
Windows 7 Linux
Customer A Customer B
I want … I want …
I wantWindows
… I wantLinux
…
Virtualization
• How about preparing a pool of pre-installed physical machines for all
kinds of request ?
I want
Mac OS
Somebody Somebody
Somebody may want Somebody may want
might want… might want…
Windows + Office Windows Server Linux + OpenOffice Linux Server
Virtualization
• Obviously, neither of previous strategies will work.
• We need more powerful techniques to deal with that.
• Virtualization techniques will help.
• For computation resources
• Virtual Machine technique
• For storage resources
• Virtual Storage technique
• For communication resources
• Virtual Network technique
For computation resources:
• A virtual machine (VM) is an operating system (OS) or application
environment that is installed on software, which imitates dedicated
hardware. The end user has the same experience on a virtual
machine as they would have on dedicated hardware.
For storage resources:
• Virtual storage is the pooling of physical storage from multiple
network storage devices into what appears to be a
single storage device that is managed from a central console.
For communication resources
• Network virtualization or network virtualisation is the process of
combining hardware and software network resources and network
functionality into a single, software-based administrative entity,
a virtual network.
Evolution of Cloud Computing
• A recent sensation in the realm of outsourcing is called Cloud
Computing.
• Cloud is a huge collection of effortlessly approachable imaginary like
utilities that can be used and accessed from anywhere, (for example
s/w, h/w, advancement operating environments and applications).
An Overview
• The concept of Cloud Computing came into existence
in the year 1950 with implementation of mainframe
computers, accessible via thin/static clients.
• Since then, cloud computing has been evolved from
static clients to dynamic ones and from software to
services. The following diagram explains the evolution
of cloud computing:
• In the past 1950s- The exact first useful microchip got
it’s begin in the past due 1950s, and once
workstations could accomplish more intricate
computations, individuals began building systems for
business requisitions.
• In 1951- The exact first normal office machine work
by the true Leo workstation, which was made to
handle determinations, payroll.
• In 1959- IBM discharged the 1401 model, which gave
minor undertakings access to actualities transforming
machines.
• In mid1970s- Sun's machines utilized open systems
administration models, for example TCP/IP.
• In 1974- Bill Gates and Robert Allen established
Microsoft
• In 1976- Steve Jobs started Apple Computers
• in 1976, Xerox's Robert Metcalfe indicated the idea of
Ethernet.
• The 80s introduced- more than 5 different million Pcs
being used worldwide, however normally these were
expected for business or government utilization.
• From the genuine mid-1980s on the early to mid-
1990s- The "web application" had come through
common gateway interface.
• In 1981, IBM position the first Pc on the industry.
• In 1982, Microsoft started certificate MS DOS, this
working framework that, as a result of gigantic scale
marketing deliberations by Microsoft organization
• In the center 1990s- Best and most massive system of
workstation frameworks.
• In 1990s- Cern's ease the World Wide-cut Web for
general (that is, non-business) use
• In 1994- Marc Andreessen and John Clark established
Netscape.
• In 2000s - an incredible opportunity to find or put
resources into an internet-based organization.
• Amazon.com presented Amazon Web Services in
2004.
• This gave clients the capacity with a specific end goal
to store information and hang a huge amount of
people to work with exceptionally little obligations,
(for example Hardware Turk), around some different
administrations
• In 2006,- Amazon started Elastic Compute
cloud (Ec2), which allowed individuals to
memory access machines and running
their own particular programming on
them, altogether on the cloud.
• In 2009- Google Apps started, permitting
individuals to make and store paperwork
truly in the genuine cloud.
• The other catalysts were grid computing, which allowed major issues
to be addressed via parallel computing;
• utility computing facilitated computing resources to be offered as a
metered service and SaaS allowed subscriptions, which were
network-based, to applications.
• Cloud computing, therefore, owes its emergence to all these factors.
UNDERLYING PRINCIPLES OF PARALLEL AND
DISTRIBUTED COMPUTING
• What is Parallel Computing?
• Parallel computing is a type of computation in which many
calculations or execution of processes is carried out simultaneously.
• Whereas, a distributed system is a system whose components are
located on different networked computers which communicate and
coordinate their actions by passing messages to one another.
• Parallel computing is also called parallel processing. There are
multiple processors in parallel computing.
• Each of them performs the computations assigned to them.
• In other words, in parallel computing, multiple calculations are
performed simultaneously.
• The systems that support parallel computing can have a shared
memory or distributed memory. In shared memory systems, all the
processors share the memory. In distributed memory systems,
memory is divided among the processors
Parallel computing is the simultaneous use of multiple compute
resources to solve a computational problem:
• A problem is broken into discrete parts that can be solved
concurrently
• Each part is further broken down to a series of instructions
• Instructions from each part execute simultaneously on different
processors
• An overall control/coordination mechanism is employed
Applications of Parallel Computing:
• Data bases and Data mining.
• Real time simulation of systems.
• Science and Engineering.
• Advanced graphics, augmented reality and virtual reality
Advantages to parallel computing:
• As there are multiple processors working simultaneously, it increases
the CPU utilization and improves the performance.
• Moreover, failure in one processor does not affect the functionality of
other processors. Therefore, parallel computing provides reliability.
• It saves time and money as many resources working together.
• It can be impractical to solve larger problems on Serial Computing.
Limitations of Parallel Computing:
• It addresses such as communication and synchronization between
multiple sub-tasks and processes which is difficult to achieve.
• The algorithms must be managed in such a way that they can be
handled in the parallel mechanism.
• The algorithms or program must have low coupling and high
cohesion. But it’s difficult to create such programs.
• More technically skilled and expert programmers can code a
parallelism based program well.
Types of Parallelism:
• Bit-level parallelism: It is the form of parallel computing
which is based on the increasing processor’s word size. It
reduces the number of instructions that the system must
execute in order to perform a task on large-sized data.
• Instruction-level parallelism: It is a measure of how many
instructions can be executed concurrently without affecting
the result of the program. This is called instruction-level
parallelism.
• Task Parallelism: Task parallelism employs the
decomposition of a task into subtasks and then allocating
each of the subtasks for execution. The processors perform
execution of sub tasks concurrently.
What is Distributed Computing?
• A Distributed System is composed of a collection of independent
physically (and geographically) separated computers that do not share
physical memory or a clock.
• Each processor has its own local memory
• Processors communicate using local and wide area networks.
• The nodes of a distributed system may be of heterogeneous
architectures.
• Distributed computing divides a single task between multiple
computers.
• All computers work together to achieve a common goal. Thus, they all
work as a single entity.
• A computer in the distributed system is a node while a collection of
nodes is a cluster.
• In distributed computing systems, multiple system processors can
communicate with each other using messages that are sent over the
network.
MapReduce
Map reduce is really a robust framework manage large amount of data.
The map reduce framework has to involve a lot of overhead when
dealing with iterative map reduce.
Twister is a great framework to perform iterative map reduce.
TWISTER ARCHITECTURE
The Twister is designed to effectively support iterative MapReduce
function.
It reads data from the local disk of the worker nodes and handle the
intermediate data in the distributed memory of the workers mode.
Twister has three main entity:
1. Client Side Driver responsible to drive entire MapReduce computation
2. Twister Daemon running on every working node.
3. The broker Network.
The messaging infrastructure in twister is called broker network and it is
responsible to perform data transfer using publish/subscribe messaging.
• Twister starts a daemon process in each worker node, which then
establishes a connection with the broker network to receive commands
and data.
• The daemon is responsible for maintaining a worker pool to execute map
and reduce tasks, notifying status, and finally responding to control events.
Twister provides the following features to support MapReduce
computations.
• Pub/sub messaging based communication/data transfers
• Efficient support for Iterative MapReduce computations
(extremely faster than Hadoop)
• Combine phase to collect all reduce outputs
• Data access via local disks
• Support for typical MapReduce computations
• Tools to manage data
Access Data
1. To access input data for map task it either reads data from the local
disk of the worker nodes.
2. receive data directly via the broker network.
Additionally they allow tool to perform typical file operations like
(i) create directories,
(ii) delete directories,
(iii) distribute input files across worker nodes,
(iv) copy a set of resources/input files to all worker nodes,
(v) collect output files from the worker nodes to a given location
Intermediate Data
The intermediate data are stored in the distributed memory of the
worker node. Keeping the map output in distributed memory,
enhances the speed of the computation, by sending the output of the
map from these memory to reduces.
Messaging
The use of publish/subscribe messaging infrastructure improves the
efficiency of Twister runtime.
• Pub/sub is shorthand for publish/subscribe messaging, an asynchronous
communication method in which messages are exchanged between
applications without knowing the identity of the sender or recipient.
Four core concepts make up the pub/sub model:
• Topic – An intermediary channel that maintains a list of subscribers to relay
messages to that are received from publishers
• Message – Serialized messages sent to a topic by a publisher which has no
knowledge of the subscribers
• Publisher – The application that publishes a message to a topic
• Subscriber – An application that registers itself with the desired topic in
order to receive the appropriate messages
Fault Tolerance
There are three assumption for providing fault tolerance for iterative
mapreduce:
(i) failure of master node is rare and no support is provided for that.
(ii) the communication network can be made fault tolerant.
(iii) the data is replicated among the nodes of the computation
infrastructure.
Based on these assumptions we try to handle failures of map/reduce
tasks, daemons, and worker nodes failures.
Potential benefits over centralized systems
• The availability of multiple processing nodes means that load can be
shared or balanced across all processing elements with the objective
of increasing throughput and resource efficiency.
• Data and processes can be replicated at a number of different sites to
compensate for failure of some nodes or to eliminate bottlenecks.
• A well designed system will be able to accommodate growth in scale
by providing mechanisms for distributing control and data.
Communication
• Communication is the central issue for distributed systems as all
process interaction depends on it.
• Exchanging messages between different components of the system
incurs delays due to data propagation, execution of communication
protocols and scheduling.
• Communication delays can lead to inconsistencies arising between
different parts of the system at a given instant in time making it
difficult to gather global information for decision making and making
it difficult to distinguish between what may be a delay and what may
be a failure.
Fault tolerance
• Fault tolerance is an important issue for distributed systems.
• Faults are more likely to occur in distributed systems than centralized
ones because of the presence of communication links and a greater
number of processing elements, any of which can fail.
• The system must be capable of reinitializing itself to a state where the
integrity of data and state of ongoing computation is preserved with
only some possible performance degradation.
why a system should be built distributed, not
just parallel?
• Scalability: As distributed systems do not have the problems associated
with shared memory, with the increased number of processors, they are
obviously regarded as more scalable than parallel systems.
• The impact of the failure of any single subsystem or a computer on the
network of computers defines the reliability of such a connected system.
Definitely, distributed systems demonstrate a better aspect in this area
compared to the parallel systems.
• Data sharing: Data sharing provided by distributed systems is similar to the
data sharing provided by distributed databases. Thus, multiple
organizations can have distributed systems with the integrated applications
for data exchange.
• Resources sharing: If there exists an expensive and a special purpose
resource or a processor, which cannot be dedicated to each processor
in the system, such a resource can be easily shared across distributed
systems.
• Heterogeneity and modularity: A system should be flexible enough to
accept a new heterogeneous processor to be added into it and one of
the processors to be replaced or removed from the system without
affecting the overall system processing capability. Distributed systems
are observed to be more flexible in this respect.
• Geographic construction: The geographic placement of different
subsystems of an application may be inherently placed as distributed.
S.NO PARALLEL COMPUTING DISTRIBUTED COMPUTING
Many operations are performed
1. simultaneously System components are located at different locations
2. Single computer is required Uses multiple computers
Multiple processors perform multiple
3. operations Multiple computers perform multiple operations
4. It may have shared or distributed memory It have only distributed memory
Processors communicate with each other Computer communicate with each other through
5. through bus message passing.
Improves system scalability, fault tolerance and
6. Improves the system performance resource sharing capabilities
Parallel vs Distributed Computing
Distributed computing is a computation type in
Parallel computing is a computation type in
which networked computers communicate and
which multiple processors execute multiple
coordinate the work through message passing
tasks simultaneously.
to achieve a common goal.
Number of Computers Required
Distributed computing occurs between
Parallel computing occurs on one computer.
multiple computers.
Processing Mechanism
In parallel computing multiple processors In distributed computing, computers rely on
perform processing. message passing.
Number of Computers Required
Distributed computing occurs between multiple
Parallel computing occurs on one computer.
computers.
Processing Mechanism
In parallel computing multiple processors perform In distributed computing, computers rely on message
processing. passing.
Synchronization
All processors share a single master clock for There is no global clock in distributed computing, it uses
synchronization. synchronization algorithms.
Memory
In Parallel computing, computers can have shared In Distributed computing, each computer has their own
memory or distributed memory. memory.
Usage
Parallel computing is used to increase performance and Distributed computing is used to share resources and to
for scientific computing. increase scalability.
CLOUD CHARACTERISTICS
• On Demand Self-services
• Broad Network Access
• Resource Pooling
• Rapid Elasticity
• Measured Service
• Dynamic Computing Infrastructure
• IT Service-centric Approach
• Minimally or Self-managed Platform
• Consumption-based Billing
• Multi Tenancy
• Managed Metering
On-Demand Self-Service
Cloud computing provides resources on demand, i.e. when the
consumer wants it. Self-service means that the consumer performs all
the actions needed to acquire the service themself, instead of going
through an IT department, for example.
Broad network access
Capabilities are available over the network and accessed through
standard mechanisms that promote use by heterogeneous thin or thick
client platforms (e.g., mobile phones, tablets, laptops, and
workstations).
Resource pooling
The provider’s computing resources are pooled, to serve multiple
consumers, using a multi-tenant model, with different physical and
virtual resources dynamically, assigned and reassigned according to
consumer demand.
There is a sense of location independence in that the customer
generally has no control or knowledge over the exact location of the
provided resources but may be able to specify location at a higher level
of abstraction (e.g., country, state, or datacenter).
Examples of resources include storage, processing, memory, and
network bandwidth.
Rapid elasticity
Elasticity is basically a ‘rename’ of scalability, which has been a known
non-functional requirement in IT architectures for many years already.
Scalability is the ability to add or remove capacity, mostly processing,
memory, or both.
Measured service.
Cloud services generally charge users per hour of resource usage, or
based on the number of certain kinds of transactions that have
occurred, amount of storage in use, and the amount of data transferred
over a network. All usage is measured.
Resource usage can be monitored, controlled, and reported, providing
transparency for both the provider and consumer of the utilized
service.
Large Network Access
The user can access the data of the cloud or upload the data to the
cloud from anywhere just with the help of a device and an internet
connection. These capabilities are available all over the network and
accessed with the help of internet.
Availability
The capabilities of the Cloud can be modified as per the use and can be
extended a lot. It analyzes the storage usage and allows the user to buy
extra Cloud storage if needed for a very small amount.
Automatic System
Cloud computing automatically analyzes the data needed and supports
a metering capability at some level of services.
We can monitor, control, and report the usage.
It will provide transparency for the host as well as the customer.
Economical
It is the one-time investment as the company (host) has to buy the
storage and a small part of it can be provided to the many companies
which save the host from monthly or yearly costs.
Only the amount which is spent is on the basic maintenance and a few
more expenses which are very less.
Security
Cloud Security, is one of the best features of cloud computing.
It creates a snapshot of the data stored so that the data may not get
lost even if one of the servers gets damaged.
The data is stored within the storage devices, which cannot be hacked
and utilized by any other person. The storage service is quick and
reliable.
ELASTICITY IN CLOUD
• Elasticity is the ability to grow or shrink infrastructure resources
dynamically as needed to adapt to workload changes in an autonomic
manner, maximizing the use of resources.
• Elasticity is the degree to which a system is able to adapt to workload
changes by provisioning and de-provisioning resources in an
autonomic manner, such that at each point in time the available
resources match the current demand as closely as possible.
• Elasticity in cloud infrastructure involves enabling the hypervisor to
create virtual machines or containers with the resources to meet the
real-time demand.
• Scalability often is discussed at the application layer, highlighting
capability of a system, network or process to handle a growing
amount of work, or its potential to be enlarged in order to
accommodate that growth.
• This can result in savings infrastructure costs overall. Not everyone
can benefit from elastic services though.
• Environments that do not experience sudden or cyclical changes in
demand may not benefit from the cost savings elastic services offer.
• Use of “Elastic Services” generally implies all resources in the
infrastructure be elastic. This includes but not limited to hardware,
software, QoS and other policies, connectivity, and other resources
that are used in elastic applications.
• This may become a negative trait where performance of certain
applications must have guaranteed performance. It depends on the
environment.
• This is the case for businesses with dynamic resource demands like
streaming services or e-commerce marketplaces.
• Various seasonal events (like Christmas, Black Friday) and other
engagement triggers (like when HBO’s Chernobyl spiked an interest in
nuclear-related products) cause spikes of customer activity.
• These volatile ebbs and flows of workload require flexible resource
management to handle the operation consistently.
Dimensions and Core Aspects
• Any given adaptation process is defined in the context of at least
one or possibly multiple types of resources that can be scaled up
or down as part of the adaptation.
• Each resource type can be seen as a separate dimension of the
adaptation process with its own elasticity properties.
• If a resource type is a container of other resources types, like in
the case of a virtual machine having assigned CPU cores and
RAM, elasticity can be considered at multiple levels.
• Normally, resources of a given resource type can only be
provisioned in discrete units like CPU cores, virtual machines
(VMs), or physical nodes.
Speed
The speed of scaling up is defined as the time it takes to switch from an
under-provisioned state to an optimal or over-provisioned state.
The speed of scaling down is defined as the time it takes to switch from
an over-provisioned state to an optimal or under-provisioned state.
The speed of scaling up/down does not correspond directly to the
technical resource provisioning/de-provisioning time.
Precision
The precision of scaling is defined as the absolute deviation of the
current amount of allocated resources from the actual resource
demand.
Motivation
• A promising way of managing and improving the utilization of data center
resources and providing a wide range of computing services
• Virtualization is a key enabling technology of cloud computing.
• System virtualization - access software and hardware resources
• Provide several concurrently usable and independent instances of virtual
execution entities, often called virtual machines (VMs).
• Dynamic resource provision and management feature is called elasticity.
• Elasticity is provisioning and de-provisioning resources in an autonomic
manner, such that at each point in time the available resources match the
current demand as closely as possible
• when VMs do not use all the provided resources, they can be logically
resized and be migrated from a group of active servers to other servers,
while the idle servers can be switched to the low-power modes (sleep or
hibernate).
• elasticity has been used to avoid inadequate provision of resources
and degradation of system performance
• achieve cost reduction
• elasticity can be used for other purposes, such as increasing the
capacity of local resources.
Over Provisioning and Under Provisioning
• Elasticity is to avoid either over provisioning or under provisioning of
resources.
• Giving a cloud user either too much or too little data and resources will put
that user at a disadvantage.
• If an enterprise has too many resources, they’ll be paying for assets they
aren’t using. If they have too few resources, they can’t run their processes
correctly.
• Elastic systems can detect changes in workflows and processes in the
cloud, automatically correcting resource provisioning to adjust for updated
user projects.
• Modern business operations live on Cloud scalability and cloud elasticity
handle these two business aspects(consistent performance and instant
service availability) in equal measure.
Notations and Preliminaries
• Notations describes the correlated variables which are used in the
following sections. To elaborate the essence of cloud elasticity, we give the
various states that are used in our discussion.
• Let i denote the number of VMs in service and let j be the number of
requests in the system.
(1) Just-in-Need State. A cloud platform is in a just-in-need state if i<j<3i is
defined as the accumulated time in all just-in-need states.
(2) Over-provisioning State. A cloud platform is in an over-provisioning state
if 0<j<i. To is defined as the accumulated time in all over-provisioning states.
(3) Under-provisioning State. A cloud platform is in an under-provisioning
state if j>3i. Tu is defined as the accumulated time in all under-provisioning
states.
• workload can be properly handled and quality of service (QoS) can be
satisfactorily guaranteed.
• Computing resource over-provisioning, though QoS can be achieved,
leads to extra but unnecessary cost to rent the cloud resources.
• Computing resource under-provisioning, on the other hand, delays
the processing of workload and may be at the risk of breaking QoS
commitment.
• The definition of elasticity is given from a computational point of view
and we develop a calculation formula for measuring elasticity value in
virtualized clouds.
• Let Tm be the measuring time, which includes all the periods in the
just-in-need, over-provisioning, and under-provisioning states; that is,
Tm=Tj+To+Tu
• Definition : The elasticity E of a cloud perform is the percentage of
time when the platform is in just-in-need states;
That is, E=Tj/Tm
• where Tm denotes the total measuring time
• To is the over-provisioning time which accumulates each single period
of time that the cloud platform needs to switch from an over-
provisioning state to a balanced state
• Tu is the under-provisioning time which accumulates each single
period of time that the cloud platform needs to switch from an
under-provisioning state to a corresponding balanced state.
• Let Pj ,Po ,and Pu be the accumulated probabilities of just-in-need
states, over-provisioning states, and under-provisioning states,
respectively.
• If Tm is sufficiently long, we have
• Primary factors of elasticity, that is, the amount, frequency, and time
of resource re-provisioning, are all summarized in To and
Tu (i.e.Po, and Pu).
• Elasticity can be increased by changing these factors.
Relevant Properties of Clouds
Resiliency
• The persistence of service delivery that can be trusted justifiably,
when facing changes.
• The Cloud has the distinct ability that it is capable of resilience,
meaning that it can rebuild or “bounce back” depending on the
events which affect it. ... Cloud resilience is understood as a way to
readapt to a “crisis situation”. This applies to the infrastructure and
the data.
Scalability
• Scalability reflects the performance speedup when cloud resources
are re-provisioned.
• In other words, scalability characterizes how well in terms of
performance a new compute cluster, either larger or smaller, handles
a given workload.
• Cloud scalability is impacted by quite a few factors such as the
compute node type and count and workload type and count.
Efficiency
• Efficiency characterizes how cloud resource can be efficiently utilized
as it scales up or down.
• Elasticity is closely related to efficiency of the clouds.
• Efficiency is defined as the percentage of maximum performance
(speedup or utilization) achievable.
• High cloud elasticity results in higher efficiency.
• Scalability is affected by cloud efficiency.
Elastic Cloud Platform Modeling
To model elastic cloud platforms, we make the following assumptions.
(i) All VMs are homogeneous with the same service capability and
are added/removed one at a time.
(ii) The user request arrivals are modeled as a Poisson process with
rate .
(iii) The service time, the start-up time, and the shut-off time of
each VM are governed by exponential distributions with rates α,µ
respectively .
(iv) Let i denote the number of virtual machines that are currently
in service, and let j denote the number of requests that are receiving
service or in waiting.
(v) Let State v(i,j) denote the various states of a cloud platform
• Let the hypothetical just-in-need state, over-provisioning state, and
under-provisioning state be JIN, OP, and UP, respectively. We can set
the equations of the relation between the virtual machine number
and the request number as follows:
• The hypothetical just-in-need state, over-provisioning state, and
under-provisioning state are listed in Table
• Cloud elasticity is a popular feature associated with scale-out solutions ,
which allows for resources to be dynamically added or removed when
needed.
When evaluating elasticity, the following points need to be checked
beforehand:
• Autonomic Scaling: What adaptation process is used for autonomic
scaling?
• Elasticity Dimensions: What is the set of resource types scaled as part of
the adaptation process?
• Resource Scaling Units: For each resource type, in what unit is the amount
of allocated resources varied?
• Scalability Bounds: For each resource type, what is the upper bound on
the amount of resources that can be allocated?
There are several types of cloud scalability:
• Vertical, aka Scale-Up - the ability to handle an increasing workload
by adding resources to the existing infrastructure. It is a short term
solution to cover immediate needs.
• Horizontal, aka Scale-Out - the expansion of the existing
infrastructure with new elements to tackle more significant workload
requirements. It is a long term solution aimed to cover present and
future resource demands with room for expansion.
• A Service Level Agreement (SLA) is the bond for performance negotiated
between the cloud services provider and the client. Earlier, in cloud
computing all Service Level Agreements were negotiated between a client
and the service consumer.
Service Level Agreements usually specify some parameters which are
mentioned below:
• Availability of the Service (uptime)
• Latency or the response time
• Service components reliability
• Each party accountability
• Warranties
ON-DEMAND PROVISIONING
• On-demand computing is a delivery model in which computing resources
are made available to the user as needed.
• The resources may be maintained within the user's enterprise, or made
available by a cloud service provider.
• The on-demand model was developed to overcome the common challenge
to an enterprise of being able to meet fluctuating demands efficiently.
• Because an enterprise's demand on computing resources can vary
drastically from one time to another, maintaining sufficient resources to
meet peak requirements can be costly.
• Conversely, if an enterprise tried to cut costs by only maintaining minimal
computing resources, it is likely there will not be sufficient resources to
meet peak requirements.
• The on-demand model provides an enterprise with the ability to scale
computing resources up or down.
• The model is characterized by three attributes: scalability, pay-per-
use and self-service.
• Many on-demand computing services in the cloud are so user-friendly
that non-technical end users can easily acquire computing resources
without any help from the organization's information technology (IT)
department.
• This has advantages because it can improve business agility, but it
also has disadvantages because shadow IT can pose security risks.
Local On-demand Resource Provisioning
• Virtualization of Cluster and HPC Systems Benefits
• Creates a distributed virtualization layer
• Extend the benefits of VM from one to multiple resources
• Transform a distributed physical infrastructure into a flexible and
elastic virtual infrastructure
Benefit of Remote Provisioning
The virtualization of the local infrastructure supports a virtualized
alternative to contribute resources to a Grid infrastructure.
• Simpler deployment and operation of new middleware distributions
Middleware is software which lies between an operating system and
the applications running on it. Essentially functioning as hidden
translation layer, middleware enables communication and data
management for distributed applications.
• Lower operational costs
• Easy provision of resources to more than one infrastructure or VO.
A virtual organization (VO) refers to a dynamic set of individuals or
institutions defined around a set of resource-sharing rules and
conditions.
• Easy support for VO-specific worker nodes
• Performance partitioning between local and grid clusters
In grid computing, the computers on the network can work on a task
together, thus functioning as a supercomputer.
Computer grids allow access to computing resources from many
different locations, just as the World Wide Web allows access to
information. These computing resources include data storage capacity,
computing power, sensors, visualization tools and much, much more.
VMs to Provide pre-Created Software Environments for
Jobs
• Extensions of job execution managers to create per-job
basis VMs so as to provide a pre-defined environment for
job execution
• Those approaches still manage jobs
• The VMs are bounded to a given PM and only exist
during job execution
• Job Execution Managers for the Management of VMs
• Those approaches manage VMs as jobs
Differences between VMs and Jobs as basic Management
Entities
• VM structure: Images with fixed and variable parts for
migration…
• VM life-cycle: Fixed and transient states for
contextualization, live migration.
• VM duration: Long time periods (“forever”)
• VM groups (services): Deploy ordering, rollback
management…
• VM elasticity: Changing of capacity requirements and
number of VMs
• Different Metrics in the Allocation of Physical Resources
Capacity provisioning:
Probability of SLA violation for a given cost of provisioning including
support for server consolidation, partitioning.
• HPC scheduling: Turnaround time, wait time, throughput
• High-performance computing (HPC) is the ability to process data and
perform complex calculations at high speeds.
• One of the best-known types of HPC solutions is the supercomputer.
A supercomputer contains thousands of compute nodes that work
together to complete one or more tasks. This is called parallel
processing. It’s similar to having thousands of PCs networked
together, combining compute power to complete tasks faster.
How Does HPC Work?
HPC solutions have three main components:
• Compute
• Network
• Storage
• To build a high-performance computing architecture, compute servers
are networked together into a cluster. Software programs and
algorithms are run simultaneously on the servers in the cluster. The
cluster is networked to the data storage to capture the output.
Together, these components operate seamlessly to complete a
diverse set of tasks.
• To operate at maximum performance, each component must keep
pace with the others
What Is HPC Cluster?
• An HPC cluster consists of hundreds or thousands of compute servers
that are networked together. Each server is called a node. The nodes
in each cluster work in parallel with each other, boosting processing
speed to deliver high-performance computing.
Advantages:
• Open and flexible architecture to integrate new virtualization
technologies
• Support for any scheduling policy
• The main benefits of HPC are speed, cost, a flexible
deployment model, fault tolerance, and total cost of
ownership, although the actual benefits realized can vary
from system to system.
• Speed. High performance is synonymous with fast
calculations. The speed of an HPC depends on its
configuration. That is, more clusters and cores enable faster
enable parallel processing. [Performance is also affected by
the software that runs on the machine, including the
operating system design and application design, and the
complexity of the problem being solved.]
• Cost. HPC is cheaper than supercomputing because it utilizes
fewer clusters. HPCs are also available as a cloud
computing option, which may result in additional savings.
HPC is usually rented via public cloud, and many cloud
computing companies offer it.
• Fault tolerance. If part of the system fails, the entire HPC
system doesn’t fail. Given that HPC workloads are computed
heavy, fault tolerance ensures that the computations
continue uninterrupted.
• Static Provisioning:
• For applications that have predictable and generally unchanging
demands/workloads, it is possible to use “static provisioning"
effectively.
• With advance provisioning, the customer contracts with the provider
for services and the provider prepares the appropriate resources in
advance of start of service.
• The customer is charged a flat fee or is billed on a monthly basis.
• Dynamic Provisioning:
• In cases where demand by applications may change or vary, “dynamic
provisioning" techniques have been suggested whereby VMs may be
migrated on-the-fly to new compute nodes within the cloud.
• With dynamic provisioning, the provider allocates more resources as
they are needed and removes them when they are not.
• The customer is billed on a pay-per-use basis.
• When dynamic provisioning is used to create a hybrid cloud, it is
sometimes referred to as cloud bursting.
• User Self-provisioning:
With user self- provisioning (also known as cloud self- service),
the customer purchases resources from the cloud provider
through a web form, creating a customer account and paying
for resources with a credit card.
Parameters for Resource Provisioning
i) Response time: The resource provisioning algorithm designed must
take minimal time to respond when executing the task.
ii) Minimize Cost: From the Cloud user point of view cost should be
minimized.
iii) Revenue Maximization: This is to be achieved from the Cloud
Service Provider’s view.
iv) Fault tolerant: The algorithm should continue to provide service in
spite of failure of nodes.
v) Reduced SLA Violation: The algorithm designed must be able to
reduce SLA violation.
vi) Reduced Power Consumption: VM placement & migration
techniques must lower power consumption.
There are many resource provisioning techniques both static and
dynamic provisioning each having its own pros and cons.
The provisioning techniques are
• used to improve QoS parameters ,
• minimize cost for cloud user and
• maximize revenue for the Cloud Service Provider,
• improve response time ,
• deliver services to cloud user even in presence of failures,
• improve performance reduces SLA violation,
• efficiently uses cloud resources reduces power consumption
• Static Resource Provisioning Techniques
• Aneka’s deadline driven provisioning technique is used for
scientific application as scientific applications require large
computing power.
• Aneka is a cloud application platform which is capable of
provisioning resources which are obtained from various sources
such as public and private clouds, clusters, grids and desktop
grids.
• Because resource failures are unavoidable it is a good idea to
efficiently using an architectural framework for realizing the full
potential of hybrid clouds.
• Aneka proposes a failure-aware resource provisioning
algorithm that is capable of providing cloud users’ QoS
requirements.
• This provides resource provisioning policies and proposes a
scalable hybrid infrastructure to assure QoS of the users.
Issues:
• Since resources held by single cloud are usually limited. It is
better to get resources from other participating clouds.
• But this is difficult to provide right resources from different
cloud providers because management policies are different
and description about various resources is different in each
organization.
• Also interoperability is hard to achieve.
• To overcome this, Inter Cloud Resource Provisioning (ICRP)
system is proposed where resources and tasks are described
and using a semantic scheduler and a set of inference rules
resources are assigned.
• With the increasing functionality and complexity in Cloud
computing, resource failure cannot be avoided.
• So the proposed strategy in addresses the question of
provisioning resources to applications.
• It takes into account the workload model and the failure
correlations to redirect requests to appropriate cloud
providers.
• Dynamic Resource provisioning Techniques
• The algorithm proposed in suitable for web applications where
response time is one of the important factors.
• For web applications guaranteeing average response time is difficult
because traffic patterns are highly dynamic and difficult to predict
accurately and also due to the complex nature of the multi-tier web
applications it is difficult to identify bottlenecks and resolving them
automatically.
• This provisioning technique proposes a working prototype system for
automatic detection and resolution of bottlenecks in a multi-tier
cloud hosted web applications.
• This improves response time and also identifies over provisioned
resources.
• VM based resource management is a heavy weight task. So this is less
flexible and less resource efficient.
• To overcome this, a lightweight approach called Elastic Application
Container [EAC] is used for provisioning the resources where EAC is a
virtual resource unit for providing better resource efficiency and more
scalable applications.
• This EAC–oriented platform and algorithm is to support multi tenant
cloud use .
• But dynamic creation is by building the required components from
the scratch.
• Even though multitenant systems save cost, but incur huge
reconfiguration costs.
• proposes a novel user interface-tenant selector (UTC) model which
enables considers functional, non functional and resource allocation
requirements
• So the cost and time is saved in this approach.
• Adaptive power-aware virtual machine provisioned (APA-VMP) -
where the resources are provisioned dynamically from the resource
pool.
• Server Consolidation is a technique to save on energy costs
in virtualized data centers.
• The instantiation of a given set of Virtual Machines (VMs) to
Physical Machines (PMs) can be thought of as a provisioning
step where amount of resources to be allocated to a VM is
determined and a placement step which decides which VMs
can be placed together on physical machines thereby
allocating VMs to PMs.
Sl. On demand Provisioning
Merits Challenges
No. Techniques
Deadline-driven provisioning of Able to efficiently allocate Not suitable for HPC-data
resources for scientific resources from different sources intensive applications.
1
applications in hybrid clouds in order to reduce application
with Aneka execution times.
Dynamic provisioning in multi- Matches tenant functionalities Does not work for testing on real-
2 tenant service clouds with client requirements. life cloud–based system and
across several domains.
Elastic Application Container: A Outperforms in Not suitable for web applications
Lightweight Approach for Cloud terms of and supports only one type of
3
Resource Provisioning Flexibility and resource programming language, Java.
efficiency.
Hybrid Cloud Resource Able to adopt user the workload Not suitable to run real
Provisioning Policy in the model to provide flexibility in the experiments.
Presence of Resource Failures choice of strategy based on the
4
desired level of QoS, the needed
performance, and the available
budget.
Provisioning of Requests for Runtime efficient & can provide Not practical for medium to large
Virtual Machine Sets with an effective means of online VM- problems.
5
Placement Constraints in IaaS to-PM mapping and also
Clouds Maximizes revenue.
Failure-aware resource Able to improve the users’ QoS Not able to run real experiments
provisioning for hybrid Cloud about 32% in terms of deadline and also not able to move VMs
infrastructure violation rate and 57% in terms of
between public and private clouds
6
slowdown with a limited cost on ato deal with resource failures in
public cloud. the local
infrastructures.
VM Provisioning Method to Reduces SLA violations & ImprovesIncreases the problem of resource
Improve the Profit and SLA Profit. allocation and load balancing
7
Violation of Cloud Service among the datacenters.
Providers
Risk Aware Provisioning and Significant amount of reduction in Takes into account only CPU
Resource Aggregation based the numbers required to host requirements of VMs.
8
Consolidation of Virtual 1000 VMs and enables to turn off
Machines unnecessary servers.
Semantic based Resource Enables the fulfillment of QoS parameters like response
Provisioning and Scheduling in customer requirements to the time and throughput has to be
Inter-cloud Environment maximum by providing additional achieved for interactive
9 resources to the cloud system applications.
participating in a federated cloud
environment thereby solving the
interoperability problem.
Design and implementation of Efficient VM placement and Not suitable for conserving power
adaptive power-aware virtual significant reduction in power. in modern data centers.
10
machine provisioned (APA-VMP)
using swarm intelligence
Adaptive resource provisioning Automatic Identification and Not suitable for n-tier clustered
for read intensive multi-tier resolution of bottlenecks in application hosted on a cloud.
11
applications in the cloud multitier web application hosted
on a cloud.
Optimal Resource Provisioning Efficiently provisions Cloud Applicable only for SaaS users and
for Cloud Computing Resources for SaaS users with a SaaS providers.
12
Environment limited budget and Deadline
thereby optimizing QoS.
CHAPTER-2
CLOUD ENABLING
TECHNOLOGIES
• SERVICE-ORIENTED ARCHITECTURE (SOA)
• A service-oriented architecture (SOA) is essentially a collection of
services.
• These services communicate with each other.
• The communication can involve either simple data passing or it could
involve two or more services coordinating some activity.
• Some means of connecting services to each other is needed.
• Service-Oriented Architecture (SOA) is a style of software design
where services are provided to the other components by application
components, through a communication protocol over a network.
• Its principles are independent of vendors and other technologies.
In service oriented architecture, a number of services communicate
with each other, in one of two ways: through passing data or through
two or more services coordinating an activity.
• Services
If a service-oriented architecture is to be effective, we need a clear
understanding of the term service.
A service is a function that is well-defined, self-contained, and does not
depend on the context or state of other services. See Service.
• Connections
The technology of Web Services is the most likely connection
technology of service-oriented architectures. The following figure
illustrates a basic service-oriented architecture.
• It shows a service consumer at the right sending a service request
message to a service provider at the left.
• The service provider returns a response message to the service
consumer.
• The request and subsequent response connections are defined in
some way that is understandable to both the service consumer and
service provider.
• A service provider can also be a service consumer.
• Web services which are built as per the SOA architecture are
more independent.
• The web services themselves can exchange data with each
other and because of the underlying principles on which
they are created, they don't need any sort of human
interaction and also don't need any code modifications.
• It ensures that the web services on a network can interact
with each other.
Benefit of SOA
• Language Neutral Integration: Regardless of the developing
language used, the system offers and invoke services
through a common mechanism. Programming language
neutralization is one of the key benefits of SOA's
integration approach.
• Component Reuse: Once an organization built an
application component, and offered it as a service, the rest
of the organization can utilize that service.
• Organizational Agility: SOA defines building blocks of
capabilities provided by software and it offers some
service(s) that meet some organizational requirement; which
can be recombined and integrated rapidly.
• Leveraging Existing System: This is one of the major use of
SOA which is to classify elements or functions of existing
applications and make them available to the organizations or
enterprise.
SOA Architecture
• SOA architecture is viewed as five horizontal layers. These are
described below:
• Consumer Interface Layer: These are GUI based apps for end users
accessing the applications.
• Business Process Layer: These are business-use cases in terms of
application.
• Services Layer: These are whole-enterprise, in service inventory.
• Service Component Layer: are used to build the services, such as
functional and technical libraries.
• Operational Systems Layer: It contains the data model.
SOA Governance
• SOA governance focuses on managing Business services.
• Furthermore in service oriented organization, everything
should be characterized as a service in an organization.
• Governance becomes clear when we consider the amount of
risk that it eliminates with the good understanding of
service, organizational data and processes in order to choose
approaches and processes for policies for monitoring and
generate performance impact.
SOA Architecture Protocol
• Here lies the protocol stack of SOA showing each protocol
along with their relationship among each protocol.
• These components are often programmed to comply with
SCA (Service Component Architecture)
• These components are written in BPEL (Business Process
Execution Languages), Java, C#, XML etc and can apply to
C++ or FORTRAN or other modern multi-purpose languages
such as Python or Ruby.
SOA Security
• With the vast use of cloud technology and its on-demand
applications, there is a need for well - defined security policies and
access control.
• With the betterment of these issues, the success of SOA architecture
will increase.
• Actions can be taken to ensure security and lessen the risks when
dealing with SOE (Service Oriented Environment).
Elements of SOA
• SOA is based on some key principles which are mentioned below
Standardized Service Contract - A service must have some sort of
description which describes what the service is about.
• This makes it easier for client applications to understand what the
service does.
Loose Coupling – Less dependency on each other.
• This is one of the main characteristics of web services which just
states that there should be as less dependency as possible between
the web services and the client invoking the web service.
• So if the service functionality changes at any point in time, it should
not break the client application or stop it from working.
Service Abstraction - Services hide the logic they encapsulate
from the outside world.
• The service should not expose how it executes its
functionality;
• It should just tell the client application on what it does and
not on how it does it.
Service Reusability
• Logic is divided into services with the intent of maximizing
reuse.
• In any development company re-usability is a big topic
because obviously one wouldn't want to spend time and
effort building the same code again and again across
multiple applications which require them.
• Hence, once the code for a web service is written it should
have the ability work with various application types.
• Service Autonomy - Services should have control over the logic they
encapsulate. The service knows everything on what functionality it
offers and hence should also have complete control over the code it
contains.
• Service Statelessness - Ideally, services should be stateless. This
means that services should not withhold information from one state
to the other. This would need to be done from either the client
application.
• Service Discoverability - Services can be discovered (usually in a
service registry). We have already seen this in the concept of the
UDDI(Universal Description Discovery and Integration), which
performs a registry which can hold information about the web
service.
• Service Composability - Services break big problems into little
problems. One should never embed all functionality of an application
into one single service but instead, break the service down into
modules each with a separate business functionality.
• Service Interoperability - Services should use standards that allow
diverse subscribers to use the service. In web services, standards as
XML and communication over HTTP is used to ensure it conforms to
this principle.
Service-Oriented Architecture Patterns
• There are three roles in each of the Service-Oriented Architecture
building blocks: service provider; service broker and service
requester/consumer.
• The service provider works in conjunction with the service registry,
debating the why and how the services being offered, such as
security, availability, what to charge, and more.
• The service broker makes information regarding the service available
to those requesting it. The scope of the broker is determined by
whoever implements it.
• The service requester locates entries in the broker registry and then
binds them to the service provider. They may or may not be able to
access multiple services; that depends on the capability of the service
requester.
Implementing Service-Oriented Architecture
• There is a wide range of technologies that can be used,
depending on what your end goal is and what you’re trying
to accomplish.
• Typically, Service-Oriented Architecture is implemented with
web services, which makes the “functional building blocks
accessible over standard internet protocols.”
• An example of a web service standard is SOAP, which stands
for Simple Object Access Protocol. SOAP is a messaging
protocol specification for exchanging structured information
in the implementation of web services in computer
networks.
Here are some examples of SOA at work
• To deliver services outside the firewall to new markets:
• First Citizens Bank not only provides services to its own
customers, but also to about 20 other institutions, including
check processing, outsourced customer service, and "bank in
a box" for getting community-sized bank everything they
need to be up and running. Underneath these services is an
SOA-enabled mainframe operation.
To provide real-time analysis of business events:
• Through real-time analysis, OfficeMax is able to order out-of-
stock items from the point of sale, employ predictive
monitoring of core business processes such as order
fulfillment, and conduct real-time analysis of business
transactions, to quickly measure and track product affinity,
hot sellers, proactive inventory response and price error
checks
• To streamline the business: Whitney National Bank built a winning
SOA formula that helped the bank attain measurable results on a
number of fronts, including cost savings, integration, and more
impactful IT operations. Metrics and progress are tracked month to
month.
• To speed time to market: This may be the competitive advantage
available to large enterprises
• To improve federal government operations: The US Government
Accountability Office (GAO) issued guidelines intended to help
government agencies achieve enterprise transformation through
enterprise architecture. The guidelines and conclusions offer a strong
business case for commercial businesses also seeking to achieve
greater agility and market strength through shared IT services.
• Within the federal government, the Office of Management and
Budget (OMB) has encouraged development of centers of excellence
that provide IT services to multiple customers. Among them are four
financial centers of excellence operated by the Bureau of Public Debt,
General Services Administration, Department of the Interior and the
Department of Transportation.
• To improve state and local government operations: The money isn’t
there to advance new initiatives, but state governments may have
other tools at their disposal to drive new innovations — through
shared IT service
Ex:Governments at all levels are increasingly turning to these channels
to improve their communications with and responsiveness to the
public. Government records and data that were previously isolated and
difficult for the public to access are now becoming widely available
through APIs (application programming interfaces).
• To improve healthcare delivery: can improve the delivery of
important information and make the sharing of data across a
community of care practical in cost, security, and risk of deployment.
• Healthcare organizations today are challenged to manage a growing
portfolio of systems.
• The cost of acquiring, integrating, and maintaining these systems is
rising, while end-user demands are increasing.
• In addition to all these factors, there are increased demands for
enabling interoperability between other healthcare organizations to
regionally support care delivery.
• SOA offers system design and management principles in support of
the reuse and sharing of system resources across that are potentially
very valuable to a typical healthcare organization.
• To support online business offerings: Thomson Reuters, a provider of
business intelligence information for businesses and professionals,
maintains a stable of 4,000 services that it makes available to outside
customers.
• For example, one such service, Thomson ONE Analytics, delivers a
broad and deep range of financial content to Thomson Reuters
clientele. Truly an example of SOA supporting the cloud.
• To virtualize history: Colonial Williamsburg, Virginia, implementing a
virtualized, tiered storage pool to manage its information and
content.
• To defend the universe: US Air Force space administrator announced
that new space-based situational awareness systems will be deployed
on service oriented architecture-based infrastructure.
• The importance of Service-Oriented Architecture
• There are a variety of ways that implementing an SOA structure can
benefit a business, particularly, those that are based around web
services. Here are some of the foremost:
• Creates reusable code
• The primary motivator for companies to switch to an SOA is the
ability to reuse code for different applications.
• By reusing code that already exists within a service, enterprises can
significantly reduce the time that is spent during the development
process.
• It also lowers costs that are often incurred during the development of
applications.
• Since SOA allows varying languages to communicate through a central
interface, this means that application engineers do not need to be
concerned with the type of environment in which these services will
be run.
• Promotes interaction
Interoperability refers to the sharing of data.
• The more interoperable software programs are, the easier it is for
them to exchange information.
• Software programs that are not interoperable need to be integrated.
Therefore, SOA integration can be seen as a process that enables
interoperability. A goal of service-orientation is to establish native
interoperability within services in order to reduce the need for
integration
• Allows for scalability
• When developing applications for web services, one issue that is of
concern is the ability to increase the scale of the service to meet the
needs of the client.
• By using an SOA where there is a standard communication protocol in
place, enterprises can drastically reduce the level of interaction that is
required between clients and services
• Reduction means that applications can be scaled without putting
added pressure on the application, as would be the case in a tightly-
coupled environment.
Reduced costs
• This cost reduction is facilitated by the fact that loosely coupled
systems are easier to maintain and do not necessitate the need for
costly development and analysis.
• Furthermore, the increasing popularity in SOA means reusable
business functions are becoming commonplace for web services
which drive costs lower.
REST AND SYSTEMS OF SYSTEMS
• Representational state transfer (REST) is a software
architectural style that defines a set of constraints to be used
for creating Web services.
• Web services that conform to the REST architectural style,
called RESTful Web services, provide interoperability
between computer systems on the internet.
• Representational State Transfer (REST) is an architecture
principle in which the web services are viewed as resources
and can be uniquely identified by their URLs.
• "Web resources" were first defined on the World Wide Web as
documents or files identified by their URLs.
• However, today they have a much more generic and abstract
definition that encompasses every thing, entity, or action that can be
identified, named, addressed, handled, or performed, in any way
whatsoever, on the Web.
• In a RESTful Web service, requests made to a resource's URI will elicit
a response with a payload formatted in HTML, XML, JSON, or some
other format.
• The response can confirm that some alteration has been made to the
resource state, and the response can provide hypertext links to other
related resources.
• When HTTP is used, as is most common, the operations (HTTP
methods) available are GET, HEAD, POST, PUT, DELETE, CONNECT,
OPTIONS and TRACE.
• By using a stateless protocol and standard operations, RESTful
systems aim for fast performance, reliability, and the ability to grow
by reusing components that can be managed and updated without
affecting the system as a whole, even while it is running.
• The key characteristic of a REST Web service is the explicit use of
HTTP methods to denote the invocation of different operations.
• The REST architecture involves client and server interactions built
around the transfer of resources.
• The Web is the largest REST implementation.
• REST may be used to capture website data through
interpreting extensible markup language (XML) Web page
files with the desired data.
• In addition, online publishers use REST when providing
syndicated content to users.
• Content syndication is when web-based content is re-published by a
third-party website.
Architectural properties
Performance in component interactions, which can be the dominant
factor in user-perceived performance and network efficiency;[9]
• scalability allowing the support of large numbers of components and
interactions among components.
• simplicity of a uniform interface;
• modifiability of components to meet changing needs (even while the
application is running);
• visibility of communication between components by service agents;
• portability of components by moving program code with the data;
• reliability in the resistance to failure at the system level in the
presence of failures within components, connectors, or data
• Basic REST constraints include:
• Client and Server: The client and server are separated through a
uniform interface, which improves client code portability.
• Stateless: Each client request must contain all required data for
request processing without storing client context on the server.
• Cacheable: Responses (such as Web pages) can be cached on a client
computer to speed up Web Browsing. Responses are defined as
cacheable or not cacheable to prevent clients from reusing or
inappropriate data when responding to further requests.
• Layered System: Enables clients to connect to the end server through
an intermediate layer for improved scalability. If a proxy or load
balancer is placed between the client and server, it won't affect their
communications and there won't be a need to update the client or
server code.
Uniform Interface: It is a key constraint that differentiate between a
REST API and Non-REST API. It suggests that there should be an uniform
way of interacting with a given server irrespective of device or type of
application (website, mobile app).
There are four guidelines principle of Uniform Interface are:
• Resource-Based: Individual resources are identified in requests. For
example: API/users.
• Manipulation of Resources Through Representations: Client has
representation of resource and it contains enough information to
modify or delete the resource on the server, provided it has
permission to do so.
• Self-descriptive Messages: Each message includes enough
information to describe how to process the message so that server
can easily analyses the request.
The basic REST design principle uses typical
operations:
GET
The GET method requests a representation of the specified resource.
Requests using GET should only retrieve data.
HEAD
The HEAD method asks for a response identical to that of a GET request,
but without the response body.
POST
The POST method is used to submit an entity to the specified resource
PUT
The PUT method replaces all current representations of the target
resource with the request payload.
DELETE
The DELETE method deletes the specified resource.
CONNECT
The CONNECT method establishes a tunnel to the server identified by the
target resource.
OPTIONS
The OPTIONS method is used to describe the communication options for
the target resource.
TRACE
The TRACE method performs a message loop-back test along the path to
the target resource.
PATCH
The PATCH method is used to apply partial modifications to a resource.
• The major advantages of REST-services are:
• They are highly reusable across platforms (Java, .NET, PHP, etc) since
they rely on basic HTTP protocol
• They use basic XML instead of the complex SOAP and are easily used
• In comparison to SOAP based web services, the programming model
is simpler
• REST allows a greater variety of data formats, whereas SOAP only allows
XML.
• REST is generally considered easier to work better with data and offers
faster
• REST offers better support for browser clients.
• REST provides superior performance, particularly through caching for
information
• REST is the protocol used most often for major services such as Yahoo,
Ebay, Amazon, and even Google.
• REST is generally faster and uses less bandwidth. It’s also easier to
integrate with existing websites with no need to refactor site
infrastructure. This enables developers to work faster rather than spend
time rewriting a site from scratch. Instead, they can simply add additional
functionality
Five key principles are:
• Give every “thing” an ID
• Link things together
• Use standard methods
• Resources with multiple representations
• Communicate statelessly
• Overview of Architecture
• In J2EE applications, the Java API or services are exposed as either
Stateless Session Bean API (Session Façade pattern) or as SOAP web
services.
• In case of integration of these services with client applications using
non-Java technology like .NET or PHP etc, it becomes very
cumbersome to work with SOAP Web Services and also involves
considerable development effort.
• The Enterprise Information Systems Tier
• The enterprise information systems (EIS) tier consists of database servers,
enterprise resource planning systems, and other legacy data sources, like
mainframes. These resources typically are located on a separate machine than
the Java EE server, and are accessed by components on the business tier.
• Java EE Technologies Used in the EIS Tier
• The following Java EE technologies are used to access the EIS tier in Java EE
applications:
• The Java Database Connectivity API (JDBCTM)
• The Java Persistence API
• The Java EE Connector Architecture
• The Java Transaction API (JTA)
• The Business Tier
• The business tier consists of components that provide the business logic for an
application. Business logic is code that provides functionality to a particular business
domain, like the financial industry, or an e-commerce site. In a properly designed
enterprise application, the core functionality exists in the business tier components.
• Java EE Technologies Used in the Business Tier
The following Java EE technologies are used in the business tier in Java EE applications:
• Enterprise JavaBeans (enterprise bean) components
• JAX-RS RESTful web services(JAX-RS is a JAVA based programming language API and
specification to provide support for created RESTful Web Services)
• JAX-WS web service endpoints
• Java Persistence API entities
The Web Tier
The web tier consists of components that handle the interaction between
clients and the business tier. Its primary tasks are the following:
• Dynamically generate content in various formats for the client.
• Collect input from users of the client interface and return appropriate
results from the components in the business tier.
• Control the flow of screens or pages on the client.
• Maintain the state of data for a user's session.
• Perform some basic logic and hold some data temporarily in JavaBeans
components.
• The architecture consists of a Front Controller
which acts as the central point for receiving
requests and providing response to the
clients.
• The Front Controller gives the request
processing to the ActionController, which
contains the processing logic of this
framework.
• The architecture consists of a Front Controller
The ActionController performs validation,
maps the request to the appropriate Action
and invokes the action to generate response.
• Validating a website is the process of
ensuring that the pages on
the website conform to the norms or
standards defined by various organizations.
• Various Helper Services are provided for
request processing, logging and exception
handling which can be used by the
ActionController as well as the individual
Actions.
• Service Client
• This is a client application which needs to invoke the service. This
component can be either Java-based or any other client as long as it is
able to support the HTTP methods
• Common Components
• These are the utility services required by the framework like logging,
exception handling and any common functions required for
implementation.
WEB SERVICES
• What is Web Service?
• Type of Web Service
• Web Services Advantages
• Web Service Architecture
• Web Service Characteristics
• What is Web Service?
• Web service is a standardized medium to propagate communication
between the client and server applications on the World Wide Web.
• A web service is a software module which is designed to perform a
certain set of tasks.
• Web service is a service offered by a server running on a computer
device, listening for requests at a particular port over a network,
serving web documents (HTML, JSON, XML, images), and
creating web applications services, which serve in solving specific
domain problems over the Web (WWW, Internet, HTTP)
• In a Web service a Web technology such as HTTP is used for
transferring machine-readable file formats such as XML and JSON.
• The above diagram shows a very simplistic view of how a web service
would actually work.
• The client would invoke a series of web service calls via requests to a
server which would host the actual web service.
• These requests are made through what is known as remote
procedure calls. Remote Procedure Calls(RPC) are calls made to
methods which are hosted by the relevant web service.
• The main component of a web service is the data which is transferred
between the client and the server, and that is XML.
• XML stands for eXtensible Markup Language.
• XML was designed to store and transport data.
• XML was designed to be both human- and machine-readable.
• XML (Extensible markup language) is easy to understand the
intermediate language that is understood by many programming
languages.
• Web services use something known as SOAP (Simple Object Access
Protocol) for sending the XML data between applications. The data is
sent over normal HTTP.
• The data which is sent from the web service to the application is
called a SOAP message. The SOAP message is nothing but an XML
document. Since the document is written in XML, the client
application calling the web service can be written in any programming
language.
• Type of Web Service
• There are mainly two types of web services.
• SOAP web services.
• RESTful web services.
• In order for a web service to be fully functional, there are certain
components that need to be in place. These components need to be
present irrespective of whatever development language is used for
programming the web service.
• SOAP (Simple Object Access Protocol)
• SOAP is known as a transport-independent messaging protocol. SOAP
is based on transferring XML data as SOAP Messages. Each message
has something which is known as an XML document.
• Only the structure of the XML document follows a specific pattern,
but not the content. The best part of Web services and SOAP is that
its all sent via HTTP, which is the standard web protocol.
• Each SOAP document needs to have a root element known as the
<Envelope> element. The root element is the first element in an XML
document.
• The "envelope" is in turn divided into 2 parts.
• The first is the header, and the next is the body.
• The header contains the routing data which is basically the
information which tells the XML document to which client it needs to
be sent to.
• The body will contain the actual message.
• WSDL (Web services description language)
• The client invoking the web service should know where the web service
actually resides.
• Secondly, the client application needs to know what the web service
actually does, so that it can invoke the right web service.
• This is done with the help of the WSDL, known as the Web services
description language.
• The WSDL file is again an XML-based file which basically tells the client
application what the web service does. By using the WSDL document, the
client application would be able to understand where the web service is
located and how it can be utilized.
• <message> - The message parameter in the WSDL definition is used
to define the different data elements for each operation performed
by the web service.
• <portType> - This actually describes the operation which can be
performed by the web service
• <binding> - This element contains the protocol which is used. So in
our case, we are defining it to use http
(http://schemas.xmlsoap.org/soap/http).
• Universal Description, Discovery, and Integration (UDDI)
• The Universal Description, Discovery and Integration (UDDI)
specifications define a registry service for Web services and for other
electronic and non-electronic services.
• A UDDI registry service is a Web service that manages information
about service providers, service implementations, and service
metadata.
• Service providers can use UDDI to advertise the services they offer.
• Service consumers can use UDDI to discover services that suit their
requirements and to obtain the service metadata needed to consume
those services.
• The UDDI specifications define:
• SOAP APIs that applications use to query and to publish information
to a UDDI registry
• XML Schema schemata of the registry data model and the SOAP
message formats
• WSDL definitions of the SOAP APIs
• UDDI registry definitions (technical models - tModels) of various
identifier and category systems that may be used to identify and
categorize UDDI registrations
• Web Services Advantages
• We already understand why web services came about in the first place,
which was to provide a platform which could allow different applications to
talk to each other.
• Exposing Business Functionality on the network - A web service is a unit
of managed code that provides some sort of functionality to client
applications or end users. This functionality can be invoked over the HTTP
protocol which means that it can also be invoked over the internet.
Nowadays all applications are on the internet which makes the purpose of
Web services more useful. That means the web service can be anywhere
on the internet and provide the necessary functionality as required.
• Interoperability amongst applications - Web services allow various
applications to talk to each other and share data and services among
themselves. All types of applications can talk to each other. So instead
of writing specific code which can only be understood by specific
applications, you can now write generic code that can be understood
by all applications
• A Standardized Protocol which everybody understands - Web
services use standardized industry protocol for the communication.
All the four layers (Service Transport, XML Messaging, Service
Description, and Service Discovery layers) uses well-defined protocols
in the web services protocol stack.
• Reduction in cost of communication - Web services use SOAP over
HTTP protocol, so you can use your existing low-cost internet for
implementing web services.
• Web service Architecture
• Every framework needs some sort of architecture to make sure the entire
framework works as desired. Similarly, in web services, there is an
architecture which consists of three distinct roles as given below
• Provider - The provider creates the web service and makes it available to
client application who want to use it.
• Requestor - A requestor is nothing but the client application that needs to
contact a web service. The client application can be a .Net, Java, or any
other language based application which looks for some sort of functionality
via a web service.
• Broker - The broker is nothing but the application which provides access to
the UDDI. The UDDI, as discussed in the earlier topic enables the client
application to locate the web service.
• Publish - A provider informs the broker (service registry) about the
existence of the web service by using the broker's publish interface to
make the service accessible to clients
• Find - The requestor consults the broker to locate a published web
service
• Bind - With the information it gained from the broker(service registry)
about the web service, the requestor is able to bind, or invoke, the
web service.
• Web service Characteristics
• Web services have the following special behavioral characteristics:
They are
XML-Based - Web Services uses XML to represent the data at the
representation and data transportation layers. Using XML eliminates
any networking, operating system, or platform sort of dependency
since XML is the common language understood by all.
• Loosely Coupled – Loosely coupled means that the client and the web
service are not bound to each other, which means that even if the
web service changes over time, it should not change the way the
client calls the web service.
• Adopting a loosely coupled architecture tends to make software
systems more manageable and allows simpler integration between
different systems.
• Synchronous or Asynchronous functionality- Synchronicity refers to the
binding of the client to the execution of the service.
• In synchronous operations, the client will actually wait for the web service
to complete an operation. An example of this is probably a scenario
wherein a database read and write operation are being performed. If data
is read from one database and subsequently written to another, then the
operations have to be done in a sequential manner.
• Asynchronous operations allow a client to invoke a service and then
execute other functions in parallel. This is one of the common and
probably the most preferred techniques for ensuring that other services
are not stopped when a particular operation is being carried out.
• Ability to support Remote Procedure Calls (RPCs) - Web services
enable clients to invoke procedures, functions, and methods on
remote objects using an XML-based protocol.
• Supports Document Exchange - One of the key benefits of XML is its
generic way of representing not only data but also complex
documents.
• PUBLISH-SUBSCRIBE MODEL
• Publish-subscribe (pub/sub) is a messaging pattern where publishers
push messages to subscribers.
• In software architecture, pub/sub messaging provides instant event
notifications for distributed applications, especially those that are
decoupled into smaller, independent building blocks.
• In laymen’s terms, pub/sub describes how two different parts of a
messaging pattern connect and communicate with each other.
• Pub/Sub delivers low-latency, durable messaging that helps
developers quickly integrate systems hosted on the Google Cloud
Platform and externally.
• Pub/Sub brings the flexibility and reliability of enterprise message-
oriented middleware to the cloud.
• At the same time, Pub/Sub is a scalable.
• By providing many-to-many, asynchronous messaging that decouples
senders and receivers, it allows for secure and highly available
communication among independently written applications.
• These are three central components to understanding pub/sub
messaging pattern:
• Publisher: Publishes messages to the communication infrastructure
• Subscriber: Subscribes to a category of messages
• Communication infrastructure (channel, classes): Receives messages
from publishers and maintains subscribers’ subscriptions.
• The publisher will categorize published messages into classes where
subscribers will then receive the message.
• Basically, a publisher has one input channel that splits into multiple
output channels, one for each subscriber. Subscribers can express
interest in one or more classes and only receive messages that are of
interest.
• The thing that makes pub/sub interesting is that the publisher and
subscriber are unaware of each other. The publisher sends messages
to subscribers, without knowing if there are any actually there. And
the subscriber receives messages, without explicit knowledge of the
publishers out there. If there are no subscribers around to receive the
topic-based information, the message is dropped.
• Core concepts
• Topic: A named resource to which messages are sent by publishers.
• Subscription: A named resource representing the stream of messages from
a single, specific topic, to be delivered to the subscribing application. For
more details about subscriptions and message delivery semantics, see the
Subscriber Guide.
• Message: The combination of data and (optional) attributes that a
publisher sends to a topic and is eventually delivered to subscribers.
• Message attribute: A key-value pair that a publisher can define for a
message. For example, key iana.org/language_tag and value en could be
added to messages to mark them as readable by an English-speaking
subscriber.
• Publisher-subscriber relationships
• A publisher application creates and sends messages to a topic.
Subscriber applications create a subscription to a topic to receive
messages from it. Communication can be one-to-many (fan-out),
many-to-one (fan-in), and many-to-many.
• Pub/Sub message flow
• The following is an overview of the components in the Pub/Sub
system and how messages flow between them:
• A subscriber receives messages either by Pub/Sub pushing them to
the subscriber's chosen endpoint, or by the subscriber pulling them
from the service.
• The subscriber sends an acknowledgement to the Pub/Sub service for
each received message.
• The service removes acknowledged messages from the subscription's
message queue.
• Common use cases
• Balancing workloads in network clusters. For example, a large queue
of tasks can be efficiently distributed among multiple workers, such as
Google Compute Engine instances.
• Implementing asynchronous workflows. For example, an order
processing application can place an order on a topic, from which it
can be processed by one or more workers.
• Distributing event notifications. For example, a service that accepts
user signups can send notifications whenever a new user registers,
and downstream services can subscribe to receive notifications of the
event.
• Refreshing distributed caches. For example, an application can
publish invalidation events to update the IDs of objects that have
changed.
• Logging to multiple systems. For example, a Google Compute Engine
instance can write logs to the monitoring system, to a database for
later querying, and so on.
• Data streaming from various processes or devices. For example, a
residential sensor can stream data to backend servers hosted in the
cloud.
• Reliability improvement. For example, a single-zone Compute Engine
service can operate in additional zones by subscribing to a common
topic, to recover from failures in a zone or region.
Pub/Sub integrations
• Content-Based Pub-Sub Models
• In the publish–subscribe model, filtering is used to process the
selection of messages for reception and processing, with the two
most common being topic-based and content-based.
• In a topic-based system, messages are published to named channels
(topics). The publisher is the one who creates these channels.
Subscribers subscribe to those topics and will receive messages from
them whenever they appear.
• In a content-based system, messages are only delivered if they match
the constraints and criteria that are defined by the subscriber.
• BASICS OF VIRTUALIZATION
• The term 'Virtualization' can be used in many respect of computer.
• It is the process of creating a virtual environment of something which
may include hardware platforms, storage devices, OS, network
resources, etc. The cloud's virtualization mainly deals with the server
virtualization.
• Virtualization is the ability which allows sharing the physical instance
of a single application or resource among multiple organizations or
users.
• This technique is done by assigning a name logically to all those
physical resources & provides a pointer to those physical resources
based on demand.
• Over an existing operating system & hardware, we generally create a
virtual machine which and above it we run other operating systems or
applications.
• This is called Hardware Virtualization.
• The virtual machine provides a separate environment that is logically
distinct from its underlying hardware. Here, the system or the
machine is the host & virtual machine is the guest machine.
• There are several approaches or ways to virtualizes cloud servers.
• These are:
• OS - Level Virtualization: Here, multiple instances of an application
can run in an isolated form on a single OS
• Hypervisor-based Virtualization: which is currently the most
widely used technique With hypervisor's virtualization, there are
various sub-approaches to fulfill the goal to run multiple
applications & other loads on a single physical host.
• A technique is used to allow virtual machines to move from one
host to another without any requirement of shutting down. This
technique is termed as "Live Migration".
• Another technique is used to actively load balance among
multiple hosts to efficiently utilize those resources available in a
virtual machine, and the concept is termed as Distributed
Resource Scheduling or Dynamic Resource Scheduling.
• VIRTUALIZATION
• Virtualization is the process of creating a virtual environment on an
existing server to run your desired program, without interfering with
any of the other services provided by the server or host platform to
other users.
• The Virtual environment can be a single instance or a combination of
many such as operating systems, Network or Application servers,
computing environments, storage devices and other such
environments.
• Virtualization in Cloud Computing is making a virtual platform of
server operating system and storage devices.
• This will help the user by providing multiple machines at the same
time it also allows sharing a single physical instance of resource or an
application to multiple users.
• Cloud Virtualizations also manage the workload by transforming
traditional computing and make it more scalable, economical and
efficient.
TYPES OF VIRTUALIZATION
• Operating System Virtualization
• Hardware Virtualization
• Server Virtualization
• Storage Virtualization
Benefits for Companies
• Removal of special hardware and utility requirements
• Effective management of resources
• Increased employee productivity as a result of better accessibility
• Reduced risk of data loss, as data is backed up across multiple storage
locations
Benefits for Data Centers
•Maximization of server capabilities, thereby reducing maintenance and operation costs
•Smaller footprint as a result of lower hardware, energy and manpower requirements
• Access to the virtual machine and the host machine or server is
facilitated by a software known as Hypervisor.
• Hypervisor acts as a link between the hardware and the virtual
environment and distributes the hardware resources such as CPU
usage, memory allotment between the different virtual
environments.
• Hardware Virtualization
• Hardware virtualization also known as hardware-assisted
virtualization or server virtualization runs on the concept that an
individual independent segment of hardware or a physical server,
may be made up of multiple smaller hardware segments or
servers, essentially consolidating multiple physical servers into
virtual servers that run on a single primary physical server.
• Each small server can host a virtual machine, but the entire cluster of
servers is treated as a single device by any process requesting the
hardware.
• The hardware resource allotment is done by the hypervisor.
• The main advantages include increased processing power as a result
of maximized hardware utilization and application uptime.
• Subtypes:
• Full Virtualization
• Emulation Virtualization
• Para virtualization
• Software Virtualization
• Software Virtualization involves the creation of an operation of
multiple virtual environments on the host machine.
• It creates a computer system complete with hardware that lets the
guest operating system to run.
• For example, it lets you run Android OS on a host machine natively
using a Microsoft Windows OS, utilizing the same hardware as the
host machine does.
• Application Virtualization – hosting individual applications in a virtual
environment separate from the native OS.
• Service Virtualization – hosting specific processes and services
related to a particular application.
• Server Virtualization
• In server virtualization in Cloud Computing, the software directly
installs on the server system and use for a single physical server can
divide into many servers on the demand basis and balance the load.
• It can be also stated that the server virtualization is masking of the
server resources which consists of number and identity.
• With the help of software, the server administrator divides one
physical server into multiple servers.
• Memory Virtualization
• Physical memory across different servers is aggregated into a single
virtualized memory pool.
• It provides the benefit of an enlarged contiguous working memory.
• You may already be familiar with this, as some OS such as Microsoft
Windows OS allows a portion of your storage disk to serve as an
extension of your RAM.
• Subtypes:
• Application-level control – Applications access the memory pool
directly
• Operating system level control – Access to the memory pool is
provided through an operating system
• Storage Virtualization
• Multiple physical storage devices are grouped together,
which then appear as a single storage device.
• This provides various advantages such as homogenization of
storage across storage devices of multiple capacity and
speeds, reduced downtime, load balancing and better
optimization of performance and speed.
• Partitioning your hard drive into multiple partitions is an
example of this virtualization.
• Subtypes:
• Block Virtualization – Multiple storage devices are consolidated into
one
• File Virtualization – Storage system grants access to files that are
stored over multiple hosts
• Data Virtualization
• It lets you easily manipulate data, as the data is presented as an
abstract layer completely independent of data structure and database
systems.
• Decreases data input and formatting errors.
• Network Virtualization
• In network virtualization, multiple sub-networks can be created on
the same physical network, which may or may not be authorized to
communicate with each other.
• This enables restriction of file movement across networks and
enhances security, and allows better monitoring and identification of
data usage which lets the network administrator’s scale up the
network appropriately.
• It also increases reliability as a disruption in one network doesn’t
affect other networks, and the diagnosis is easier.
• Internal network: Enables a single system to function like a network
• External network: Consolidation of multiple networks into a single
one, or segregation of a single network into multiple ones.
• Desktop Virtualization
• This is perhaps the most common form of virtualization for any
regular IT employee. The user’s desktop is stored on a remote server,
allowing the user to access his desktop from any device or location.
Employees can work conveniently from the comfort of their home.
Since the data transfer takes place over secure protocols, any risk of
data theft is minimized.
• Benefits of Virtualization
• i. Security
• During the process of virtualization security is one of the important
concerns.
• The security can be provided with the help of firewalls, which will
help to prevent unauthorized access and will keep the data
confidential.
• Moreover, with the help of firewall and security, the data can protect
from harmful viruses malware and other cyber threats. Encryption
process also takes place with protocols which will protect the data
from other threads.
• So, the customer can virtualize all the data store and can create a
backup on a server in which the data can store.
• ii. Flexible operations
• With the help of a virtual network, the work of it professional is
becoming more efficient and agile.
• The network switch implement today is very easy to use, flexible and
saves time.
• With the help of virtualization in Cloud Computing, technical
problems can solve in physical systems.
• It eliminates the problem of recovering the data from crashed or
corrupted devices and hence saves time.
• iii. Economical
• Virtualization in Cloud Computing, save the cost for a physical system
such as hardware and servers.
• It stores all the data in the virtual server, which are quite economical.
It reduces the wastage, decreases the electricity bills along with the
maintenance cost.
• Due to this, the business can run multiple operating system and apps
in a particular server.
• iv. Eliminates the risk of system failure
• While performing some task there are chances that the system
might crash down at the wrong time.
• This failure can cause damage to the company but the
virtualizations help you to perform the same task in multiple
devices at the same time.
• The data can store in the cloud it can retrieve anytime and with
the help of any device.
• Moreover, there is two working server side by side which makes
the data accessible every time. Even if a server crashes with the
help of the second server the customer can access the data.
• v. Flexible transfer of data
• The data can transfer to the virtual server and retrieve
anytime. The customers or cloud provider don’t have to
waste time finding out hard drives to find data.
• With the help of virtualization, it will very easy to locate the
required data and transfer them to the allotted authorities.
• This transfer of data has no limit and can transfer to a long
distance with the minimum charge possible. Additional
storage can also provide and the cost will be as low as
possible.
Levels of Virtualization
• Virtualization at ISA (instruction set architecture) level
• Virtualization is implemented at ISA (Instruction Set Architecture)
level by transforming physical architecture of system’s instruction set
into software completely.
• The host machine is a physical platform containing various
components, such as process, memory, Input/output (I/O) devices,
and buses.
• The VMM installs the guest systems on this machine. The emulator
gets the instructions from the guest systems to process and execute.
• The emulator transforms those instructions into native instruction
set, which are run on host machine’s hardware. The instructions
include both the I/O-specific ones and the processor-oriented
instructions. For an
• emulator to be efficacious, it has to imitate all tasks that a real
computer could perform.
• Advantages:
• It is a simple and strong method of conversion into virtual architecture. On
a single physical structure, this architecture makes simple to implement
multiple systems on single physical structure. The instructions given by the
guest system is translated into instructions of the host system .
• This architecture makes the host system to adjust to the changes in
architecture of the guest system.
• The binding between the guest system and the host is not rigid, but making
it very flexible. The infrastructure of this kind could be used for creating
virtual machines on platform, for example: X86 on any platform such as
Sparc, X86, Alpha, etc.
• Disadvantage: The instructions should be interpreted before being
executed. And therefore the system with ISA level of virtualization shows
poor performance.
• Virtualization at HAL (hardware abstraction layer) level
• The virtualization at the HAL (Hardware Abstraction Layer) is the most
common technique, which is used in computers on x86 platforms that
increases the efficiency of virtual machine to handle various tasks.
• This architecture becomes economical and relatively for practical use.
In case, if emulator communication is required to the critical
processes, the simulator undertakes the tasks and it performs the
appropriate multiplexing.
• The working of virtualization technique wants catching the execution
of privileged instructions by virtual machine, which passes these
instructions to VMM to be handled properly.
• VIRTUALIZATION STRUCTURES
• A virtualization architecture is a conceptual model specifying the
arrangement and interrelationships of the particular components
involved in delivering a virtual -- rather than physical -- version of
something, such as an operating system (OS), a server, a storage
device or network resources.
• Virtualization is commonly hypervisor-based. The hypervisor isolates
operating systems and applications from the underlying computer
hardware so the host machine can run multiple virtual machines (VM)
as guests that share the system's physical compute resources, such
as processor cycles, memory space, network bandwidth and so on.
• Type 1 hypervisors, sometimes called bare-metal hypervisors, run
directly on top of the host system hardware. Bare-metal hypervisors
offer high availability and resource management. Their direct access
to system hardware enables better performance, scalability and
stability. Examples of type 1 hypervisors include Microsoft Hyper-V,
Citrix XenServer and VMware ESXi.
• A type 2 hypervisor, also known as a hosted hypervisor, is installed on
top of the host operating system, rather than sitting directly on top of
the hardware as the type 1 hypervisor does.
• Each guest OS or VM runs above the hypervisor.
• The convenience of a known host OS can ease system configuration
and management tasks.
• However, the addition of a host OS layer can potentially limit
performance and expose possible OS security flaws.
Examples of type 2 hypervisors include VMware Workstation, Virtual
PC and Oracle VM VirtualBox.
• The main alternative to hypervisor-based virtualization is
containerization.
• Operating system virtualization, for example, is a container-
based kernel virtualization method.
• OS virtualization is similar to partitioning.
• In this architecture, an operating system is adapted so it functions as
multiple, discrete systems, making it possible to deploy and run
distributed applications without launching an entire VM for each
one.
• Instead, multiple isolated systems, called containers, are run on a
single control host and all access a single kernel.
• Lightweight Service Virtualization Tools
• Free or open-source tools are great tools to start with because they
help you get started in a very ad hoc way, so you can quickly learn the
benefits of service virtualization.
• Some examples of lightweight tools include Traffic Parrot, Mockito, or
the free version of Parasoft Virtualize.
• These solutions are usually sought out by individual development
teams to “try out” service virtualization, brought in for a very specific
project or reason.
• Key Capabilities of Service Virtualization
• Ease-Of-Use and Core Capabilities:
• Ability to use the tool without writing scripts
• Ability to rapidly create virtual services before the real service is available
• Intelligent response
• Data-driven responses
• Ability to re-use services
• Support for authentication and security
• Support for clustering/scaling
• Capabilities for optimized workflows:
• Record and playback
• Test data management / generation
• Data re-use
• Message routing
• Stateful behavior emulation
• Automation Capabilities:
• Build system plugins
• Command-line execution
• Open APIs for DevOps integration
• Cloud support (EC2, Azure)
• Management and Maintenance Support:
• Governance
• Environment management
• Monitoring
• A process for managing change
• On-premise and browser-based access
• Supported TecInologies:
• REST API virtualization
• SOAP API virtualization
• IoT and microservice virtualization
• Database virtualization
• Webpage virtualization
• File transfer virtualization
• Mainframe and fixed-length
CPU VIRTUALIZATION
• CPU virtualization involves a single CPU acting as if it were multiple
separate CPUs.
• The most common reason for doing this is to run multiple different
operating systems on one machine.
• CPU virtualization emphasizes performance and runs directly on the
available CPUs whenever possible.
• The underlying physical resources are used whenever possible.
• The virtualization layer runs instructions only as needed to make
virtual machines operate as if they were running directly on a physical
machine.
• When many virtual machines are running, those virtual machines
might compete for CPU resources.
• When CPU contention occurs, the host time-slices the physical
processors across all virtual machines so each virtual machine runs as
if it has its specified number of virtual processors
• To support virtualization, processors employ a special running mode
and instructions, known as hardware-assisted virtualization.
• In this way, the VMM and guest OS run in different modes and all
sensitive instructions of the guest OS and its applications are trapped
in the VMM.
• To save processor states, mode switching is completed by hardware.
For the x86 architecture, Intel and AMD have proprietary
technologies for hardware-assisted virtualization.
• HARDWARE SUPPORT FOR VIRTUALIZATION
• Modern operating systems and processors permit multiple processes
to run simultaneously.
• If there is no protection mechanism in a processor, all instructions
from different processes will access the hardware directly and cause a
system crash.
• Therefore, all processors have at least two modes, user mode and
supervisor mode, to ensure controlled access of critical hardware.
• Instructions running in supervisor mode are called privileged
instructions.
• Other instructions are unprivileged instructions.
• In a virtualized environment, it is more difficult to make OSes and
applications run correctly because there are more layers in the
machine stack.
• At the time of this writing, many hardware virtualization products
were available.
• The VMware Workstation is a VM software suite for x86 and x86-64
computers.
• This software suite allows users to set up multiple x86 and x86-64
virtual computers and to use one or more of these VMs
simultaneously with the host operating system.
• The VMware Workstation assumes the host-based virtualization. Xen
is a hypervisor for use in IA-32, x86-64, Itanium, and PowerPC 970
hosts. Actually, Xen modifies Linux as the lowest and most privileged
layer, or a hypervisor.
• One or more guest OS can run on top of the hypervisor. KVM (Kernel-
based Virtual Machine) is a Linux kernel virtualization infrastructure.
• KVM can support hardware-assisted virtualization and
paravirtualization by using the Intel VT-x or AMD-v and VirtIO
framework, respectively.
• HARDWARE-ASSISTED CPU VIRTUALIZATION
• This technique attempts to simplify virtualization because
full or paravirtualization is complicated.
• Intel and AMD add an additional mode called privilege mode
level (some people call it Ring-1) to x86 processors.
• Therefore, operating systems can still run at Ring 0 and the
hypervisor can run at Ring -1.
• All the privileged and sensitive instructions are trapped in the
hypervisor automatically.
• This technique removes the difficulty of implementing binary
translation of full virtualization.
• It also lets the operating system run in VMs without modification.
• MEMORY VIRTUALIZATION
• Virtual memory virtualization is similar to the virtual memory support
provided by modern operating systems.
• In a traditional execution environment, the operating system
maintains mappings of virtual memory to machine memory using
page tables, which is a one-stage mapping from virtual memory to
machine memory.
• All modern x86 CPUs include a memory management unit (MMU)
and a translation lookaside buffer (TLB) to optimize virtual memory
performance.
• However, in a virtual execution environment, virtual memory
virtualization involves sharing the physical system memory in RAM
and dynamically allocating it to the physical memory of the VMs.
• Two-stage mapping process should be maintained by the guest OS
and the VMM, respectively: virtual memory to physical memory and
physical memory to machine memory.
• Furthermore, MMU virtualization should be supported, which is
transparent to the guest OS.
• The guest OS continues to control the mapping of virtual addresses to
the physical memory addresses of VMs.
• But the guest OS cannot directly access the actual machine memory.
The VMM is responsible for mapping the guest physical memory to
the actual machine memory.
• Figure shows the two-level memory mapping procedure Since each
page table of the guest OSes has a separate page table in the VMM
corresponding to it, the VMM page table is called the shadow page
table.
• Nested page tables add another layer of indirection to virtual
memory.
• The MMU already handles virtual-to-physical translations as defined
by the OS.
• Then the physical memory addresses are translated to machine
addresses using another set of page tables defined by the hypervisor.
• I/O VIRTUALIZATION
• I/O virtualization involves managing the routing of I/O requests
between virtual devices and the shared physical hardware. At the
time of this writing, there are three ways to implement I/O
virtualization: full device emulation, para-virtualization, and direct
I/O.
• All the functions of a device or bus infrastructure, such as device
enumeration, identification, interrupts, and DMA, are replicated in
software.
• This software is located in the VMM and acts as a virtual device. The
I/O access requests of the guest OS are trapped in the VMM which
interacts with the I/O devices.
• A single hardware device can be shared by multiple VMs that run
concurrently.
• The para-virtualization method of I/O virtualization is typically used in
Xen. It is also known as the split driver model consisting of a frontend
driver and a backend driver.
• VIRTUALIZATION IN MULTI-CORE PROCESSORS
• Virtualizing a multi-core processor is relatively more complicated than
virtualizing a uni-core processor.
• Though multicore processors are claimed to have higher performance
by integrating multiple processor cores in a single chip, muti-core
virtualiuzation has raised some new challenges to computer
architects, compiler constructors, system designers, and application
programmers.
• There are mainly two difficulties: Application programs must be
parallelized to use all cores fully, and software must explicitly assign
tasks to the cores, which is a very complex problem.
• Concerning the first challenge, new programming models, languages,
and libraries are needed to make parallel programming easier.
• The second challenge has spawned research involving scheduling
algorithms and resource management policies.
• A new challenge called dynamic heterogeneity is emerging to mix the
fat CPU core and thin GPU cores on the same chip, which further
complicates the multi-core or many-core resource management.
• VIRTUALIZATION SUPPORT AND DISASTER RECOVERY.
• Virtualization provides flexibility in disaster recovery. When
servers are virtualized, they are containerized into VMs,
independent from the underlying hardware.
• Therefore, an organization does not need the same physical
servers at the primary site as at its secondary disaster
recovery site.
• Other benefits of virtual disaster recovery include ease,
efficiency and speed. Virtualized platforms typically provide
high availability in the event of a failure.
• Virtualization helps meet recovery time objectives (RTOs) and
recovery point objectives (RPOs), as replication is done as frequently
as needed, especially for critical systems.
• In addition, consolidating physical servers with virtualization saves
money because the virtualized workloads require less power, floor
space and maintenance.
• However, replication can get expensive, depending on how frequently
it's done.
• Adding VMs is an easy task, so organizations need to watch out for
VM sprawl.
• VMs operating without the knowledge of DR staff may fall through
the cracks when it comes time for recovery.
• Sprawl is particularly dangerous at larger companies where
communication may not be as strong as at a smaller organization with
fewer employees.
• All organizations should have strict protocols for deploying virtual
machines.
• Virtual disaster recovery planning and testing
• Virtual infrastructures can be complex. In a recovery situation, that
complexity can be an issue, so it's important to have a comprehensive DR
plan.
• A virtual disaster recovery plan has many similarities to a traditional DR
plan. An organization should:
• Decide which systems and data are the most critical for recovery, and
document them.
• Get management support for the DR plan
• Complete a risk assessment and business impact analysis to outline
possible risks and their potential impacts.
• Document steps needed for recovery.
• Define RTOs and RPOs.
• Test the plan.
• As with a traditional DR s
• Virtual disaster recovery vs. physical disaster recovery
• Virtual disaster recovery, though simpler than traditional DR, should
retain the same standard goals of meeting RTOs and RPOs, and
ensuring a business can continue to function in the event of an
unplanned incident.
• The traditional disaster recovery process of duplicating a data center
in another location is often expensive, complicated and time-
consuming.
• While a physical disaster recovery process typically involves multiple
steps, virtual disaster recovery can be as simple as a click of a button
for failover.
• Rebuilding systems in the virtual world is not necessary because they
already exist in another location, thanks to replication. However, it's
important to monitor backup systems. It's easy to "set it and forget it"
in the virtual world, which is not advised and is not as much of a
problem with physical systems.
• Like with physical disaster recovery, the virtual disaster recovery plan
should be tested. Virtual disaster recovery, however, provides testing
capabilities not available in a physical setup. It is easier to do a DR test
in the virtual world without affecting production systems, as
virtualization enables an organization to bring up servers in an
isolated network for testing. In addition, deleting and recreating DR
servers is much simpler than in the physical world.
• Virtual disaster recovery is possible with physical servers through
physical-to-virtual backup. This process creates virtual backups of
physical systems for recovery purposes.
• For the most comprehensive data protection, experts advise having
an offline copy of data. While virtual disaster recovery vendors
provide capabilities to protect against cyberattacks such as
ransomware, physical tape storage is the one true offline option that
guarantees data is safe during an attack.
• Trends and future directions
• With ransom ware now a constant threat to business, virtual disaster
recovery vendors are including capabilities specific to recovering from
an attack. Through point-in-time copies, an organization can roll back
its data recovery to just before the attack hit.
• The convergence of backup and DR is a major trend in data
protection. One example is instant recovery also called recovery in
place -- which allows a backup snapshot of a VM to run temporarily
on secondary storage following a disaster. This process significantly
reduces RTOs.
• Hyper-convergence, which combines storage, compute and
virtualization, is another major trend. As a result, hyper-converged
backup and recovery has taken off, with newer vendors such as
Cohesity and Rubrik leading the charge. Their cloud-based hyper-
converged backup and recovery systems are accessible to smaller
organizations, thanks to lower cost and complexity.
• These newer vendors are pushing the more established players to do
more with their storage and recovery capabilities.
• How Virtualization Benefit Disaster Recovery
• Most of us are aware of the importance of backing up data, but
there’s a lot more to disaster recovery than backup alone. It’s
important to recognize the fact that disaster recovery and backup are
not interchangeable. Rather, backup is a critical element of disaster
recovery. However, when a system failure occurs, it’s not just your
files that you need to recover – you’ll also need to restore a complete
working environment.
• Virtualization technology has come a long way in recent years to
completely change the way organizations implement their disaster-
recovery strategies.
• Consider, for a moment, how you would deal with a system failure in
the old days: You’d have to get a new server or repair the existing one
before manually reinstalling all your software, including the operating
system and any applications you use for work. Unfortunately, disaster
recovery didn’t stop there. Without virtualization, you’d then need to
manually restore all settings and access credentials to what they were
before.
• In the old days, a more efficient disaster-recovery strategy would
involve redundant servers that would contain a full system backup
that would be ready to go as soon as you needed it. However, that
also meant increased hardware and maintenance costs from having
to double up on everything.
• How Does Virtualization Simplify Disaster Recovery?
• When it comes to backup and disaster recovery, virtualization changes
everything by consolidating the entire server environment, along with all
the workstations and other systems into a single virtual machine. A virtual
machine is effectively a single file that contains everything, including your
operating systems, programs, settings, and files. At the same time, you’ll be
able to use your virtual machine the same way you use a local desktop.
• Virtualization greatly simplifies disaster recovery, since it does not require
rebuilding a physical server environment. Instead, you can move your
virtual machines over to another system and access them as normal. Factor
in cloud computing, and you have the complete flexibility of not having to
depend on in-house hardware at all. Instead, all you’ll need is a device with
internet access and a remote desktop application to get straight back to
work as though nothing happened.
• What Is the Best Way to Approach Server Virtualization?
• Almost any kind of computer system can be virtualized, including
workstations, data storage, networks, and even applications. A virtual
machine image defines the hardware and software parameters of the
system, which means you can move it between physical machines
that are powerful enough to run it, including those accessed through
the internet.
• Matters can get more complicated when you have many servers and
other systems to virtualize. For example, you might have different
virtual machines for running your apps and databases, yet they all
depend on one another to function properly. By using a tightly
integrated set of systems, you’ll be able to simplify matters, though
it’s usually better to keep your total number of virtual machines to a
minimum to simplify recovery processes.
• Backup and restore full images
• By having your system completely virtualized each of your server’s
files are encapsulated in a single image file. An image is basically a
single file that contains all of server’s files, including system files,
programs, and data; all in one location. By having these images it
makes managing your systems easy and backups become as simple as
duplicating the image file and restores are simplified to simply
mounting the image on a new server.
• Run other workloads on standby hardware
• A key benefit to virtualization is reducing the hardware needed by
utilizing your existing hardware more efficiently. This frees up systems
that can now be used to run other tasks or be used as a hardware
redundancy. This mixed with features like VMware’s High Availability,
which restarts a virtual machine on a different server when the
original hardware fails, or for a more robust disaster recovery plan
you can use Fault Tolerance, which keeps both servers in sync with
each other leading to zero downtime if a server should fail.
• Easily copy system data to recovery site
• Having an offsite backup is a huge advantage if something were to
happen to your specific location, whether it be a natural disaster, a
power outage, or a water pipe bursting, it is nice to have all your
information at an offsite location. Virtualization makes this easy by
easily copying each virtual machines image to the offsite location and
with the easy customizable automation process, it doesn’t add any
more strain or man hours to the IT department.
Four big benefits to cloud-based disaster recovery:
• Faster recovery
• Financial savings
• Scalability
• Security
UNIT III
CLOUD ARCHITECTURE, SERVICES AND STORAGE
Layered Cloud Architecture Design – NIST Cloud Computing Reference
Architecture – Public, Private and Hybrid Clouds – laaS – PaaS – SaaS –
Architectural Design Challenges – Cloud Storage – Storage-as-a-Service
– Advantages of Cloud Storage – Cloud Storage Providers – S3.
• Layered Cloud Architecture Design
The architecture of cloud computing categorized into four different
layers, each layer having their own functionalities
• The physical layer
• Application layer
• Infrastructure layer
• platform layer
Hardware Layer
Physical resources of the cloud are managed in this layer,
including physical servers, routers, switches, power and cooling
systems.
In practice, the data centers are place where hardware layer are
implemented.
A data center usually contains thousands of servers that are organized
in racks and interconnected through switches, routers or other fabrics.
Typical issues at hardware layer include hardware configuration, fault
tolerance, traffic management, power and cooling resource
management.
Infrastructure Layer
• The basic purpose of infrastructure is to delivering basic storage and
compute capabilities as standardized services over the internet. It is
also known as the virtualization layer.
• The infrastructure layer creates cluster of storage and computing
resources by partitioning the physical resources using virtualization
technologies such as Xen, KVM and VMware.
• This layer is an essential component of cloud computing, since many
key features, such as dynamic resource assignment, are only made
available through virtualization technologies .
• The services provided by this layer to the consumer is to storage,
networks, and other fundamental computing resources, where
different arbitrary software can be run or deploy by the consumer,
which can include operating systems and applications.
• The underlying cloud infrastructure are not managed by the
consumer
• IaaS refers to on-demand provisioning of infrastructural resources,
usually in terms of VMs.
• The cloud owner who offers IaaS is called an IaaS provider.
• An example of IaaS provider includes Amazon EC2, GoGrid and
Flexiscale.
Platform layer
• It is built on top of the infrastructure layer.
• It consists of operating systems and application frameworks.
• The main purpose of the platform layer is to minimize the burden of
deploying applications directly into VM containers.
• For example, Google App Engine operates at the platform layer to
provide API support for implementing storage, database and business
logic of typical web applications.
• Application layer At the highest level of the hierarchy, the application
layer consists of the actual cloud applications.
• Different from traditional applications, cloud applications can
leverage the automatic-scaling feature to achieve better performance,
availability and lower operating cost
The advantages are:
• We only need to understand the layers lower the one we are working
on;
• Each layer is replaceable by an equivalent implementation, with no
impact on the other layers;
• Layers are used for standardization;
• A layer can be used by several different higher-level layers.
The disadvantages are:
• Layers can not encapsulate everything (a field that is added
to the UI, most likely also needs to be added to the DB);
• Extra layers can harm performance
The 60s and 70s
• Although software development started during the 50s, it was during
the 60s and 70s that it was effectively born as we know it today, as
the activity of building applications that can be delivered, deployed
and used by others
• At this point, however, applications were very different than today.
• Applications were quite simple so weren’t built with layering in mind
and they were deployed and used on one computer making it
effectively a one-tier application.
• While these applications were very simple, they were not scalable.
• For example, if we needed to update the software to a new version,
we would have to do it on every computer that would have the
application installed.
• Layering during the 80s and 90s
• During the 1980s, enterprise applications come to life and we start
having several users in a company using desktop computers who
access the application through the network.
• At this time, there were mostly following layers:
• User Interface (Presentation): The user interface, be it a web page, a
CLI or a native desktop application;
• A native Windows application as the client (rich client), which the
common user would use on his desktop computer, that would
communicate with the server in order to actually make things
happen. The client would be in charge of the application flow and
user input validation;
• Business logic (Domain): The logic that is the reason why the
application exists;
• An application server, which would contain the business logic and
would receive requests from the native client, act on them and persist
the data to the data storage;
• Data source: The data persistence mechanism (DB), or
communication with other applications.
• A database server, which would be used by the application server for
the persistence of data.
• With this shift in usability context, layering started to be a practice,
although it only started to be a common widespread practice during
the 1990s (Fowler 2002) with the rise of client/server systems.
• This was effectively a two-tier application, where the client would be
a rich client application used as the application interface, and the
server would have the business logic and the data source.
• This architecture pattern solves the scalability problem, as several
users could use the application independently, we would just need
another desktop computer, install the client application in it and that
was it.
• However, if we would have a few hundred, or even just a few tenth of
clients, and we would want to update the application it would be
a highly complex operation as we would have to update the clients
one by one.
• Layering after the mid 90s
• Roughly between 1995 and 2005, with the generalised shift to a cloud
context, the increase in application users, application complexity and
infrastructure complexity we end up seeing an evolution of the
layering scheme, where a typical implementation of this layering
could be:
• A native browser application, rendering and running the user
interface, sending requests to the server application;
• An application server, containing the presentation layer, the
application layer, the domain layer, and the persistence layer;
• A database server, which would be used by the application server for
the persistence of data.
• This is a three-tier architecture pattern, also known as n-tier. It is a
scalable solution and solves the problem of updating the clients as
the user interface lives and is compiled on the server, although it is
rendered and ran on the client browser.
• Layering after the early 2000s
• In 2003, Eric Evans published his emblematic book Domain-Driven
Design: Tackling Complexity in the Heart of Software. Among the
many key concepts published in that book, there was also a vision for
the layering of a software system:
• User Interface
• Responsible for drawing the screens the users use to interact with the
application and translating the user’s inputs into application
commands.
• Application Layer
• Orchestrates Domain objects to perform tasks required by the users.
It does not contain business logic.
• Domain Layer
• This is the layer that contains all business logic, the Entities, Events
and any other object type that contains Business Logic.
• This is the heart of the system;
• Infrastructure
• The technical capabilities that support the layers above, ie.
persistence or messaging.
• NIST Cloud Computing Reference Architecture
• The National Institute of Standards and Technology (NIST) has been
designated by Federal Chief Information Officer (CIO) Vivek Kundra
with technical leadership for US government (USG) agency efforts
related to the adoption and development of cloud computing
standards.
• The goal is to accelerate the federal government‟s adoption of secure
and effective cloud computing to reduce costs and improve services.
• The Conceptual Reference
• An overview of the NIST cloud computing reference architecture,
which identifies the major actors, their activities and functions in
cloud computing.
• The diagram depicts a generic high-level architecture and is intended
to facilitate the understanding of the requirements, uses,
characteristics and standards of cloud computing
• Actors in cloud computing
• The NIST cloud computing reference architecture defines five major
actors: cloud consumer, cloud provider, cloud carrier, cloud auditor
and cloud broker.
• Each actor is an entity (a person or an organization) that participates
in a transaction or process and/or performs tasks in cloud computing.
• Table 1 briefly lists the actors defined in the NIST cloud computing
reference architecture.
• The general activities of the actors are discussed in the remainder of
this section, while the details of the architectural elements are
discussed.
• Figure 2 illustrates the interactions among the actors.
• A cloud consumer may request cloud services from a cloud provider
directly or via a cloud broker.
• A cloud auditor conducts independent audits and may contact the
others to collect necessary information.
Example Usage Scenario 1:
• A cloud consumer may request service from a cloud broker instead
of contacting a cloud provider directly.
The cloud broker may create a new service by combining multiple
services or by enhancing an existing service. In this example, the actual
cloud providers are invisible to the cloud consumer and the cloud
consumer interacts directly with the cloud broker
• Example Usage Scenario 2: Cloud carriers provide the connectivity
and transport of cloud services from cloud providers to cloud
consumers.
• As illustrated in Figure 4, a cloud provider participates in and arranges
for two unique service level agreements (SLAs), one with a cloud
carrier (e.g. SLA2) and one with a cloud consumer (e.g. SLA1).
• A cloud provider arranges service level agreements (SLAs) with a
cloud carrier and may request dedicated and encrypted connections
to ensure the cloud services are consumed at a consistent level
according to the contractual obligations with the cloud consumers.
• In this case, the provider may specify its requirements on capability,
flexibility and functionality in SLA2 in order to provide essential
requirements in SLA1.
• Example Usage Scenario 3: For a cloud service, a cloud auditor
conducts independent assessments of the operation and security of
the cloud service implementation.
• The audit may involve interactions with both the Cloud Consumer and
the Cloud Provider.
Cloud Consumer
• The cloud consumer is the principal stakeholder for the cloud
computing service.
• A cloud consumer represents a person or organization that maintains
a business relationship with, and uses the service from a cloud
provider.
• A cloud consumer browses the service catalog from a cloud provider,
requests the appropriate service, sets up service contracts with the
cloud provider, and uses the service.
• The cloud consumer may be billed for the service provisioned, and
needs to arrange payments accordingly.
• Cloud consumers need SLAs to specify the technical performance
requirements fulfilled by a cloud provider.
• SLAs can cover terms regarding the quality of service, security,
remedies for performance failures.
• Cloud Storage
• Cloud storage is a cloud computing model that stores data on the
Internet through a cloud computing provider who manages and
operates data storage as a service.
• It's delivered on demand with just-in-time capacity and costs, and
eliminates buying and managing your own
data storage infrastructure.
• Cloud storage is a service model in which data is transmitted and
stored on remote storage systems, where it is maintained,
managed, backed up and made available to users over a network
(typically the internet).
• Users generally pay for their cloud data storage on a per-
consumption, monthly rate.
• Although the per-gigabyte cost has been radically driven down, cloud
storage providers have added operating expenses
• The security of cloud storage services continues to be a concern
among users.
• Types of cloud storage
• There are three main cloud-based storage access models: public,
private and hybrid.
• Public cloud storage services provide a multi-tenant storage
environment that is most suited for unstructured data on a
subscription basis.
• Data is stored in the service providers' data centers with storage data
spread across multiple regions or continents.
• Private cloud storage service is provided by in-house storage
resources deployed as a dedicated environment protected behind an
organization's firewall.
• Private clouds are appropriate for users who need customization and
more control over their data, or who have rigid data security or
regulatory requirements.
• Hybrid cloud storage is a mix of private cloud storage and third-party
public cloud storage services with a layer of orchestration
management to integrate operationally the two platforms.
• The model offers businesses flexibility and more data deployment
options.
• An organization might, for example, store actively used and
structured data in an on-premises cloud, and unstructured and
archival data in a public cloud.
• In recent years, there has been increased adoption of the hybrid
cloud model.
• Cloud storage characteristics
• Cloud storage is based on a virtualized infrastructure with accessible
interfaces, near-instant elasticity and scalability, multi-tenancy
and metered resources.
• Cloud-based data is stored in logical pools across disparate,
commodity servers located on premises or in a data center managed
by a third-party cloud provider.
• Benefits of cloud storage
• Cloud storage provides many benefits that result in cost-savings and
greater convenience for its users. These benefits include:
• Pay for what is used. With a cloud storage service, customers only
pay for the storage they actually use so there's no need for big capital
expenses.
• Utility billing. Since customers only pay for the capacity they're using,
cloud storage costs can decrease as usage drops.
• Global availability. Cloud storage is typically available from any
system anywhere at any time; one does not have to worry about
operating system capability or complex allocation processes.
• Ease of use. Cloud storage is easier to access and use, so developers,
software testers and business users can get up and running quickly
without have to wait for IT to allocate and configure storage
resources.
• Offsite security. By its very nature, public cloud storage offers a way
to move copies of data to a remote site for backup and security
purposes. Again, this represents a significant cost-savings when
compared to a company maintaining its own remote facility.
• Drawbacks of cloud storage
• There are some shortcomings to cloud storage -- particularly the public services -- that
may discourage companies from using these services or limit how they use them.
• Security is the single most cited factor that may make a company reluctant -- or at least
cautious -- about using public cloud storage.
• The concern is that once data leaves a company's premises, the company no longer has
control over how it's handled and stored.
• There are also concerns about storing data that is regulated by specific compliance laws.
• Cloud providers address these concerns by making public the steps they take to protect
their customers' data, such as encryption for data in flight and at rest, physical security
and storing data at multiple locations.
• Access to data stored in the cloud may also be an issue and could significantly increase
the cost of using cloud storage.
• A company may need to upgrade its connection to the cloud storage service to handle
the volume of data it expects to transmit; the monthly cost of an optical link can run into
the thousands of dollars.
• Cloud storage pros/cons
• Advantages of private cloud storage include high reliability and
security.
• But this approach to cloud storage provides limited scalability and
requires on-site resources and maintenance.
• Public cloud storage offers high scalability with no need for an on-
premises storage infrastructure. However, performance and security
measures can vary by service provider.
• Storage-as-a-Service
• Storage as a service (StaaS) is a cloud business model in which a
company leases or rents its storage infrastructure to another
company or individuals to store data.
• Small companies and individuals often find this to be a convenient
methodology for managing backups, and providing cost savings in
personnel, hardware and physical space.
• IT administrators are meeting their storage and backup needs by
service level agreements (SLAs) with an SaaS provider, usually on a
cost-per-gigabyte-stored and cost-per-data-transferred basis.
• The storage provider provides the client with the software required to
access their stored data.
• Clients use the software to perform standard tasks associated with
storage, including data transfers and data backups.
• Corrupted or lost company data can easily be restored.
• Who uses storage as a service and why?
• Storage as a Service is usually used by small or mid-sized companies
that lack the budget to implement and maintain their own storage
infrastructure.
• Organizations use storage as a service to mitigate risks in disaster
recovery, provide long-term retention and enhance business
continuity and availability.
• How storage as a service works?
• The company would sign a service level agreement (SLA) whereby the
STaaS provider agreed to rent storage space on a cost-per-gigabyte-
stored and cost-per-data-transfer basis
• If the company ever loses its data, the network administrator could
contact the STaaS provider and request a copy of the data.
• Advantage of Storage as Services
• Cost– factually speaking, backing up data isn’t always cheap, especially
when take the cost of equipment into account. Additionally, there is the
cost of the time it takes to manually complete routine backups. Storage as
a service reduces much of the cost associated with traditional backup
methods, providing ample storage space in the cloud for a low monthly fee.
• Invisibility – Storage as a service is invisible, as no physical presence of it is
seen in its deployment and so it doesn’t take up valuable office space.
• Security – In this service type, data is encrypted both during transmission
and while at rest, ensuring no unauthorized user access to files.
• Automation –Users can simply select what and when they want to
backup, and the service does all the rest.
• Accessibility – By going for storage as a service, users can access data
from smart phones, netbooks to desktops and so on.
• Syncing – Syncing ensures your files are automatically updated across
all of your devices. This way, the latest version of a file a user saved
on their desktop is available on your smart phone.
• Data Protection – By storing data on cloud storage services, data is
well protected by all kind of catastrophes such as floods, earthquakes
and human errors.
• Disaster Recovery – as said earlier, data stored in cloud is not only
protected from catastrophes by having the same copy at several
places, but can also favor disaster recovery to ensure business
continuity.
• Disadvantages of Cloud Storage
• Usability: Be careful when using drag/drop to move a document into the
cloud storage folder. This will permanently move your document from its
original folder to the cloud storage location. Do a copy and paste instead of
drag/drop if you want to retain the document’s original location in addition
to moving a copy onto the cloud storage folder.
• Bandwidth: Several cloud storage services have a specific bandwidth
allowance. If an organization surpasses the given allowance, the additional
charges could be significant. However, some providers allow unlimited
bandwidth. This is a factor that companies should consider when looking at
a cloud storage provider.
• Accessibility: If you have no internet connection, you have no access
to your data.
Data Security: There are concerns with the safety and privacy of
important data stored remotely. The possibility of private data
commingling with other organizations makes some businesses
uneasy. If you want to know more about those issues that govern
data security and privacy, here is an interesting article on the recent
privacy debates.
• Software: If you want to be able to manipulate your files locally
through multiple devices, you’ll need to download the service on all
devices.
Cloud Storage Providers
• Cloud storage lets users store and sync their data to an online server.
• Because they are stored in the cloud rather than on a local drive, files are
available on various devices. This allows a person to access files from multiple
computers, as well as mobile devices to view, edit, and comment on files. It
replaces workarounds like emailing yourself documents. Cloud storage can also
act as a backup system for your hard drive.
• Cloud storage solutions support a variety of file types. Supported files typically
include:
• Text documents
• Pictures
• Videos
• Audio files
• The most user-friendly cloud storage solutions integrate with other
applications for easy edits, playback, and sharing.
• Cloud storage is used by individuals to manage personal files, as well
as by businesses for file sharing and backup.
Cloud Storage Features & Capabilities
• File Management
• These features are core to every cloud storage platform. Typically, file
management capabilities include:
• A search function to easily find files and search within files
• Device syncing to update files connected to the cloud across devices
• A web interfacerequired
• Support for multiple file types
• Collaboration
• Most cloud storage providers also feature collaboration functionality.
Not all tools will have the same level of tracking and control.
Collaboration features may include:
• Notifications when files are changed by others
• File sharing, with the ability to set editing and view-only
permissions
• Simultaneous editing
• Security & Administration
• Security and administration features are important considerations for
enterprises. Security is particularly important for storing sensitive,
private data in the cloud.
• Cloud storage providers offer different levels of security to address
concerns. For example, Google Drive lets users set up two-step
verification.
• Possible security and administration features include:
• Two-step verification for added security
• End-user encryption (for integrations)
• User and role management
• Control over file access, sharing and editing permissions
• Storage limits for individual users or groups
• Device management
• File Sharing
• File sharing is one of the most common uses for cloud storage. Most
cloud storage providers offer a mechanism to let users share files. The
level of access, versioning, and change tracking varies by product.
• Some providers put a cap on file upload size. This is important for
anyone looking to upload and share files larger than 2GB.
• File sharing is executed in a few different ways:
• Sending files to users
• Emailing users a link to the file in the cloud
Cloud Storage Platform Considerations
• The platform’s performance, reliability, and integrations are all
important considerations for any business use case.
• Free Cloud Storage
• Many cloud storage providers offer some amount of storage space for
free. For example, DropBox offers 2GB of free storage, and Google
Drive offers 15GB. Sometimes providers have a hard limit on free
storage.
• Cloud storage vendors with advanced security features do not usually
have free accounts.
• Paid Cloud Storage
• For users who need to move beyond free options, pricing for cloud
storage is typically per user, per month.
• Plans usually have a fixed storage capacity, with prices increasing for
more storage and/or added features.
• Users can find paid cloud storage options with monthly costs as low
as $10 for 1TB of storage.
Storage File
Suitable for
loud Storage Providers Best For space Platform Upload Price:
business size
plans Limit
pCloud Storing large files. Personal, Family, 10GB Windows, 2TB Free storage of 10GB.
and Small 2TB Mac, Annual Plans: $3.99 per month for
businesses. Linux, 500 GB and $7.99 per month for
iOS, 2TB.
Android Lifetime Plans: One-time fee of
$175 for 500GB and $359 for 2TB.
IDrive It is mainly for Freelancers, solo 5GB, Windows, 2GB Free: 5GB
Backup. workers, teams, & 2TB, Mac, IDrive Personal 2TB: $104.25.
businesses of any 5TB, iOS, IDrive Business: $149.25.
size. 250GB, Android.
500 GB,
& 1.25 TB.
Dropbox Light data users. Freelancers, solo 2GB, Windows, Unlimited Plans for Individuals starts at
workers, teams, 1TB, Mac OS, $8.25/ month.
& businesses of 2TB, Linux, Plans for teams start at
any size. 3TB, Android, $12.50/user/month
Till iOS,
Unlimited. Windows
phone.
Google Drive Teams & Individuals & 15GB, Windows, 5TB Free for 15GB.
Collaboration Teams. 100GB, Mac OS, 200GB: $2.99 per month.
200GB.. Android, 2TB: $9.99/month.
Till iOS. 30TB: $299.99/month.
Unlimited.
OneDrive Windows users -- 5GB, Windows, 15GB Free: 5GB.
50GB, Android, The paid plan starts at $1.99 per
1TB, iOS. month.
6TB,
&
Unlimited.
Box Enterprise Small teams and 10GB. It is accessible 5GB Free for 10GB.
solutions Enterprises. from any The paid plan starts at $10/month.
device.
• S3-Simple Storage Service
• Amazon S3 or Amazon Simple Storage Service is a service offered by
Amazon Web Services that provides object storage through a web
service interface.
• Amazon S3 uses the same scalable storage infrastructure that
Amazon.com uses to run its global e-commerce network.
• Amazon Simple Storage Service (Amazon S3) is an object storage
service that offers industry-leading scalability, data availability,
security, and performance.
• This means customers of all sizes and industries can use it to store
and protect any amount of data for a range of use cases, such as
websites, mobile applications, backup and restore, archive, enterprise
applications, IoT devices, and big data analytics.
• Amazon S3 provides easy-to-use management features so you can
organize your data and configure finely-tuned access controls to meet
your specific business, organizational, and compliance requirements.
• Creating buckets – Create and name a bucket that stores data. Buckets are
the fundamental containers in Amazon S3 for data storage.
• Storing data – Store an infinite amount of data in a bucket. Upload as many
objects as you like into an Amazon S3 bucket. Each object can contain up to
5 TB of data. Each object is stored and retrieved using a unique developer-
assigned key.
• Downloading data – Download your data or enable others to do so.
Download your data anytime you like, or allow others to do the same.
• Permissions – Grant or deny access to others who want to upload or
download data into your Amazon S3 bucket. Grant upload and download
permissions to three types of users. Authentication mechanisms can help
keep data secure from unauthorized access.
• Standard interfaces – Use standards-based REST and SOAP interfaces
designed to work with any internet-development toolkit
Deployment Models
• Private
• Public
• Hybrid
• Community
• Private Cloud also termed as 'Internal Cloud'; which allows the
accessibility of systems and services within a specific boundary or
organization.
• The cloud platform is implemented in a cloud-based secure
environment that is guarded by advanced firewalls under the
surveillance of the IT department that belongs to a particular
organization.
• Private clouds permit only authorized users, providing the
organizations greater control over data and its security. Business
organizations that have dynamic, critical, secured, management
demand based requirement should adopt Private Cloud.
Private Cloud
• Implementation of cloud services on resources that are dedicated to
organization, whether they exist on-premises or off-premises
• Typically, your organization owns and controls the resources/assets,
definition of services, costs and risks
• Microsoft solutions
• Windows Server 2008 R2 Hyper-V, System Center (IaaS)
• Windows Azure Appliance (PaaS)
• A private cloud is owned by a single organization.
• Private clouds enable an organization to use cloud
computing technology as a means of centralizing
access to IT resources by different parts, locations, or
departments of the organization.
Advantages
• Cloud infrastructure built in house
• Retains control of resources
• More security & privacy
• Can conform to regulatory requirement
• Needs capital investment
• Needs expertise to build and maintain Private Cloud
Public Cloud
• Public Cloud is a type of cloud hosting that allows the accessibility of
systems & its services to its clients/users easily.
• Some of the examples of those companies which provide public cloud
facilities are IBM, Google, Amazon, Microsoft, etc.
• This cloud service is open for use.
• This type of cloud computing is a true specimen of cloud hosting
where the service providers render services to various clients.
• From the technical point of view, there is the least difference between
private clouds and public clouds along with the structural design.
• Only the security level depends based on the service providers and
the type of cloud clients use.
• Public cloud is better suited for business purposes for managing the
load. This type of cloud is economical due to the decrease in capital
overheads.
Public Cloud
• A public cloud is a publicly accessible cloud environment owned by a
third-party cloud provider
• The IT resources on public clouds are usually provisioned via cloud
delivery models .
• The cloud provider is responsible for the creation and on-going
maintenance of the public cloud and its IT resources.
Advantages
• Cost-effectiveness: you don’t pay for the
hardware/software but only for the resources you
use. Also, you save time as you don’t need to worry
about maintenance;
• Reliability: public cloud allows you to host data and
services on more than one cloud provider. This way,
services can be replicated to avoid failures and
outages;
• Flexibility: people can access the public cloud services
remotely from anywhere, no matter where the offices
of a company are located, and from any internet-
enabled device.
• Available to everyone.
• Anyone can go and signup for the service.
• Economies of Scale due to Size.
Disadvantages such as:
• Less Secured
• Poor Customizable
Private Cloud vs. Public Cloud
Private vs Public
Community cloud
• Community Cloud is another type of cloud computing in which the setup of
the cloud is shared manually among different organizations that belong to the
same community or area.
• Example of such a community is where organizations/firms are there along
with the financial institutions/banks.
• shared infrastructure for specific community
• several orgs that have shared concerns,
• managed by org or a 3rd party
• Community clouds are a recent variant of hybrid clouds that are built
to serve the specific needs of different business communities
Hybrid cloud
• In a hybrid cloud, data and applications can move between private
and public clouds for greater flexibility and more deployment options.
• For instance, you can use the public cloud for high-volume, lower-
security needs such as web-based email, and the private cloud (or
other on-premises infrastructure) for sensitive, business-critical
operations like financial reporting.
• Mixed/blended model of private and public clouds
• Variations and multiple interpretations exist
• On-premises and off-premises bridging
• Most common scenario today
• Especially for large enterprises
• More than a deployment / delivery model
• Application design, architectural model
Characteristics
• Move assets closer to intended users
• Public-facing apps and websites (microsites, mobile app services,
etc.) in public cloud
• Internal enterprise systems and apps in private cloud
• Leverage optimized infrastructure models
• Higher scalability, reliability, and agility for applications servicing
external customers, with higher opportunistic benefits
• Higher control and customization for core business processes
accessed by internal users, with higher systematic benefits
• Lower conflict with compliance and data sovereignty
requirements
• Still a deployment model
• Need proper application and data integration
Advantages of hybrid clouds:
• Control – your organisation can maintain a private
infrastructure for sensitive assets.
• Flexibility – you can take advantage of additional
resources in the public cloud when you need them.
• Cost-effectiveness – with the ability to scale to the
public cloud, you pay for extra computing power only
when needed.
• Ease – transitioning to the cloud doesn’t have to be
overwhelming because you can migrate gradually –
phasing in workloads over time.
SaaS-PaaS-IaaS
• SaaS (software-as-a-service). WAN-enabled application services (e.g.,
Google Apps, Salesforce.com, WebEx)
• PaaS (platform-as-a-service). Foundational elements to develop new
applications (e.g., Coghead, Google Application Engine)
• IaaS (infrastructure-as-a-service). Providing computational and
storage infrastructure in a centralized, location-transparent service
(e.g., Amazon)
What is it?
• Green indicates the levels owned and operated by the organization
• Red levels are run and operated by the service provider
Source:
http://www.katescomment.com/iaas-
paas-saas-definition/
Core Stacks
Source: Goel, Pragati, and Mayank Kumar. "An overview of Cloud Computing."
• Resource Layer
• infrastructure layer which is composed of physical and virtualized
computing, storage and networking resources.
• Platform Layer
• computing framework manages the transaction dispatching and
task scheduling.
• storage sub-layer provides storage and caching capability
• Application Layer
• general application logic
• either on-demand capability or flexible management.
• no components will be the bottle neck of the whole system.
• large and distributed transactions and management of huge volume
of data.
• All the layers provide external service through web service or other
open interfaces.
Introduction to PaaS
• Platform as a service (PaaS) or application platform as a
service (aPaaS) or platform-based service is a category of cloud
computing services that provides a platform allowing customers to
develop, run, and manage applications without the complexity of
building and maintaining the infrastructure typically associated with
developing and launching an app
What is a Platform?
• A platform is anything you can leverage to accomplish something in a
simpler, faster, or otherwise better way than you could without.
• As a programmer, you leverage pre-existing code rather than starting
from scratch and writing everything.
• The most well-known software platforms for desktop software are
Windows and Mac OS
Platform as a Service (PaaS)
Goals of PaaS
• The ultimate goal of a PaaS is to make it easier for you to run your
website or web application no matter how much traffic it gets.
• You just deploy your application and the service figures out what to
do with it.
• A platform as a service should handle scaling seamlessly for you so
you can just focus on your website and the code running it.
• Types of PaaS
• Various types of PaaS are currently available to developers. They are:
• Public PaaS
• Private PaaS
• Hybrid PaaS
• Communications PaaS
• Mobile PaaS
• OpenPaaS
• Public PaaS is best fit for use in the public cloud.
• A public PaaS allows the user to control software deployment while
the cloud provider manages the delivery of all other major IT
components necessary to the hosting of applications, including
operating systems, databases, servers and storage system networks.
• Private PaaS aims to deliver the agility of public PaaS while
maintaining the security, compliance, benefits and potentially lower
costs of the private data center.
• A private PaaS is usually delivered as an appliance or software within
the user's firewall, which is frequently maintained in the company's
on-premises data center.
• A private PaaS can be developed on any type of infrastructure and can
work within the company's specific private cloud.
• Hybrid PaaS combines public PaaS and private PaaS to provide
companies with the flexibility of infinite capacity provided by a public
PaaS and the cost efficiencies of owning an internal infrastructure in
private PaaS.
• Hybrid PaaS utilizes a hybrid cloud.
Communications platform as a service (CPaaS)
• A CPaaS is a cloud-based platform that enables
developers to add real-time communications
features (voice, video, and messaging) in their own
applications.
Mobile platform as a service
• Initiated in 2012, mobile PaaS (mPaaS) provides
development capabilities for mobile app designers
and developers.
Open PaaS
• Open PaaS provides open source software allowing a PaaS provider
to run applications in an open source environment, such as Google
App Engine. Some open platforms let the developer use any
programming language, database, operating system or server to
deploy their applications.
Paas Examples
• AWS Elastic Beanstalk,
• Windows Azure,
• Heroku,
• Force.com,
• Google App Engine,
• Apache Stratos.
Advantages of PaaS
Reduced Costs
• When using a PaaS system, real savings are possible due to the fact
that
you don’t perform low-level work yourself and
you don’t have to hire additional personnel or
pay for extra working hours.
Continuous Updates
• you should keep in mind all components that need to be updated
and re-integrated from time to time to keep pace with your
competitors.
Scalability
• Imagine a common situation with a self-built platform: a small
company begins to build an application counting on a certain number
of users; over time, things are going well and the company is
expanding and attracting more users; as the business grows, it
requires more resources to serve the growing number of users.
Freedom of Action:
• This model of cloud computing is, perhaps, the most advantageous
for creative developers and companies that need custom solutions.
The low-level work is done by professionals and numerous tools are
available and ready to operate, which saves time.
Disadvantages of PaaS
• Dependency on Vendor
• Compatibility of Existing Infrastructure
some difficulties and contradictions may arise when two systems
come into contact.
• Security Risks
As a rule, PaaS software is available in a public environment where
multiple end users have access to the same basic resources. For some
apps that contain sensitive data or have strict compliance
requirements, this is not a good option.
SaaS Definition
• Software as Service is a software delivery
model in which software and data are
hosted centrally and accessed via web. It can
be rented .
Intro to SaaS
• Web-browser acting as a thin-client for accessing the software
remotely across the internet.
• Network-based access to, and management of, commercially
available (i.e., not custom) software
• Application delivery that typically is closer to a one-to-many model
(single instance, multi-tenant architecture) than to a one-to-one
model
Capabilities o f SaaS
• Office applications
• Email and instant messaging
• Social media
• Exposing 3rd Party APIs
• Security and authentication
• Machine Learning
• Artificial intelligence
• Location Services
• Data streaming and lookup services
SaaS Architecture
• Supported by
• Bandwidth technologies
• The cost of a PC has been reduced significantly with more
powerful computing but the cost of application software
has not followed
• Timely and expensive setup and maintenance costs
• Licensing issues for business are contributing significantly
to the use of illegal software and piracy.
High-Level Architecture
• There are three key differentiators that separate a
well-designed SaaS application from a poorly designed
one
• scalable
• multi-tenant-efficient
• configurable
• Scaling the application - maximizing concurrency, and
using application resources more efficiently
• i.e. optimizing locking duration, statelessness, sharing
pooled resources such as threads and network
connections, caching reference data, and partitioning large
databases.
High-Level Architecture (con’t)
• Multi-tenancy – important architectural shift from designing
isolated, single-tenant applications
• One application instance must be able to accommodate users
from multiple other companies at the same time
• All transparent to any of the users.
• This requires an architecture that maximizes the sharing of
resources across tenants
• still able to differentiate data belonging to different customers.
High-Level Architecture (con’t)
• Configurable - a single application instance on a single server has
to accommodate users from several different companies at once
• To customize the application for one customer will change the
application for other customers as well.
• Traditionally customizing an application would mean code changes
• Customers configuring applications must be simple and easy
without incurring extra development or operation costs
Saas Financials
• 4 ways software companies are pricing their products
• Open Source – free basic products but charge a fee for the upgrade to the
premium product (i.e. Apache, Linux, etc)
• License software – main way its being done. Customer like this way because
they own the software as an asset
• Leased Software – deployed at customer site but leased for a time period.
Used in the days of the mainframe
• SaaS – subscription pricing. Like leasing is considered and expense but
upgrades and maintenance is free and seamless
• Legal should be involved in the acquisition of mission-critical SaaS software
• Companies are losing control of their data in the SaaS model
• Depending on the service provider for security and data access.
• Need to setup contractual relationship with the SaaS provider
• With conditions of being able to run application in house
• Ability to move data from current provider to new location
• Also Service Level Agreements (SLAs) for
• Availability, response times, notifications of outages
• Data integrity, data privacy, frequency of backup, support and disaster recovery
Saas Financials (con’t)
• CIO(chief information officer) decides if SaaS software will benefit IT
while CFO(chief financial officer) decides if it is economical for the
whole firm
• Leasing vs Buying
• Similar to decision of leasing or buying a car
• Need to compare costs that effect cash flows such as depreciation, interest on financing,
tax and opportunity cost
• Use an experience Accountant
Examples
• Netflix
• Microsoft Office 365
• Amazon Prime
• Twitter
• Facebook
• Google Docs
• …and many more!
SaaS - Pros
• Stay focused on business processes
• Change software to an Operating Expense instead of a Capital Purchase, making
better accounting and budgeting sense.
• Create a consistent application environment for all users
• No concerns for cross platform support
• Easy Access
• Reduced piracy of your software
• Lower Cost
• For an affordable monthly subscription
• Implementation fees are significantly lower
• Continuous Technology Enhancements
SaaS - Cons
• Initial time needed for licensing and agreements
• Trust, or the lack thereof, is the number one factor blocking the adoption of
software as a service (SaaS).
• Centralized control
• Possible erosion of customer privacy
• Absence of disconnected use
• Robustness
Difference between Google docs and Microsoft office.
• Privacy
Storing all data in cloud prone to hacks
• Reliability
Recovery during server downtime is difficult
Applicability of Saas
• Enterprise Software Application
• Perform business functions
• Organize internal and external information
• Share data among internal and external users
• The most standard type of software applicable to Saas model
• Example: Saleforce.com CRM application, Siebel On-demand application
Company name
Applicability of Saas(Continue)
• Single-User software application
• Organize personal information
• Run on users’ own local computer
• Serve only one user at a time
• Inapplicable to Saas model
• Data security issue
• Network performance issue
• Example: Microsoft office suite
Company name
Applicability of Saas(Continue)
• Infrastructure software
• Serve as the foundation for most other enterprise software application
• Inapplicable to Saas model
• Installation locally is required
• Form the basis to run other application
• Example: Window XP, Oracle database
Company name
Applicability of Saas(Continue)
• Embedded Software
• Software component for embedded system
• Support the functionality of the hardware device
• Inapplicable to Saas model
• Embedded software and hardware is combined together and is inseparable
• Example: software embedded in ATM machines, cell phones, routers, medical
equipment, etc
Company name
Types of SaaS
• Business Utility SaaS - Applications like Salesforce automation are
used by businesses and individuals for managing and collecting data,
streamlining collaborative processes and providing actionable
analysis. Popular use cases are Customer Relationship Management
(CRM), Human Resources and Accounting.
• Social Networking SaaS - Applications like Facebook are used by
individuals for networking and sharing information, photos, videos,
etc.
Cloud Computing
Agenda
• Overview
• Why do we need IaaS ?
• How IaaS meets cloud properties ?
• Enabling Techniques
• Virtualization Overview
• Terminology & Taxonomy
Overview
• Problems in conventional case:
• Companies IT investment for peak capacity
• Lack of agility for IT infrastructure
• IT maintain cost for every company
• Usually suffered from hardware failure risk
• …etc
• These IT complexities force company back !!
Overview
• How to solve these problem ?
• Let’s consider some kind of out-sourcing solution
• Somebody will handle on demand capacity for me
• Somebody will handle high available resource for me
• Somebody will handle hardware management for me
• Somebody will handle system performance for me
• Somebody will …
• Frankly, that would be a great solution IF there were
“somebody”.
• But who can be this “somebody”, and provide all these
services ?
Overview
• Infrastructure as a Service will be the salvation.
• IaaS cloud provider takes care of all the IT
infrastructure complexities.
• IaaS cloud provider provides all the infrastructure
functionalities.
• IaaS cloud provider guarantees qualified
infrastructure services.
• IaaS cloud provider charges clients according to their
resource usage.
• But, what make all of these happen so magically ?
Virtualization
• Assume that you are going to be an IaaS cloud
provider.
• Then, what are the problems you are facing ?
• Clients will request different operating systems.
• Clients will request different storage sizes.
• Clients will request different network bandwidths.
• Clients will change their requests anytime.
• Clients will …
• Is there any good strategy ?
• Allocate a new physical machine for each incomer.
• Prepare a pool of pre-installed machines for different
requests.
• or …
Virtualization
• What if we allocate a new physical machine for each incomer ?
I want I want
Windows 7 Linux
Customer A Customer B
I want … I want …
I wantWindows
… I wantLinux
…
Virtualization
• How about preparing a pool of pre-installed physical machines for all
kinds of request ?
I want
Mac OS
Somebody Somebody
Somebody may want Somebody may want
might want… might want…
Windows + Office Windows Server Linux + OpenOffice Linux Server
Virtualization
• Obviously, neither of previous strategies will work.
• We need more powerful techniques to deal with that.
• Virtualization techniques will help.
• For computation resources
• Virtual Machine technique
• For storage resources
• Virtual Storage technique
• For communication resources
• Virtual Network technique
• A virtual machine (VM) is an operating system (OS) or application
environment that is installed on software, which imitates dedicated
hardware. The end user has the same experience on a virtual
machine as they would have on dedicated hardware.
• Virtual storage is the pooling of physical storage from multiple
network storage devices into what appears to be a
single storage device that is managed from a central console.
• Network virtualization or network virtualisation is the process of
combining hardware and software network resources and network
functionality into a single, software-based administrative entity,
a virtual network.
Overview
Why do we need IaaS ?
How IaaS meets cloud properties ?
Properties and Characteristics
Scalability & Elasticity in IaaS
• Clients should be able to dynamically increase or decrease the
amount of infrastructure resources in need.
• Large amount of resources provisioning and deployment should be
done in a short period of time, such as several hours or days.
• System behavior should remain identical in small scale or large
one.
Scalability & Elasticity
• How to approach scalability and elasticity in IaaS ?
• For computation resources :
• Dynamically create or terminate virtual
machines for clients on demand.
• Integrate hypervisors among all physical
machines to collaboratively control and
manage all virtual machines.
• For storage resources :
• Dynamically allocate or de-allocate virtual storage space for clients.
• Integrate all physical storage resources in the entire IaaS system
• Offer initial storage resources by thin provisioning technique.
• Thin Provisioning is a method for optimizing utilization of available
storage in a shared storage environment. It is a flexible manner to allocate
space to systems, on a just-enough and just-in-time basis.
• For communication resources :
• Dynamically connect or disconnect the linking state of virtual
networks for clients on demand.
• Dynamically divide the network request flow to different
physical routers to maintain access bandwidth.
Availability & Reliability
• What do availability and reliability mean in IaaS ?
• Clients should be able to access computation resources without
considering the possibility of hardware failure.
• Data stored in IaaS cloud should be able to be retrieved when
needed without considering any natural disaster damage.
• Communication capability and capacity should be maintained
without considering any physical equipment shortage.
Availability & Reliability
How to approach availability and reliability in IaaS ?
• For computation resources :
• Monitor each physical and virtual machine for
any possible failure.
• Regularly backup virtual machine system state
for disaster recovery.
• Migrate virtual machine among physical
machines for potential failure prevention.
• For storage resources :
• Maintain data pieces replication among different physical
storage devices.
• Regularly backup virtual storage data to geographical remote
locations for disaster prevention.
• For communication resources :
• Built redundant connection system to improve robustness.
Manageability & Interoperability
• What do manageability and interoperability mean in IaaS ?
• Clients should be able to fully control the virtualized
infrastructure resources which allocated to them.
• States of all virtualized resource should be fully under
monitoring.
• Usage of infrastructure resources will be recorded and then
billing system will convert these information to user
payment.
Manageability & Interoperability
• How to approach manageability and interoperability in
IaaS ?
• For computation resources :
• Provide basic virtual machine operations, such as
creation, termination, suspension, resumption
and system snapshot.
• Monitor and record CPU and memory loading for
each virtual machine.
• For storage resources :
• Monitor and record storage space usage and read/write data access
from user for each virtual storage resource.
• Automatic allocate/de-allocate physical storage according to space
utilization.
• For communication resources :
• Monitor and record the network bandwidth consumption for each
virtual link.
Performance & Optimization
• What do performance and optimization mean in IaaS ?
• Physical resources should be highly utilized among different clients.
• Physical resources should form a large resource pool which provide high
computing power through parallel processing.
Performance & Optimization
• How to approach performance and optimization in IaaS ?
• For computation resources :
• Deploy virtual machine with load balancing consideration.
• Live migrate virtual machines among physical ones to balance
the system loading.
• For storage resources :
• Live migrate virtual storage among physical ones with different
performance level.
• For communication resources :
• Consider network bandwidth loading when deploying virtual machines
and storage.
• Dynamically migrate virtual machines or storage to balance network
flow.
Accessibility & Portability
• What do accessibility and portability mean in IaaS ?
• Clients should be able to control, manage and access infrastructure resources
in an easy way, such as the web-browser, without additional local software or
hardware installation.
• Provided infrastructure resources should be able to be reallocated or
duplicated easily.
Accessibility & Portability
• How to approach accessibility and portability in IaaS ?
• For computation resources :
• Cloud provider integrates virtual machine management and access through web-based
portal.
• Comply the virtual machine standard for portability.
• For storage resources :
• Cloud provider integrates virtual storage management and access through web-based
portal.
• For communication resources :
• Cloud provider integrates virtual network management and access through web-based
portal.
Enabling Techniques
Virtualization
IaaS Architecture
• Infrastructure as a Service (IaaS) delivers computer infrastructure
for cloud user, typically a platform virtualization environment as a
service.
• Virtualization is an enabling technique to provide an abstraction of
logical resources away from underlying physical resources.
Virtualization Overview
• What is virtualization ?
• Virtualization is the creation of a virtual (rather than physical) version of
something, such as an operating system, a server, a storage device or network
resources.
• It hides the physical characteristics of a resource from users, instead showing
another abstract resource.
• But, where does virtualization come from ?
• Virtualization is NOT a new idea of computer science.
• Virtualization concept comes from the component abstraction of system
design, and it has been adapted in many system level.
• Now, let’s take a look of our original system architecture !!
Virtualization Overview
• System abstraction :
• Computer systems are built
on levels of abstraction.
• Higher level of abstraction
hide details at lower levels.
• Designer of each abstraction
level make use of the
functions supported from its
lower level, and provide
another abstraction to its
higher one.
• Example
• files are an abstraction of a
disk
Virtualization Overview
• Machine level abstraction
:
• For OS developers, a
machine is defined by ISA
(Instruction Set
Architecture).
• This is the major division
between hardware and
software.
• Examples :
• X86
• ARM
• MIPS
Virtualization Overview
• OS level abstraction :
• For compiler or library
developers, a machine is
defined by ABI (Application
Binary Interface).
• This define the basic OS
interface which may be
used by libraries or user.
• Examples :
• User ISA
• OS system call
Virtualization Overview
• Library level abstraction :
• For application developers,
a machine is defined by API
(Application Programming
Interface).
• This abstraction provides
the well-rounded
functionalities.
• Examples :
• User ISA
• Standard C library
• Graphical library
Virtualization Overview
• The concept of virtualization is everywhere !!
• In IaaS, we focus the virtualization granularity at each
physical hardware device.
• General virtualization implementation level :
• Virtualized instance
• Software virtualized hardware instance
• Virtualization layer
• Software virtualization implementation
• Abstraction layer
• Various types of hardware access interface
• Physical hardware
• Various types of infrastructure resources
• Different physical resources :
• Server, Storage and Network
Virtualization
Terminology & Taxonomy
Virtual Machine
• What is Virtual Machine (VM)?
• VM is a software implementation of a machine (i.e. a computer) that executes
programs like a real machine.
• Terminology :
• Host (Target)
• The primary environment where
will be the target of virtualization.
• Guest (Source)
• The virtualized environment where
will be the source of virtualization.
Emulation vs. Virtualization
• Emulation technique
• Simulate an independent environment where guest ISA and
host ISA are different.
• Example
• Emulate x86 architecture on ARM platform.
• Virtualization technique
• Simulate an independent environment where guest ISA and
host ISA are the same.
• Example
• Virtualize x86 architecture to multiple instances.
Process Virtual Machine
• Process virtual machine
• Usually execute guest applications with an ISA different from
host
• Couple at ABI(Application Binary Interface) level via runtime system
• Not persistent
System Virtual Machine
• System virtual machine
• Provide the entire operating system on same or different host
ISA
• Constructed at ISA level
• Persistent
Taxonomy
System Virtual Machine Process Virtual Machine
Emulation Transmeta Crusoe Multi-processing system
( Emulate x86 on VLIW cpu )
Virtualization XEN, KVM, VMWare JVM, Microsoft CLI
( x86 virtualization software ) ( High level language virtualization )
Virtual Machine Monitor
• What’s Virtual Machine Monitor (VMM) ?
• VMM or Hypervisor is the software layer providing the
virtualization.
• System architecture :
VM1 VM2 VM3
Virtualization Types
• Virtualization Types :
• Type 1 – Bare metal
• VMMs run directly on the host's hardware as a hardware control and guest operating
system monitor.
• Type 2 – Hosted
• VMMs are software applications running within a conventional operating system.
535
Virtualization Approaches
• Virtualization Approaches :
• Full-Virtualization
• VMM simulates enough hardware to allow an unmodified guest OS.
• Para-Virtualization
• VMM does not necessarily simulate hardware, but instead offers a special API that can
only be used by the modified guest OS.
536
Virtualization Approaches
• Full-Virtualization
Pros Need not to modify guest OS
Cons Significant performance hit
537
Virtualization Approaches
• Para-Virtualization
Pros Light weight and high performance
Cons Require modification of guest OS
538
Examples
Xen KVM
• Type 1 Virtualization • Type 2 Virtualization
• Para-Virtualization • Full-Virtualization
539
To be continued…
• Virtualization techniques
Server Virtualization Storage Virtualization Network Virtualization
References
• James Smith and Ravi Nair, “Virtual Machines: Versatile Platforms for
Systems and Processors”.
• Xen. http://www.xen.org/
• Kernel-based Virtual Machine (KVM). http://www.linux-
kvm.org/page/Main_Page
• From Wikipedia, the free encyclopedia
• All resources of the materials and pictures were partially retrieved
from the Internet.
UNIT-4
UNIT IV RESOURCE MANAGEMENT AND SECURITY IN CLOUD
Inter Cloud Resource Management – Resource Provisioning and
Resource Provisioning Methods – Global Exchange of Cloud Resources
– Security Overview – Cloud Security Challenges –Software-as-a-Service
Security – Security Governance – Virtual Machine Security – IAM –
Security Standards
• Inter-cloud Resource Management
• This section characterizes the various cloud service models and their
extensions. The cloud service trends are outlined. Cloud resource
management and intercloud resource exchange schemes are
reviewed. We will discuss the defense of cloud resources against
network threats
• Extended Cloud Computing Services
• There are six layers of cloud services, ranging from hardware,
network,infrastructure, platform, and software applications.
• We already introduced the top three service layers as SaaS, PaaS, and
IaaS, respectively.
• The cloud platform provides PaaS, which sits on top of the IaaS
infrastructure. The top layer offers SaaS.
• These must be implemented on the cloud platforms provided.
• Although the three basic models are dissimilar in usage, they are
built one on top of another.
• The implication is that one cannot launch SaaS applications with a
cloud platform.
• The cloud platform cannot be built if compute and storage
infrastructures are not there.
• The bottom three layers are more related to physical requirements.
• The bottommost layer provides Hardware as a Service (HaaS).
• The next layer is for interconnecting all the hardware components,
and is simply called Network as a Service (NaaS).
• Virtual LANs fall within the scope of NaaS.
• The next layer up offers Location as a Service (LaaS), which provides a
location service to house, power, and secure all the physical hardware
and network resources.
• Location as a Service (LaaS) is a new concept that combines the three
main categories of cloud computing services: infrastructure, software,
and platform as a service.
• Some authors say this layer provides Security as a Service.
• The cloud infrastructure layer can be further subdivided as Data as a
Service (DaaS)and Communication as a Service (CaaS) in addition to
compute and storage in IaaS.
• cloud players are divided into three classes: (1) cloud service
providers and IT administrators, (2) software developers or vendors,
and (3) end users or business users.
• These cloud players vary in their roles under the IaaS, PaaS, and SaaS
models.
• The table entries distinguish the three cloud models as viewed by
different players.
• From the providers’ perspective, cloud infrastructure performance is
the primary concern.
• From the end users’ perspective, the quality of services, including
security, is the most important.
• Cloud Service Tasks and Trends
• Cloud services are introduced in five layers.
• The top layer is for SaaS applications, as further subdivided into the
five application areas, mostly for business applications.
• For example, CRM is heavily practiced in business promotion, direct
sales, and marketing services.
• Cloud CRM (or CRM cloud) means any customer relationship
management (CRM) technology where the CRM software, CRM tools
and the organization's customer data resides in the cloud and is
delivered to end-users via the Internet (see "cloud computing").
• Cloud CRM typically offers access to the application via Web-based
tools (or Web browser) logins where the CRM system administrator
has previously defined access levels across the organization.
• Employees can log in to the CRM system, simultaneously, from any
Internet-enabled computer or device.
• Often, cloud CRM provide users with mobile apps to make it easier to
use the CRM on smartphones and tablets.
• CRM offered the first SaaS on the cloud successfully.
• The approach is to market coverage by investigating customer
behaviors and revealing opportunities by statistical analysis.
• SaaS tools also apply to distributed collaboration, and financial and
human resources management.
• These cloud services have been growing rapidly in recent years.
• PaaS is provided by Google, Salesforce.com, and Facebook, among
others.
• IaaS is provided by Amazon, Windows Azure, and RackRack, among
others.
• Collocation services require multiple cloud providers to work together
to support supply chains in manufacturing.
• Network cloud services provide communications such as those by
AT&T, Qwest, and AboveNet.
Software Stack for Cloud Computing
• You can categorise cloud services into one of three
offerings: software-as-a-service (SaaS), platform-as-a-service (PaaS)
and infrastructure-as-a-service (IaaS).
• They are called 'as a service' because businesses purchase them as
service products from third parties.
• Together, they are the cloud computing stack
• Despite the various types of nodes in the cloud computing cluster, the
overall software stacks are built from scratch to meet rigorous goals.
• Developers have to consider how to design the system to meet
critical requirements such as high throughput, and fault tolerance.
• Even the operating system might be modified to meet the special
requirement of cloud data processing.
• The platform for running cloud computing services can be either
physical servers or virtual servers.
• By using VMs, the platform can be flexible, that is, the running
services are not bound to specific hardware platforms.
• This brings flexibility to cloud computing platforms.
• The software layer on top of the platform is the layer for storing
massive amounts of data.
• This layer acts like the file system in a traditional single machine.
• Other layers running on top of the file system are the layers for
executing cloud computing applications.
• They include the database storage system, programming for large-
scale clusters, and data query language support.
• The next layers are the components in the software stack.
• Runtime Support Services
• As in a cluster environment, there are also some runtime supporting
services in the cloud computing environment.
• Cluster monitoring is used to collect the runtime status of the entire
cluster.
• Cloud monitoring is the process of evaluating, monitoring, and
managing cloud-based services, applications, and infrastructure.
Companies utilize various application monitoring tools to monitor
cloud-based applications.
• One of the most important facilities is the cluster job management
system introduced.
Cluster Job Management Systems
Job management is also known as workload management, load
sharing, or load management.
A Job Management System (JMS) should have three parts:
• A user server lets the user submit jobs to one or more queues, specify
resource requirements for each job, delete a job from a queue, and
inquire about the status of a job or a queue.
• A job scheduler performs job scheduling and queuing according to
job types, resource requirements, resource availability, and scheduling
policies.
• A resource manager allocates and monitors resources, enforces
scheduling policies, and collects accounting information.
• The scheduler queues the tasks submitted to the whole cluster and
assigns the tasks to the processing nodes according to node
availability.
• The distributed scheduler for the cloud application has special
characteristics that can support cloud applications, such as scheduling
the programs written in MapReduce style.
• The runtime support system keeps the cloud cluster working properly
with high efficiency.
• The runtime support manages an application's execution through
various stages of its lifecycle: deployment, initialization, execution and
termination.
• Runtime support is software, needed in browser-initiated applications
applied by thousands of cloud customers.
• The SaaS model provides the software applications as a service,
rather than letting users purchase the software.
• As a result, on the customer side, there is no upfront investment in
servers or software licensing.
• On the provider side, costs are rather low, compared with
conventional hosting of user applications.
• The customer data is stored in the cloud that is either vendor
proprietary or a publicly hosted cloud supporting PaaS and IaaS.
• Resource Provisioning and Platform Deployment
• The emergence of computing clouds suggests fundamental changes in
software and hardware architecture.
• Cloud architecture puts more emphasis on the number of processor
cores or VM instances.
• Parallelism is exploited at the cluster node level.
• In this section, we will discuss techniques to provision computer
resources or VMs.
• Provisioning of Compute Resources (VMs)
• Providers supply cloud services by signing SLAs with end users.
• The SLAs must commit sufficient resources such as CPU, memory, and
bandwidth that the user can use for a preset period.
• Underprovisioning of resources will lead to broken SLAs and
penalties.
• Overprovisioning of resources will lead to resource underutilization,
and consequently, a decrease in revenue for the provider.
• Deploying an autonomous system to efficiently provision resources to
users is a challenging problem.
• Autonomous systems are highly automated systems that leverage
machine learning and AI.
• "Enterprise processes are moving from automated to autonomous.
The difference is in automating some components of a process to
automating the entire process end to end,"
• The difficulty comes from the unpredictability of consumer demand,
software and hardware failures, heterogeneity of services, power
management, and conflicts in signed SLAs between consumers and
service providers.
• Efficient VM provisioning depends on the cloud architecture and
management of cloud infrastructures.
• Resource provisioning schemes also demand fast discovery of services
and data in cloud computing infrastructures.
• In a virtualized cluster of servers, this demands efficient installation of
VMs, live VM migration, and fast recovery from failures.
• To deploy VMs, users treat them as physical hosts with customized
operating systems for specific applications.
• A VMware template (also callled a golden image) is a perfect, model
copy of a virtual machine (VM) from which an administrator can
clone, convert or deploy more virtual machines.
• A VMware template includes the virtual machine's virtual disks and
settings from its .vmx configuration file, managed with permissions.
Templates save time and avoid errors when configuring settings
• Users can choose different kinds of VMs from the templates.
• IBM’s Blue Cloud does not provide any VM templates.
• In general, any type of VM can run on top of Xen.
• To deploy VMs, users treat them as physical hosts with customized
operating systems for specific applications.
• For example, Amazon’s EC2 uses Xen as the virtual machine monitor
(VMM).
• Amazon Elastic Compute Cloud (Amazon EC2) is a web service that
provides secure, scalable compute capacity in the cloud.
• The same VMM is used in IBM’s Blue Cloud.
• In the EC2 platform, some predefined VM templates are also
provided.
• The provider should offer resource-economic services.
• Power-efficient schemes for caching, query processing, and thermal
management are mandatory due to increasing energy waste by heat
dissipation from data centers.
• Public or private clouds promise to streamline the on-demand
provisioning of software, hardware, and data as a service, achieving
economies of scale in IT deployment and operation.
• Resource Provisioning Methods
• Three cases of static cloud resource provisioning policies.
• In case (a), overprovisioning with the peak load causes heavy
resource waste (shaded area).
• In case (b), underprovisioning (along the capacity line) of resources
results in losses by both user and provider in that paid demand by the
users (the shaded area above the capacity) is not served and wasted
resources still exist for those demanded areas below the provisioned
capacity.
• In case (c), the constant provisioning of resources with fixed capacity
to a declining user demand could result in even worse resource
waste.
• The user may give up the service by canceling the demand, resulting
in reduced revenue for the provider.
• Both the user and provider may be losers in resource provisioning
without elasticity
• The demand-driven method provides static resources and has been
used in grid computing for many years.
• The event-driven method is based on predicted workload by time.
• The popularity-driven method is based on Internet traffic monitored.
We characterize these resource provisioning methods as follows.
• Demand-Driven Resource Provisioning
• This method adds or removes computing instances based on the
current utilization level of the allocated resources.
• The demand-driven method automatically allocates two Xeon
processors for the user application, when the user was using one
Xeon processor more than 60 percent of the time for an extended
period.
• In general, when a resource has surpassed a threshold for a certain
amount of time, the scheme increases that resource based on
demand.
• When a resource is below a threshold for a certain amount of time,
that resource could be decreased accordingly.
• Amazon implements such an auto-scale feature in its EC2 platform.
• This method is easy to implement.
• The scheme does not work out right if the workload changes abruptly.
• The x-axis is the time scale in milliseconds.
• In the beginning, heavy fluctuations of CPU load are encountered.
• All three methods have demanded a few VM instances initially.
• Event-Driven Resource Provisioning
• This scheme adds or removes machine instances based on a specific
time event.
• Event-driven computing allows for functions to be executed on
arbitrary servers when triggered, and companies are billed only for
the duration of time it takes for the function to complete.
• The scheme works better for seasonal or predicted events such as
Christmastime in the West and the Lunar New Year in the East.
• During these events, the number of users grows before the event
period and then decreases during the event period.
• This scheme anticipates peak traffic before it happens.
• The method results in a minimal loss of QoS, if the event is predicted
correctly.
• Otherwise, wasted resources are even greater due to events that do
not follow a fixed pattern.
• Popularity-Driven Resource Provisioning
• In this method, the Internet searches for popularity of certain applications
and creates the instances by popularity demand.
• The scheme anticipates increased traffic with popularity.
• Again, the scheme has a minimal loss of QoS, if the predicted popularity is
correct.
• Resources may be wasted if traffic does not occur as expected.
• In Figure 4.25(c), EC2 performance by CPU utilization rate (the dark curve
with the percentage scale shown on the left) is plotted against the number
of VMs provisioned (the light curves with scale shown on the right, with a
maximum of 20 VMs provisioned).
• Dynamic Resource Deployment
• The cloud uses VMs as building blocks to create an execution
environment across multiple resource sites.
• The InterGrid-managed infrastructure was developed by a Melbourne
University group.
• InterGrid: A Case for Internetworking of Islands of Grids.
• Grid Computing can be defined as a network of computers working
together to perform a task that would rather be difficult for a single
machine.
• All machines on that network work under the same protocol to act
like a virtual supercomputer.
• Computers on the network contribute resources like processing
power and storage capacity to the network.
• Dynamic resource deployment can be implemented to achieve
scalability in performance.
• The InterGrid is a Java-implemented software system that lets users
create execution cloud environments on top of all participating grid
resources.
• The architecture of InterGrid relies on InterGrid Gateways (IGGs) that
mediate access to resources of participating Grids.
• The InterGrid also aims at undertaking the heterogeneity of hardware
and software within Grids.
• An intergrid gateway (IGG) allocates resources from a local cluster to
deploy applications in three steps:
• (1) requesting the VMs,
• (2) passing the leases, and
• (3) deploying the VMs as requested.
• Under peak demand, this IGG interacts with another IGG that can
allocate resources from a cloud computing provider.
• Peering is a voluntary interconnection of administratively
separate Internet networks for the purpose of exchanging traffic
between the users of each network.
• A grid has predefined peering arrangements with other grids, which
the IGG manages.
• Through multiple IGGs, the system coordinates the use of InterGrid
resources.
• An IGG is aware of the peering terms with other grids, selects suitable
grids that can provide the required resources, and replies to requests
from other IGGs.
• Request redirection policies determine which peering grid, InterGrid
selects to process a request
• An IGG can also allocate resources from a cloud provider.
• The cloud system creates a virtual environment to help users deploy
their applications.
• These applications use the distributed grid resources.
• The InterGrid allocates and provides a distributed virtual environment
(DVE).
• Distributed virtual environment are software systems that connect
geographically dispersed users into a shared virtual space and
support the interaction between the users and the shared world.
• DVEs have many applications in medicine, robotics, interactive
distance learning, and online communities.
• This is a virtual cluster of VMs that runs isolated from other virtual
clusters.
• A component called the DVE manager performs resource allocation
and management on behalf of specific user applications.
• The core component of the IGG is a scheduler for implementing
provisioning policies and peering with other gateways.
• The communication component provides an asynchronous message-
passing mechanism.
• Received messages are handled in parallel by a thread pool.
• Provisioning of Storage Resources
• The data storage layer is built on top of the physical or virtual servers.
• As the cloud computing applications often provide service to users,
it is unavoidable that the data is stored in the clusters of the cloud
provider.
• The service can be accessed anywhere in the world. One example is
e-mail systems.
• A typical large e-mail system might have millions of users and each
user can have thousands of e-mails and consume multiple gigabytes
of disk space.
• Another example is a web searching application.
• In storage technologies, hard disk drives may be augmented with
solid-state drives in the future.
• A hard disk drive (HDD) is an old-school storage device that uses
mechanical platters and a moving read/write head to access data. A
solid-state drive (SSD) is a newer, faster type of device that stores
data on instantly-accessible memory chips.
• What is an HDD?
• An HDD is a data storage device that lives inside the computer. It has
spinning disks inside where data is stored magnetically. The HDD has
an arm with several "heads" (transducers) that read and write data on
the disk. It is similar to how a turntable record player works, with an
LP record (hard disk) and a needle on an arm (transducers). The arm
moves the heads across the surface of the disk to access different
data.
• HDDs are considered a legacy technology, meaning they’ve been
around longer than SSDs. In general, they are lower in cost and are
practical for storing years of photos and videos or business files. They
are available in two common form factors: 2.5 inch (commonly used
in laptops) and 3.5 inch (desktop computers).
• What is an SSD?
• SSDs got their name—solid state—because they have no moving parts. In
an SSD, all data is stored in integrated circuits.
• This difference from HDDs has a lot of implications, especially in size and
performance.
• Without the need for a spinning disk, SSDs can go down to the shape and
size of a stick of gum (what’s known as the M.2 form factor) or even as
small as a postage stamp. Their capacity—or how much data they can
hold—varies, making them flexible for smaller devices, such as slim
laptops, convertibles, or 2 in 1s. And SSDs dramatically reduce access time
since users don’t have to wait for platter rotation to start up.
• SSDs are more expensive than HDDs per amount of storage (in gigabytes,
or GB, and terabytes, or TB), but the gap is closing as SSD prices begin to
drop.
• The biggest barriers to adopting flash memory in data centers have
been price, capacity, and, to some extent, a lack of sophisticated
query-processing techniques.
• Flash is a form of non-volatile, high-speed read and write media that
holds digital data.
• Types of Flash can be found in many diverse areas from USB drives to
smart phones. While SSD is a type of hard disk that instead of using
magnetic media to write, store and read data uses a form of Flash
memory.
• A distributed file system is very important for storing large-scale data.
• However, other forms of data storage also exist. Some data does not
need the namespace of a tree structure file system, and instead,
databases are built with stored data files.
• In cloud computing, another form of data storage is (Key, Value) pairs.
• Many cloud computing companies have developed large-scale data
storage systems to keep huge amount of data collected every day.
• For example, Google’s GFS stores web data and some other data, such
as geographic data for Google Earth.
• A similar system from the open source community is the Hadoop
Distributed File System (HDFS) for Apache.
• Hadoop is the open source implementation of Google’s cloud
computing infrastructure.
• Similar systems include Microsoft’s Cosmos file system for the cloud.
• Cosmos is Microsoft's internal data storage/query system for
analyzing enormous amounts (as in petabytes) of data.
• Despite the fact that the storage service or distributed file system can
be accessed directly, similar to traditional databases, cloud computing
does provide some forms of structure or semistructure database
processing capability.
• For example, applications might want to process the information
contained in a web page.
• Web pages are an example of semistructural data in HTML format.
• If some forms of database capability can be used, application
developers will construct their application logic more easily.
• Another reason to build a database like service in cloud computing is
that it will be quite convenient for traditional application developers
to code for the cloud platform.
• Databases are quite common as the underlying storage device
for many applications.
• Thus, such developers can think in the same way they do for
traditional software development.
• Hence, in cloud computing, it is necessary to build databases like
large-scale systems based on data storage or distributed file
systems.
• The scale of such a database might be quite large for processing
huge amounts of data.
• The main purpose is to store the data in structural or semi-
structural ways so that application developers can use it easily
and build their applications rapidly.
• Traditional databases will meet the performance bottleneck
while the system is expanded to a larger scale.
• However, some real applications do not need such strong
consistency.
• The scale of such databases can be quite large.
• Typical cloud databases include BigTable from Google, SimpleDB
from Amazon, and the SQL service from Microsoft Azure.
• Virtual Machine Creation and Management
• In this section, we will consider several issues for cloud infrastructure
management.
• First, we will consider the resource management of independent
service jobs.
• Then we will consider how to execute third-party cloud applications.
• Cloud-loading experiments are used by a Melbourne research group
on the French Grid’5000 system.
• This experimental setting illustrates VM creation and management.
• This case study example reveals major VM management issues and
suggests some plausible solutions for workload-balanced execution.
• Virtual Machine Manager
• The VM manager is the link between the gateway and resources.
• The gateway doesn’t share physical resources directly, but relies
on virtualization technology for abstracting them.
• Hence, the actual resources it uses are VMs.
• The manager manage VMs deployed on a set of physical
resources.
• The VM manager implementation is generic
• Virtual Machine Templates
A VM template is analogous to a computer’s configuration and
contains a description for a VM with the following static
information:
• The number of cores or processors to be assigned to the VM
• The amount of memory the VM requires
• The kernel used to boot the VM’s operating system
• The disk image containing the VM’s file system
• The price per hour of using a VM
Resource Provisioning and Resource Provisioning Methods
• The applications can be used properly by applying the purpose of
resource provisioning which implies that to discover the
reasonable resources.
• The best consequences can get by using the more effective
resources.
• The reasonable and appropriate workload discovery is one of
the main goals that maintain the program of different workloads.
• For making the quality of services more effective there is needed
to satisfy the parts or units like utility, availability, reliability, time,
security, price and CPU etc.
• So the resource provisioning reflects the performance of the time
for the various workloads.
• Static Resource Provisioning
• For an application all types of desired resources are required in
the peak time normally.
• Mostly this type of cloud provisioning the misuse of resources
and wastage of resources because of workload is not
considered in the peak time.
• Despite of this the resource provider offer the maximum
desired resource for the purpose of avoids the service level
application violation .
• Dynamic Resource Provisioning
• The customer demand, requirements and workloads
are changed rapidly so that the cloud computing
contain the elasticity element to the level of advanced
automation adaption in the way of resource
provisioning.
• This aim can be achieved through making the
automatically scaling up and down of the resources that
are assigned to a particular customer.
• This method is used to match the existing resources with the
consumer current needs and demands with more good and
reasonable way.
• In this way the element of elasticity is helpful to overcome the
problem of under and over provisioning.
• User Self-provisioning:
• With user self-provisioning (also known as cloud self-
service), the customer purchases resources from the cloud
provider through a web form, creating a customer account
and paying for resources with a credit card.
• The provider's resources are available for customer use
within hours, if not minutes
• Parameters of Resource Provisioning
• 1) Response time: The algorithm of resource provisioning is designed
to give response in minimum time after completing any task
• 2) Minimize Cost: The cloud services cost should be less for the cloud
consumer.
• 3) Revenue Maximization: The cloud services provider should be
earned maximum revenue.
• 4) Fault tolerant: The algorithm provide services continuously in spite
of collapse of nodes.
• 5) Reduced SLA Violation: The design of algorithm should be capable
to decrease SLA violation.
• 6) Reduced Power Consumption: The placement & migration
methods of virtual machine should be consume low power.
• The basic model of resource provisioning in cloud
• Cloud user sends their workload like cloud application to the
resource provisioning agents and establish good interaction with
them.
• Resource provisioning agent (RPA) does resource provisioning and
provide most suitable resource according to the customer
requirements.
• When resource provisioning agent received the workload from user,
his connection and access with the resource information centre (RIC)
that have all the desired information about all type of resources with
a resource pool.
• After that output can be achieved depend on the workload
requirements as précised by consumer.
• Through resource discovery we know about the available resources
and desired resources list can be generated.
• On the other hand the selection of resources is a procedure of
choosing the most appropriate workload resource competition and
match depended on the quality of services
• Resource Provisioning Mechanisms
• 1 QoS Based RPM
• The major objective of such work is to provide provision on different
resources before managing in an appropriate manner or way and
then execute this application for getting optimal results to the end
user.
• 2 Cost Based RPM Minimize the total amount of resource
provisioning cost like over provisioning cost and under provisioning
cost. Cost reduction can assure the double capacity of application
• 3 SLA Based RPM SLA provisioning method depend on the admission
control that can be maximizes the revenue and also the utilization of
resources resource utilization and also pay attention on multiple type
needs of SLA that are consumer described.
• 4 Time Based RPM Minimum execution time can double the
application capacity as well as minimize the overhead cost of
switching servers.
• 5.Energy Based RPM Enhance the resource utilization and must be
reduce the consumption of power.
• 6 Dynamic Based RPM Decision related to different and changing
environment such as electricity bills and user requirements. Fully and
partially distributing the clouds computing services facilities with
other consumers.
• 7 Adaptive Based RPM Methodologies which can be based on the
virtualization for the purpose of resource provisioning depend on the
need of application dynamically as well as minimize the consumption
of power and energy by maximizing the server’s usage.
• 8 Optimization Based RPM The running cost of consumer application
can be reduced through advancing the energy resources and also
meet the required deadline on time make sure that SLA objectives
cannot be violated.
Global Exchange of Cloud Resources
• In order to support a large number of application service consumers from
around the world, cloud infrastructure providers (i.e., IaaS providers) have
established data centers in multiple geographical locations to provide
redundancy and ensure reliability in case of site failures.
• For example, Amazon has data centers in the United States (e.g., one on
the East Coast and another on the West Coast) and Europe.
• However, currently Amazon expects its cloud customers (i.e., SaaS
providers) to express a preference regarding where they want their
application services to be hosted.
• Amazon does not provide seamless/automatic mechanisms for scaling its
hosted services across multiple geographically distributed data centers.
• This approach has many shortcomings.
• First, it is difficult for cloud customers to determine in
advance the best location for hosting their services as they
may not know the origin of consumers of their services.
• Second, SaaS providers may not be able to meet the QoS
expectations of their service consumers originating from
multiple geographical locations.
• In addition, no single cloud infrastructure provider will be able
to establish its data centers at all possible locations
throughout the world.
• As a result, cloud application service (SaaS) providers will
have difficulty in meeting QoS expectations for all their
consumers.
• Hence, they would like to make use of services of multiple
cloud infrastructure service providers who can provide better
support for their specific consumer needs.
• This kind of requirement often arises in enterprises with
global operations and applications such as Internet services,
media hosting, and Web 2.0 applications.
• This necessitates federation of cloud infrastructure service
providers for seamless provisioning of services across
different cloud providers.
• By realizing InterCloud architectural principles in
mechanisms in their offering, cloud providers will be able to
dynamically expand or resize their provisioning capability
based on sudden spikes in workload demands by leasing
available computational and storage capabilities from other
cloud service providers;
• They consist of client brokering and coordinator services that support
utility-driven federation of clouds: application scheduling, resource
allocation, and migration of workloads.
• The architecture cohesively couples the administratively and
topologically distributed storage and compute capabilities of clouds
as part of a single resource leasing abstraction.
• The system will ease the cross-domain capability integration for on-
demand, flexible, energy-efficient, and reliable access to the
infrastructure based on virtualization technology.
• The Cloud Exchange (CEx) acts as a market maker for
bringing together service producers and consumers.
• It aggregates the infrastructure demands from application
brokers and evaluates them against the available supply
currently published by the cloud coordinators.
• It supports trading of cloud services based on competitive
economic models such as commodity markets and auctions.
• CEx allows participants to locate providers and consumers
with fitting offers. Such markets enable services to be
commoditized, and thus will pave the way for creation of
dynamic market infrastructure for trading based on SLAs.
Security Overview
• Cloud computing security or, more simply, cloud security
refers to a broad set of policies, technologies, applications,
and controls utilized to protect virtualized IP, data,
applications, services, and the associated infrastructure of
cloud computing.
• Ensure Local Backup
• It is the essential precaution that one can take towards cloud
data security.
• Misuse of data is one thing, but losing possible data from
your end may result in terrible consequences.
• Especially in the IT world, where information is everything
organizations depend upon; losing data files could not only
lead to a significant financial loss but may also attract legal
action.
• Avoid Storing Sensitive Information
• Many companies refrain from storing personal data on their servers,
and there is sensibility behind the decision — saving sensitive
becomes a responsibility of the organization.
• Compromise with such data can lead to gruesome troubles for the
firm.
• Giants such as Facebook have been dragged to court under such
issues in the past. Additionally, uploading sensitive data is faulty from
the customer’s perspective too.
• Merely avoid storing such sensitive data on the cloud.
• Use Encryption
• Encrypting data before uploading it to the cloud is an excellent
precaution against threats from unwanted hackers.
• Use local encryption as an additional layer of security.
• Known as zero-knowledge proof in cryptography, this method will
even protect your data against service providers and administrators
themselves.
• Therefore, choose a service provider who provides a prerequisite data
encryption.
• Also if you’re already opting for an encrypted cloud service, having a
preliminary round of encryption for your files will give you a little
extra security.
• Additional Security Measures
• Although passwords are good for keeping data encrypted, applying
additional measures are also important.
• Encryption stops unauthorized access of data, but it doesn’t secure
its existence.
• There are chances that your data might get corrupted over the time
or that many people will have access to your data and password
security seems unreliable.
• Your cloud must be secured with antivirus programs, admin controls,
and other features that help protect data.
• A secure cloud system and its dedicated servers must use the right
security tools and must function according to privilege controls to
move data.
• SECURITY BEST PRACTICES
• Strategy & Policy
• A holistic cloud security program should account for ownership and
accountability (internal/external) of cloud security risks, gaps in
protection/compliance, and identify controls needed to mature
security and reach the desired end state.
• Network Segmentation
• In multi-tenant environments, assess what segmentation is in place
between your resources and those of other customers, as well as
between your own instances.
• Leverage a zone approach to isolate instances, containers,
applications, and full systems from each other when possible.
• Identity and Access Management and Privileged Access
Management
• Force robust identity management and authentication
processes to ensure only authorized users to have access to
the cloud environment, applications, and data.
• Enforce least privilege to restrict privileged access and to
harden cloud resources (for instance, only expose resources
to the Internet as is necessary, and de-activate unneeded
capabilities/features/access).
• Ensure privileges are role-based, and that privileged access is
audited and recorded via session monitoring.
• Discover and Onboard Cloud Instances and Assets
• Once cloud instances, services, and assets are discovered
and grouped, bring them under management (i.e. managing
and cycling passwords, etc.).
• Discovery and onboarding should be automated as much as
possible to eliminate shadow IT.
• Password Control (Privileged and Non-Privileged
Passwords)
• Never allow the use of shared passwords. Combine
passwords with other authentication systems for sensitive
areas. Ensure password management best practices.
• Vulnerability Management
• Regularly perform vulnerability scans and security audits,
and patch known vulnerabilities.
• Encryption
• Ensure your cloud data is encrypted, at rest, and in transit.
• Disaster Recovery
• Be aware of the data backup, retention, and recovery
policies and processes for your cloud vendor(s). Do they
meet your internal standards? Do you have break-glass
strategies and solutions in place?
• Monitoring, Alerting, and Reporting
• Implement continual security and user activity monitoring
across all environments and instances.
• Try to integrate and centralize data from your cloud provider
(if available) with data from in-house and other vendor
solutions, so you have a holistic picture of what is happening
in your environment.
• Cloud Security Challenges
• Here are the major security challenges that companies using
cloud infrastructure have to prepare for.
• Data breaches
• A data breach might be the primary objective of a targeted
attack or simply the result of human error, application
vulnerabilities, or poor security practices. It might involve
any kind of information that was not intended for public
release, including personal health information, financial
information, personally identifiable information, trade
secrets, and intellectual property. An organization’s cloud-
based data may have value to different parties for different
reasons.
• Access management
• Since cloud enables access to company's data from
anywhere, companies need to make sure that not everyone
has access to that data.
• This is done through various policies and guardrails that
ensure only legitimate users have access to vital information,
and bad actors are left out.
• Data encryption
• Implementing a cloud computing strategy means placing
critical data in the hands of a third party, so ensuring the
data remains secure both at rest (data residing on storage
media) as well as when in transit is of paramount
importance.
• Data needs to be encrypted at all times, with clearly defined
roles when it comes to who will be managing the encryption
keys.
• In most cases, the only way to truly ensure confidentiality of
encrypted data that resides on a cloud provider's storage
servers is for the client to own and manage the data
encryption keys.
• Denial of service (DoS/DDoS attacks)
• Distributed denial-of-service attack (DDoS), like any denial-
of-service attack (DoS), has as its final goal to stop the
functioning of the targeted site so that no one can access it.
• The services of the targeted host connected to the internet
are then stopped temporarily, or even indefinitely.
• Advanced persistent threats (APTs)
• APTs are a parasitical form of cyber attack that penetrates
systems to establish a foothold in the IT infrastructure of
target companies, from which they steal data.
• APTs pursue their goals stealthily over extended periods of
time, often adapting to the security measures intended to
defend against them.
• Vulnerabilities in the system:
• The shortfalls in the virtual machines could be misused for
vulnerabilities.
• The virtual machines vulnerability includes hypervisors, VM
hopping, virtual machine-based rootkits, denial of service
attacks, data leakage, and more.
• The well-known existing vulnerabilities in the virtual machines
include buffer overflows, denial of service, execution of malicious
codes, and gain privileges.
• Another known vulnerability in the VMware products includes
the path traversal vulnerability. If it gets exploited, the attacker
will be able to control the guest VM image, break the access,
disrupt the flow if the VM host is not disabled.
• Loss of data:
• Apart from the malicious attacks, the data could be lost
permanently owing to accidental deletions, a physical
catastrophe like the fire or the earthquake.
• It is recommended to follow the best practices for preventing
hamper in business continuity and disaster recovery.
A few of the methods recommended are:
• Protect the data either at disk level or through scale-out storage
• Periodic backup of the data at cost-effective lower tier medium
• Journaled file system or checkpoint replication will enable to
recover data
• A journaling file system is a file system that keeps track of changes
not yet committed to the file system's main part by recording the
intentions of such changes in a data structure known as a "journal”
• Loss of Revenue:
• Whenever a news hits the headlines telling about the
company ABC’s data breach, invariably it affects the revenue
where we can expect about 50% drop in the first quarter.
• This loss is really huge for a company to recover.
• It is recommended that the company has to reduce the
unmanaged cloud usage and thereto its associated risks.
• The IT teams must understand the uploaded data, shared
data, and enforce adequate security and governing policies
to protect the data.
• The companies must be aware of the associated risks related
to the implementation of the cloud services and mitigate
them, take proactive approaches in securing the data, and
thus availing the clear benefits of the cloud.
• Other Potential Threats
• Alongside the potential security vulnerabilities relating directly to
the cloud service, there are also a number of external threats
which could cause an issue. Some of these are:
• Man in the Middle attacks – where a third party manages to
become a relay of data between a source and a destination. If
this is achieved, the data being transmitted can be altered.
• Distributed Denial of Service – a DDoS attack attempts to knock
a resource offline by flooding it with too much traffic.
• Account or Service Traffic Hijacking – a successful attack of this
kind could provide an intruder with passwords or other access
keys which allow them access to secure data.
• Software-as-a-Service Security
• Cloud access security brokerage
• Cloud access security brokerages (CASBs) are the “integrated
suites” of the SECaaS world.
• CASB vendors typically provide a range of services designed
to help your company protect cloud infrastructure and data
in whatever form it takes.
• According to McAfee, CASBs “are on-premises or cloud-
hosted software that sit between cloud service consumers
and cloud service providers to enforce security, compliance,
and governance policies for cloud applications.”
• These tools monitor and act as security for all of a company’s
cloud applications.
• Single sign-on
• Single sign-on (SSO) services give users the ability to access
all of their enterprise cloud apps with a single set of login
credentials.
• SSO also gives IT and network administrators a better ability
to monitor access and accounts.
• Email security
• It may not be the first application that comes to mind when
you think about outsourcing security, but a massive amount
of data travels in and out of your business through cloud-
based email servers.
• SECaaS providers that focus on email security can protect
you from the menagerie of threats and risks that are an
intrinsic part of email like malvertising, targeted attacks,
phishing, and data breaches.
• Malvertising (a portmanteau of "malicious advertising") is
the use of online advertising to spread malware.
• Phishing is a cybercrime in which a target or targets are
contacted by email, telephone or text message by someone
posing as a legitimate institution to lure individuals into
providing sensitive data such as personally identifiable
information, banking and credit card details, and passwords.
• Some email security tools are part of a larger platform, while
other vendors offer it as a standalone solution.
• Website and app Security
• Beyond protecting your data and infrastructure when using
cloud-based applications, you also need to protect the apps
and digital properties that you own and manage—like your
website.
• This is another area where traditional endpoint and firewall
protection will still leave you vulnerable to attacks, hacks,
and breaches.
• Tools and services in this category are usually designed to
expose and seal vulnerabilities in your external-facing
websites, web applications, or internal portals and intranets.
• Network security
• Cloud-based network security applications help your
business monitor traffic moving in and out of your servers
and stop threats before they materialize.
• You may already use a hardware-based firewall, but with a
limitless variety of threats spread across the internet today,
it’s a good idea to have multiple layers of security.
• Network security as a service, of course, means the vendor
would deliver threat detection and intrusion prevention
through the cloud.
• Security Governance
• Cloud security governance refers to the management model
that facilitates effective and efficient security management
and operations in the cloud environment so that an
enterprise's business targets are achieved.
• An organization's board is responsible (and accountable to
shareholders, regulators and customers) for the framework
of standards, processes and activities that, together, make
sure the organization benefits securely from Cloud
computing.
• Leading provider of information, books, products and
services that help boards develop, implement and maintain a
Cloud governance framework.
• Trust boundaries in the Cloud
• Organisations are responsible for their own information. The
nature of Cloud computing means that at some point the
organisation will rely on a third party for some element of the
security of its data.
• The point at which the responsibility passes from your
organisation to your supplier is called the ‘trust boundary’ and it
occurs at a different point for Infrastructure as a Service (IaaS),
Platform as a Service (PaaS) and Software as a Service (SaaS).
• Cloud Controls Matrix
• The Cloud Security Alliance Cloud Controls Matrix (CCM) is
specifically designed to provide fundamental security principles
to guide cloud vendors and to assist prospective cloud customers
in assessing the overall security risk of a cloud provider.
• The Cloud Security Alliance (CSA) developed and maintains the
Cloud Controls Matrix, a set of additional information security
controls designed specifically for Cloud services providers (CSPs),
and against which customers can carry out a security audit.
• Cloud security certification
• The CSA offers an open Cloud security certification process:
STAR (Security, Trust and Assurance Registry).
• This scheme starts with self-assessment and progresses
through process maturity to an externally certified maturity
scheme, supported by an open registry of information about
certified organizations.
• Continuity and resilience in the Cloud
• Cloud service providers are as likely to suffer operational outages
as any other organization.
• Physical infrastructure can also be negatively affected.
• Buyers of Cloud services should satisfy themselves that their
CSPs are adequately resilient against operational risks.
• ISO22301 is an appropriate business continuity standard.
• Data protection in the Cloud
• Data Processing Agreement (the "DPA") is an enclosure to the Terms
of Service (hereinafter referred to as Terms), agreed between the
Data Controller and the Data Processor in connection with
registration for the Service and regulates in detail the measures for
processing personal related data under commission.
• UK organisations that store personal data in the Cloud or that use
a CSP must currently comply with the DPA.
• However, since the GDPR came into effect on 25 May 2018, data
processors and data controllers are now accountable for the
security of the personal data they process.
• CSPs and organisations that use them will need to implement
appropriate technical and organisational measures to make sure
that processing meets the GDPR’s requirements and protects the
rights of data subjects.
• Enforcing cloud security governance policies
• As policies are developed, they need to be enforced.
• The enforcement of cloud security policies needs a
combination of people, processes, and technology working
together
• The people being stakeholders and the executive level
• The processes being the procedures for amending policies
when necessary, and
• The technology being the mechanisms that monitor
compliance with the policies.
• Each one of these factors is equally important
• yet some businesses still experience difficulties in enforcing
their frameworks due to a lack of support from stakeholders
and the executive level, failure to plan ahead for amending
policies when necessary or implementing inadequate
technologies for monitoring compliance with cloud security
governance policies.
• G-Cloud framework
• The UK government’s G-Cloud framework makes it faster and
cheaper for the public sector to buy Cloud services.
• Suppliers are approved by the Crown Commercial Service
(CCS) via the G-Cloud application process, which eliminates
the need for them to go through a full tender process
• Suppliers can sell Cloud services via an online catalogue
called the Digital Marketplace under three categories
• Cloud hosting – Cloud platform or infrastructure services.
• Cloud software – applications that are accessed over the
Internet and hosted in the Cloud.
• Cloud support – services to help buyers set up and maintain
their Cloud services.
• Virtual Machine Security
• There are challenges introduced by the dynamism of
virtualization in cloud:
• Dynamic relocation of virtual machines (VMs): Hypervisors
today move workloads based on the service level agreement
(SLA), energy policy, resiliency policy, and a host of other
reasons.
Increased infrastructure layers to manage and protect:
• Depending on the type of cloud model in use,
• there are a large number of additional infrastructure layers
such as gateways, firewalls, access routers, and others, that
need to be managed and protected,
• at the same time allowing access to the authorized users to
perform their tasks.
• Multiple operating systems and applications per server:
• On virtualized commodity hardware, multiple workloads on
a physical server run concurrently, with multiple operating
systems.
• Traditional security products encounter new challenges in the
virtualized world:
• Reconfiguration of virtual network: Some existing solutions might
require reconfiguration of the virtual network to allow for packet
sniffing and protocol examination.
• Packet sniffing is the act of capturing packets of data flowing across a
computer network.
• Visibility and control gaps
• Virtual servers not connected to the physical network are
invisible and unprotected.
• Lack of transparency.
• Static security controls are too rigid. As VMs are moved around
by the hypervisor, static controls need to be reapplied.
• No ability to deal with workload mobility exists.
• Resource overhead.
• Virtual Server Protection for VMware provides the following
benefits:
• Dynamic protection of every layer of infrastructure, mitigating
the risks introduced by virtualization.
• Meets regulatory and compliance requirements.
• Increases ROI of virtual infrastructure because it is easy to
deploy and maintain security.
• Integrated security benefits of Virtual Server Protection for VMware
are as follows:
• Transparency
• No reconfiguration of virtual network required
• No presence in guest OS
• Security consolidation
• Only one Security Virtual Machine (SVM) required per physical server
• 1:many protection ratio
• Automation
• Privileged presence gives SVM holistic view of the virtual network
• Protection applied automatically as each new VM comes online
• Efficiency
• Eliminates redundant processing tasks
• Protection for any guest OS
• IAM
• Identity and access management (IAM) in enterprise IT is about
defining and managing the roles and access privileges of individual
network users and the circumstances in which users are granted (or
denied) those privileges. Those users might be customers (customer
identity management) or employees (employee identity
management. The core objective of IAM systems is one digital identity
per individual. Once that digital identity has been established, it must
be maintained, modified and monitored throughout each user’s
“access lifecycle.
• Need IAM?
• Identity and access management is a critical part of any enterprise
security plan, as it is inextricably linked to the security and
productivity of organizations in today’s digitally enabled economy.
• Enterprises use identity management to safeguard their information
assets against the rising threats of ransomware, criminal hacking,
phishing and other malware attacks.
• Access management: Access management refers to the processes
and technologies used to control and monitor network access.
• Access management features, such as authentication, authorization,
trust and security auditing, are part and parcel of the top ID
management systems for both on-premises and cloud-based systems.
• Active Directory (AD): Microsoft developed AD as a user-identity
directory service for Windows domain networks.
• It runs on Windows Server and allows administrators to manage
permissions and access to network resources. Though proprietary,
AD is included in the Windows Server operating system and is thus
widely deployed.
• Biometric authentication: A security process for authenticating users
that relies upon the user’s unique characteristics. Biometric
authentication technologies include fingerprint sensors, iris and retina
scanning, and facial recognition.
• Context-aware network access control: Context-aware network
access control is a policy-based method of granting access to network
resources according to the current context of the user seeking access.
For example, a user attempting to authenticate from an IP address
that hasn’t been whitelisted would be blocked.
• Credential: An identifier employed by the user to gain access to a
network such as the user’s password, public key infrastructure
(PKI) certificate, or biometric information (fingerprint, iris scan).
• De-provisioning: The process of removing an identity from an ID
repository and terminating access privileges.
• Digital identity: The ID itself, including the description of the user
and his/her/its access privileges. (“Its” because an endpoint,
such as a laptop or smartphone, can have its own digital
identity.)
• Entitlement: The set of attributes that specify the access rights
and privileges of an authenticated security principal.
• Identity as a Service (IDaaS): Cloud-based IDaaS offers identity
and access management functionality to an organization’s
systems that reside on-premises and/or in the cloud.
• Identity lifecycle management: Identity lifecycle management
includes identity synchronization, provisioning, de-provisioning, and
the ongoing management of user attributes, credentials and
entitlements.
• Identity synchronization: The process of ensuring that multiple
identity stores—say, the result of an acquisition—contain consistent
data for a given digital ID.
• Lightweight Directory Access Protocol (LDAP): LDAP is open
standards-based protocol for managing and accessing a distributed
directory service, such as Microsoft’s AD.It is an internet protocol
works on TCP/IP, used to access information from directories. LDAP
protocol is basically used to access an active directory.
• Multi-factor authentication (MFA): MFA is when more than just a
single factor, such as a user name and password, is required for
authentication to a network or system.
• At least one additional step is also required, such as receiving a code
sent via SMS to a smartphone, inserting a smart card or USB stick, or
satisfying a biometric authentication requirement, such as a
fingerprint scan.
• Password reset: In this context, it’s a feature of an ID
management system that allows users to re-establish their own
passwords, relieving the administrators of the job and cutting
support calls.
• The reset application is often accessed by the user through a
browser. The application asks for a secret word or a set of
questions to verify the user’s identity.
• Privileged account management:
• This term refers to managing and auditing accounts and data access
based on the privileges of the user.
• In general terms, because of his or her job or function, a privileged
user has been granted administrative access to systems.
• A privileged user, for example, would be able set up and delete user
accounts and roles.
• Provisioning: The process of creating identities, defining their access
privileges and adding them to an ID repository.
• Risk-based authentication (RBA):
• Risk-based authentication dynamically adjusts authentication
requirements based on the user’s situation at the moment
authentication is attempted.
• For example, when users attempt to authenticate from a geographic
location or IP address not previously associated with them, those
users may face additional authentication requirements.
• Single sign-on (SSO): A type of access control for multiple related
but separate systems.
• With a single username and password, a user can access a
system or systems without using different credentials.
• User behavior analytics (UBA):
• UBA technologies examine patterns of user behavior and
automatically apply algorithms and analysis to detect important
anomalies that may indicate potential security threats. UBA differs
from other security technologies, which focus on tracking devices or
security events.
• UBA is also sometimes grouped with entity behavior analytics and
known as UEBA.
• IAM vendors
• The identity and access management vendor landscape is a crowded one,
consisting of both pureplay providers such as IBM, Microsoft and Oracle.
Below is a list of leading players based on Gartner’s Magic Quadrant for
Access Management, Worldwide, which was published in June 2017.
• Atos (Evidan)
• CA Technologies
• Centrify
• Covisint
• ForgeRock
• IBM Security Identity and Access Assurance
• I-Spring Innovations
• Micro Focus
• Security Standards.
• In this document we focus on the following standards. This list is
based on input from the ETSI working group on standards and the list
of cloud standards published by NIST. We grouped closely related
standards together for the sake of brevity.
• HTML/XML
• WSDL/SOAP
• SAML/XACML
• OAuth/OpenID
• OData
• OpenStack
• CAMP
• CIMI
• ODCA – SuoM
• SCAP
• ISO 27001
• ITIL
• SOC
• Tier Certification
• CSA CCM
• Characteristics of standards
• For each standard we will look at some key characteristics.
• These characteristics are not intended as means of qualification.
• Below for example, we may say that a standard is used only by a
limited number of organizations, or that a standard is not publicly
available, but this does not mean that the standard is inferior nor that
it is better, than other standards.
• Application domain
• We indicate the type of assets addressed by the standard, based on
the types of assets
• Infrastructure as a Service
• Platform as a Service
• Software as a Service
• Facilities
• Organization
• For example, we denote that the application domain of a standard is
IaaS, if the standard contains requirements for IaaS assets, such as
virtual machines or hypervisors.
• Similarly, we denote that a standard applies to Facilities if the
standard contains requirements for setting up or maintaining
facilities.
• Note that in the latter case the standard may be very relevant for
cloud computing services,without being specific to one type of cloud
service or the other.
• Usage/Adoption
• We indicate the estimate size of the user base, in terms of end-users
or services. We use three levels:
• Globally (xxx) – thousands of organizations worldwide
• Widely (xx) - hundreds of organizations, regional or worldwide
• Limited (x) – tens of organizations or less, for example in pilots
• Certification/auditing
• We indicate whether or not there is a certification framework, to
certify compliance with the standard,or, alternatively, whether or not
it is common to have third-party audits to certify compliance.
• We use three levels:
• Common (xxx): Audits are common and certification frameworks
exist.
• Sometimes (xx): Audits of compliance to the standard are sometimes
carried out.
• Hardly (x): De-facto standard. There is no audit or certification
asserting compliance.
• Availability/Openness
• We indicate whether or not the standard is public and open, in terms
of access and in terms of the review process.
• We distinguish three levels:
• Fully open (xxx) - Open consultation for drafts (like W3C, IETF, OASIS,
etc.), and open access to final versions (or for a small fee, less than
100 euro).
• Partially open (xx) - Consultation is closed/membership, but there is
open access to the standard.
• Closed (x) – Consultations are not open to the public, and the
standard is not public either (or there is a substantial fee, more than
100 euro).
• Existing standards
• Cloud computing services are much more standard than traditional IT
deployments and most cloud computing services have highly
interoperable (and standard) interfaces.
• We mention some key standards, which allow customers to move
data and processes more easily to other providers or to fall back on
back-up services:
• Like for other products and services, contracts and/or SLAs for cloud
services are hardly standardized.
• The so-called ‘fine-print’ in contracts often hides important
conditions and exceptions, and the terminology used in contracts or
SLAs is often different from one provider to another.
• This means that customers have to read each contract and SLA in
detail and sometimes consult a legal expert to understand clauses.
• Even if the customer has access to legal advice, it is often
unpredictable how certain wordings in agreements will be interpreted
in court.
• The standardization of IT services in cloud computing might enable
further standardization of contracts and SLAs also.
• We mention standards that defines specific standard service
levels:
• HTML / XML allow users to integrate different cloud services
• WSDL/SOAP is an interface standard (which uses XML) which
enables interoperability between products and services
• OAuth/OpenID and SAML/XACML allows customers to
integrate a cloud service with other (existing) IDM solutions,
allowing easier integration of an identity provider with other
(existing) websites (SaaS for example).
• SAML/XACML provides users with an interface to manage
the provision of identification and user authentication
between user and provider.
• OData is a standard allows customers to integrate a (IaaS or
SaaS) cloud service with other (existing) services with cloud
ones, making the integration of this kind of service easier.
• OVF is a standard format for virtual machines. OVF allows
customers to use existing virtual machines, and move virtual
machine images more easily from one provider to another.
• SUoM is a standard developed by the Open Data Center
Alliance, which describes a set of standard service
parameters for IaaS, in 4 different levels (Bronze, Silver, Gold,
Platinum), and covers among other things, security,
availability, and elasticity.
• OpenStack is a standard software stack for IaaS. OpenStack
dashboard could be used also to monitor the usage of cloud
resources and it provides a standard API for managing cloud
resources.
• CAMP provides users with artifacts and APIs to manage the
provision of resources of PaaS provider. During the life of the
service, CAMP supports the modification of PaaS resources,
according to user needs.
• CIMI provides users with an interface to manage the
provision of resources of IaaS provider. During the life of the
service, CIMI supports the modification of IaaS resources,
according to user needs.
UNIT V
CLOUD TECHNOLOGIES AND ADVANCEMENTS
• Hadoop
• Apache Hadoop is an open source software framework used to develop
data processing applications which are executed in a distributed computing
environment.
• Applications built using HADOOP are run on large data sets distributed
across clusters of commodity computers.
• Commodity computers are mainly useful for achieving greater
computational power at low cost.
• Similar to data residing in a local file system of a personal computer
system, in Hadoop, data resides in a distributed file system which is called
as a Hadoop Distributed File system.
• Apache Hadoop consists of two sub-projects –
• Hadoop MapReduce:
• MapReduce is a computational model and software
framework for writing applications which are run on Hadoop.
• These MapReduce programs are capable of processing
enormous data in parallel on large clusters of computation
nodes.
• HDFS (Hadoop Distributed File System):
• HDFS takes care of the storage part of Hadoop applications.
• MapReduce applications consume data from HDFS.
• HDFS creates multiple replicas of data blocks and distributes
them on compute nodes in a cluster.
• This distribution enables reliable and extremely rapid
computations.
• NameNode and DataNodes
• HDFS has a master/slave architecture.
• A HDFS cluster consists of a single NameNode, a master server that
manages the file system namespace and regulates access to files by
clients.
• In addition, there are a number of DataNodes, usually one per node
in the cluster, which manage storage attached to the nodes that they
run on.
• HDFS exposes a file system namespace and allows user data to be
stored in files.
• Internally, a file is split into one or more blocks and these blocks are
stored in a set of DataNodes.
• The NameNode executes file system namespace operations like
opening, closing, and renaming files and directories.
• It also determines the mapping of blocks to DataNodes.
• The DataNodes are responsible for serving read and write requests
from the file system’s clients.
• The DataNodes also perform block creation, deletion, and replication
upon instruction from the NameNode.
Functions of NameNode:
• It is the master daemon that maintains and manages the DataNodes (slave
nodes)
• It records the metadata of all the files stored in the cluster, e.g.
The location of blocks stored, the size of the files, permissions, hierarchy,
etc. There are two files associated with the metadata:
• FsImage: It contains the complete state of the file system namespace since the start
of the NameNode.
• EditLogs: It contains all the recent modifications made to the file system with respect
to the most recent FsImage.
• It records each change that takes place to the file system metadata. For
example, if a file is deleted in HDFS, the NameNode will immediately record
this in the EditLog.
• It regularly receives a Heartbeat and a block report from all the
DataNodes in the cluster to ensure that the DataNodes are live.
• It keeps a record of all the blocks in HDFS and in which nodes these
blocks are located.
• The NameNode is also responsible to take care of
the replication factor of all the blocks.
• In case of the DataNode failure, the NameNode chooses new
DataNodes for new replicas, balance disk usage and manages the
communication traffic to the DataNodes.
• The NameNode and DataNode are pieces of software designed to run on
commodity machines.
• These machines typically run a GNU/Linux operating system (OS). HDFS is
built using the Java language;
• Any machine that supports Java can run the NameNode or the DataNode
software.
• A typical deployment has a dedicated machine that runs only the
NameNode software.
• Each of the other machines in the cluster runs one instance of the
DataNode software.
• The architecture does not preclude running multiple DataNodes on the
same machine but in a real deployment that is rarely the case.
DataNodes
• DataNodes are the slave nodes in HDFS. Unlike NameNode, DataNode is a
commodity hardware, that is, a non-expensive system which is not of high
quality or high-availability. The DataNode is a block server that stores the
data in the local file ext3 or ext4.
• Functions of DataNode:
• These are slave daemons or process which runs on each slave machine.
• The actual data is stored on DataNodes.
• The DataNodes perform the low-level read and write requests from the file
system’s clients.
• They send heartbeats to the NameNode periodically to report the overall
health of HDFS, by default, this frequency is set to 3 seconds
• The existence of a single NameNode in a cluster greatly simplifies the
architecture of the system.
• The NameNode is the arbitrator and repository for all HDFS metadata.
• The system is designed in such a way that user data never flows
through the NameNode.
• Secondary NameNode:
• Apart from these two daemons, there is a third daemon or a process
called Secondary NameNode. The Secondary NameNode works
concurrently with the primary NameNode as a helper daemon. And
don’t be confused about the Secondary NameNode being a backup
NameNode because it is not.
• Functions of Secondary NameNode:
• The Secondary NameNode is one which constantly reads all the file
systems and metadata from the RAM of the NameNode and writes it
into the hard disk or the file system.
• It is responsible for combining the EditLogs with FsImage from the
NameNode.
• It downloads the EditLogs from the NameNode at regular intervals
and applies to FsImage. The new FsImage is copied back to the
NameNode, which is used whenever the NameNode is started the
next time
• Blocks:
• Now, as we know that the data in HDFS is scattered across the
DataNodes as blocks. Let’s have a look at what is a block and how is
it formed?
• Blocks are the nothing but the smallest continuous location on your
hard drive where data is stored. In general, in any of the File System,
you store the data as a collection of blocks. Similarly, HDFS stores
each file as blocks which are scattered throughout the Apache
Hadoop cluster. The default size of each block is 128 MB in Apache
Hadoop 2.x (64 MB in Apache Hadoop 1.x) which you can configure as
per your requirement.
• t is not necessary that in HDFS, each file is stored in exact multiple of
the configured block size (128 MB, 256 MB etc.). Let’s take an
example where I have a file “example.txt” of size 514 MB as shown in
above figure. Suppose that we are using the default configuration of
block size, which is 128 MB. Then, how many blocks will be created?
5, Right. The first four blocks will be of 128 MB. But, the last block will
be of 2 MB size only.
• The File System Namespace
• HDFS supports a traditional hierarchical file organization.
• A user or an application can create directories and store files inside
these directories.
• The file system namespace hierarchy is similar to most other existing
file systems;
• one can create and remove files, move a file from one directory to
another, or rename a file.
• HDFS supports user access permissions.
• While HDFS follows naming convention of the FileSystem, some paths
and names (e.g. /.reserved and .snapshot ) are reserved.
• The NameNode maintains the file system namespace.
• Any change to the file system namespace or its properties is recorded
by the NameNode.
• An application can specify the number of replicas of a file that should
be maintained by HDFS.
• The number of copies of a file is called the replication factor of that
file. This information is stored by the NameNode.
• Data Replication
• HDFS is designed to reliably store very large files across machines in a large
cluster.
• It stores each file as a sequence of blocks. The blocks of a file are
replicated for fault tolerance.
• The block size and replication factor are configurable per file.
• All blocks in a file except the last block are the same size
• HDFS provides a reliable way to store huge data in a distributed
environment as data blocks. The blocks are also replicated to provide fault
tolerance. The default replication factor is 3 which is again configurable.
• Files in HDFS are write-once (except for appends and truncates) and
have strictly one writer at any time.
• The NameNode makes all decisions regarding replication of blocks.
• It periodically receives a Heartbeat and a Blockreport from each of
the DataNodes in the cluster.
• Receipt of a Heartbeat implies that the DataNode is functioning
properly.
• A Blockreport contains a list of all blocks on a DataNode
• Replication
• The placement of replicas is critical to HDFS reliability and
performance.
• Optimizing replica placement distinguishes HDFS from most other
distributed file systems.
• This is a feature that needs lots of tuning and experience.
• The purpose of a rack-aware replica placement policy is to improve
data reliability, availability, and network bandwidth utilization.
• Now, the following protocol will be followed whenever the data is
written into HDFS:
• At first, the HDFS client will reach out to the NameNode for a Write
Request against the two blocks, say, Block A & Block B.
• The NameNode will then grant the client the write permission and
will provide the IP addresses of the DataNodes where the file blocks
will be copied eventually.
• The selection of IP addresses of DataNodes is purely randomized
based on availability, replication factor and rack awareness that we
have discussed earlier.
• Let’s say the replication factor is set to default i.e. 3. Therefore, for
each block the NameNode will be providing the client a list of (3) IP
addresses of DataNodes. The list will be unique for each block.
• Suppose, the NameNode provided following lists of IP addresses to
the client:
• For Block A, list A = {IP of DataNode 1, IP of DataNode 4, IP of DataNode 6}
• For Block B, set B = {IP of DataNode 3, IP of DataNode 7, IP of DataNode 9}
• Each block will be copied in three different DataNodes to maintain the
replication factor consistent throughout the cluster.
• Replica Selection
• To minimize global bandwidth consumption and read latency, HDFS
tries to satisfy a read request from a replica that is closest to the
reader.
• If HDFS cluster spans multiple data centers, then a replica that is
resident in the local data center is preferred over any remote replica.
• Files in HDFS are write-once .
• The NameNode makes all decisions regarding replication of blocks.
• It periodically receives a Heartbeat and a Blockreport from each of
the DataNodes in the cluster.
• Receipt of a Heartbeat implies that the DataNode is functioning
properly.
• A Blockreport contains a list of all blocks on a DataNodes
MapReduce
• Hadoop MapReduce (Hadoop Map/Reduce) is a software framework
for distributed processing of large data sets on computing clusters.
• It is a sub-project of the Apache Hadoop project.
• Apache Hadoop is an open-source framework that allows to store and
process big data in a distributed environment across clusters of
computers using simple programming models.
• MapReduce is the core component for data processing in Hadoop
framework.
• Mapreduce helps to split the input data set into a number of parts
and run a program on all data parts parallel at once.
• The term MapReduce refers to two separate and distinct tasks.
• The first is the map operation, takes a set of data and converts it into
another set of data, where individual elements are broken down into
tuples (key/value pairs).
• The reduce operation combines those data tuples based on the key
and accordingly modifies the value of the key.
• Map Task
The Map task run in the following phases:-
a. RecordReader
• The recordreader transforms the input split into records.
• It provides the data to the mapper function in key-value pairs.
• Usually, the key is the positional information and value is the data
that comprises the record.
Types of Hadoop RecordReader in
MapReduce
• The RecordReader instance is defined by the InputFormat.
• By default, it uses TextInputFormat for converting data into a key-
value pair. TextInputFormat provides 2 types of RecordReaders:
i. LineRecordReader
ii. SequenceFileRecordReader
• b. Map
• In this phase, the mapper which is the user-defined function
processes the key-value pair from the recordreader.
• It produces zero or multiple intermediate key-value pairs.
• The key is usually the data on which the reducer function does the
grouping operation.
• And value is the data which gets aggregated to get the final result in
the reducer function.
• c. Combiner
• The combiner is actually a localized reducer which groups the data in
the map phase. It is optional.
• Combiner takes the intermediate data from the mapper and
aggregates them.
• It does so within the small scope of one mapper.
• In many situations, this decreases the amount of data needed to
move over the network. For example, moving (Hello World, 1) three
times consumes more network bandwidth than moving
(Hello World, 3).
Reduce Task
• The various phases in reduce task are as follows:
i. Shuffle and Sort
• The reducer starts with shuffle and sort step.
• This step sorts the individual data pieces into a large data list.
• The purpose of this sort is to collect the equivalent keys together.
• ii. Reduce
• The reducer performs the reduce function once per key
grouping.
• The framework passes the function key and an iterator object
containing all the values pertaining to the key.
• We can write reducer to filter, aggregate and combine data in a
number of different ways.
• Once the reduce function gets finished it gives zero or more key-
value pairs to the output format.
• iii. Output Format
• This is the final step.
• It takes the key-value pair from the reducer and writes it to the file by
record writer.
• By default, it separates the key and value by a tab and each record by
a newline character.
• Final data gets written to HDFS.
Virtual Box
• VirtualBox is opensource software for virtualizing the X86 computing
architecture.
• It acts as a hypervisor, creating a VM (Virtual Machine) in which the
user can run another OS (operating system).
• The operating system in which VirtualBox runs is called the "host" OS.
• The operating system running in the VM is called the "guest" OS.
VirtualBox supports Windows, Linux, or macOS as its host OS.
• Why Is VirtualBox Useful?
• One:
• VirtualBox allows you to run more than one operating system at a
time.
• This way, you can run software written for one operating system on
another (for example, Windows software on Linux or a Mac) without
having to reboot to use it (as would be needed if you used
partitioning and dual-booting).
• Two:
• By using a VirtualBox feature called “snapshots”, you can save a
particular state of a virtual machine and revert back to that state, if
necessary.
• This way, you can freely experiment with a computing environment.
• If something goes wrong (e.g. after installing misbehaving software or
infecting the guest with a virus), you can easily switch back to a
previous snapshot and avoid the need of frequent backups and
restores.
• Three:
• Software vendors can use virtual machines to ship entire software
configurations. For example, installing a complete mail server solution
on a real machine can be a tedious task (think of rocket science!).
• With VirtualBox, such a complex setup (then often called an
“appliance”) can be packed into a virtual machine. Installing and
running a mail server becomes as easy as importing such an appliance
into VirtualBox.
• Along these same lines, I find the “clone” feature of virtual box just
awesome!
• Four:
• On an enterprise level, virtualization can significantly reduce
hardware and electricity costs.
• Most of the time, computers today only use a fraction of their
potential power and run with low average system loads.
• A lot of hardware resources as well as electricity is thereby wasted.
• So, instead of running many such physical computers that are only
partially used, one can pack many virtual machines onto a few
powerful hosts and balance the loads between them.
VirtualBox Terminology
• When dealing with virtualization, it helps towards oneself with a bit of
crucial terminology, especially the following terms:
• Host Operating System (Host OS):
• The operating system of the physical computer on which VirtualBox
was installed. There are versions of VirtualBox for Windows, Mac OS ,
Linux and Solaris hosts.
• Guest Operating System (Guest OS):
• The operating system that is running inside the virtual machine.
• Virtual Machine (VM):
• We’ve used this term often already. It is the special environment that
VirtualBox creates for your guest operating system while it is running.
In other words, you run your guest operating system “in” a VM.
Normally, a VM will be shown as a window on your computers
desktop, but depending on which of the various frontends of
VirtualBox you use, it can be displayed in full screen mode or
remotely on another computer.
Google App Engine
• Google App Engine is a Platform as a Service and cloud computing
platform for developing and hosting web applications in Google-
managed data centers.
• App Engine is a fully managed, serverless platform for developing and
hosting web applications at scale.
• You can choose from several popular languages, libraries, and
frameworks to develop your apps, then let App Engine take care of
provisioning servers and scaling your app instances based on demand
• The App Engine requires that apps be written in Java or Python, store
data in Google BigTable and use the Google query language.
• Google App Engine provides more infrastructure than other
scalable hosting services such as Amazon Elastic Compute
Cloud (EC2).
• The App Engine also eliminates some system administration
and developmental tasks to make it easier to write scalable
applications.
• Google App Engine is free up to a certain amount of resource
usage.
• Users exceeding the per-day or per-minute usage rates for
CPU resources, storage, number of API calls or requests and
concurrent requests can pay for more of these resources.
• Modern web applications
• Quickly reach customers and end users by deploying web
apps on App Engine.
• With zero-config deployments and zero server management,
App Engine allows you to focus on writing code.
• Plus, App Engine automatically scales to support sudden
traffic spikes without provisioning, patching, or monitoring.
• Features
• Popular languages
• Build your application in Node.js, Java, Ruby, C#, Go, Python, or PHP—
or bring your own language runtime.
• Open and flexible
• Custom runtimes allow you to bring any library and framework to App
Engine.
• Fully managed
• A fully managed environment lets you focus on code while App
Engine manages infrastructure concerns.
• Powerful application diagnostics
• Use Cloud Monitoring and Cloud Logging to monitor the health and
performance of your app and Cloud Debugger and Error Reporting to
diagnose and fix bugs quickly.
• Application versioning
• Easily host different versions of your app, easily create development,
test, staging, and production environments.
• Application security
• Help safeguard your application by defining access rules with App
Engine firewall and leverage managed SSL/TLS certificates by default
on your custom domain at no additional cost.
• Advantages of Google App Engine
• There are many advantages to the Google App Engine that helps to
take your app ideas to the next level. This includes:
• Infrastructure for Security
• Around the world, the Internet infrastructure that Google has is
probably the most secure. There is rarely any type of unauthorized
access to date as the application data and code are stored in highly
secure servers.
• Quick to Start
• With no product or hardware to purchase and maintain, you can
prototype and deploy the app to your users without taking much
time.
• Easy to Use
• Google App Engine (GAE) incorporates the tools that you need to
develop, test, launch, and update the applications.
• Scalability
• Regardless of the amount of data or number of users that your app
stores, the app engine can meet your needs by scaling up or down as
required.
• Performance and Reliability
• Google is among the leaders worldwide among global brands. So, when
you discuss performance and reliability you have to keep that in mind. In
the past 15 years, the company has created new benchmarks based on its
services’ and products’ performance. The app engine provides the same
reliability and performance as any other Google product.
• Cost Savings
• You don’t have to hire engineers to manage your servers or to do that
yourself. You can invest the money saved into other parts of your business.
• Platform Independence
• You can move all your data to another environment without any difficulty
as there are not many dependencies on the app engine platform.
• Open Stack
• OpenStack is a free open standard cloud computing platform, mostly
deployed as infrastructure-as-a-service in both public and private
clouds where virtual servers and other resources are made available
to users.
• OpenStack is a set of software tools for building and managing cloud
computing platforms for public and private clouds.
• OpenStack is managed by the OpenStack Foundation, a non-profit
that oversees both development and community-building around the
project.
• Introduction to OpenStack
• OpenStack lets users deploy virtual machines and other instances that
handle different tasks for managing a cloud environment on the fly.
• It makes horizontal scaling easy, which means that tasks that benefit
from running concurrently can easily serve more or fewer users on
the fly by just spinning up more instances.
• For example, a mobile application that needs to communicate with a
remote server might be able to divide the work of communicating
with each user across many different instances, all communicating
with one another but scaling quickly and easily as the application
gains more users.
• And most importantly, OpenStack is open source software, which
means that anyone who chooses to can access the source code, make
any changes or modifications they need, and freely share these
changes.
• It also means that OpenStack has the benefit of thousands of
developers all over the world working in tandem to develop the
strongest, most robust, and most secure product that they can.
• How is OpenStack used in a cloud environment?
• The cloud is all about providing computing for end users in a remote
environment, where the actual software runs as a service on reliable and
scalable servers rather than on each end-user's computer.
• Cloud computing can refer to a lot of different things, but typically the
industry talks about running different items "as a service"—software,
platforms, and infrastructure.
• OpenStack is considered Infrastructure as a Service (IaaS).
• Providing infrastructure means that OpenStack makes it easy for users to
quickly add new instance, upon which other cloud components can run.
• Typically, the infrastructure then runs a "platform" upon which a developer
can create software applications that are delivered to the end users.
• What are the components of OpenStack?
• Because of its open nature, anyone can add additional components to
OpenStack to help it to meet their needs.
• But the OpenStack community has collaboratively identified nine key
components that are a part of the "core" of OpenStack, which are
distributed as a part of any OpenStack system and officially
maintained by the OpenStack community.
• Nova is the primary computing engine behind OpenStack. It is used
for deploying and managing large numbers of virtual machines and
other instances to handle computing tasks.
• Swift is a storage system for objects and files.
• The OpenStack Object Store project, known as Swift, offers cloud
storage software so that you can store and retrieve lots of data with a
simple API.
• It's built for scale and optimized for durability, availability, and
concurrency across the entire data set.
• Swift is ideal for storing unstructured data that can grow without
bound.
• Cinder is a block storage component, which is more analogous to the
traditional notion of a computer being able to access specific
locations on a disk drive. This more traditional way of accessing files
might be important in scenarios in which data access speed is the
most important consideration.
• Neutron provides the networking capability for OpenStack. It helps to
ensure that each of the components of an OpenStack deployment can
communicate with one another quickly and efficiently.
• Horizon is the dashboard behind OpenStack.
• It is the only graphical interface to OpenStack, so for users wanting to
give OpenStack a try, this may be the first component they actually
“see.”
• Developers can access all of the components of OpenStack
individually through an application programming interface (API), but
the dashboard provides system administrators a look at what is going
on in the cloud, and to manage it as needed.
• Keystone provides identity services for OpenStack. It is essentially a
central list of all of the users of the OpenStack cloud, mapped against
all of the services provided by the cloud, which they have permission
to use. It provides multiple means of access, meaning developers can
easily map their existing user access methods against Keystone.
• Glance provides image services to OpenStack. In this case, "images"
refers to images (or virtual copies) of hard disks.
• Prerequisite for minimum production deployment
• There are some basic requirements you’ll have to meet to deploy
OpenStack. Here are the prerequisites, drawn from the OpenStack
manual.
• Hardware: For OpenStack controller node, 12 GB RAM are needed as
well as a disk space of 30 GB to run OpenStack services. Two
SATA(Serial Advanced Technology Attachment) disks of 2 TB will be
necessary to store volumes used by instances. Communication with
compute nodes requires a network interface card (NIC) of 1 Gbps.
• Operating system (OS):
• OpenStack supports the following operating systems: Debian, Fedora,
Red Hat Enterprise Linux (RHEL), openSUSE, SLES Linux Enterprise
Server and Ubuntu.
• Federation in the Cloud
• Cloud federation is the practice of interconnecting
the cloud computing environments of two or more service providers
for the purpose of load balancing traffic and accommodating spikes in
demand. Cloud federation requires one provider to wholesale or rent
computing resources to another cloud provider.
• “Cloud federation manages consistency and access controls when two
or more independent geographically distinct Clouds share either
authentication, files, computing resources, command and control or
access to storage resources.”
• Cloud federation introduces additional issues that have to be
addressed in order to provide a secure environment in which to move
applications and services among a collection of federated providers.
• Baseline security needs to be guaranteed across all cloud vendors
that are part of the federation.
• An interesting aspect is represented by the management of the digital
identity across diverse organizations, security domains, and
application platforms.
• In particular, the term federated identity management refers to
standards-based approaches for handling authentication, single sign-
on (SSO), role-based access control, and session management in a
federated environment .
• No matter the specific protocol and framework, two main approaches can
be considered:
• Centralized federation model
• This is the approach taken by several identity federation standards. It
distinguishes two operational roles in an SSO transaction: the identity
provider and the service provider.
• Claim-based model
• This approach addresses the problem of user authentication from a
different perspective and requires users to provide claims answering who
they are and what they can do in order to access content or complete a
transaction.
• The first model is currently used today; the second constitutes a
future vision for identity management in the cloud.
• Digital identity management constitutes a fundamental aspect of
security management in a cloud federation.
• To transparently perform operations across different administrative
domains, it is of mandatory importance to have a robust framework
for authentication and authorization, and federated identity
management addresses this issue.
• Federated identity management allows us to tie together the
computing stacks of different vendors and present them as a single
environment to users from a security point of view.
OpenNebula:
• OpenNebula is a cloud computing platform for managing
heterogeneous distributed data center infrastructures.
• The OpenNebula platform manages a data center's virtual
infrastructure, to build private, public and hybrid implementations of
Infrastructure as a Service.
• Much research work has been developed around OpenNebula.
• For example, the University of Chicago has come up with an advance
reservation system called Haizea Lease Manager.
• IBM Haifa has developed a policy-driven probabilistic admission
control and dynamic placement optimization for site level
management policies called the RESERVOIR Policy Engine
• Nephele is an SLA-driven automatic service management tool
developed by Telefonica and Virtual Cluster Tool for atomic cluster
management with versioning with multiple transport protocols from
CRS4 Distributed Computing Group.
Development
• OpenNebula follows a rapid release cycle to improve user satisfaction
by rapidly delivering features and innovations based on user
requirements and feedback.
• In other words, giving customers what they want more quickly, in
smaller increments, while additionally increasing technical quality.
• Major upgrades generally occur every 3-5 years and each upgrade
generally has 3-5 updates.
• Cloud Federations and Server Coalitions
• In large-scale systems, coalition formation supports more
effective use of resources, as well as convenient means to access
these resources.
• It is therefore not surprising that coalition formation for
computational grids has been investigated in the past.
• The interest in grid computing is fading away, while cloud
computing is widely accepted today and its adoption by more
and more institutions and individuals seems to be guaranteed at
least for the foreseeable future.
• Two classes of applications of cloud coalitions are reported in the
literature:
• 1.Coalitions among CSPs for the formation of cloud federations. A
cloud federation is an infrastructure allowing a group of CSPs to share
resources; the goal is to balance the load and improve system
reliability.
• 2.Coalitions among the servers of a data center. The goal is to
assemble a pool of resources larger than the ones available from a
single server.
• In recent years the number of CSPs has increased significantly. The
question if they should cooperate to share their resources led to the
idea of cloud federations, groups of CSPs who have agreed on a set of
common standards and are able to share their resources.
• Cloud coalition formation raises a number of technical, as well as
nontechnical problems.
• Cloud federations require a set of standards.
• The cloud computing landscape is still evolving and an early
standardization may slowdown and negatively affects the adoption of
new ideas and technologies.
• At the same time, CSPs want to maintain their competitive
advantages by closely guarding the details of their internal algorithms
and protocols.
• Four Levels of Federation
• Creating a cloud federation involves research and development at
different levels: conceptual, logical and operational, and
infrastructural.
• Figure provides a comprehensive view of the challenges faced in
designing and implementing an organizational structure that
coordinates together, cloud services that belong to different
administrative domains and makes them operate within a
context of a single unified service middleware.
• Each cloud federation level presents different challenges and
operates at a different layer of the IT stack.
• It then requires the use of different approaches and
technologies.
• CONCEPTUAL LEVEL
• The conceptual level addresses the challenges in presenting a cloud
federation as a favourable solution.
• In this level it is important to clearly identify the advantages for either
service providers or service consumers in joining a federation.
• To describe the new opportunities that a federated environment
creates.
• Elements of concern at this level are:
• Motivations for cloud providers to join a federation.
• Motivations for service consumers to influence a federation.
• Advantages for providers in leasing their services to other providers.
• Responsibilities of providers once they have joined the federation.
• Trust agreements between providers.
• Transparency versus consumers.
• Among these aspects, the most relevant are the motivations of both
service providers and consumers in joining a federation.
• LOGICAL & OPERATIONAL LEVEL
• The logical and operational level of a federated cloud identifies and
addresses the challenges in creating a framework that enables the
aggregation of providers that belong to different administrative domains
• At this level, policies and rules for interoperation are defined.
• Moreover, this is the layer at which decisions are made as to how and
when to lease a service to—or to leverage a service from— another
provider.
• The logical component defines a context in which agreements among
providers are settled and services are conveyed, whereas the operational
component characterizes and shapes the dynamic behaviour of the
federation as a result of the single providers’ choices.
It is important at this level to address the following challenges:
• How should a federation be represented?
• How should we model and represent a cloud service, a cloud
provider, or an agreement?
• How should we define the rules and policies that allow providers
to join a federation?
• What are the mechanisms in place for settling agreements
among providers?
• What are provider’s responsibilities with respect to each other?
• When should providers and consumers take advantage of the
federation?
• Which kinds of services are more likely to be leased or
bought?
• How should we price resources that are leased, and which
fraction of resources should we lease?
• INFRASTRUCTURE LEVEL
• The infrastructural level addresses the technical challenges involved
in enabling heterogeneous cloud computing systems to interoperate
seamlessly.
• It deals with the technology barriers that keep separate cloud
computing systems belonging to different administrative domains.
• By having standardized protocols and interfaces, these barriers can be
overcome.
At this level it is important to address the following issues:
• What kind of standards should be used?
• How should design interfaces and protocols be designed for
interoperation?
• Which are the technologies to use for interoperation?
• How can we realize a software system, design platform components, and
services enabling interoperability?
Interoperation and composition among different cloud computing vendors is
possible only by means of open standards and interfaces. Moreover,
interfaces and protocols change considerably at each layer of the Cloud
Computing Reference Model.
Future of Federation
• The federated cloud model is a force for real democratization
in the cloud market.
• It’s how businesses will be able to use local cloud providers to
connect with customers, partners and employees anywhere in
the world.
• It’s how end users will finally get to realize the promise of the
cloud.
• And, it’s how data center operators and other service providers
will finally be able to compete with, and beat, today’s so-called
global cloud providers.
• The future of cloud computing as one big public cloud.
• Others believe that enterprises will ultimately build a single large
cloud to host all their corporate services.
• This is, of course, because the benefit of cloud computing is
dependent on large – very large – scale infrastructure, which provides
administrators and service administrators and consumers the ability
for ease of deployment, self service, elasticity, resource pooling and
economies of scale.
• However, as cloud continues to evolve – so do the services being
offered.
• Cloud Services & Hybrid Clouds
• Services are now able to reach a wider range of consumers, partners,
competitors and public audiences.
• It is also clear that storage, compute power, streaming, analytics and
other advanced services are best served when they are in an
environment tailored for the proficiency of that service.
• One method of addressing the need of these service environments is
through the advent of hybrid clouds.
• Hybrid clouds, by definition, are composed of multiple distinct cloud
infrastructures connected in a manner that enables services and data
access across the combined infrastructure.
• The intent is to leverage the additional benefits that hybrid cloud
offers without disrupting the traditional cloud benefits.
• While hybrid cloud benefits come through the ability to distribute the
work stream, the goal is to continue to realize the ability for managing
peaks in demand, to quickly make services available and capitalize on
new business opportunities.
• The Solution: Federation
• Federation creates a hybrid cloud environment with an increased focus on
maintaining the integrity of corporate policies and data integrity.
• Think of federation as a pool of clouds connected through a channel of
gateways;
• gateways which can be used to optimize a cloud for a service or set of
specific services.
• Such gateways can be used to segment service audiences or to limit access
to specific data sets.
• In essence, federation has the ability for enterprises to service their
audiences with economy of scale without exposing critical applications or
vital data through weak policies or vulnerabilities.
• Many would raise the question: if Federation creates multiples of
clouds, doesn’t that mean cloud benefits are diminished?
• I believe the answer is no, due to the fact that a fundamental change
has transformed enterprises through the original adoption of cloud
computing, namely the creation of a flexible environment able to
adapt rapidly to changing needs based on policy and automation.
• Cloud end-users are often tied to a unique cloud provider, because of
the different APIs, image formats, and access methods exposed by
different providers that make very difficult for an average user to
move its applications from one cloud to another, so leading to a
vendor lock-in problem.
• Many SMEs have their own on-premise private cloud infrastructures
to support the internal computing necessities and workloads. These
infrastructures are often over-sized to satisfy peak demand periods,
and avoid performance slow-down. Hybrid cloud (or cloud bursting)
model is a solution to reduce the on-premise infrastructure size, so
that it can be dimensioned for an average load, and it is
complemented with external resources from a public cloud provider
to satisfy peak demands.
• Many big companies (e.g. banks, hosting companies, etc.) and also
many large institutions maintain several distributed data-centers or
server-farms, for example to serve to multiple geographically
distributed offices, to implement HA, or to guarantee server proximity
to the end user.
• Resources and networks in these distributed data-centers are usually
configured as non-cooperative separate elements.
• Many educational and research centers often deploy their own
computing infrastructures, that usually do not cooperate with other
institutions, except in same punctual situations (e.g. in joint projects
or initiatives).
• Many times, even different departments within the same institution
maintain their own non-cooperative infrastructures.This Study Group
will evaluate the main challenges to enable the provision of federated
cloud infrastructures, with special emphasis on inter-cloud
networking and security issues:
• Security and Privacy
• Interoperability and Portability
• Performance and Networking Cost
• The first key action aims at “Cutting through the Jungle of Standards”
to help the adoption of cloud computing by encouraging compliance
of cloud services with respect to standards and thus providing
evidence of compliance to legal and audit obligations.
• These standards aim to avoid customer lock in by promoting
interoperability, data portability and reversibility.
• The second key action “Safe and Fair Contract Terms and Conditions”
aims to protect the cloud consumer from insufficiently specific and
balanced contracts with cloud providers that do not “provide for
liability for data integrity, confidentiality or service continuity”.
• The cloud consumer is often presented with "take-it-or-leave-it
standard contracts that might be cost-saving for the provider but is
often undesirable for the user”.
• Interface: Various cloud service providers have different APIs, pricing
models and cloud infrastructure.
• Open cloud computing interface is necessary to be initiated to
provide a common application programming interface for multiple
cloud environments.
• The simplest solution is to use a software component that allows the
federated system to connect with a given cloud environment.
• Trusted Servers
• In order to make it easier to find people on other servers we introduced
the concept of “trusted servers” as one of our last steps.
• This allows administrator to define other servers they trust.
• If two servers trust each other they will sync their user lists.
• This way the share dialogue can auto-complete not only local users but
also users on other trusted servers.
• The administrator can decide to define the lists of trusted servers manually
or allow the server to auto add every other server to which at least one
federated share was successfully created.
• This way it is possible to let your cloud server learn about more and more
other servers over time, connect with them and increase the network of
trusted servers.
• Open Challenges: where we’re taking Federated Cloud Sharing
• Of course there are still many areas to improve.
• For example the way you can discover users on different server to
share with them, for which we’re working on a global, shared address
book solution.
• Another point is that at the moment this is limited to sharing files.
• A logical next step would be to extend this to many other areas like
address books, calendars and to real-time text, voice and video
communication and we are, of course, planning for that.