Cloud Computing
Cloud Computing
CLOUD COMPUTING
Submitted by
AKANSHA TYAGI
90790308965
Guided by
Mr. Narendra kirola
eRoads Technology
A
Project Report
CERTIFICATE
Certified that seminar work entitled CLOUD COMPUTING is a bonafide work carried out in the
eighth semester by Akansha Tyagi(SPCET-90790308965) in partial fulfilment for the award of
Bachelor of Technology in Computer Science from Swami Paramand College of
Engineering and Technology during the academic year 2012-2013. Who carried out the seminar work
under the guidance and no part of this work has been submitted earlier for the award of any degree.
ACKNOWLEDGEMENT
I take this opportunity to express my deepest gratitude to my team leader and project guide
Mr.NarendraKirola (Sr. Network Engineer) for his able guidance and support in this
phase of transition from an academic to a professional life.. His support and valuable inputs
helped me immensely in completing this project.
I would also like to show my deep sense of gratitude to my team members Mr. Nitish
Singh, Ms. Payal Sharma, Ms. Richa Mishra and Mr. Deepak Kumar at Eroads
Technology, Noida who helped me in ways of encouragement, suggestions and technical
inputs, thus contributing either directly or indirectly to the various stages of the project.
I am also grateful toNavinKumar(noc,Eroads Technology) for providing me this great
opportunity of industrial training at Eroads Technology.
I extend my heartiest thanks to Er. Dev Kant Sharma(HOD, Computer Science And
Engineering, SPCET) for providing me the necessary help to undergo this industrial/project
training at Eroads Technology, Noida.
And last, but not the least, I would like to thank the staff at Eroads Technology for being so
cordial and cooperative throughout the period of my training.
AKANSHA TYAGI
COMPUTER SCIENCE
S. No.
Topic
Page No.
1.
Introduction
2.
Cloud Computing
2.1Characteristics of cloud computing
3.
14
4.
History
15
5.
6.
7.
Enabling Technologies
5.1 Cloud computing application architecture
5.2 Server Architecture
5.3 Map Reduce
5.4 Google File System
5.5 Hadoop
Cloud Computing Services
6.1 Amazon Web Services
6.2 Google App Engine
Cloud Computing in the Real World
7.1 Time Machine
7.2 IBM Google University Academic Initiative
7.3 SmugMug
7.4 Nasdaq
16
21
24
8.
Conclusion
40
9.
References
41
ABSTRACT
Cloud computing is the delivery of computing as a service rather than a product,
whereby shared resources, software, and information are provided to computers and
other devices as a metered service over a network (typically the Internet).
Cloud computing provides computation, software, data access, and storage resources
without requiring cloud users to know the location and other details of the computing
infrastructure.
End users access cloud based applications through a web browser or a light weight
desktop or mobile app while the business software and data are stored on servers at a
remote location. Cloud application providers strive to give the same or better service
and performance as if the software programs were installed locally on end-user
computers.
At the foundation of cloud computing is the broader concept of infrastructure
convergence (or Converged Infrastructure) and shared services. This type of data center
environment allows enterprises to get their applications up and running faster, with
easier manageability and less maintenance, and enables IT to more rapidly adjust IT
resources (such as servers, storage, and networking) to meet fluctuating and
unpredictable business demand.
CLOUD COMPUTING
5
This overview gives the basic concept, defines the terms used in the industry, and outlines the general
architecture and applications of Cloud computing. It gives a summary of Cloud Computing and
provides a good foundation for understanding.
Keywords: Grid, Cloud, Computing
1. INTRODUCTION
Cloud Computing, to put it simply, means Internet Computing. The Internet is commonly
visualized as clouds; hence the term cloud computing for computation done through the Internet.
With Cloud Computing users can access database resources via the Internet from Anywhere, for as
long as they need, without worrying about any maintenance or management of actual resources.
Besides, databases in cloudier very dynamic and scalable. Cloud computing is unlike grid computing,
utility computing, or autonomic computing. In fact, it is a very independent platform in terms of
computing. The best example of cloud computing is Google Apps where any application can be
accessed using a browser and it can be deployed on thousands of computer through the Internet.
1.1WHAT IS CLOUD COMPUTING?
Cloud computing provides the facility to access shared resources and common infrastructure, offering
services on demand over the network to perform operations that meet changing business needs. The
location of physical resources and devices being accessed are typically not known to the end user. It
also provides facilities for users to develop, deploy and manage their applications on the cloud,
which entails virtualization of resources that maintains and manages itself.
Some generic examples include:
Amazons Elastic Computing Cloud (EC2) offering computational services that enable people to use
CPU cycles without buying more computers
Storage services such as those provided by Amazons Simple Storage Service (S3)
Companies like Nirvana allowing
Over time many big Internet based companies (Amazon, Google) have come to realise that only a
small amount of their data storage capacity is being used. This has led to the rentingout of space and
the storage of information on remote servers or "clouds.
Data Cloud:-Along with services the cloud will host data. There has been some discussion of this
being potentially useful notion possibly aligned with the Semantic Web, though it could result in data
becoming undifferentiated .
1.4. CLOUD COMPUTINGARCHITECTURE
Cloud computing architecture, just like any other system, is categorized into two main sections: Front
End and Back End.
Front End can be end user or client or any application (i.e. web browser etc.) which is using cloud
services. Backend is the network of servers with any computer program and data storage system. It is
usually assumed that cloud contains infinite storage
capacity for any software available in market.
Cloud has different applications that are hosted on their own dedicated server farms.
Cloud has centralized server administration system. Centralized server administers the
system, balances client supply, adjusts demands, monitors traffic and avoids congestion. This server
follows protocols, commonly known as middleware. Middleware controls the communication of cloud
network among them.
Cloud Architecture runs on a very important assumption, which is mostly true. The
assumption is that the demand for resources is not always consistent from client to cloud.
Because of this reason the servers of cloud are unable to run at their full capacity. To avoid this
scenario, server virtualization technique is applied. In sever virtualization, all physical
servers are virtualized and they run multiple servers with either same or different application.
2.Cloud Computing:7
services offering compute, network and storage capacity where: hardware management is highly
abstracted from the buyer; buyers incur infrastructure costs as variable OPEX [operating
expenditures]; and infrastructure capacity is highly elastic (up or down).
The cloud model differs from traditional outsourcing in that customers do not hand over their own IT
resources to be managed. Instead they plug into the cloud, treating it as they would an internal data
center or computer providing the samefunctions.
Large companies can afford to build and expand their own data centers but small- to medium-size
enterprises often choose to house their IT infrastructure in someone elses facility. A collocation center
is a type of data center where multiple customers locate network, server and storage assets, and
interconnect to a variety of telecommunications and other network service providers with a minimum
of cost and complexity. A selection of companies in the collocation and cloud arena is presented in
Table 1.
Amazon has a head start but well known companies such as Microsoft, Google, and Apple have joined
the fray.
Although not all the companies selected for Table 1 would agree on the definitions given in this article,
it is generally supposed that there are three basic types of cloud computing: Infrastructure as a Service
(IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). In IaaS, cpu, grids or clusters,
virtualized servers, memory, networks, storage and systems software are delivered as a service.
Perhaps the best known example is Amazons Elastic Compute Cloud (EC2) and Simple Storage
Service (S3), but traditional IT vendors such as IBM, and telecoms providers such as AT&T and
Verizon are also offering solutions. Services are typically charged by usage and can be scaled
dynamically, i.e. capacity can be
increased or decreased more or less on demand.
PaaS provides virtualized servers on which users can run applications, or develop new ones, without
having to worry about maintaining the operating systems, server hardware, load balancing or
computing capacity. Well known examples include Microsofts Azure and Salesforces Force.com.
Microsoft Azure provides database and platform services starting at $0.12 per hour for compute
infrastructure; $0.15 per gigabyte for storage; and $0.10 per 10,000 transactions. For SQL Azure, a
cloud database, Microsoft is charging $9.99 for a Web Edition, which comprises up to a 1 gigabyte
relational database; and $99.99 for a Business Edition, which holds up to a 10 gigabyte relational
database. For .NET Services, a set of Web based developer tools for building cloud-based applications,
Microsoft is charging $0.15 per 100,000 message operations.
SaaS is software that is developed and hosted by the SaaS vendor and which the end user accesses over
the Internet. Unlike traditional applications that users install on their computers or servers, SaaS
software is owned by the vendor and runs on computers in the vendors data center (or a collocation
facility). Broadly speaking, all customers of a SaaS vendor use the same software: these are one-sizefits all solutions. Well known examples are Salesforce.com, Googles Gmail and Apps, instant
messaging from AOL, Yahoo and Google, and Voice-over Internet Protocol (VoIP) from Vonage and
Skype.
9
Pros and Cons of Cloud Computing The great advantage of cloud computing is elasticity: the ability
to add capacity or applications almost at a moments notice. Companies buy exactly the amount of
storage, computing power, security and other IT functions that they need from specialists in data-center
computing. They get sophisticated data center services on demand, in only the amount they need and
can pay for, at service levels set with the
vendor, with capabilities that can be added or subtracted at will.
The metered cost, pay-as-you-go approach appeals to small- and medium-sized enterprises; little or no
capital investment and maintenance cost is needed. IT is remotely managed and maintained, typically
for a monthly fee, and the company can let go of plumbing concerns. Since the vendor has many
customers, it can lower the per-unit cost to each customer. Larger companies may find it easier to
manage collaborations in the cloud, rather than having to make holes in their firewalls for contract
research organizations. SaaS deployments usually take less time than in-house ones, upgrades are
easier, and users are always using the most recent version of the application. There may be fewer bugs
because having only one version of the software reduces complexity.
This may all sound very appealing but there are downsides. In the cloud you may not have the kind of
control over your data or the performance of your applications that you need, or the ability to audit or
change the processes and policies under which users must work. Different parts of an application
might
be in many places in the cloud. Complying with federal regulations such a Sarbanes Oxley, or FDA
audit, is extremely difficult. Monitoring and maintenance tools are immature. It is hard to get metrics
out of the cloud and general management of the work is not simple.
There are systems management tools for the cloud environment but they may not integrate with
existing system management tools, so you are likely to need two systems. Nevertheless, cloud
computing may provide enough benefits to compensate for the inconvenience of two tools.
Cloud customers may risk losing data by having them locked into proprietary formats and may lose
control of data because tools to see who is using them or who can view them are inadequate. Data loss
is a real risk. In October 2009 1 million US users of the T-Mobile Sidekick mobile phone and emailing
device lost data as a result of server failure at Danger, a company recently acquired by Microsoft. Bear
in mind, though, that it is easy to underestimate risks associated with the current environment while
overestimating the risk of a new one. Cloud computing is not risky for every system. Potential users
need to evaluate security measures such as firewalls, and encryption techniques and make sure that
they will
have access to data and the software or source code if the service provider goes out of business.
It may not be easy to tailor service-level agreements (SLAs) to the specific needs of a business.
Compensation for downtime may be inadequate and SLAs are unlikely to cover concomitant damages,
but not all applications have stringent uptime requirements. It is sensible to balance the cost of
guaranteeing internal uptime against the advantages of opting for the cloud. It could be that your own
IT organization is not as sophisticated as it might seem.
Calculating cost savings is also not straightforward. Having little or no capital investment may actually
have tax disadvantages. SaaS deployments are cheaper initially than in-house installations and future
costs are predictable; after 3-5 years of monthly fees, however, SaaS may prove more expensive
overall.
Large instances of EC2 are fairly expensive, but it is important to do the mathematics correctly and
make a fair estimate of the cost of an on-premises (i.e., in-house) operation.
10
Standards are immature and things change very rapidly in the cloud. All IaaS and SaaS providers use
different technologies and different standards. The storage infrastructure behind Amazon is different
from that of the typical data center (e.g., big Unix file systems). The Azure storage engine does not use
a standard relational database; Googles App Engine does not support an SQL database. So you cannot
just move applications to the cloud and expect them to run. At least as much work is involved in
moving an application to the cloud as is involved in moving it from an existing server to a new one.
There is also the issue of employee skills: staff may need retraining and they may resent a change to
the cloud and
fear job losses. Last but not least, there are latency and performance issues. The Internet connection
may add to latency or limit bandwidth. (Latency, in general, is the period of time that one component
in a system is wasting time waiting for another component. In networking, it is the amount of time it
takes a packet to travel from source to destination.) In future, programming models exploiting
multithreading may hide latency.
Nevertheless, the service provider, not the scientist, controls the hardware, so unanticipated
sharing and reallocation of machines may affect run times. Interoperability is limited. In general, SaaS
solutions work best for non-strategic, non-mission-critical processes that are simple and standard and
not highly integrated with other business systems. Customized applications may demand an in-house
solution, but SaaS makes sense for applications that have become commoditized, such as reservation
systems in the travel industry.
Virtualization of computers or operating systems hides the physical characteristics of a computing
platform from users; instead it shows another abstract computing platform. A hypervisor is a piece of
virtualization software that allows multiple operating systems to run on a host computer concurrently.
Virtualization providers include VMware, Microsoft, and Citrix Systems (see Table 1). Virtualization is
an enabler of cloud computing.
Recently some vendors have described solutions that emulate cloud computing on private networks
referring to these as private or internal clouds (where public or external cloud describes cloud
computing in the traditional mainstream sense). Private cloud products claim to deliver some of the
clouds and connecting customer data centers to those of external cloud providers. It has been reported
that Eli Lilly wants to benefit from both internal and external clouds and that Amylin is looking at
private cloud VMware as a complement to EC2. Other experts, however, are skeptical: one has even
gone as far as to describe private clouds as absolute rubbish.
Platform Computing has recently launched a cloud management system, Platform ISF, enabling
customers to manage workload across both virtual and physical environments and support multiple
pervisors and operating systems from a single interface. VMware, the market leader in virtualization
6technology, is moving into cloud technologies in a big way, with vSphere 4. The company is building
a huge partner network of service providers and is also releasing a vCloud API. VMware wants
customers to build a series of virtual data centers, each tailored to meet different requirements, and
then have the ability to move workloads in the virtual data centers to the infrastructure provided by
cloud vendors.
Cisco, EMC and VMware have formed a new venture called Acadia. Its strategy for private cloud
computing is based on Ciscos servers and networking, VMwares server virtualization and EMCs
storage. (Note, by the way, that EMC owns nearly 85% of VMware.) Other vendors, such as Google,
11
disagree with VMwares emphasis on private clouds; in return VMware says Googles online
applications are not ready for the enterprise. Applicability
Not everyone agrees, but McKinsey has concluded as follows. Clouds already make sense for many
small and medium-size businesses, but technical, operational and financial hurdles will need to be
overcome before clouds will be used extensively by large public and private enterprises. Rather than
create unrealizable expectations for internal clouds, CIOs should focus now on the immediate
benefits of virtualizing server storage, network operations, and other critical building blocks.
They recommend that users should develop an overall strategy based on solid
business cases not cloud for the sake of cloud; use modular design in all new software to minimize
costs when it comes time to migrate to the cloud; and set up a Cloud CIO Council to advise industry.
Applications in the Pharmaceutical Industry In the pharmaceutical sector, where large amounts of
sensitive data are currently kept behin protective firewalls, security is a real concern, as is policing
individual researchers access to the cloud.
Nevertheless, cheminformatics vendors are starting to look at cloud options, especially in terms of
Software as a Service (SaaS) and hosted informatics. In bioinformatics and number-crunching, the
cloud has distinct advantages. EC2 billing is typically hours times number of cpus, so, as an overgeneralization, the cost for 1 cpu for 1000 hours is the same as the cost of 1000 cpus for 1 hour. This
makes cloud computing appealing for speedy answers to complex calculations. Over the past two
years, new DNA sequencing technology has emerged allowing a much more comprehensive view of
biological systems at the genetic level. This so-called next-generation sequencing has increased by
orders of magnitude the already daunting deluge of laboratory data, resulting in an immense IT
challenge. Could the cloud
provide a solution?
An unnamed pharmaceutical company found that processing BLAST databases and query jobs was
time consuming on its internal grid and approached Cycle Computing
about running BLAST and other applications in the cloud. After the customer had approved Cycles
security model, Cycle built a processing pipeline for BLAST that provides more than 7000 public
databases from the National Center for Biotechnology Information (NCBI), Ensembl, and the
Information Sciences Institute of the University of Southern California (ISI) that are updated weekly.
The CycleCloud BLAST service is now publicly available to all users.
.
2.1. Characteristics of Cloud Computing
1. Self Healing
Any application or any service running in a cloud computing environment has the property
of
self healing. In case of failure of the application, there is always a hot backup of the application ready
to take over without disruption. There are multiple copies of the same
application - each copy updating itself regularly so that at times of failure there is at least one copy of
the application which can take over without even the slightest change in its unning state.
2. Multi-tenancy
With cloud computing, any application supports multi-tenancy - that is multiple tenants a same instant
of time. The system allows several customers to share the infrastructure allotted to them without any of
them being aware of the sharing. This is done by virtualizing the
12
servers on the available machine pool and then allotting the servers to multiple users. This is done in
such a way that the privacy of the users or the security of their data is not compromised.
3. Linearly Scalable
Cloud computing services are linearly scalable. The system is able to break down the workloads into
pieces and service it across the infrastructure. An exact idea of linear scalability can be obtained from
the fact that if one server is able to process say 1000 transactions per
4. Service-oriented
Cloud computing systems are all service oriented - i.e. the systems are such that they are created
out of other discrete services. Many such Division of Computer Science and Engineering, School Of
Engineering, CUSATtogether to form this service. This allows re-use of the different services that are
available and that are being created. Using the
services that were just created, other such services can be created. SLA Driven
Usually businesses have agreements on the amount of services. Scalability and availability issues
cause clients to break these agreements. But cloud computing services are SLA driven such that when
the system experiences peaks of load, it will automatically adjust itself so as to comply with the
service-level agreements.
The services will create additional instances of the applications on more servers so that the load can be
easily managed.
Virtualized
The applications in cloud computing are fully decoupled from the underlying hardware. The cloud
computing environment is a fully virtualized environment.
Flexible
Another feature of the cloud computing services is that they are flexible. They can be used to serve a
large variety of workload types varying from small loads of a small consumer application to very heavy loads of a commercial
application.
4.History
14
The Cloud is a metaphor for the Internet, derived from its common depiction in network diagrams or
more generally components which are managed by others) as a cloud outline.
The underlying concept dates back to 1960 when John McCarthy opined that "computation may
someday be organized as a public utility" (indeed it shares characteristics with service bureaus which
date back to the 1960s) and the term The Cloud was already in commercial use around the turn of the
21st century. Cloud computing solutions had started to appear on the market, though most of the focus
at this time was on Software as a service.
The Cloud is a term with a long history in telephony, which has in the past decade, been adopted as a
metaphor for internet based services, with a common depiction in network diagrams as a cloud outline.
The underlying concept dates back to 1960 when John McCarthy opined that "computation may
someday be organized as a public utility"; indeed it shares characteristics with service bureaus which
date back to the 1960s. The term cloud had already come into commercial use in the early 1990s to
refer to large ATM networks.
By the turn of the 21st century, the term "cloud computing" had started to appear, although most of the
focus at this time was on Software as a service (SaaS).
In 1999, Salesforce.com was established by Marc Benioff, Parker Harris, and his fellows. They applied
many technologies of consumer web sites like Google and Yahoo! to business applications. They also
provided the concept of "On demand" and "SaaS" with their real business and successful customers.
The key for SaaS is being customizable by customer alone or with a small amount of help. Flexibility
and speed for application development have been drastically welcomed and accepted by
business users.
IBM extended these concepts in 2001, as detailed in the Autonomic Computing Manifesto -- which
described advanced automation techniques such as self-monitoring, self-healing, self-configuring, and
self-optimizing in the management of complex IT systems with heterogeneous storage, servers,
applications, networks, security mechanisms, and other system elements that can be virtualized across
an enterprise.
Amazon.com played a key role in the development of cloud computing bymodernizing their data
centers after the dot-com bubble and, having found that the new cloud architecture resulted in
significant internal efficiency improvements, providing access to their systems by way of Amazon Web
Services in 2005 on a utility computing basis.
2007 saw increased activity, including Goggle, IBM and a number of universities embarking on
a large scale cloud computing research project, around the time the term started gaining popularity
in the mainstream press. It was a hot topic by mid-2008 and numerous cloud computing events.
5 Enabling Technologies
5.1 Cloud computing application architecture
15
Cloud architecture, the systems architecture of the software systems involved in the delivery of cloud
computing, comprises hardware and software designed by a cloud architect who typically works for a
cloud integrator. It typically involves multiple cloud components communicating with each other over
application programming interfaces, usually web services.
This closely resembles the UNIX philosophy of having multiple programs doing one thing well and
working together over universal interfaces. Complexity is controlled and the resulting systems are
more manageable than their monolithic counterparts.
Cloud architecture extends to the client, where web browsers and/or software applications access
cloud applications.
Cloud storage architecture is loosely coupled, where metadata operations are centralized enabling
the data nodes to scale into the hundreds, each independently delivering data to applications or user.
duled.
16
very simple and easy to use web based user interface. It has a module architecture
which allows for the creation of additional system add-ons and plugins. It supports
one click deployment of distributed or replicated applications on a global basis. It supports the
management of various virtual environments including KVM/Qemu,
Amazon EC2 and Xen, OpenVZ, Linux Containers, VirtualBox. It has fine grained
user permissions and access privileges.
5.3. Map Reduce
Map Reduce is a software framework developed at Google in 2003 to support
parallel computations over large (multiple petabyte) data sets on clusters of
commodity computers. This framework is largely taken from map and reduce
functions commonly used in functional programming, although the actual semantics
of the framework are not the same. It is a programming model and an associated
implementation for processing and generating large data sets. Many of the real world
tasks are expressible in this model. MapReduce implementations have been written in
C++, Java and other languages.
Programs written in this functional style are automatically parallelized and
executed on the cloud. The run-time system takes care of the details of partitioning
the input data, scheduling the programs execution across a set of machines, handling
machine failures, and managing the required inter-machine communication. This
allows programmers without any experience with parallel and distributed systems to
easily utilize the resources of a largely distributed system.
The computation takes a set of input key/value pairs, and produces a set of
output key/value pairs. The user of the MapReduce library expresses the computation
as two functions: Map and Reduce.
Map, written by the user, takes an input pair and produces a set of
intermediate key/value pairs. The MapReduce library groups together all intermediate
values associated with the same intermediate key I and passes them to the Reduce
function.
The Reduce function, also written by the user, accepts an intermediate key Ind a set of values for that
key. It merges together these values to form a possibly smaller set of values. Typically just zero or one
output value is produced per Reduce invocation. The intermediate values are supplied to the user's
reduce function via anterator. This allows us to handle lists of values that are too large to fit in
memory.
Map Reduce achieves reliability by parcelling out a number of operations onhe set of data to each node
in the network; each node is expected to report back periodically with completed work and status
18
updates. If a node falls silent for longer han that interval, the master node records the node as dead, and
sends out the node's signed work to other nodes. Individual operations use atomic operations for
naming file outputs as a double check to ensure that there are not parallel conflicting threads running;
when files are renamed, it is possible to also copy them to another name in addition to the name of the
task (allowing for side-effects).
both reliability and data motion. Hadoop implements the computation paradigm
named MapReduce which was explained above. The application is divided into many
small fragments of work, each of which may be executed or re-executed on any node
in the cluster. In addition, it provides a distributed file system that stores data on the
compute nodes, providing very high aggregate bandwidth across the cluster. Both
MapReduce and the distributed file system are designed so that the node failures are
automatically handled by the framework. Hadoop has been implemented making use
of Java. In Hadoop, the combination of the entire JAR files and classed needed to run
a MapReduce program is called a job. All of these components are themselves
collected into a JAR which is usually referred to as the job file. To execute a job, it is
submitted to a jobTracker and then executed.
Tasks in each phase are executed in a fault-tolerant manner. If node(s) fail in
the middle of a computation the tasks assigned to them are re-distributed among the
remaining nodes. Since we are using MapReduce, having many map and reduce tasks
enables good load balancing and allows failed tasks to be re-run with smaller runtime
overhead.
The Hadoop MapReduce framework has master/slave architecture. It has a
single master server or a jobTracker and several slave servers or taskTrackers, one per
node in the cluster. The jobTracker is the point of interaction between the users and
the framework. Users submit jobs to the jobTracker, which puts them in a queue of
pending jobs and executes them on a first-come first-serve basis. The jobTracker
manages the assignment of MapReduce jobs to the taskTrackers. The taskTrackers
execute tasks upon instruction from the jobTracker and also handle data motion
between the map and reduce phases of the MapReduce job.
Hadoop is a framework which has received a wide industry adoption. Hadoop
is used along with other cloud computing technologies like the Amazon services so as
to make better use of the resources. There are many instances where Hadoop has been
used. Amazon makes use of Hadoop for processing millions of sessions which it uses
for analytics. This is made use of in a cluster which has about 1 to 100 nodes.
Facebook uses Hadoop to store copies of internal logs and dimension data sources and
Division of Computer Science and Engineering, School Of Engineering, CUSAT
use it as a source for reporting/analytics and machine learning. The New York Times
made use of Hadoop for large scale image conversions. Yahoo uses Hadoop to
support research for advertisement systems and web searching tools.
industry. Amazon has an expertise in this industry and has a small advantage over the
others because of this. Microsoft has good knowledge of the fundamentals of cloud
science and is building massive data centers. IBM, the king of business computing
and traditional supercomputers, teams up with Google to get a foothold in the clouds.
Google is far and away the leader in cloud computing with the company itself built
from the ground up on hardware.
6.1. Amazon Web Services
The Amazon Web Services is the set of cloud computing services offered by
Amazon. It involves four different services. They are Elastic Compute Cloud (EC2),
Simple Storage Service (S3), Simple Queue Service (SQS) and Simple Database
Service (SDB).
1. Elastic Compute Cloud (EC2)
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute
capacity in the cloud. It is designed to make webscale computing
Easier for developers.
It provides on-demand processing power.
Amazon EC2's simple web service interface allows you to obtain and configure capacity with minimal
friction. It provides you with complete control of your computing resources and lets you run on
Amazon's proven computing environment. Amazon EC2 reduces the time required to obtain and boot
new server instances to minutes, allowing you to quickly scale capacity,
both up and down, as your computing requirements change. Amazon EC2 changes the economics of
computing by allowing you to pay only for capacity that you actually use. Amazon EC2 provides
developers the tools to build failure resilient applications and isolate themselves from common failure
scenarios.
Amazon EC2 presents a true virtual computing environment, allowing
you to use web service interfaces to requisition machines for use, load them
with your custom application environment, manage your network's access
permissions, and run your image using as many or few systems as you desire.
To set up an Amazon EC2 node we have to create an EC2 node
configuration which consists of all our applications, libraries, data and
associated configuration settings. This configuration is then saved as an AMI
(Amazon Machine Image). There are also several stock instances of Amazon
AMIs available which can be customized and used. We can then start,
terminate and monitor as many instances of the AMI as needed.
Amazon EC2 enables you to increase or decrease capacity within
minutes. You can commission one, hundreds or even thousands of server
instances simultaneously. Thus the applications can automatically scale itself
up and down depending on its needs. You have root access to each one, and
you can interact with them as you would any machine. You have the choice of
several instance types, allowing you to select a configuration of memory,
CPU, and instance storage that is optimal for your application. Amazon EC2
offers a highly reliable environment where replacement instances can be
21
manipulate images. With this API, you can resize, crop, rotate and flip images in
JPEG and PNG formats.
In theory, Google claims App Engine can scale nicely. But Google currently
places a limit of 5 million hits per month on each application. This limit nullifies App
Engine's scalability, because any small, dedicated server can have this performance.
Google will eventually allow webmasters to go beyond this limit (if they pay).
25
Types of services:
These services are broadly divided into three categories:
Infrastructure-as-a-Service (IaaS)
Platform-as-a-Service (PaaS)
Software-as-a-Service (SaaS).
Infrastructure-as-a-Service (IaaS):
Infrastructure-as-a-Service(IaaS) like Amazon Web Services provides virtual servers with unique
IP addresses and blocks of storage on demand. Customers benefit from an API from which they
can control their servers. Because customers can pay for exactly the amount of service they use,
like for electricity or water, this service is also called utility computing.
Platform-as-a-Service (PaaS):
Platform-as-a-Service(PaaS) is a set of software and development tools hosted on the provider's
servers. Developers can create applications using the provider's APIs. Google Apps is one of the
most famous Platform-as-a-Service providers. Developers should take notice that there aren't any
interoperability standards (yet), so some providers may not allow you to take your application and
26
Public cloud:
Public cloud or external cloud describes cloud computing in the traditional mainstream sense,
whereby resources are dynamically provisioned on a fine-grained, self-service basis over the
Internet, via web applications/web services, from an off-site third-party provider who
shares resources and bills on a fine-grained utility computing basis.
Hybrid cloud:
A hybrid cloud environment consisting of multiple internal and/or external providers] "will be
typical for most enterprises". A hybrid cloud can describe configuration combining a local device,
such as a Plug computer with cloud services. It can also describe configurations combining virtual
and physical, colocated assetsfor example, a mostly virtualized environment that requires
physical servers, routers, or other hardware such as a network appliance acting as a firewall or
spam filter.
Private cloud:
Private cloud and internal cloud are neologisms that some vendors have recently used to describe
offerings that emulate cloud computing on private networks. These (typically virtualisation
automation) products claim to "deliver some benefits of cloud computing without the pitfalls",
capitalising on data security, corporate governance, and reliability concerns. They have been
criticized on the basis that users "still have to buy, build, and manage them" and as such do not
benefit from lower up-front capital costs and less hands-on management[, essentially "[lacking]
the economic model that makes cloud computing such an intriguing concept".
While an analyst predicted in 2008 that private cloud networks would be the future of corporate
IT, there is some uncertainty whether they are a reality even within the same firm. Analysts also
27
claim that within five years a "huge percentage" of small and medium enterprises will get most
of their computing resources from external cloud computing providers as they "will not have
economies of scale to make it worth staying in the IT business" or be able to afford private clouds.
Analysts have reported on Platform's view that private clouds are a stepping stone to external
clouds, particularly for the financial services, and that future datacenters will look like internal
clouds.
The term has also been used in the logical rather than physical sense, for example in reference to
platform as a service offerings, though such offerings including Microsoft's
Azure Services Platform are not available for on-premises deployment.
How does cloud computing work?
Supercomputers today are used mainly by the military, government intelligence agencies,
universities and research labs, and large companies to tackle enormously complex
calculations for such tasks as simulating nuclear explosions, predicting climate change, designing
airplanes, and analyzing which proteins in the body are likely to bind with potential new drugs.
Cloud computing aims to apply that kind of powermeasured in the tens of trillions of
computations per secondto problems like analyzing risk in financial portfolios, delivering
personalized medical information, even powering immersive computer games, in a way that users
can tap through the Web. It does that by networking large groups of servers that often use
low-cost consumer PC technology, with specialized connections to spread data-processing
chores across them. By contrast, the newest and most powerful desktop PCs process only about
3 billion computations a second. Let's say you're an executive at a large corporation. Your
particular responsibilities include making sure that all of your employees have the right
hardware and software they need to do their jobs. Buying computers for everyone isn't
enough -- you also have to purchase software or software licenses to give employees the tools
they require. Whenever you have a new hire, you have to buy more software or make sure
your current software license allows another user. It's so stressful that you find it difficult to go.
software for each computer, you'd only have to load one application. That application would
allow workers to log into a Web-based service which hosts all the programs the user would
need for his or her job. Remote machines owned by another company would run everything
from e-mail to word processing to complex data analysis programs. It's called cloud computing,
and it could change the entire computer industry.
In a cloud computing system, there's a significant workload shift. Local computers no longer have
to do all the heavy lifting when it comes to running applications. The network of computers that
make up the cloud handles them instead. Hardware and software demands on the user's side
decrease. The only thing the user's computer needs to be able to run is the cloud computing
system's interface software, which can be as simple as a Web browser, and the cloud's network
takes care of the rest.
There's a good chance you've already used some form of cloud computing. If you have an
e-mail account with a Web-based e-mail service like Hotmail, Yahoo! Mail or Gmail, then you've
had some experience with cloud computing. Instead of running an e-mail program on your
computer, you log in to a Web e-mail account remotely. The software and storage for your
account doesn't exist on your computer -- it's on the service's computer cloud.
SEVEN TECHNICAL SECURITY BENEFITS OF THE CLOUD:
individual providers as they pursue their own approaches to realize their respective requirements,
which strongly differ between providers. Non-functional aspects are one of the key reasons why
clouds differ so strongly in their interpretation (see also II.B).
Economic considerations are one of the key reasons to introduce cloud systems in a business
environment in the first instance. The particular interest typically lies in the reduction of cost and
effort through outsourcing and / or automation of essential resource management. As has been
noted in the first section, relevant aspects thereby to consider relate to the cut-off between loss of
control and reduction of effort. With respect to hosting private clouds, the gain through cost
reduction has to be carefully balanced with the increased effort to build and run such a system.
Obviously, technological challenges implicitly arise from the non-functional and economical aspects,
when trying to realize them. As opposed to these aspects, technological challenges typically imply a
specific realization even though there may be no standard approach as yet and deviations may
hence arise. In addition to these implicit challenges, one can identify additional technological
aspects to be addressed by cloud system, partially as a pre-condition to realize some of the high
level features, but partially also as they directly relate to specific characteristics of cloud systems.
1. NON- FUNCTI ONAL ASPECTS
The most important non-functional aspects are:
Elasticity is an essential core feature of cloud systems and circumscribes the capability of the
underlying infrastructure to adapt to changing, potentially non-functional requirements, for
example amount and size of data supported by an application, number of concurrent users etc. One
can distinguish between horizontal and vertical scalability, whereby horizontal scalability refers to
the amount of instances to satisfy e.g. changing amount of requests, and vertical scalability refers to
the size of the instances themselves and thus implicit to the amount of resources required to
maintain the size. Cloud scalability involves both (rapid) up- and down-scaling.
Elasticity goes one step further, tough, and does also allow the dynamic integration and extraction
of physical resources to the infrastructure. Whilst from the application perspective, this is identical
to scaling, from the middleware management perspective this poses additional requirements, in
particular regarding reliability. In general, it is assumed that changes in the resource infrastructure
are announced first to the middleware manager, but with large scale systems it is vital that such
changes can be maintained automatically.
Reliability is essential for all cloud systems in order to support todays data centre-type
applications in a cloud, reliability is considered one of the main features to exploit cloud capabilities.
Reliability denotes the capability to ensure constant operation of the system without disruption, i.e.
no loss of data, no code reset during execution etc. Reliability is typically achieved through
redundant resource utilisation. Interestingly, many of the reliability aspects move from a hardware
to a software-based solution. (Redundancy in the file systems vs. RAID controllers, stateless front
end servers vs. UPS, etc.).
Notably, there is a strong relationship between availability (see below) and reliability however,
reliability focuses in particular on prevention of loss (of data or execution progress).
Quality of Service support is a relevant capability that is essential in many use cases where
specific requirements have to be met by the outsourced services and / or resources. In business
cases, basic QoS metrics like response time, throughput etc. must be guaranteed at least, so as to
ensure that the quality guarantees of the cloud user are met. Reliability is a particular QoS aspect
which forms a specific quality requirement.
30
Agility and adaptability are essential features of cloud systems that strongly relate to the elastic
capabilities. It includes on-time reaction to changes in the amount of requests and size of resources,
but also adaptation to changes in the environmental conditions that e.g. require different types of
resources, different quality or different routes, etc. Implicitly, agility and adaptability require
resources (or at least their management) to be autonomic and have to enable them to provide self-*
capabilities.
Availability of services and data is an essential capability of cloud systems and was actually one
of the core aspects to give rise to clouds in the first instance. It lies in the ability to introduce
redundancy for services and data so failures can be masked transparently. Fault tolerance also
requires the ability to introduce new redundancy (e.g. previously failed or fresh nodes) in an online
manner non-intrusively (without a significant performance penalty).
With increasing concurrent access, availability is particularly achieved through replication of data /
services and distributing them across different resources to achieve load-balancing. This can be
regarded as the original essence of scalability in cloud systems.
2. ECONOMIC ASPECTS
In order to allow for economic considerations, cloud systems should help in realising the following
aspects:
Cost reduction is one of the first concerns to build up a cloud system that can adapt to changing
consumer behaviour and reduce cost for infrastructure maintenance and acquisition. Scalability and
Pay per Use are essential aspects of this issue. Notably, setting up a cloud system typically entails
additional costs be it by adapting the business logic to the cloud host specific interfaces or by
enhancing the local infrastructure to be cloud-ready. See also return of investment below.
Pay per use. The capability to build up cost according to the actual consumption of resources is a
relevant feature of cloud systems. Pay per use strongly relates to quality of service support, where
specific requirements to be met by the system and hence to be paid for can be specified. One of the
key economic drivers for the current level of interest in cloud computing is the structural change in
this domain. By moving from the usual capital upfront investment model to an operational expense,
cloud computing promises to enable especially SMEs and entrepreneurs to accelerate the
development and adoption of innovative solutions.
Improved time to market is essential in particular for small to medium enterprises that want to
sell their services quickly and easily with little delays caused by acquiring and setting up the
infrastructure,
in particular in a scope compatible and competitive with larger industries. Larger
enterprises need to be able to publish new capabilities with little overhead to remain competitive.
Clouds can support this by providing infrastructures, potentially dedicated to specific use cases that
take over essential capabilities to support easy provisioning and thus reduce time to market.
Return of investment (ROI) is essential for all investors and cannot always be guaranteed in fact
some cloud systems currently fail this aspect. Employing a cloud system must ensure that the cost
and effort vested into it is outweighed by its benefits to be commercially viable this may entail
direct (e.g. more customers) and indirect (e.g. benefits from advertisements) ROI. Outsourcing
resources versus increasing the local infrastructure and employing (private) cloud technologies need
therefore to be outweighed and critical cut-off points identified.
31
Turning CAPEX into OPEX is an implicit, and much argued characteristic of cloud systems, as
the actual cost benefit (cf. ROI) is not always clear (see e.g.[9]). Capital expenditure (CAPEX) is
required to build up a local infrastructure, but with outsourcing computational resources to cloud
systems
according to operational need.
Going Green is relevant not only to reduce additional costs of energy consumption, but also to
reduce the carbon footprint. Whilst carbon emission by individual machines can be quite well
estimated, this information is actually taken little into consideration when scaling systems up.
Clouds principally allow reducing the consumption of unused resources (down-scaling). In addition,
up-scaling should be carefully balanced not only with cost, but also carbon emission issues. Note
that beyond software stack aspects, plenty of Green IT issues are subject to development on the
hardware level.
3. TECHNOLOGI CAL ASPECTS
The main technological challenges that can be identified and that are commonly associated with
cloud systems are:
Virtualisation is an essential technological characteristic of clouds which hides the technological
complexity from the user and enables enhanced flexibility (through aggregation, routing and
translation). More concretely, virtualisation supports the following features:
Ease of use: through hiding the complexity of the infrastructure (including management,
configuration etc.) virtualisation can make it easier for the user to develop new applications, as well
as reduces the overhead for controlling the system.
Infrastructure independency: in principle, virtualisation allows for higher interoperability by making
the code platform independent.
Flexibility and Adaptability: by exposing a virtual execution environment, the underlying
infrastructure can change more flexible according to different conditions and requirements
(assigning more resources, etc.).
Location independence: services can be accessed independent of the physical location of the user
and the resource.
Multi-tenancy is a highly essential issue in cloud systems, where the location of code and / or
data is principally unknown and the same resource may be assigned to multiple users (potentially at
the same time). This affects infrastructure resources as well as data / applications / services that are
hosted on shared resources but need to be made available in multiple isolated instances. Classically,
all information is maintained in separate databases or tables, yet in more complicated cases
information may be concurrently altered, even though maintained for isolated tenants. Multitenancy
implies a lot of potential issues, ranging from data protection to legislator issues (see
section III).
Security, Privacy and Compliance is obviously essential in all systems dealing with potentially
sensitive data and code.
Data Management is an essential aspect in particular for storage clouds, where data is flexibly
distributed across multiple resources. Implicitly, data consistency needs to be maintained over a
wide distribution of replicated data sources. At the same time, the system always needs to be aware
32
of the data location (when replicating across data centres) taking latencies and particularly workload
into consideration. As size of data may change at any time, data management addresses both
horizontal and vertical aspects of scalability. Another crucial aspect of data management is the
provided consistency guarantees (eventual vs. strong consistency, transactional isolation vs. no
isolation, atomic operations over individual data items vs. multiple data times etc.).
APIs and / or Programming Enhancements are essential to exploit the cloud features: common
programming models require that the developer takes care of the scalability and autonomic capabilities
him- / herself, whilst a cloud environment provides the features in a fashion that allows the user to
leave such management to the system.
Metering of any kind of resource and service consumption is essential in order to offer elastic
pricing, charging and billing. It is therefore a pre-condition for the elasticity of clouds.
Tools are generally necessary to support development, adaptation and usage of cloud services.
C. RELATED AREAS
It has been noted, that the cloud concept is strongly related to many other initiatives in the area of
the Future Internet, such as Software as a Service and Service Oriented Architecture. New
concepts and terminologies often bear the risk that they seemingly supersede preceding work and
thus require a fresh start, where plenty of the existing results are lost and essential work is
repeated unnecessarily. In order to reduce this risk, this section provides a quick summary of the
main related areas and their potential impact on further cloud developments.
1. INTERNET OF SERVI CES
Service based application provisioning is part of the Future Internet as such and therefore a similar
statement applies to cloud and Internet of Services as to cloud and Future Internet. Whilst the cloud
concept foresees essential support for service provisioning (making them scalable, providing a simple
API for development etc.), its main focus does not primarily rest on service provisioning. As detailed
in section II.A.1 cloud systems are particularly concerned with providing an infrastructure on which
any type of service can be executed with enhanced features.
Clouds can therefore be regarded as an enabler for enhanced features of large scale service
provisioning. Much research was vested into providing base capabilities for service provisioning
accordingly, capabilities that overlap with cloud system features can be easily exploited for cloud
infrastructures.
2. INTERNET OF THI NGS
It is up to debate whether the Internet of Things is related to cloud systems at all: whilst the internet of
things will certainly have to deal with issues related to elasticity, reliability and data management etc.,
there is an implicit assumption that resources in cloud computing are of a type that can host and / or
process data in particular storage and processors that can form a computational unit (a virtual
processing platform).
However, specialised clouds may e.g. integrate dedicated sensors to provide enhanced capabilities
and the issues related to reliability of data streams etc. are principally independent of the type of
data source. Though sensors as yet do not pose essential scalability issues, metering of resources
will already require some degree of sensor information integration into the cloud.
33
Clouds may furthermore offer vital support to the internet of things, in order to deal with a flexible
amount of data originating from the diversity of sensors and things. Similarly, cloud concepts for
scalability and elasticity may be of interest for the internet of things in order to better cope with
dynamically scaling data streams.
Overall, the Internet of Things may profit from cloud systems, but there is no direct relationship
between the two areas. There are however contact points that should not be disregarded. Data
management and interfaces between sensors and cloud systems therefore show commonalities.
3. THE GRI D
There is an on-going confusion about the relationship between Grids and Clouds [17], sometimes
seeing Grids as on top of Clouds, vice versa or even identical. More surprising, even elaborate
comparisons (such as [18][19][20]) still have different views on what the Grid is in the first
instance, thus making the comparison cumbersome. Indeed most ambiguities can be quickly
resolved if the underlying concept of Grids is examined first: just like Clouds, Grid is primarily a
concept rather than a technology thus leading to many potential misunderstandings between
individual communities.
With respect to research being carried out in the Grid over the last years, it is therefore
recommendable to distinguish (at least) between (1) Resource Grids, including in parti cular Grid
Computing, and (2) eBusiness Grids which centres mainly on distributed Virtual Organizations and
is closer related to Service Oriented Architectures (see below). Note that there may be combination
between the two, e.g. when capabilities of the eBusiness Grids are applied for commercial resource
provisioning, but this has little impact on the assessment below.
Resource Grids try to make resource - such as computational devices and storage - locally available
in a fashion that is transparent to the user. The main focus thereby lies on availability rather than
scalability, in particular rather than dynamic scalability. In this context we may have to distinguish
between HPC Grids, such as EGEE, which select and provide access to (single) HPC resources, as
opposed to distributed computing Grids (cf. Service Oriented Architecture below) which also
includes P2P like scalability - in other words, the more resources are available, the more code
instances are deployed and executed. Replication capabilities may be applied to ensure reliability,
though this is not an intrinsic capability of in particular computational Grids. Even though such Grid
middleware(s) offers manageability interfaces, it typically acts on a layer on top of the actual
resources and thus does rarely virtualise the hardware, but the computing resource as a whole (i.e.
not on the IaaS level).
Overall, Resource Grids do address similar issues to Cloud Systems, yet typically on a different layer
with a different focus - as such, e.g. Grids do generally not cater for horizontal and vertical elasticity.
What is more important though is the strong conceptual overlap between the issues addressed by
Grid and Clouds which allows re-usage of concepts and architectures, but also of parts of technology
(see also SOA below).
Specific shared concepts:
Virtualisation of computation resources, respectively of hardware
Scalability of amount of resources versus of hardware, code and data
Reliability through replication and check-pointing
Interoperability
Security and Authentication
34
eBusiness Grids share the essential goals with Service Oriented Architecture, though the specific
focus rests on integration of existing services so as to build up new functionalities, and to enhance
these services with business specific capabilities. The eBusiness (or here Virtual Organization)
approach derives in particular from the distributed computing aspect of Grids, where parts of the
overall logic are located in different sites. The typical Grid middleware thereby focus mostly on
achieving reliability in the overall execution through on-the-fly replacement and (re)integration.
But eBusiness Grids also explore the specific requirements for commercial employment of service
consumption and provisioning - even though this is generally considered an aspect more related to
Service Oriented Architectures than to Grids.
Again, eBusiness Grids and Cloud Systems share common concepts and thus basic technological
approaches. In particular with the underlying SOA based structure, capabilities may be exposed and
integrated as stand-alone services, thus supporting the re-use aspect.
Specific shared concepts:
Pay-per-use / Payment models
Quality of Service
Metering
Availability through self-management
It is worth noting that the comparison here is with deployed Grids. The original Grids concept had a
vision of elasticity, virtualization and accessibility [48] [49] not unlike that claimed for the Clouds
vision.
4. SERVI CE ORIENTED ARCHI TECTURES
There is a strong relationship between the Grid and Service Oriented Architectures, often leading
to confusions where the two terms either are used indistinguishably, or the one as building on top
of the other. This arises mostly from the fact that both concepts tend to cover a comparatively wide
scope of issues, i.e. the term being used a bit ambiguously.
Service Oriented Architecture however typically focuses predominantly on ways of developing,
publishing and integrating application logic and / or resources as services. Aspects related to
enhancing the provisioning model, e.g. through secure communication channels, QoS guaranteed
maintenance of services etc. come in this definition secondary. Again it must be stressed though
that the aspects of eBusiness Grids and SOA are used almost interchangeably - in particular since the
advent of Web Service technologies such as the .NET Framework and Globus Toolkit 4, where GT4 is
typically regarded as Grid related and .NET as a Web Service / SOA framework (even though they
share the same main capabilities).
Though providing cloud hosted applications as a service is an implicit aspect of Cloud SaaS
provisioning, the cloud concept is principally technology agnostic, but it is generally recommended to
build on service-oriented principles. However, in particular with the resource virtualization aspect of
cloud systems, most technological aspects will have to be addressed at a lower level than the service
layer.
Service Oriented Architectures are therefore of primary interest for a) the type of applications
1. CENTRALIZED DATA:
35
Reduced Data Leakage: this is the benefit I hear most from Cloud providers - and in my
view they are right. How many laptops do we need to lose before we get this? How many
backup tapes? The data landmines of today could be greatly reduced by the Cloud
as thin client technology becomes prevalent. Small, temporary caches on handheld devices
or Netbook computers pose less risk than transporting data buckets in the form of laptops.
Ask the CISO of any large company if all laptops have company mandated controls
consistently applied; e.g. full disk encryption. Youll see the answer by looking at the
whites of their eyes. Despite best efforts around asset management and endpoint security
we continue to see embarrassing and disturbing misses. And what about SMBs? How many
use encryption for sensitive data, or even have a data classification policy in place?
Monitoring benefits: central storage is easier to control and monitor. The flipside is
the nightmare scenario of comprehensive data theft. However, I would rather spend my time
as a security professional figuring out smart ways to protect and monitor access to data
stored in one place (with the benefit of situational advantage) than trying to figure out
all the places where the company data resides across a myriad of thick clients! You can get
the benefits of Thin Clients today but Cloud Storage provides a way to centralize the data
faster and potentially cheaper. The logistical challenge today is getting Terabytes of data to
the Cloud in the first place.
2.
INCIDENT RESPONSE / FORENSICS:
Forensic readiness: with Infrastructure as a Service (IaaS) providers, I can build a
dedicated forensic server in the same Cloud as my company and place it offline, ready for
use when needed. I would only need pay for storage until an incident happens and I need to
bring it online. I dont need to call someone to bring it online or install some kind of
remote boot software - I just click a button in the Cloud Providers web interface. If I have
multiple incident responders, I can give them a copy of the VM so we can distribute the
forensic workload based on the job at hand or as new sources of evidence arise and need
analysis. To fully realise this benefit, commercial forensic software vendors would
need to move away from archaic, physical dongle based licensing schemes to a network
licensing model.
Decrease evidence acquisition time: if a server in the Cloud gets compromised (i.e.
broken into), I can now clone that server at the click of a mouse and make the cloned
disks instantly available to my Cloud Forensics server. I didnt need to find storage
or have it ready, waiting and unused - its just there.
Eliminate or reduce service downtime: Note that in the above scenario I didnt have to go tell the COO
that the system needs to be taken offline for hours whilst I dig around in the RAID Array hoping that
my physical acqusition toolkit is compatible (and that the version of RAID firmware isnt supported by
my forensic software). Abstracting the hardware removes a barrier to even doing forensics in some
situations.
Decrease evidence transfer time: In the same Cloud, bit fot bit copies are super fast - made faster by
that replicated, distributed file system my Cloud provider engineered for me. From a network traffic
perspective, it may even be free to make the copy in the same Cloud. Without the Cloud, I would have
to a lot of time consuming and expensive provisioning of physical devices. I only pay for the storage
as long as I need the evidence.
36
Eliminate forensic image verification time: Some Cloud Storage implementations expose a
cryptographic checksum or hash. For example, Amazon S3 generates an MD5 hash automagically
when you store an object. In theory you no longer need to generate time-consuming MD5 checksums
using external tools - its already there.
Decrease time to access protected documents: Immense CPU power opens some doors. Did the suspect
password protect a document that is relevant to the investigation? You can now test a wider range of
candidate passwords in less time to speed investigations
.
3. PASSWORD ASSURANCE TESTING (AKA CRACKING):
Decrease password cracking time: if your organization regularly tests password strength by running
password crackers you can use Cloud Compute to decrease crack time and you only pay for what you
use. Ironically, your cracking costs go up as people choose better passwords ;-).
Keep cracking activities to dedicated machines: if today you use a distributed password cracker to
spread the load across non-production machines, you can now put those agents in dedicated Compute
instances - and thus stop mixing sensitive credentials with other workloads
4. LOGGING:
Unlimited, pay per drink storage: logging is often an afterthought, consequently insufficient disk
space is allocated and logging is either non-existant or minimal. Cloud Storage changes all this - no
more guessing how much storage you need for standard logs.
Improve log indexing and search: with your logs in the Cloud you can leverage Cloud Compute to
index those logs in real-time and get the benefit of instant search results. What is different here? The
Compute instances can be plumbed in and scale as needed based on the logging load - meaning a true
real-time view.
Getting compliant with Extended logging: most modern operating systems offer extended logging in
the form of a C2 audit trail. This is rarely enabled for fear of performance degradation and log size.
Now you can opt-in easily - if you are willing to pay for the enhanced logging, you can do so.
Granular logging makes compliance and investigations easier.
5. IMPROVE THE STATE OF SECURITY SOFTWARE (PERFORMANCE):
Drive vendors to create more efficient security software: Billable CPU cycles get noticed. More
attention will be paid to inefficient processes; e.g. poorly tuned security agents. Process accounting
will make a comeback as customers target expensive processes. Security vendors that understand
how to squeeze the most performance from their software will win.
6. SECURE BUILDS:
Pre-hardened, change control builds: this is primarily a benefit of virtualization based Cloud
Computing. Now you get a chance to start secure (by your own definition) - you create your Gold
Image VM and clone away. There are ways to do this today with bare-metal OS installs but frequently
these require additional 3rd party tools, are time consuming to clone or add yet another agent to each
endpoint.
Reduce exposure through patching offline: Gold images can be kept up securely kept up to date.
Offline VMs can be conveniently patched off the network.
37
Easier to test impact of security changes: this is a big one. Spin up a copy of your production
environment, implement a security change and test the impact at low cost, with minimal startup time.
This is a big deal and removes a major barrier to doing security in production environments.
7. SECURITY TESTING:
Reduce cost of testing security: a SaaS provider only passes on a portion of their security testing costs.
By sharing the same application as a service, you dont foot the expensive security code review and/or
penetration test. Even with Platform as a Service (PaaS) where your developers get to write code, there
are potential cost economies of scale (particularly around use of code scanning tools that sweep source
code for security weaknesses).
Adoption fears and strategic innovation opportunities
Adoption-fears
Security: Many IT executives make decisions based on the perceived security risk instead of the real
security risk. IT has traditionally feared the loss of control for SaaS deployments based on an
assumption that if you cannot control something it must be unsecured. I recall the anxiety about the
web services deployment where people got really worked up on the security of web services because
the users could invoke an internal business process from outside of a firewall.
The IT will have to get used to the idea of software being delivered outside from a firewall that gets
meshed up with on-premise software before it reaches the end user. The intranet, extranet, DMZ, and
the internet boundaries have started to blur and this indeed imposes some serious security challenges
such as relying on a cloud vendor for the physical and logical security of the data, authenticating users
across firewalls by relying on vendor's authentication schemes etc., but assuming challenges as fears is
not
a
smart
strategy.
Latency: Just because something runs on a cloud it does not mean it has latency. My opinion is quite
the opposite. The cloud computing if done properly has opportunities to reduce latency based on its
architectural advantages such as massively parallel processing capabilities and distributed computing.
The web-based applications in early days went through the same perception issues and now people
don't worry about latency while shopping at Amazon.com or editing a document on Google docs
served to them over a cloud. The cloud is going to get better and better and the IT has no strategic
advantages to own and maintain the data centers. In fact the data centers are easy to shut down but the
applications are not and the CIOs should take any and all opportunities that they get to move the data
centers
away
if
they
can.
SLA: Recent Amazon EC2 meltdown and RIM's network outage created a debate around the
availability of a highly centralized infrastructure and their SLAs. The real problem is not a bad SLA
but lack of one. The IT needs a phone number that they can call in an unexpected event and have an up
front estimate about the downtime to manage the expectations. May be I am simplifying it too much
but this is the crux of the situation. The fear is not so much about 24x7 availability since an on-premise
system hardly promises that but what bothers IT the most is inability to quantify the impact on
business in an event of non-availability of a system and set and manage expectations upstream and
downstream. The non-existent SLA is a real issue and I believe there is a great service innovation
opportunity for ISVs and partners to help CIOs with the adoption of the cloud computing by providing
a rock solid SLA and transparency into the defect resolution process.
38
8.CONCLUSION:
39
In my view, there are some strong technical security arguments in favour of Cloud Computing assuming we can find ways to manage the risks. With this new paradigm come challenges and
opportunities. The challenges are getting plenty of attention - Im regularly afforded the opportunity to
comment on them, plus obviously I cover them on this blog. However, lets not lose sight of the
potential upside.
Some benefits depend on the Cloud service used and therefore do not apply across the board. For
example; I see no solid forensic benefits with SaaS. Also, for space reasons, Im purposely not
including the flip side to these benefits, however if you read this blog regularly you should recognise
some.
We
believe the Cloud offers Small and Medium Businesses major potential security benefits.
Frequently SMBs struggle with limited or non-existent in-house INFOSEC resources and budgets. The
caveat is that the Cloud market is still very new - security offerings are somewhat foggy - making
selection tricky. Clearly, not all Cloud providers will offer the same security.
Increases business responsiveness Accelerates creation of new services via rapid prototyping
capabilities Reduces acquisition complexity via service oriented approach Uses IT resources efficiently
via sharing and higher system utilization Reduces energy consumption
Handles new and emerging workloads Scales to extreme workloads quickly and easily
Simplifies IT management Platform for collaboration and innovation
Cultivates skills for next generation workforce
9.REFERENCES:
40
Web guild.org
http://www.webguild.org/
How stuff works.com
http://communication.howstuffworks.com/
Cloud security.org
http://cloudsecurity.org
IBM
http://www.ibm.com/developerworks/websphere/zones/hipods/
Google suggest
http://www.google.com/webhp?complete=1&hl=en
41
42