KEMBAR78
Cloud Computing Complete Notes | PDF | Apache Hadoop | Software As A Service
0% found this document useful (0 votes)
41 views198 pages

Cloud Computing Complete Notes

The document provides an overview of cloud computing, tracing its evolution from client/server computing to modern cloud services. It defines cloud computing, outlines its characteristics, and describes various service models including SaaS, PaaS, and IaaS, along with their advantages and disadvantages. Additionally, it discusses key concepts such as virtualization, load balancing, scalability, and monitoring in the context of cloud technologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views198 pages

Cloud Computing Complete Notes

The document provides an overview of cloud computing, tracing its evolution from client/server computing to modern cloud services. It defines cloud computing, outlines its characteristics, and describes various service models including SaaS, PaaS, and IaaS, along with their advantages and disadvantages. Additionally, it discusses key concepts such as virtualization, load balancing, scalability, and monitoring in the context of cloud technologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 198

UNIT-I

Introduction to cloud computing

History:

Before emerging the cloud computing, there was Client/Server


computing which is basically a centralized storage in which all the software
applications, all the data and all the controls are resided on the server side.

If a single user wants to access specific data or run a program,


he/she need to connect to the server and then gain appropriate access, and
then he/she can do his/her business.

Then after, distributed computing came into picture, where all the
computers are networked together and share their resources when needed.

On the basis of above computing, there was emerged of cloud


computing concepts that later implemented.

At around in 1961, John MacCharty suggested in a speech at MIT


that computing can be sold like a utility, just like a water or electricity. It
was a brilliant idea, but like all brilliant ideas, it was ahead if its time, as for
the next few decades, despite interest in the model, the technology simply
was not ready for it.

But of course time has passed and the technology caught that idea
and after few years we mentioned that:

In 1999, Salesforce.com started delivering of applications to users


using a simple website. The applications were delivered to enterprises over
the Internet, and this way the dream of computing sold as utility were true.

In 2002, Amazon started Amazon Web Services, providing services


like storage, computation and even human intelligence. However, only
starting with the launch of the Elastic Compute Cloud in 2006 a truly
commercial service open to everybody existed.

1
In 2009, Google Apps also started to provide cloud computing
enterprise applications.

Of course, all the big players are present in the cloud computing
evolution, some were earlier, and some were later. In
2009, Microsoft launched Windows Azure, and companies like Oracle and
HP have all joined the game. This proves that today, cloud computing has
become mainstream.

Definition of cloud:
Cloud Computing can be defined as delivering computing power (CPU,
RAM, Network Speeds, Storage OS software) a service over a network
(usually on the internet) rather than physically having the computing
resources at the customer location.

Example: AWS, Azure, Google Cloud

Characteristics of cloud computing:

The characteristics of cloud computing are given below:

1) Agility

The cloud works in the distributed computing environment. It shares


resources among users and works very fast.

2) High availability and reliability

Availability of servers is high and more reliable, because chances of


infrastructure failure are minimal.

3) High Scalability

Means "on-demand" provisioning of resources on a large scale, without


having engineers for peak loads.

2
4) Multi-Sharing

With the help of cloud computing, multiple users and applications can work
more efficiently with cost reductions by sharing common infrastructure.

5) Device and Location Independence

Cloud computing enables the users to access systems using a web browser
regardless of their location or what device they use e.g. PC, mobile phone
etc. As infrastructure is off-site (typically provided by a third-party) and
accessed via the Internet, users can connect from anywhere.

6) Maintenance

Maintenance of cloud computing applications is easier, since they do not


need to be installed on each user's computer and can be accessed from
different places. So, it reduces the cost also.

7) Low Cost

By using cloud computing, the cost will be reduced because to take the
services of cloud computing, IT company need not to set its own
infrastructure and pay-as-per usage of resources.

8) Services in pay-per-use mode

Application Programming Interfaces (APIs) are provided to the users so


that they can access services on the cloud by using these APIs and pay the
charges as per the usage of services.

Cloud Models:
The service models are categorized into three basic models:

1) Software-as-a-Service (SaaS)
2) Platform-as-a-Service (PaaS)
3) Infrastructure-as-a-Service (IaaS)

3
1) Software-as-a-Service (SaaS)
SaaS is a software distribution model in which applications are hosted by a
cloud service provider and made available to customers over internet. SaaS
is also known as "On-Demand Software".

In SaaS, software and associated data are centrally hosted on the cloud
server. SaaS is accessed by users using a thin client via a web browser.

Advantages
1) SaaS is easy to buy
SaaS pricing is based on a monthly fee or annual fee, SaaS allows
organizations to access business functionality at a low cost which is
less than licensed applications.
2) Less hardware required for SaaS
The software is hosted remotely, so organizations don't need to
invest in additional hardware.

4
3) Low Maintenance required for SaaS
Software as a service removes the necessity of installation, set-up,
and often daily unkeep and maintenance for organizations. Initial
set-up cost for SaaS is typically less than the enterprise software.
SaaS vendors actually pricing their applications based on some
usage parameters, such as number of users using the application. So
SaaS does easy to monitor and automatic updates.

4) No special software or hardware versions required


All users will have the same version of software and typically access it through the
web browser. SaaS reduces IT support costs by outsourcing hardware and software
maintenance and support to the IaaS provider.

Disadvantages
1) Security

Actually data is stored in cloud, so security may be an issue for some users.
However, cloud computing is not more secure than in-house deployment.
Learn more cloud security.

2) Latency issue

Because the data and application are stored in cloud at a variable distance
from the end user, so there is a possibility that there may be more latency
while interacting with the application than a local deployment. So, SaaS
model is not suitable for applications whose demand response times are in
milliseconds.

3) Total Dependency on Internet


Without internet connection, most SaaS applications are not usable.

4) Switching between SaaS vendors is difficult


Switching SaaS vendors involves the difficult and slow task of transferring
the very large data files over the Internet and then converting and importing
them into another SaaS also.
5) Infrastructure-as-a-Service (IaaS):IaaS is one of the layers of cloud
computing platform wherein the customer organization outsources its IT
infrastructure such as servers, networking, processing, storage, virtual
5
machines and other resources. Customers access these resources over
internet i.e. cloud computing platform, on a pay-per-use model.

Iaas, earlier called Hardware as a Service (HaaS), is a cloud computing


platform based model.

In traditional hosting services, IT infrastructure was rented out for a specific


periods of time, with pre-determined hardware configuration. The client
paid for the configuration and time, regardless of the actual use. With the
help of IaaS cloud computing platform layer, clients can dynamically scale
the configuration to meet changing requires, and are billed only for the
services actually used.

IaaS cloud computing platform layer eliminates the need for every
organization to maintain the IT infrastructure.

IaaS is offered in three models: public, private, and hybrid cloud. Private
cloud implies that the infrastructure resides at the customer-premise. In case
of public cloud, it is located at the cloud computing platform vendor's data
center; and hybrid cloud is a combination of two with customer choosing
the best of both worlds.

Advantages

1) You can dynamically choose a CPU, memory and storage configuration


as per your needs.

2) You easily access the vast computing power available on IaaS cloud
platform.

3) You can eliminate the need of investment in rarely used IT hardware.

4) IT infra will be handled by the IaaS cloud computing platform vendors.

6
Disadvantages
1) There is a risk of IaaS cloud computing platform vendor by gaining the
access to the organization’s data. But it can be avoided by opting for private
cloud.

2) IaaS cloud computing platform model is dependent on internet


availability.

3) It is also dependent on the availability of virtualization services.

4) IaaS cloud computing platform can limit the user privacy and
customization options.

Top vendors who are providing PaaS

1. Amazon Web Services


2. Netmagic Services
3. Rackspace
4. Reliance Communications
5. Sify Technologies
6. Tata Communications

3) Platform-as-a-Service (PaaS):
PaaS cloud computing platform is a developer programming platform
which is created for the programmer to develop, test, run and manage the
applications.

A developer is able to write the application as well as deploy it directly into


this layer easily.PaaS extend and abstract the IaaS layer by removing the
hassle of managing the individual virtual machine.

In PaaS cloud computing platform, back end scalability is handled by the


cloud service provider and the end user does not have to worry about to

7
manage the infrastructure.All the infrastructure to run the applications will
be over the internet.

Advantages
1) Simplified Development

Developers can focus on development and innovation without worrying


about the infrastructure.

2) Lower risk

No requirements of up-front investment in hardware and software.


Developers only need a PC and an internet connection to start building
applications.

3) Prebuilt business functionality

Some PaaS vendors also provide already defined business functionality so


that users can avoid building everything from very scratch and hence can
directly start the projects only.

4) Instant community

PaaS vendors frequently provides online communities where developer can


get the ideas, share experiences and seek advice from others.

5) Scalability

Applications deployed can scale from one to thousands of users without any
changes to the applications.

Disadvantages
1) Vendor lock-in

One have to write the applications according to the platform


provided by PaaS vendor so migration of an application to another
PaaS vendor would be a problem.

8
2) Data Privacy

Corporate data, whether it can be critical or not, will be private so if it is


not located within the walls of the company there can be a risk in terms
of privacy of data.

3) Integration with the rest of the systems applications

It may happen that some applications are local and some are in cloud.
So there will be chances of increased complexity when we want to use
data which in the cloud with the local data.

Top vendors who are providing PaaS


1. Google Apps Engine (GAE)
2. SalesFroce.com
3. Windows Azure
4. AppFog
5. Openshift
6. Cloud Foundary from VMware

Cloud Services Examples:

IaaS:
• Amazon EC2
• Google Compute Engine
• Windows Azure VMs
• PaaS:
• Google App Engine
• SaaS:
• Salesforce

Cloud Computing Applications:


Banking & Financial Apps
• E-Commerce Apps
• Social Networking
9
• Healthcare Systems
• Energy Systems
• Intelligent Transportation Systems
• E-Governance
• Education
• Mobile Communications

Cloud Concepts & Technologies:


Virtualization:

 Virtualization refers to the partitioning there sources of a physical


system (such as computing, storage, network and memory)
into multiple virtual resources.

 Key enabling technology of cloud computing that allow pooling of


resources. In cloud computing, resources are pooled to serve
multiple users using multi-tenancy.

Hypervisor:

• The virtualization layer consists of a hypervisor or a virtual machine


monitor (VMM).
• Hypervisor presents a virtual operating platform to a guest operating
system (OS).

10
• Type-1 Hypervisor
• Type-I or the native hypervisors run directly on the host hardware and
control the hardware and monitor the guest
Operating systems.

• Type-2 Hypervisor
• Type 2 hypervisors or hosted hypervisors run on top of a conventional
(main/host) operating system and monitor the guest operating systems.

Fig: Type-1 Hypervisor fig: Type-2 Hypervisor

Load Balancing:
 Cloud computing resources can be scaled up on demand to meet the
performance requirements of applications.
• Load balancing distributes workloads across multiple servers to meet the
application workloads.
• The goals of load balancing techniques include:
• Achieve maximum utilization of resources
• Minimizing the response times
 Maximizing throughput.

11
Load Balancing Algorithms :
• Round Robin load balancing
• Weighted Round Robin load balancing
• Low Latency load balancing
• Least Connections load balancing
• Priority load balancing
• Overflow load balancing

Load Balancing- Persistence Approaches:


• Since load balancing can route successive requests from a
user session to different servers, maintaining the state or the
information of the session is important.

• Persistence Approaches
• Sticky sessions
• Session Database
12
• Browser cookies
• URL re-writing

Scalability & Elasticity:


 Multi-tier applications such as e-Commerce, social networking,
business-to-business, etc. can experience rapid changes in their
traffic.
 Capacity planning involves determining theright sizing of each tier
of the deployment ofan application in terms of the number of
resources and the capacity of each resource.
• Capacity planning may be for computing,storage, memory or
resources.

Fig. Cost VS Capacity curves

Scaling Approaches:
• Vertical Scaling/Scaling up:
• Involves upgrading the hardware resources (adding additional computing,
memory,storage or network resources).
13
• Horizontal Scaling/Scaling out
• Involves addition of more resources of the same type.

Deployment:
• Cloud application deployment design is an iterative process that
involves:
• Deployment Design
• The variables in this step include the number of servers in each tier,
computing, memory and storage capacities
of servers, server interconnection, load balancing and replication strategies.
• Performance Evaluation
• To verify whether the application meets the performance requirements
with the deployment.
• Involves monitoring the workload on the application and measuring
various workload parameters such as
response time and throughput.
• Utilization of servers (CPU, memory, disk, I/O, etc.) in each tier is also
monitored.
• Deployment Refinement
• Various alternatives can exist in this step such as vertical scaling (or
scaling up), horizontal scaling (or scaling
out), alternative server interconnections, alternative load balancing and
replication strategies, for instance.
Replication:
Replication is used to create and maintain multiple copies of the data in the
cloud.
• Cloud enables rapid implementation of replication solutions for disaster
recovery for
organizations.
• With cloud-based data replication organizations can plan for disaster
recovery without
making any capital expenditures on purchasing, configuring and managing
secondary site locations.

14
• Types:
• Array-based Replication
• Network-based Replication
• Host-based Replication

Monitoring:

• Monitoring services allow cloud users to collect and analyze the data
on various monitoring metrics.

• A monitoring service collects data on various system and


application metrics from the cloud computing instances.

• Monitoring of cloud resources is important because it allows the


users to keep track of the health of applications and services deployed
in the cloud.

15
Examples of Monitoring Metrics

Type Metrics
CPU CPU-Usage, CPU-Idle
Disk Disk-Usage, Bytes/sec (read/write),
Operations/sec
Memory Memory-Used, Memory-Free, Page-Cache
Interface Packets/sec (incoming/outgoing),
Octets/sec(incoming/outgoing)

Software Defined Networking:


• Software-Defined Networking (SDN) is a networking architecture that
separates the
control plane from the data plane and centralizes the network controller.
• Conventional network architecture
• The control plane and data plane are coupled. Control plane is the part of
the network that carries the signaling and routing message traffic while the
data plane is the part of the network that carries the payload data traffic.

16
• SDN Architecture

• The control and data planes are decoupled and the network controller is
centralized.

Fig SDN Layers

SDN - Key Elements:


• Centralized Network Controller
• With decoupled the control and data planes and centralized network
controller, the network
administrators can rapidly configure the network.
• Programmable Open APIs
• SDN architecture supports programmable open APIs for interface between
the SDN application and
control layers (Northbound interface). These open APIs that allow
implementing various network services
such as routing, quality of service (QoS), access control, etc.
• Standard Communication Interface (OpenFlow)
• SDN architecture uses a standard communication interface between the
control and infrastructure layers
(Southbound interface). OpenFlow, which is defined by the Open
Networking Foundation (ONF) is the
broadly accepted SDN protocol for the Southbound interface.

17
Network Function Virtualization:
• Network Function Virtualization (NFV) is a technology that leverages
virtualization to
consolidate the heterogeneous network devices onto industry standard high
volume
servers, switches and storage.
• Relationship to SDN
• NFV is complementary to SDN as NFV can provide the infrastructure on
which SDN can run.
• NFV and SDN are mutually beneficial to each other but not dependent.
• Network functions can be virtualized without SDN, similarly, SDN can
run without NFV.
• NFV comprises of network functions implemented in software that run on
virtualized resources in the cloud.
• NFV enables a separation the network functions which are implemented
in software from the underlying
hardware.

NFV Architecture
• Key elements of the NFV architecture are
• Virtualized Network Function (VNF): VNF is asoftware implementation
of a network function
which is capable of running over the NFV Infrastructure (NFVI).
• NFV Infrastructure (NFVI): NFVI includes compute,network and storage
resources that are virtualized.
• NFV Management and Orchestration: NFV Management and
Orchestration focuses on all
virtualization-specific management tasks and coversthe orchestration and
lifecycle management of
physical and/or software resources that support theinfrastructure
virtualization, and the lifecycle management of VNFs.

18
Fig NFV Architecture

MapReduce:
• MapReduce is a parallel data processing model for processing and
analysis of massive scale data.
• MapReduce phases:
• Map Phase: In the Map phase, data is read from a distributed file system,
partitioned among a set of computing nodes in the cluster, and sent to the
nodes as a set of key-value pairs.
• The Map tasks process the input records independently of each other and
produce intermediate results as key-value pairs.
• The intermediate results are stored on the local disk of the node running
the Map task.
• Reduce Phase: When all the Map tasks are completed, the Reduce phase
begins in which the intermediate data with the same key is aggregated.

Fig: MapReduce Workflow


19
Identity and Access Management:

• Identity and Access Management (IDAM) for cloud describes the


authentication and authorization
of users to provide secure access to cloud resources.
• Organizations with multiple users can use IDAM services provided by the
cloud service provider
for management of user identifiers and user permissions.
• IDAM services allow organizations to centrally manage users, access
permissions, security
credentials and access keys.
• Organizations can enable role-based access control to cloud resources and
applications using the
IDAM services.
• IDAM services allow creation of user groups where all the users in a
group have the same access permissions.
• Identity and Access Management is enabled by a number of technologies
such as OpenAuth, Role-based Access Control (RBAC), Digital Identities,
Security Tokens, Identity Providers, etc.

Services level agreements in Cloud computing


A Service Level Agreement (SLA) is the bond for performance negotiated
between the cloud services provider and the client. Earlier, in cloud
computing all Service Level Agreements were negotiated between a client
and the service consumer. Nowadays, with the initiation of large utility-like
cloud computing providers, most Service Level Agreements are
standardized until a client becomes a large consumer of cloud services.
Service level agreements are also defined at different levels which are
mentioned below:

 Customer-based SLA
 Service-based SLA
 Multilevel SLA
Few Service Level Agreements are enforceable as contracts, but mostly are
agreements or contracts which are more along the lines of an Operating
Level Agreement (OLA) and may not have the restriction of law. It is fine
to have an attorney review the documents before making a major agreement
20
to the cloud service provider. Service Level Agreements usually
specify some parameters which are mentioned below:
1. Availability of the Service (uptime)
2. Latency or the response time
3. Service components reliability
4. Each party accountability
5. Warranties
Billing:
Cloud service providers offer a number of billing models described as
follows:
• Elastic Pricing
• In elastic pricing or pay-as-you-use pricing model, the customers are
charged based on the usage of cloud resources.
• Fixed Pricing
• In fixed pricing models, customers are charged a fixed amount per month
for the cloud resources.
• Spot Pricing
• Spot pricing models offer variable pricing for cloud resources which is
driven by market demand.

Cloud Services & Platforms:


Cloud Reference Model
• Infrastructure & Facilities Layer
• Includes the physical infrastructure such as datacenter facilities, electrical
and mechanical equipment, etc.
• Hardware Layer
• Includes physical compute, network and storage hardware.
• Virtualization Layer
• Partitions the physical hardware resources into multiple virtual resources
that enabling pooling of resources.
• Platform & Middleware Layer
• Builds upon the IaaS layers below and provides standardized stacks of
services such as database service, queuing service, application frameworks
and run-time environments, messaging services, monitoring
services,analytics services, etc.
• Service Management Layer
• Provides APIs for requesting, managing and monitoring cloud resources.
• Applications Layer
21
• Includes SaaS applications such as Email, cloud storage application,
productivity applications, management portals, customer self-service

Fig : Cloud Reference Model

Compute Services

• Compute services provide dynamically scalable compute capacity in the


cloud.
• Compute resources can be provisioned on-demand in the form of virtual
machines.
Virtual machines can be created from standard images provided by the
cloud
service provider or custom images created by the users.
• Compute services can be accessed from the web consoles of these services
that
provide graphical user interfaces for provisioning, managing and
monitoring these services.
• Cloud service providers also provide APIs for various programming
languages that allow developers to access and manage these services
programmatically.

22
Compute Services – Amazon EC2
• Amazon Elastic Compute Cloud (EC2) is a compute service provided by
Amazon.
• Launching EC2 Instances
• To launch a new instance click on the launch instance button. This will
open a wizard where you can select the Amazon machine image (AMI)
with which you want to launch the instance. You can also create their own
AMIs with custom applications, libraries and data. Instances can be
launched with a variety of operating systems.
• Instance Sizes
• When you launch an instance you specify the instance type (micro, small,
medium, large, extra-large, etc.), the number of instances to launch based
on the selected AMI and availability zones for the instances.
• Key-pairs
• When launching a new instance, the user selects a key-pair from existing
keypairs or creates a new keypair for the instance. Keypairs are used to
securely connect to an instance after it launches.
• Security Groups
• The security groups to be associated with the instance can be selected
from the instance launch wizard. Security groups are used to open or block
a specific network port for the launched instances.

Fig : Screenshot of Amazon EC2 Console

23
Storage Services:
• Cloud storage services allow storage and retrieval of any amount of data,
at any time from anywhere on the web.
• Most cloud storage services organize data into buckets or containers.
• Scalability
• Cloud storage services provide high capacity and scalability. Objects upto
several tera-bytes in size can be uploaded and
multiple buckets/containers can be created on cloud storages.
• Replication
• When an object is uploaded it is replicated at multiple facilities and/or on
multiple devices within each facility.
• Access Policies
• Cloud storage services provide several security features such as Access
Control Lists (ACLs), bucket/container level policies, etc.
ACLs can be used to selectively grant access permissions on individual
objects. Bucket/container level policies can also be
defined to allow or deny permissions across some or all of the objects
within a single bucket/container.
• Encryption
• Cloud storage services provide Server Side Encryption (SSE) options to
encrypt all data stored in the cloud storage
• Consistency
• Strong data consistency is provided for all upload and delete operations.
Therefore, any object that is uploaded can be immediately downloaded after
the upload is complete.
Storage Services – Amazon S3
• Amazon Simple Storage Service(S3) is an online cloud-based data storage
infrastructure for storing and retrieving any amount of data.
• S3 provides highly reliable, scalable, fast, fully redundant and affordable
storage infrastructure.
• Buckets
• Data stored on S3 is organized in the form of buckets. You must create a
bucket before you can store data on S3.
• Uploading Files to Buckets
• S3 console provides simple wizards for creating a new bucket and
uploading files to Buckets

• S3 console provides simple wizards for creating a new bucket and


uploading files.

24
• You can upload any kind of file to S3.
• While uploading a file, you can specify the redundancy and encryption
options and access permissions.

Fig : Screenshot of Amazon S3 Console

Database Services:
• Cloud database services allow you to set-up and operate relational or non-
relational databases in the cloud.
• Relational Databases
• Popular relational databases provided by various cloud service providers
include MySQL, Oracle, SQL Server, etc.
• Non-relational Databases
• The non-relational (No-SQL) databases provided by cloud service
providers are mostly proprietary solutions.

• Scalability
• Cloud database services allow provisioning as much compute and storage
resources as required to meet the application
workload levels. Provisioned capacity can be scaled-up or down. For read-
heavy workloads, read-replicas can be created.
• Reliability
• Cloud database services are reliable and provide automated backup and
snapshot options.
• Performance
• Cloud database services provide guaranteed performance with options
such as guaranteed input/output operations persecond (IOPS) which can be
provisioned upfront.
• Security
• Cloud database services provide several security features to restrict the
access to the database instances and stored data,such as network firewalls
and authentication mechanisms

25
Database Services – Amazon RDS
• Amazon Relational Database Service (RDS) is a web service that makes
it easy to setup, operate and scale a relational database in the cloud.
• Launching DB Instances
• The console provides an instance launch wizard that allows you to select
the type of database to create (MySQL, Oracle or SQL Server) database
instance size, allocated storage, DB instance identifier, DB username and
password. The status of the launched DB instances can be viewed from the
console.
• Connecting to a DB Instance
• Once the instance is available, you can note the instance end point from
the instance properties tab. This end point can then be used for securely
connecting to the instance.

Fig : Screenshot of Amazon RDS Console

Application services:
Cloud Computing has its applications in almost all the fields such
as business, entertainment, data storage, social networking, management,
entertainment, education, art and global positioning system, etc. Some of
the widely famous cloud computing applications are discussed here in this
tutorial:
Business Applications
Cloud computing has made businesses more collaborative and easy by
incorporating various apps such as MailChimp, Chatter, Google Apps
for business, and Quickbooks.

26
SN Application Description

1
MailChimp
It offers an e-mail publishing platform. It is widely employed
by the businesses to design and send their e-mail campaigns.

2
Chatter
Chatter app helps the employee to share important
information about organization in real time. One can get the
instant feed regarding any issue.

3
Google Apps for Business
Google offers creating text documents, spreadsheets,
presentations, etc., on Google Docs which allows the business
users to share them in collaborating manner.

4
Quickbooks
It offers online accounting solutions for a business. It helps
in monitoring cash flow, creating VAT returns and creating
business reports.

Data Storage and Backup


Box.com, Mozy, Joukuu are the applications offering data storage and
backup services in cloud.

SN Application Description

1
Box.com
Box.com offers drag and drop service for files. The users need
to drop the files into Box and access from anywhere.

27
2
Mozy
Mozy offers online backup service for files to prevent data
loss.

3
Joukuu
Joukuu is a web-based interface. It allows to display a single
list of contents for files stored in Google Docs, Box.net and
Dropbox.

Management Applications
There are apps available for management task such as time tracking,
organizing notes. Applications performing such tasks are discussed
below:

SN Application Description

1
Toggl
It helps in tracking time period assigned to a particular project.

2
Evernote
It organizes the sticky notes and even can read the text from
images which helps the user to locate the notes easily.

3
Outright
It is an accounting app. It helps to track income, expenses,
profits and losses in real time.

Social Applications
There are several social networking services providing websites such as
Facebook, Twitter, etc.

28
SN Application Description

1
Facebook
It offers social networking service. One can
share photos, videos, files, status and much
more.

2
Twitter
It helps to interact with the public directly.
One can follow any celebrity, organization
and any person, who is on twitter and can
have latest updates regarding the same.

Entertainment Applications
SN Application Description

1
Audio box.fm
It offers streaming service. The music files are
stored online and can be played from cloud
using the own media player of the service.

Art Applications
SN Application Description

1
Moo
It offers art services such as designing and
printing business cards, postcards and mini
cards.

29
Content Delivery Services:
• Cloud-based content delivery service include Content Delivery Networks
(CDNs).
• CDN is a distributed system of servers located across multiple geographic
locations to serve content to enduser with high availability and high
performance.
• CDNs are useful for serving static content such as text, images, scripts,
etc., and streaming media.
• CDNs have a number of edge locations deployed in multiple locations,
often over multiple backbones.
• Requests for static for streaming media content that is served by a CDN
are directed to the nearest edge location.
• Amazon CloudFront
• Amazon CloudFront is a content delivery service from Amazon.
CloudFront can be used to deliver dynamic, static and streaming content
using a global network of edge locations.
• Windows Azure Content Delivery Network
• Windows Azure Content Delivery Network (CDN) is the content delivery
service from Microsoft.

Analytics Services:
• Cloud-based analytics services allow analyzing massive data sets stored in
the cloud either in cloud storages or in cloud databases using programming
models such as MapReduce.
• Amazon Elastic MapReduce
• Amazon Elastic MapReduce is the MapReduce service from Amazon
based the Hadoop framework running on Amazon EC2 and S3
• EMR supports various job types such as Custom JAR, Hive program,
Streaming job, Pig programs and Hbase
• Google MapReduce Service
• Google MapReduce Service is a part of the App Engine platform and can
be accessed using the Google MapReduce API.
• Google BigQuery
• Google BigQuery is a service for querying massive datasets. BigQuery
allows querying datasets using SQL-like queries.
• Windows Azure HDInsight
• Windows Azure HDInsight is an analytics service from Microsoft.
HDInsight deploys and provisions Hadoop clusters in the Azure cloud and
makes Hadoop available as a service.

30
Deployment & Management Services
• Cloud-based deployment & management services allow you to easily
deploy and manage applications in the
cloud. These services automatically handle deployment tasks such as
capacity provisioning, load balancing,
auto-scaling, and application health monitoring.
• Amazon Elastic Beanstalk
• Amazon provides a deployment service called Elastic Beanstalk that
allows you to quickly deploy and manage applications in the AWS cloud.
• Elastic Beanstalk supports Java, PHP, .NET, Node.js, Python, and Ruby
applications.
• With Elastic Beanstalk you just need to upload the application and specify
configuration settings in a simple wizard and the service
automatically handles instance provisioning, server configuration, load
balancing and monitoring.
• Amazon CloudFormation
• Amazon CloudFormation is a deployment management service from
Amazon.
• With CloudFront you can create deployments from a collection of AWS
resources such as Amazon Elastic Compute Cloud, Amazon
Elastic Block Store, Amazon Simple Notification Service, Elastic Load
Balancing and Auto Scaling.
• A collection of AWS resources that you want to manage together are
organized into a stack.

Identity & Access Management Services


• Identity & Access Management (IDAM) services allow managing the
authentication and authorization of users to provide secure access to cloud
resources.
• Using IDAM services you can manage user identifiers, user permissions,
security credentials and access keys.
• Amazon Identity & Access Management
• AWS Identity and Access Management (IAM) allows you to manage
users and user permissions for an AWS account.
• With IAM you can manage users, security credentials such as access keys,
and permissions that control which AWS resources users can access.
31
• Using IAM you can control what data users can access and what resources
users can create.
• IAM also allows you to control creation, rotation, and revocation security
credentials of users.
• Windows Azure Active Directory
• Windows Azure Active Directory is an Identity & Access Management
Service from Microsoft.
• Azure Active Directory provides a cloud-based identity provider that
easily integrates with your on-premises active directory
deployments and also provides support for third party identity providers.
• With Azure Active Directory you can control access to your applications
in Windows Azure.
Open Source Private Cloud Software - CloudStack
• Apache CloudStack is an open source cloud software that can be used for
creating private cloud offerings.
• CloudStack manages the network, storage, and compute nodes that make
up a cloud infrastructure.
• A CloudStack installation consists of a Management Server and the cloud
infrastructure that it manages.
• Zones
• The Management Server manages one or more zones where each zone is
typically a single datacenter.
• Pods
• Each zone has one or more pods. A pod is a rack of hardware comprising
of a switch and one or more clusters.
• Cluster
• A cluster consists of one or more hosts and a primary storage. A host is a
compute node that runs guest virtual machines.
• Primary Storage
• The primary storage of a cluster stores the disk volumes for all the virtual
machines running on the hosts in that cluster.
• Secondary Storage
• Each zone has a secondary storage that stores templates, ISO images, and
disk volume snapshots.

32
Fig : Open Stack Architecture

PART-A (2-Marks Questions)

1. Define Cloud computing.

2. Define virtualization.

3. What are cloud models?

4. What are a pplication of cloud?

5. Define scalability?

6. Define load balancing.

7. What is replication?

8. What are sticky sessions?

9. What are the benefits of load balancing?

11. What are the various criteria for service level agreements?

12. What are the various layers in the cloud reference model?

13. What are the benefits of using a sandbox environment for a PaaS?

33
14. What is a push messaging service? What are its uses?

15. What is Google app engine?

16. What is meant by scheduler?

18. Define Billing.

PART B (10-Marks Questions)


1. What is the difference between horizontal scaling an vertical
scaling? Describe scenarios in which you will use each type of
scaling.
2. Explain the cloud computing service models?
3. What are the various stages in deployment lifecycle?
4. What are the various criteria for service level agreements?
5. In mapreduce, what are the functions of map, reduce and combine
tasks?
6. Describe three applications that can benefit from the mapreduce
programming model?
7. What are the various security mechanisms of cloud-storage
services?
8. What is a content delivery network?
9. Describe a real-world applaication that can benefit from google big

34
UNIT-II

Hadoop & MapReduce


Introduction:
Hadoop is an Apache open source framework written in java that allows
distributed processing of large datasets across clusters of computers using
simple programming models. The Hadoop framework application works in an
environment that provides distributed storage and computation across clusters
of computers. Hadoop is designed to scale up from single server to thousands
of machines, each offering local computation and storage.
Apache Ecosystem:
• Apache Hadoop is an open source framework for distributed batch
processing of big data.
• Hadoop Ecosystem includes:
• Hadoop MapReduce
• HDFS
• YARN
• HBase
• Zookeeper
• Pig
• Hive
• Mahout
• Chukwa
• Cassandra
• Avro
• Oozie
• Flume Fig: Hadoop EcoSystem
• Sqo
Apache Hadoop :
A Hadoop cluster comprises of a Master node, backup node and a numbe of
slave nodes.
• The master node runs the NameNode and JobTracker processes and the
slave nodes run the DataNode and TaskTracker components of Hadoop.
• The backup node runs the Secondary NameNode process.
35
• NameNode
• Name node keeps the directory tree of all files in the file system, and
tracks where across the cluster the file data is kept. It does not store the data
of these files itself. Client applications talk to the Name node whenever
they wish to locate a file, or when they want to add/copy/move/delete a file.
• Secondary Name node
• Name node is a Single Point of Failure for the HDFS Cluster. An optional
Secondary Name node which is hosted on a separate machine creates
checkpoints of the namespace.
• Job Tracker
• The Job tracker is the service within Hadoop that distributes MapReduce
tasks to specific nodes in the cluster, ideally the nodes that have the data, or
at least are in the same rack.

Fig : Components Of Hadoop Cluster


TaskTracker
• TaskTracker is a node in a Hadoop cluster that accepts Map, Reduce and
Shuffie tasks from the JobTracker.
• Each TaskTracker has a defined number of slots which indicate the
number of tasks that it can accept.
• DataNode
• A DataNode stores data in an HDFS file system.

36
• A functional HDFS filesystem has more than one DataNode, with data
replicated across them.
• DataNodes respond to requests from the NameNode for filesystem
operations.
• Client applications can talk directly to a DataNode, once the NameNode
has provided the location of the data.
• Similarly, MapReduce operations assigned to TaskTracker instances near
a DataNode, talk directly to the DataNode to access the files.
• TaskTracker instances can be deployed on the same servers that host
DataNode instances, so that MapReduce operations are performed close to
the data.
MapReduce:
MapReduce job consists of two phases:
• Map: In the Map phase, data is read from a distributed file system and
partitioned among a set of computing nodes in the cluster. The data is sent
to the nodes as a set of key-value pairs. The Map tasks process the input
records independently of each other and produce intermediate results as
key-value pairs. The intermediate results are stored on the local disk of the
node running the Map task.
• Reduce: When all the Map tasks are completed, the Reduce phase begins
in which the intermediate data with the same key is aggregated.
• Optional Combine Task
• An optional Combine task can be used to perform data aggregation on the
intermediate data of the same key for the output of the mapper before
transferring the output to the Reduce task.

Fig : Map Reduce


37
MapReduce Job Execution Workflow:

MapReduce job execution starts when the client applications submit jobs to
the Job tracker.
• The JobTracker returns a JobID to the client application. The JobTracker
talks to the NameNode to determine the location of the data.
• The JobTracker locates TaskTracker nodes with available slots at/or near
the data.
• The TaskTrackers send out heartbeat messages to the JobTracker, usually
every few minutes, to reassure the
JobTracker that they are still alive. These messages also inform the
JobTracker of the number of available
slots, so the JobTracker can stay up to date with where in the cluster, new
work can be delegated.

Fig: Hadoop MapReduce Job Execution

MapReduce 2.0 – YARN:


In Hadoop 2.0 the original processing engine of Hadoop (MapReduce) has
been separated from the resource management (which is now part
ofYARN).
• This makes YARN effectively an operating system for Hadoop that
supports different processing engines on a Hadoop cluster such as
MapReduce for batch processing, ApacheTez for interactive queries,
Apache Storm for stream processing, etc.

38
• YARN architecture divides architecture divides the two major functions
of the JobTracker - resource management and job life-cycle management -
into separate components:
• ResourceManager
• ApplicationMaster.

Fig Hadoop Mapreduce Next Generation Job Execution

YARN Components
• Resource Manager (RM): RM manages the global assignment of
compute resources to applications. RM consists of two main services:
• Scheduler: Scheduler is a pluggable service that manages and enforces
the resource scheduling policy in the cluster.
• Applications Manager (AsM): AsM manages the running Application
Masters in the cluster. AsM is responsible for starting application masters
and for monitoring and restarting them on different nodes in case of
failures.
• Application Master (AM): A per-application AM manages the
application’s life cycle. AM is responsible for negotiating resources from
the RM and working with the NMs to execute and monitor the tasks.
• Node Manager (NM): A per-machine NM manages the user processes on
that machine.

39
• Containers: Container is a bundle of resources allocated by RM
(memory, CPU, network, etc.). A container is a conceptual entity
that grants an application the privilege to use a certain amount of resources
on a given machine to run a component task.

Hadoop Schedulers
• Hadoop scheduler is a pluggable component that makes it open to support
different scheduling algorithms.
• The default scheduler in Hadoop is FIFO.
• Two advanced schedulers are also available - the Fair Scheduler,
developed at Facebook, and the Capacity Scheduler, developed at Yahoo.
• The pluggable scheduler framework provides the flexibility to support a
variety of workloads with varying priority and performance constraints.
• Efficient job scheduling makes Hadoop a multi-tasking system that can
process multiple data sets for multiple jobs for multiple users
simultaneously.

FIFO Scheduler
• FIFO is the default scheduler in Hadoop that maintains a work queue in
which the jobs are queued.
• The scheduler pulls jobs in first in first out manner (oldest job first) for
scheduling.
• There is no concept of priority or size of job in FIFO scheduler.
Fair Scheduler
• The Fair Scheduler allocates resources evenly between multiple jobs and
also provides capacity guarantees.
• Fair Scheduler assigns resources to jobs such that each job gets an equal
share of the available resources on average over time.
• Tasks slots that are free are assigned to the new jobs, so that each job gets
roughly the same amount of CPU time.
• Job Pools
• The Fair Scheduler maintains a set of pools into which jobs are placed.
Each pool has a guaranteed capacity.
• When there is a single job running, all the resources are assigned to that
job. When there are multiple jobs in the pools, each pool gets at least as
many task slots as guaranteed.
• Each pool receives at least the minimum share.
• When a pool does not require the guaranteed share the excess capacity is
split between other jobs.

40
• Fairness
• The scheduler computes periodically the difference between the
computing time received by each job and the time it should have received
in ideal scheduling.
• The job which has the highest deficit of the compute time received is
scheduled next.

Capacity Scheduler
• The Capacity Scheduler has similar functionally as the Fair Scheduler but
adopts a different scheduling philosophy.
• Queues
• In Capacity Scheduler, you define a number of named queues each with a
configurable number of map and reduce slots.
• Each queue is also assigned a guaranteed capacity.
• The Capacity Scheduler gives each queue its capacity when it contains
jobs, and shares any unused capacity between the queues. Within each
queue FIFO scheduling with priority is used.
• Fairness
• For fairness, it is possible to place a limit on the percentage of running
tasks per user, so that users share a cluster equally.
• A wait time for each queue can be configured. When a queue is not
scheduled for more than the wait time, it can preempt tasks of other queues
to get its fair share.

Hadoop Cluster Setup

Follow the steps given below to have Hadoop Multi-Node cluster setup.
Installing Java
Java is the main prerequisite for Hadoop. First of all, you should verify the
existence of java in your system using “java -version”. The syntax of java version
command is given below.
$ java –version
If everything works fine it will give you the following output.
java version "1.7.0_71" Java(TM) SE Runtime Environment (build 1.7.0_71-
b13) Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)
If java is not installed in your system, then follow the given steps for installing
java.

41
Step 1
Download java (JDK - X64.tar.gz) by visiting the following link
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-
1880260.html Then jdk-7u71-linux-x64.tar.gz will be downloaded into your
system.

Step 2
Generally you will find the downloaded java file in Downloads folder. Verify it
and extract the jdk7u71-linux-x64.gz file using the following commands.
$ cd Downloads/
$ ls
jdk-7u71-Linux-x64.gz
$ tar zxf jdk-7u71-Linux-x64.gz
$ ls
jdk1.7.0_71 jdk-7u71-Linux-x64.gz
Step 3
To make java available to all the users, you have to move it to the location
“/usr/local/”. Open the root, and type the following commands.
$ su
password:
# mv jdk1.7.0_71 /usr/local/
# exit
Step 4
For setting up PATH and JAVA_HOME variables, add the following commands
to ~/.bashrc file.
export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=PATH:$JAVA_HOME/bin
Now verify the java -version command from the terminal as explained above.
Follow the above process and install java in all your cluster nodes.
Creating User Account
Create a system user account on both master and slave systems to use the Hadoop
installation.
# useradd hadoop
# passwd hadoop
Mapping the nodes
You have to edit hosts file in /etc/ folder on all nodes, specify the IP address of
each system followed by their host names.
# vi /etc/hosts
enter the following lines in the /etc/hosts file.
192.168.1.109 hadoop-master
192.168.1.145 hadoop-slave-1
192.168.56.1 hadoop-slave-2
Configuring Key Based Login

42
Setup ssh in every node such that they can communicate with one another without
any prompt for password.

# su hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/id_rsa.pub tutorialspoint@hadoop-master
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2
$ chmod 0600 ~/.ssh/authorized_keys
$ exit
Installing Hadoop
You have to configure Hadoop server by making the following changes as
given below.
# mkdir /opt/hadoop
# cd /opt/hadoop/
# wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-
1.2.0.tar.gz
# tar -xzf hadoop-1.2.0.tar.gz
# mv hadoop-1.2.0 hadoop
# chown -R hadoop /opt/hadoop
# cd /opt/hadoop/hadoop/
Configuring Hadoop
You have to configure Hadoop server by making the following changes as
given below.
core-site.xml
Open the core-site.xml file and edit it as shown below.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

43
Starting Hadoop Services
The following command is to start all the Hadoop services on the Hadoop-
Master.
$ cd $HADOOP_HOME/sbin
$ start-all.sh

Cloud Application Design


Design Considerations for Cloud Applications:
• Scalability
• Scalability is an important factor that drives the application
designers to move to cloud computing environments.
Building applications that can serve millions of users
without taking a hit on their performance has always been
challenging. With the growth of cloud computing
application designers can provision adequate resources to
meet their workload levels.
• Reliability & Availability
• Reliability of a system is defined as the probability
that a system will perform the intended functions
under stated conditions for a specified amount of
time. Availability is the probability that a system will
perform a specified function under given conditions
at a prescribed time.
• Security
• Security is an important design consideration for cloud
applications given the outsourced nature of cloud computing
environments.

• Maintenance & Up gradation


• To achieve a rapid time-to-market, businesses
typically launch their applications with a core set of
features ready and then incrementally add new
features as and when they are complete. In such
scenarios, it is important to design applications with
low maintenance and up gradation costs.

44
• Performance
• Applications should be designed while keeping the
performance requirements in mind.

Reference Architectures – e-Commerce, Business-to-


Business, Banking and Financial apps
• Load Balancing Tier
• Load balancing tier consists of one or more load balancers.
• Application Tier
• For this tier, it is recommended to configure auto scaling.
• Auto scaling can be triggered when the recorded values for any of the
specified metrics such as CPU usage, memory usage, etc. goes above
defined thresholds.
• Database Tier
• The database tier includes a master database instance and multiple slave
instances.
• The master node serves all the write requests and the read requests are
served from the slave
• This improves the throughput for the database tier since most applications
have a higher number of read requests than write requests.

Fig : Reference Architectures –e-Commerce, Business-to-Business, Banking and Financial


apps

45
Reference Architectures –Content delivery apps
• Figure shows a typical deployment architecture for content delivery
applications such as online photo albums, video webcasting, etc.
• Both relational and non-relational data stores are shown in this
deployment.
• A content delivery network (CDN) which consists of a global network of
edge locations is used for media delivery.
• CDN is used to speed up the delivery of static content such as images and
videos.

Fig : Reference Architectures –Content delivery apps

Reference Architectures –Analytics apps


• Figure shows a typical deployment architecture for compute intensive
applications such as Data Analytics, Media ranscoding, etc.
• Comprises of web, application, storage, computing/analytics and database
tiers.
• The analytics tier consists of cloud-based distributed batch processing
frameworks such as Hadoop which are suitable for analyzing big data.

46
• Data analysis jobs (such as MapReduce) jobs are submitted to the
analytics tier from the application servers.
• The jobs are queued for execution and upon completion the analyzed data
is presented from the application servers.

Fig : Deployment Architecture for compute intensive applications

Service Oriented Architecture


• Service Oriented Architecture (SOA) is a well established architectural
approach for designing and developing applications in the form services
that can be shared and reused.
• SOA is a collection of discrete software modules or services that form a
part of an application and collectively provide the functionality of an
application.
• SOA services are developed as loosely coupled modules with no
hardwired calls embedded in the services.
• The services communicate with each other by passing messages.
• Services are described using the Web Services Description Language
(WSDL).

47
• WSDL is an XML-based web services description language that is used to
create service descriptions containing information on the functions
performed by a service and the inputs and outputs of the service.

Fig WSDL Concepts for representation of web services

SOA Layers
• Business Systems
• This layer consists of custom built applications and legacy systems such
as Enterprise Resource Planning (ERP), Customer Relationship
Management (CRM), Supply Chain Management (SCM), etc.
• Service Components
• The service components allow the layers above to interact with the
business
systems. The service components are responsible for realizing the
functionality of the services exposed.

• Composite Services
• These are coarse-grained services which are composed of two or
more service components. Composite services can be used to create
enterprise scale components or business-unit specific components.
48
• Orchestrated Business Processes
• Composite services can be orchestrated to create higher level
business processes. In this layers the compositions and orchestrations
of the composite services are defined to create business processes.
• Presentation Services
• This is the topmost layer that includes user interfaces that exposes
the services and the orchestrated business processes to the users.
• Enterprise Service Bus
• This layer integrates the services through adapters, routing,
transformation and messaging mechanisms.

Fig : Layers of service oriented architecture

Cloud Component Model


• Cloud Component Model is an application design methodology that
provides a flexible way of creating cloud applications in a rapid, convenient
and platform independent manner.

49
• CCM is an architectural approach for cloud applications that is not tied to
any specific programming language or cloud platform.
• Cloud applications designed with CCM approach can have innovative
hybrid deployments in which different components of an application can be
deployed on cloud infrastructure and platforms of different cloud vendors.
• Applications designed using CCM have better portability and
interoperability.
• CCM based applications have better scalability by decoupling application
components and providing asynchronous communication mechanisms.
CCM Application Design Methodology
• CCM approach for application design involves:
• Component Design
• Architecture Design
• Deployment Design

Fig : CCM Model Methodology

50
CCM Component Design:
• Cloud Component Model is created for the application based on
comprehensive analysis of the application’s functions and building blocks.
• Cloud component model allows identifying the building blocks of a cloud
application which are classified based on the functions performed and type
of cloud resources required.
• Each building block performs a set of actions to produce the desired
outputs for other components.
• Each component takes specific inputs, performs a predefined set of
actions and produces the desired outputs.
• Components offer their functions as services through a functional
interface which can be used by other components.
• Components report their performance to a performance database through a
performance interface.

Fig: CCM map for an e-Commerce application

CCM Architecture Design

• In Architecture Design step, interactions between the application


components are defined.
• CCM components have the following characteristics:
• Loose Coupling
51
• Components in the Cloud Component Model are loosely coupled.
• Asynchronous Communication
• By allowing asynchronous communication between components, it
ispossible to add capacity by adding additional servers when the
application load increases.Asynchronous communication is made possible
by using messaging queues.
• Stateless Design
• Components in the Cloud Component Model are stateless. By storing
session state outside of the component (e.g. in a database), stateless
component design enables distribution and horizontal scaling.

Fig: Architecture design of an e-Commerce application.

CCM Deployment Design


• In Deployment Design step, application components are mapped to
specific cloud resources such as web servers, application servers, database
servers, etc.
• Since the application components are designed to be loosely coupled and
stateless with asynchronous communication, components can be deployed
independently of each other.
• This approach makes it easy to migrate application components from one
cloud to the other.

52
• With this fiexibility in application design and deployment, the application
developers can ensure that the applications meet the performance and cost
requirements with changing contexts.

Fig: Deployment design of an e-Commerce application

SOA vs CCM
Similarities

S
O
A
Standard SOA advocates principles CCM is
ization of reuse and well defined based on
& Re- relationship between service reusable
use provider and service components
consumer. which can
be used by
multiple
cloud
applications
.

53
Loose SOA is based on loosely CCM is
coupling coupled services that based on
minimize dependencies. loosely
coupled
compone
nts that
communi
cate
asynchro
nously
Statelessness SOA services minimize CCM
resource consumption by component
deferring the management of s are
state information. stateless.
State is
stored
outside of
the
component
s.

Model View Controller

• Model View Controller (MVC) is a popular software design pattern


for web applications.
• Model
• Model manages the data and the behavior of the applications. Model
processes events sent by the controller. Model has no information about the
views and controllers. Model responds to the requests for information about
its state (from the view) and responds to the instructions to change state
(from controller).
• View
• View prepares the interface which is shown to the user. Users interact
with the application through views. Views present the information that the
model or controller tell the view to present to the user and also handle user
requests and sends them to the controller.
54
• Controller
• Controller glues the model to the view. Controller processes user requests
and updates the model when the user manipulates the view. Controller also
updates the view when the model changes.

Fig : Model view controller

RESTful Web Services


• Representational State Transfer (REST) is a set of architectural principles
by which you can design web services and web APIs that focus on a
system’s resources and how resource states are addressed and transferred.
• The REST architectural constraints apply to the components, connectors,
and data elements, within a distributed hypermedia system.
• A RESTful web service is a web API implemented using HTTP and
REST principles.
• The REST architectural constraints are as follows:
• Client-Server
• Stateless
• Cacheable
• Layered System
• Uniform Interface
• Code on demand

Data Storage Approaches


Relational Databases
• A relational database is database that conforms to the relational model that
was popularized by Edgar Codd in 1970.
• The 12 rules that Codd introduced for relational databases include:
• Information rule
• Guaranteed access rule
• Systematic treatment of null values
• Dynamic online catalog based on relational model
• Comprehensive sub-language rule
• View updating rule
55
• High level insert, update, delete
• Physical data independence
• Logical data independence
• Integrity independence
• Distribution independence
• Non-subversion rule

• Relations
• A relational database has a collection of relations (or tables). A relation is
a set of tuples (or rows).
• Schema
• Each relation has a fixed schema that defines the set of attributes (or
columns in a table) and the constraints on the attributes.
• Tuples
• Each tuple in a relation has the same attributes (columns). The tuples in a
relation can have any order and the relation is not sensitive to the ordering
of the tuples.
• Attributes
• Each attribute has a domain, which is the set of possible values for the
attribute.
• Insert/Update/Delete
• Relations can be modified using insert, update and delete operations.
Every relation has a primary key that uniquely identifies each tuple in the
relation.
• Primary Key
• An attribute can be made a primary key if it does not have repeated values
in different tuples
ACID Guarantees.
• Relational databases provide ACID guarantees.
• Atomicity
• Atomicity property ensures that each transaction is either “all or nothing”.
An atomic transaction ensures that all parts of the transaction complete or
the database state is left unchanged.
• Consistency
• Consistency property ensures that each transaction brings the database
from one valid state to another. In other words, the data in a database
always conforms to the defined schema and constraints.

56
• Isolation
• Isolation property ensures that the database state obtained after a set of
concurrent transactions is the same as would have been if the transactions
were executed serially. This provides concurrency control, i.e. the results of
incomplete transactions are not visible to other transactions. The
transactions are isolated from each other until they finish.
• Durability
• Durability property ensures that once a transaction is committed, the data
remains as it is, i.e. it is not affected by system outages such as power loss.
Durability guarantees that the database can keep track of changes and can
recover from abnormal terminations.
Non-Relational Databases
• Non-relational databases (or popularly called No-SQL databases) are
becoming popular with the growth of cloud computing.
• Non-relational databases have better horizontal scaling capability and
improved performance for big data at the cost of less rigorous consistency
models.
• Unlike relational databases, non-relational databases do not provide ACID
guarantees.
• Most non-relational databases offer “eventual” consistency, which means
that given a sufficiently long period of time over which no updates are
made, all updates can be expected to propagate eventually through the
system and the replicas will be consistent.
• The driving force behind the non-relational databases is the need for
databases that can achieve high scalability, fault tolerance and availability.
• These databases can be distributed on a large cluster of machines. Fault
tolerance is provided by storing multiple replicas of data on different
machines.
Non-Relational Databases – Types
• Key-value store
• Key-value store databases are suited for applications that require storing
unstructured data without a fixed schema. Most key-value stores have
support for native programming language data types.
• Document store
• Document store databases store semi-structured data in the form of
documents which are encoded in different
standards such as JSON, XML, BSON, YAML, etc.

57
• Graph store
• Graph stores are designed for storing data that has graph structure (nodes
and edges). These solutions are suitable for applications that involve graph
data such as social networks, transportation systems, etc.
• Object store
• Object store solutions are designed for storing data in the form of objects
de?ned in an object-oriented programming language.
Python Basics
Python:
Python is a general-purpose high level programming language and suitable
for providing a solid foundation to the reader in the area of cloud
computing.
• The main characteristics of Python are:
• Multi-paradigm programming language
• Python supports more than one programming paradigms including object-
oriented programming and structured programming
• Interpreted Language
• Python is an interpreted language and does not require an explicit
compilation step. The Python interpreter executes the program source code
directly, statement by statement, as a processor or scripting engine does.
• Interactive Language
• Python provides an interactive mode in which the user can submit
commands at the Python prompt and interact with the interpreter directly.
Python – Benefits
• Easy-to-learn, read and maintain
• Python is a minimalistic language with relatively few keywords, uses
English keywords and has fewer syntactical constructions as compared to
other languages. Reading Python programs feels like English with pseudo-
code like constructs. Python is easy to learn yet an extremely powerful
language for a wide range of applications.
• Object and Procedure Oriented
• Python supports both procedure-oriented programming and object-
oriented programming. Procedure oriented paradigm allows programs to be
written around procedures or functions that allow reuse of code. Procedure
oriented paradigm allows programs to be written around objects that
include both data and functionality.

58
• Extendable
• Python is an extendable language and allows integration of low-level
modules written in languages such as C/C++. This is useful when you want
to speed up a critical portion of a program.
• Scalable
• Due to the minimalistic nature of Python, it provides a manageable
structure for large programs.
• Portable
• Since Python is an interpreted language, programmers do not have to
worry about compilation, linking and loading of programs. Python
programs can be directly executed from source.
• Broad Library Support
• Python has a broad library support and works on various platforms such as
Windows,Linux, Mac, etc.

Python – Setup
• Windows
• Python binaries for Windows can be downloaded from
http://www.python.org/getit.
• For the examples and exercise in this book, you would require Python 2.7
which can be directly downloaded from:
http://www.python.org/ftp/python/2.7.5/python-2.7.5.msi
• Once the python binary is installed you can run the python shell at the
command prompt using > python
• Linux
#Install Dependencies
sudo apt-get install build-essential
sudo apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev
libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev
#Download Python
wget http://python.org/ftp/python/2.7.5/Python-2.7.5.tgz
tar -xvf Python-2.7.5.tgz
cd Python-2.7.5
#Install Python
./configure
make
sudo make install

59
Python Data Types
The Python data types are as fallows:
 Numbers
 Strings
 Lists
 Tuples
 Dictionaries
 Type Conversions
Numbers
 Number data type is used to store numeric values. Numbers are
immutable data types, therefore changing the value of a number data
type results in a newly allocated object.

Eg1.

#Integer
>>>a=5
>>>type(a)
<type ’int’>
#Floating Point
>>>b=2.5
>>>type(b)
<type ’float’>
#Long
>>>x=9898878787676L
>>>type(x)
<type ’long’>
#Complex
>>>y=2+5j
>>>y
(2+5j)
>>>type(y)
<type ’complex’>
>>>y.real
2 >>>y.imag
5

60
Eg2.

#Addition
>>>c=a+b
>>>c
7.5
>>>type(c)
<type ’float’>
#Subtraction
>>>d=a-b
>>>d
2.5
>>>type(d)
<type ’float’>
#Multiplication
>>>e=a*b
>>>e
12.5
>>>type(e)
<type ’float’>

Eg3:

#Division
>>>f=b/a
>>>f
0.5
>>>type(f)
<type float’>
#Power
>>>g=a**2
>>>g
25
Strings
• A string is simply a list of characters in order. There are no limits to the
number of characters you can have in a string.

61
Eg1:

#Create string
>>>s="Hello World!"
>>>type(s)
<type ’str’>
#String concatenation
>>>t="This is sample program."
>>>r = s+t
>>>r
’Hello World!This is sample program.’
#Get length of string
>>>len(s)
12
#Convert string to integer
>>>x="100"
>>>type(s)
<type ’str’>
>>>y=int(x)
>>>y
100

Eg2:
#Print string
>>>print s
Hello World!
#Formatting output
>>>print "The string (The string (Hello World!)
has 12 characters
#Convert to upper/lower case
>>>s.upper()
’HELLO WORLD!’
>>>s.lower()
’hello world!’
#Accessing sub-strings
>>>s[0]
’H’
>>>s[6:]
’World!’
62
>>>s[6:-1]’World’

Lists
• List a compound data type used to group together other values. List items
need not all have the same type. A list contains items separated by commas
and enclosed within square brackets.

Eg1:
Create List
>>>fruits=[’apple’,’orange’,’banana’,’mango’]
>>>type(fruits)
<type ’list’>
#Get Length of List
>>>len(fruits)
4
#Access List Elements
>>>fruits[1]
’orange’
>>>fruits[1:3]
[’orange’, ’banana’]
>>>fruits[1:]
[’orange’, ’banana’, ’mango’]
#Appending an item to a list
>>>fruits.append(’pear’)
>>>fruits
[’apple’, ’orange’, ’banana’, ’mango’, ’pear’]
Eg2:
>>>fruits.remove(’mango’)
>>>fruits
[’apple’, ’orange’, ’banana’, ’pear’]
#Inserting an item to a list
>>>fruits.insert(1,’mango’)
>>>fruits
[’apple’, ’mango’, ’orange’, ’banana’, ’pear’]
#Combining lists
>>>vegetables=[’potato’,’carrot’,’onion’,’beans’,’r
adish’]
63
>>>vegetables
[’potato’, ’carrot’, ’onion’, ’beans’, ’radish’]
>>>eatables=fruits+vegetables
>>>eatables
[’appl
e’,
’mang
o’,
’orang
e’,
’banan
a’,
’pear’, ’potato’, ’carrot’, ’onion’, ’beans’, ’radish’]
Tuples
• A tuple is a sequence data type that is similar to the list. A tuple consists
of a number of values separated by commas and enclosed within
parentheses. Unlike lists, the elements of tuples cannot be changed, so
tuples can be thought of as read-only lists.

Eg1:

#Create a Tuple
>>>fruits=("apple","mango","banana","pineapple")
>>>fruits
(’apple’, ’mango’, ’banana’, ’pineapple’)
>>>type(fruits)
<type ’tuple’>
#Get length of tuple
>>>len(fruits)
4

Eg2:

#Get an element from a tuple


>>>fruits[0]
’apple’
>>>fruits[:2]
(’apple’, ’mango’)
#Combining tuples
64
>>>vegetables=(’potato’,’carrot’,’onion’,’radish’)
>>>eatables=fruits+vegetables
>>>eatables
(’apple’, ’mango’, ’banana’, ’pineapple’, ’potato’, ’carrot’, ’onion’,
’radish’)

Dictionaries
• Dictionary is a mapping data type or a kind of hash table that maps keys
to values. Keys in a dictionary can be of any data type, though numbers and
strings are commonly used for keys. Values in a dictionary can be any data
type or object.

Eg1:

#Create a dictionary
>>>student={’name’:’Mary’,’id’:’8776’,’major’:’CS’}
>>>student
{’major’: ’CS’, ’name’: ’Mary’, ’id’: ’8776’}
>>>type(student)
<type ’dict’>
#Get length of a dictionary
>>>len(student)
3
#Get the value of a key in dictionary
>>>student[’name’]
’Mary’
#Get all items in a dictionary
>>>student.items()
[(’gender’, ’female’), (’major’, ’CS’), (’name’, ’Mary’),
(’id’, ’8776’)]

Eg2:

#Get all keys in a dictionary


>>>student.keys()
[’gender’, ’major’, ’name’, ’id’]
#Get all values in a dictionary
>>>student.values()
[’female’, ’CS’, ’Mary’, ’8776’]
#Add new key-value pair
65
>>>student[’gender’]=’female’
>>>student
{’gende
r’: ’female’, ’major’: ’CS’, ’name’: ’Mary’, ’id’: ’8776’}
#A value in a dictionary can be another dictionary
>>>student1={’name’:’David’,’id’:’9876’,’major’:’ECE’}
>>>students={’1’: student,’2’:student1}
>>>students
{’1’:
{’gende
r’: ’female’, ’major’: ’CS’, ’name’: ’Mary’, ’id’: ’8776’}, ’2’:
{’ major’: ’ECE’, ’name’: ’David’, ’id’: ’9876’}}

Type conversion
Using Python, we can easily convert data into different types. There are
different functions for Type Conversion. We can convert string type objects
to numeric values, perform conversion between different container types
etc.

Eg1:

#Convert to string
>>>a=10000
>>>str(a)
’10000’
#Convert to int
>>>b="2013"
>>>int(b)
2013
#Convert to float
>>>float(b)
2013.0

Eg2:

>>>long(b)
2013L
#Convert to list
>>>s="aeiou"
>>>list(s)
66
[’a’, ’e’, ’i’, ’o’, ’u’]
#Convert to set
>>>x=[’mango’,’apple’,’banana’,’mango’,’banana’]
>>>set(x)
set([’mango’, ’apple’, ’banana’])

Control Flow – if statement


The if statement in Python is similar to the if statement in other languages.

Eg1:
>>>a = 25**5
>>>if a>10000:
print "More"
else:
print "Less"
More

Eg2:
>>>if a>10000:
if a<1000000:
print "Between 10k and 100k"
else:
print "More than 100k"
elif a==10000:
print "Equal to 10k"
else:
print "Less than 10k"
More than 100k

Control Flow – for statement


• The for statement in Python iterates over items of any sequence (list,
string, etc.) in the order in which they appear in the sequence.
• This behavior is different from the for statement in other languages such
as C in which an initialization, incrementing and stopping criteria are
provided.

67
Eg1:
#Looping over characters in a string
helloString = "Hello World"
for c in helloString:
print c

Eg2:
#Looping over items in a list
fruits=[’apple’,’orange’,’banana’,’mango’]
i=0
for item in fruits:
print "Fruit-%d: %s" % (i,item)
i=i+1

Eg3:
#Looping over keys in a dictionary
student
=
’nam
e’:
’Mar
y’, ’id’: ’8776’,’gender’: ’female’, ’major’: ’CS’
for key in student:
print "%s: %s" % (key,student[key]

Control Flow – while statement


• The while statement in Python executes the statements within the while
loop as long as the while condition is true.
Eg1:
Prints even numbers upto 100
>>> i = 0
>>> while i<=100:
if i%2 == 0:
print i
i = i+1
Control Flow – range statement
The range statement in Python generates a list of numbers in arithmetic
progression.

68
Eg1:
#Generate a list of numbers from 0 – 9
>>>range (10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Eg2:
#Generate a list of numbers from 10 - 100 with increments of 10
>>>range(10,110,10)
[10, 20, 30, 40, 50, 60, 70, 80, 90,100]

Control Flow – break/continue statements


The break and continue statements in Python are similar to the statements in
C.
Break
• Break statement breaks out of the for/while loop
Continue
• Continue statement continues with the next iteration.
Eg1:
Break statement example
>>>y=1
>>>for x in range(4,256,4):
y=y*x
if y > 512:
break
print y
43
2
384
Eg2:

#Continue statement example


>>>fruits=[’apple’,’orange’,’banana’,’mango’]
>>>for item in fruits:
if item == "banana":
continue
else:
print item
apple
orange
mango
69
Control Flow – pass statement
• The pass statement in Python is a null operation.
• The pass statement is used when a statement is required syntactically but
you do not want any command or code to execute.
Eg1:
>fruits=[’apple’,’orange’,’banana’,’mango’]
>for item in fruits:
if item == "banana":
pass
else:
print item
apple
orange
mango
Functions
• A function is a block of code that takes information in (in the form of
parameters), does some computation, and returns a new piece of
information based on the parameter information.
• A function in Python is a block of code that begins with the keyword def
followed by the function name and parentheses. The function parameters
are enclosed within the parenthesis.
• The code block within a function begins after a colon that comes after the
parenthesis enclosing the parameters.
• The first statement of the function body can optionally be a
documentation string or docstring.

Eg1:
students = { '1': {'name': 'Bob', 'grade': 2.5},
'2': {'name': 'Mary', 'grade': 3.5},
'3': {'name': 'David', 'grade': 4.2},
'4': {'name': 'John', 'grade': 4.1},
'5': {'name': 'Alex', 'grade': 3.8}}
def averageGrade(students):
“This function computes the average grade”
sum = 0.0
for key in students:
sum = sum + students[key]['grade']
average = sum/len(students)
return average
70
avg = averageGrade(students)
print "The average garde is: %0.2f" % (avg)

Functions - Default Arguments


• Functions can have default values of the parameters.
• If a function with default values is called with fewer parameters or
without any parameter, the default values of the parameters are used.

Eg1:

>>>def displayFruits(fruits=[’apple’,’orange’]):
print "There are %d fruits in the list" % (len(fruits))
for item in fruits:
print item
#Using default arguments
>>>displayFruits()
apple
orange
>>>fruits = [’banana’, ’pear’, ’mango’]
>>>displayFruits(fruits)
banana
pear
mango

Functions - Passing by Reference


• All parameters in the Python functions are passed by reference.
• If a parameter is changed within a function the change also reflected back
in the calling function.

Eg1:

>>>def displayFruits(fruits):
print "There are %d fruits in the list" % (len(fruits))
for item in fruits:
print item
print "Adding one more fruit"
fruits.append('mango')
>>>fruits = ['banana', 'pear', 'apple']
>>>displayFruits(fruits)
71
There are 3 fruits in the list
banana
pear
apple
#Adding one more fruit
>>>print "There are %d fruits in the list" % (len(fruits))
There are 4 fruits in the list
Functions - Keyword Arguments
• Functions can also be called using keyword arguments that identifies the
arguments by the parameter name when the function is called.

Eg1:
>>>def
printStudentRecords(name,age=20,major=’CS’):
print "Name: " + name
print "Age: " + str(age)
print "Major: " + major
#This will give error as name is required argument
>>>printStudentRecords()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: printStudentRecords() takes at least 1
argument (0 given)

Eg2:
#Correct use
>>>printStudentRecords(name=’Alex’)
Name: Alex
Age: 20
Major: CS
>>>printStudentRecords(name=’Bob’,age=22,major=’EC
E’)
Name: Bob
Age: 22
Major: ECE
>>>printStudentRecords(name=’Alan’,major=’ECE’)
Name: Alan
Age: 20
Major: ECE

72
Eg3:
name is a formal argument.
#**kwargs is a keyword argument that receives all
arguments except the formal argument as a
dictionary.
>>>def student(name, **kwargs):
print "Student Name: " + name
for key in kwargs:
print key + ’: ’ + kwargs[key]
>>>student(name=’Bob’, age=’20’, major = ’CS’)
Student Name: Bob
age: 20
major: CS

Functions - Variable Length Arguments


• Python functions can have variable length arguments. The variable length
arguments are passed to as a tuple to the function with an argument
prefixed with asterix (*).

Eg1:

>>>def student(name, *varargs):


print "Student Name: " + name
for item in varargs:
print item
>>>student(’Nav’)
Student Name: Nav
>>>student(’Amy’, ’Age: 24’)
Student Name: Amy
Age: 24
>>>student(’Bob’, ’Age: 20’, ’Major: CS’)
Student Name: Bob
Age: 20
Major: CS

Modules
• Python allows organizing the program code into different modules which
improves the code readability and management.
• A module is a Python file that defines some functionality in the form of
functions or classes.
73
• Modules can be imported using the import keyword.
• Modules to be imported must be present in the search path.

Eg1:

#Using student module


>>>import student
>>>students = '1': 'name': 'Bob', 'grade': 2.5,
'2': 'name': 'Mary', 'grade': 3.5,
'3': 'name': 'David', 'grade': 4.2,
'4': 'name': 'John', 'grade': 4.1,
'5': 'name': 'Alex', 'grade': 3.8
>>>student.printRecords(students)
There are 5 students
Student-1:
Name: Bob
Grade: 2.5
Student-2:
Name: David
Grade: 4.2
Student-3:
Name: Mary
Grade: 3.5
Student-4:
Name: Alex
Grade: 3.8
Student-5:
Name: John
Grade: 4.1
>>>avg = student. averageGrade(students)
>>>print "The average garde is: %0.2f" % (avg)
3.62
Packages
• Python package is hierarchical file structure that consists of modules and
subpackages.
• Packages allow better organization of modules related to a single
application environment.

74
Eg1:
# skimage package listing
skimage/ Top level package
__init__.py Treat directory as a package
color/ color color subpackage
__init__.py
colorconv.py
colorlabel.py
rgb_colors.py
draw/ draw draw subpackage
__init__.py
draw.py
setup.py
exposure/ exposure subpackage
__init__.py
_adapthist.py
exposure.py
feature/ feature subpackage
__init__.py
_brief.py
_daisy.py

File Handling
• Python allows reading and writing to files using the file object.
• The open(filename, mode) function is used to get a file object.
• The mode can be read (r), write (w), append (a), read and write (r+ or
w+), read-binary (rb), write-binary (wb), etc.
• After the file contents have been read the close function is called which
closes the file object.

Eg1:
# Example of reading an entire file
>>>fp = open('file.txt','r')
>>>content = fp.read()
>>>print content
This is a test file.
>>>fp.close()

75
Eg2:
# Example of reading line by line
>>>fp = open('file1.txt','r')
>>>print "Line-1: " + fp.readline()
Line-1: Python supports more than one programming paradigms.
>>>print "Line-2: " + fp.readline()
Line-2: Python is an interpreted language.
>>>fp.close()
:
Eg3
# Example of writing to a file
>>>fo = open('file1.txt','w')
>>>content='This is an example of writing to a file in
Python.'
>>>fo.write(content)
>>>fo.close()

Date/Time Operations
• Python provides several functions for date and time access and
conversions.
• The datetime module allows manipulating date and time in several ways.
• The time module in Python provides various time-related functions.

Eg1:

# Examples of manipulating with date


>>>from datetime import date
>>>now = date.today()
>>>print "Date: " + now.strftime("%m-%d-%y")
Date: 07-24-13
>>>print "Day of Week: " + now.strftime("%A")
Day of Week: Wednesday
>>>print "Month: " + now.strftime("%B")
Month: July
>>>then = date(2013, 6, 7)
>>>timediff = now - then
>>>timediff.days
47

76
Eg2:

# Examples of manipulating with time


>>>import time
>>>nowtime = time.time()
>>>time.localtime(nowtime)
time.struct_time(tm_year=2013, tm_mon=7, tm_mday=24, tm_ec=51,
tm_wday=2, tm_yday=205,
tm_isdst=0)
>>>time.asctime(time.localtime(nowtime))
'Wed Jul 24 16:14:51 2013'
>>>time.strftime("The date is %d-%m-%y. Today is a %A. It is %H hours,
%M minutes and %S seconds now.")
'The date is 24-07-13. Today is a Wednesday. It is 16 hours, 15 minutes and
14 seconds now.'

Classes
• Python is an Object-Oriented Programming (OOP) language. Python
provides all the standard features of Object Oriented Programming such as
classes, class variables, class methods, inheritance, function overloading,
and operator overloading.
Class
• A class is simply a representation of a type of object and user-defined
prototype for an object that is composed of three things: a name, attributes,
and operations/methods.
Instance/Object
• Object is an instance of the data structure defined by a class.
Inheritance
• Inheritance is the process of forming a new class from an existing class or
base class.
Function overloading
• Function overloading is a form of polymorphism that allows a function to
have different meanings, depending on its context.
Operator overloading
• Operator overloading is a form of polymorphism that allows assignment
of more than one function to a particular operator.

77
Function overriding
• Function overriding allows a child class to provide a specific
implementation of a function that is already provided by the base class.
Child class implementation of the overridden function has the same name,
parameters and return type as the function in the base class.

Class Example
The variable studentCount is a class variable that is shared by all instances
of the class
Student and is accessed by Student.studentCount.
• The variables name, id and grades are instance variables which are
specific to each instance of the class.
• There is a special method by the name __init__() which is the class
constructor.
• The class constructor initializes a new instance when it is created. The
function __del__() is the class destructor.

Eg1:
# Examples of a class
class Student:
studentCount = 0
def __init__(self, name, id):
print "Constructor called"
self.name = name
self.id = id
Student.studentCount = Student.studentCount + 1
self.grades={}
def __del__(self):
print "Destructor called"
def getStudentCount(self):
return Student.studentCount
def addGrade(self,key,value):
self.grades[key]=value
def getGrade(self,key):
return self.grades[key]
def printGrades(self):
for key in self.grades:
print key + ": " + self.grades[key]
>>>s = Student(’Steve’,’98928’)
Constructor called
78
>>>s.addGrade(’Math’,’90’)
>>>s.addGrade(’Physics’,’85’)
>>>s.printGrades()
Physics: 85
Math: 90
>>>mathgrade = s.getGrade(’Math’)
>>>print mathgrade
90
>>>count = s.getStudentCount()
>>>print count
1
>>>del s
Destructor called

Class Inheritance
• In this example Shape is the base class and Circle is the derived class. The
class Circle inherits the attributes of the Shape class.
• The child class Circle overrides the methods and attributes of the base
class (eg. draw() function defined in the base class Shape is overridden in
child class Circle).

Eg1:
# Examples of class inheritance
class Shape:
def __init__(self):
print "Base class constructor"
self.color = ’Green’
self.lineWeight = 10.0
def draw(self):
print "Draw - to be implemented"
def setColor(self, c):
self.color = c
def getColor(self):
return self.color
def setLineWeight(self,lwt):
self.lineWeight = lwt
def getLineWeight(self):
return self.lineWeight

79
class Circle(Shape):
def __init__(self, c,r):
print "Child class constructor"
self.center = c
self.radius = r
self.color = ’Green’
self.lineWeight = 10.0
self.__label = ’Hidden circle label’
def setCenter(self,c):
self.center = c
def getCenter(self):
return self.center
def setRadius(self,r):
self.radius = r
def getRadius(self):
return self.radius
def draw(self):
print "Draw Circle (overridden function)"

class Point:
def __init__(self, x, y):
self.xCoordinate = x
self.yCoordinate = y
def setXCoordinate(self,x):
self.xCoordinate = x
def getXCoordinate(self):
return self.xCoordinate
def setYCoordinate(self,y):
self.yCoordinate = y
def getYCoordinate(self):
return self.yCoordinate
>>>p = Point(2,4)
>>>circ = Circle(p,7)
Child class constructor
>>>circ.getColor()
’Green’
>>>circ.setColor(’Red’)
>>>circ.getColor()
’Red’
>>>circ.getLineWeight()
80
10.0
>>>circ.getCenter().getXCoordinate()
2 >>>circ.getCenter().getYCoordinate()
4 >>>circ.draw()
Draw Circle (overridden function)
>>>circ.radius
7

PART-A (2-Marks Questions)


1. Define- Hadoop Scheduler.

2. What are the key components of YARN?

3. What are the Design Considerations for Cloud Applications?

4. What are the ACID properties of relational databases?

5. What are the main characteristics of Python?

6. What are the advantages of Python?

7. List the data types of Python?

8. Define list in Python programming.

9. Define Module.

10. Define class.

PART-B (10-Mark Questions)

1. Explain the architecture of MapReduce in Hadoop?

2. Explain the dataflow and control flow of MapReduce

3. Define Control flow statement with an example.

4. Write an algorithm to accept two numbers, compute the sum and print
the result. Evaluate.

81
5.Explain Reference Architecture for Cloud Applications.

6. Explain Data Storage Approaches.

7. Explain Python, Python data Types with an example.

8.Explain file handling in Python.

9.Explain Date/Time Operations.

10.Explain Installing Python

82
UNIT-III

Python for Cloud

Python For Amazon Web Services

Amazon EC2 – Python Example


• Boto is a Python package that provides interfaces to Amazon Web
Services (AWS)
• In this example, a connection to EC2 service is first established by calling
boto.ec2.connect_to_region.
• The EC2 region, AWS access key and AWS secret key are passed to this
function. After connecting to EC2 , a new instance is launched using
theconn.run_instances function.
• The AMI-ID, instance type, EC2 key handle and security group are passed
to this function.

Eg:-

#Python program for launching an EC2 instance


import boto.ec2
from time import sleep
ACCESS_KEY="<enter access key>"
SECRET_KEY="<enter secret key>"
REGION="us-east-1"
AMI_ID = "ami-d0f89fb9"
EC2_KEY_HANDLE = "<enter key handle>"
INSTANCE_TYPE="t1.micro"
SECGROUP_HANDLE="default"
conn = boto.ec2.connect_to_region(REGION,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
reservation = conn.run_instances(image_id=AMI_ID,
key_name=EC2_KEY_HANDLE,
instance_type=INSTANCE_TYPE,
security_groups = [ SECGROUP_HANDLE, ] )

83
Amazon AutoScaling – Python Example
• AutoScaling Service
• A connection to AutoScaling service is first established by calling
boto.ec2.autoscale.connect_to_region function.
• Launch Configuration
• After connecting to AutoScaling service, a new launch configuration is
created by calling conn.create_launch_conf iguration. Launch configuration
contains instructions on how to launch new instances including the AMI-
ID, instance type, security groups, etc.
• AutoScaling Group
• After creating a launch configuration, it is then associated with a new
AutoScaling group. AutoScaling group is created by calling
conn.create_auto_scaling_group. The settings for AutoScaling group such
as the maximum and minimum number of instances in the group, the launch
configuration, availability zones, optional load balancer to use with the
group, etc.

Eg:-

#Python program for creating an AutoScaling group (code excerpt)


import boto.ec2.autoscale
:
print "Connecting to Autoscaling Service"
conn = boto.ec2.autoscale.connect_to_region(REGION,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
print "Creating launch configuration"
lc = LaunchConfiguration(name='My-Launch-Config-2',
image_id=AMI_ID,
key_name=EC2_KEY_HANDLE,
instance_type=INSTANCE_TYPE,
security_groups = [ SECGROUP_HANDLE, ])
conn.create_launch_configuration(lc)
print "Creating auto-scaling group"
ag = AutoScalingGroup(group_name='My-Group',
availability_zones=['us-east-1b'],
launch_config=lc, min_size=1, max_size=2,
connection=conn)
conn.create_auto_scaling_group(ag)

84
• AutoScaling Policies
• After creating an AutoScaling group, the policies for scaling up
and scaling down are defined.
• In this example, a scale up policy with adjustment type
ChangeInCapacity and scaling_ad justment = 1 is defined.
• Similarly a scale down policy with adjustment type
ChangeInCapacity and scaling_ad justment = -1 is defined.

Eg :-

#Creating auto-scaling policies


scale_up_policy = ScalingPolicy(name='scale_up',
adjustment_type='ChangeInCapacity',
as_name='My-Group',
scaling_adjustment=1,
cooldown=180)
scale_down_policy = ScalingPolicy(name='scale_down',
adjustment_type='ChangeInCapacity',
as_name='My-Group',
scaling_adjustment=-1,
cooldown=180)
conn.create_scaling_policy(scale_up_policy)
conn.create_scaling_policy(scale_down_policy)

Amazon AutoScaling – Python Example


• CloudWatch Alarms
• With the scaling policies defined, the next step is to create Amazon
CloudWatch alarms that trigger these policies.
• The scale up alarm is defined using the CPUUtilization metric with the
Average statistic and threshold greater 70% for a period of 60 sec. The
scale up policy created previously is associated with this alarm. This alarm
is triggered when the average CPU utilization of the
instances in the group becomes greater than 70% for more than 60 seconds.
• The scale down alarm is defined in a similar manner with a threshold less
than 50%.

Eg:-

#Connecting to CloudWatch
cloudwatch = boto.ec2.cloudwatch.connect_to_region(REGION,
85
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
alarm_dimensions = {"AutoScalingGroupName": 'My-Group'}
#Creating scale-up alarm
scale_up_alarm = MetricAlarm(
name='scale_up_on_cpu', namespace='AWS/EC2',
metric='CPUUtilization', statistic='Average',
comparison='>', threshold='70',
period='60', evaluation_periods=2,
alarm_actions=[scale_up_policy.policy_arn],
dimensions=alarm_dimensions)
cloudwatch.create_alarm(scale_up_alarm)
#Creating scale-down alarm
scale_down_alarm = MetricAlarm(
name='scale_down_on_cpu', namespace='AWS/EC2',
metric='CPUUtilization', statistic='Average',
comparison='<', threshold='40',
period='60', evaluation_periods=2,
alarm_actions=[scale_down_policy.policy_arn],
dimensions=alarm_dimensions)
cloudwatch.create_alarm(scale_down_alarm)

Amazon S3 – Python Example


• In this example, a connection to S3 service is first established by calling
boto.connect_s3 function.
• The upload_to_s3_bucket_path function uploads the file to the S3 bucket
specified at the specified path.

# Python program for uploading a file to an S3 bucket


import boto.s3
conn = boto.connect_s3(aws_access_key_id='<enter>',
aws_secret_access_key='<enter>')
def percent_cb(complete, total):
print ('.')
def upload_to_s3_bucket_path(bucketname, path, filename):
mybucket = conn.get_bucket(bucketname)
fullkeyname=os.path.join(path,filename)
key = mybucket.new_key(fullkeyname)
key.set_contents_from_filename(filename, cb=percent_cb, num_cb=10)

86
Amazon RDS – Python Example

• In this example, a connection to RDS service is first established by calling


boto.rds.connect_to_region function.
• The RDS region, AWS access key and AWS secret key are passed to this
function.
• After connecting to RDS service, the conn.create_dbinstance function is
called to launch a
new RDS instance.
• The input parameters to this function include the instance ID, database
size, instance type, database username, database password, database port,
database engine (e.g. MySQL5.1), database name, security groups, etc.

Eg:-

#Python program for launching an RDS instance (excerpt)


import boto.rds
ACCESS_KEY="<enter>"
SECRET_KEY="<enter>"
REGION="us-east-1"
INSTANCE_TYPE="db.t1.micro"
ID = "MySQL-db-instance-3"
USERNAME = 'root'
PASSWORD = 'password'
DB_PORT = 3306
DB_SIZE = 5
DB_ENGINE = 'MySQL5.1'
DB_NAME = 'mytestdb'
SECGROUP_HANDLE="default"
#Connecting to RDS
conn = boto.rds.connect_to_region(REGION,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
#Creating an RDS instance
db = conn.create_dbinstance(ID, DB_SIZE, INSTANCE_TYPE,
USERNAME, PASSWORD, port=DB_PORT, engine=DB_ENGINE,
db_name=DB_NAME, security_groups = [ SECGROUP_HANDLE, ] )

87
Amazon DynamoDB – Python Example
• In this example, a connection to DynamoDB service is first established by
calling boto.dynamodb.connect_to_region.
• After connecting to DynamoDB service, a schema for the new table is
created by calling
conn.create_schema.
• The schema includes the hash key and range key names and types.
• A DynamoDB table is then created by calling conn.create_table function
with the table schema, read units and write units as input parameters.

# Python program for creating a DynamoDB table (excerpt)


import boto.dynamodb
ACCESS_KEY="<enter>"
SECRET_KEY="<enter>"
REGION="us-east-1"
#Connecting to DynamoDB
conn = boto.dynamodb.connect_to_region(REGION,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
table_schema = conn.create_schema(
hash_key_name='msgid',
hash_key_proto_value=str,
range_key_name='date',
range_key_proto_value=str
)
#Creating table with schema
table = conn.create_table(
name='my-test-table',
schema=table_schema,
read_units=1,
write_units=1
)

Google Compute Engine – Python Example

• This example uses the OAuth 2.0 scope


(https://www.googleapis.com/auth/compute) and credentials in the
credentials file to request a refresh and access token, which is then stored
in the oauth2.dat file.

88
• After completing the OAuth authorization, an instance of the Google
Compute Engine service is obtained.
• To launch a new instance the instances().insert method of the Google
Compute Engine API is used.
• The request body to this method contains the properties such as instance
name, machine type, zone, network interfaces, etc., specified in JSON
format.

# Python program for launching a GCE instance (excerpt)


API_VERSION = 'v1beta15'
GCE_SCOPE = 'https://www.googleapis.com/auth/compute'
GCE_URL = 'https://www.googleapis.com/compute/%s/projects/' %
(API_VERSION)
DEFAULT_ZONE = 'us-central1-b'
CLIENT_SECRETS = 'client_secrets.json'
OAUTH2_STORAGE = 'oauth2.dat'
def main():
#OAuth 2.0 authorization.
flow = flow_from_clientsecrets(CLIENT_SECRETS,
scope=GCE_SCOPE)
storage = Storage(OAUTH2_STORAGE)
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run(flow, storage)
http = httplib2.Http()
auth_http = credentials.authorize(http)
gce_service = build('compute', API_VERSION)
# Create the instance
request = gce_service.instances().insert(project=PROJECT_ID,
body=instance,
zone=DEFAULT_ZONE)
response = request.execute(auth_http)

Google Cloud Storage – Python Example


• This example uses the Oauth 2.0 scope
(https://www.googleapis.com/auth/devstora ge.full_control) and credentials
in the credentials file to request a refresh and access token, which is then
stored in the oauth2.dat file.
• After completing the OAuth authorization, an instance of the Google
Cloud Storage service is obtained.
89
• To upload a file the objects().insert method of the Google Cloud Storage
API is used.
• The request to this method contains the bucket name, file name and media
body containing the MediaIoBaseUpload object created from the file
contents.

Eg:-

# Python program for uploading a file to GCS (excerpt)


def main():
#OAuth 2.0 authorization.
flow = flow_from_clientsecrets(CLIENT_SECRETS, scope=GS_SCOPE)
storage = Storage(OAUTH2_STORAGE)
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run(flow, storage)
http = httplib2.Http()
auth_http = credentials.authorize(http)
gs_service = build('storage', API_VERSION, http=auth_http)
# Upload file
fp= open(FILENAME,'r')
fh = io.BytesIO(fp.read())
media = MediaIoBaseUpload(fh, FILE_TYPE)
request = gs_service.objects().insert(bucket=BUCKET, name=FILENAME,
media_body=media)
response = request.execute()

Python For Google Cloud Platform

• This example uses the OAuth 2.0 scope


(https://www.googleapis.com/auth/compute) and credentials in the
credentials file to request a refresh and access token, which is then stored
in the oauth2.dat file.
• After completing the OAuth authorization, an instance of the Google
Cloud SQL service is obtained.
• To launch a new instance the instances().insert method of the Google
Cloud SQL API is used.

90
• The request body of this method contains properties such as instance,
project, tier, Pricing Plan and replication Type.

Eg:-

# Python program for launching a Google Cloud SQL instance


(excerpt)
def main():
#OAuth 2.0 authorization.
flow = flow_from_clientsecrets(CLIENT_SECRETS, scope=GS_SCOPE)
storage = Storage(OAUTH2_STORAGE)
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = run(flow, storage)
http = httplib2.Http()
auth_http = credentials.authorize(http)
gcs_service = build('sqladmin', API_VERSION, http=auth_http)
# Define request body
instance={"instance": "mydb",
"project": "bahgacloud",
"settings":{
"tier": "D0",
"pricingPlan": "PER_USE",
"replicationType": "SYNCHRONOUS"}}
# Create the instance
request = gcs_service.instances().insert(project=PROJECT_ID,
body=instance)
response = request.execute()

Python For Windows Azure


• To create a virtual machine, a cloud service is first created.
• Virtual machine is created using the create_virtual_machine_deployment
method of the Azure service management API.

Eg:-

# Python program for launching a Azure VM instance (excerpt)


from azure import *
sms = ServiceManagementService(subscription_id, certificate_path)
name = ‘<enter>'
91
location = 'West US'
# Name of an os image as returned by list_os_images
image_name = '<enter>'
# Destination storage account container/blob where the VM disk will be
created
media_link = <enter>'
# Linux VM configuration
linux_config = LinuxConfigurationSet('bahga', 'arshdeepbahga',
'Arsh~2483', True)
os_hd = OSVirtualHardDisk(image_name, media_link)
#Create instance
sms.create_virtual_machine_deployment(service_name=name,
deployment_name=name, deployment_slot='production',
label=name, role_name=name, system_config=linux_config,
os_virtual_hard_disk=os_hd, role_size='Small')

Azure Storage – Python Example


• Azure Blobs service allows you to store large amounts of unstructured
text or binary data such as video, audio and images.
• This shows an example of using the Blob service for storing a file.
• Blobs are organized in containers. The create_container method is used to
create a new container.
• After creating a container the blob is uploaded using the put_blob method.
• Blobs can be listed using the list_blobs method.
• To download a blob, the get_blob method is used.

Eg:-

# Python example of using Azure Blob Service (excerpt)


from azure.storage import *
blob_service = BlobService(account_name=‘enter',
account_key=‘<enter>’)
#Create Container
blob_service.create_container('mycontainer')
#Upload Blob
filename='images.txt'
myblob = open(filename, 'r').read()
blob_service.put_blob('mycontainer', filename, myblob,
x_ms_blob_type='BlockBlob')
#List Blobs
92
blobs = blob_service.list_blobs('mycontainer')
for blob in blobs:
print(blob.name)
print(blob.url)
#Download Blob
output_filename='output.txt'
blob = blob_service.get_blob('mycontainer', 'myblob')
with open(output_filename, 'w') as f:
f.write(blob)

Python for Map Reduce

• The example shows inverted index mapper program.


• The map function reads the data from the standard input (stdin) and splits
the tab-limited
data into document-ID and contents of the document.
• The map function emits key-value pairs where key is each word in the
document and value is the document-ID.

Eg:-

#Inverted Index Mapper in Python


#!/usr/bin/env python
import sys
for line in sys.stdin:
doc_id, content = line.split(’’)
words = content.split()
for word in words:
print ’%s%s’ % (word, doc_id)

• The example shows inverted index reducer program.


• The key-value pairs emitted by the map phase are shuffled to the reducers
and grouped by the key.
• The reducer reads the key-value pairs grouped by the same key from the
standard input (stdin) and creates a list of document-IDs in which the
word occurs.
• The output of reducer contains key value pairs where key is a unique word
and value is the list of document-IDs in which the word occurs.

93
Eg:-

#Inverted Index Reducer in Python


#!/usr/bin/env python
import sys
current_word = None
current_docids = []
word = None
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# parse the input we got from mapper.py
word, doc_id = line.split(’’)
if current_word == word:
current_docids.append(doc_id)
else:
if current_word:
print ’%s%s’ % (current_word, current_docids)
current_docids = []
current_docids.append(doc_id)
current_word = word

Python Packages of Interest


• JSON
• JavaScript Object Notation (JSON) is an easy to read and write data-
interchange format. JSON is used as an alternative to XML and is is easy
for machines to parse and generate. JSON is built on two structures - a
collection of name-value pairs (e.g. a Python dictionary) and ordered lists
of values (e.g.. a Python list).

• XML
• XML (Extensible Markup Language) is a data format for structured
document interchange. The Python minidom library provides a minimal
implementation of the Document Object Model interface and has an API
similar to that in other languages.
• HTTPLib & URLLib
• HTTPLib2 and URLLib2 are Python libraries used in network/internet
programming

94
• SMTPLib
• Simple Mail Transfer Protocol (SMTP) is a protocol which handles
sending email and routing e-mail between mail servers. The
Python smtplib module provides an SMTP client session object that can be
used to send email.
• NumPy
• NumPy is a package for scientific computing in Python. NumPy provides
support for large multi-dimensional arrays and matrices
• Scikit-learn
• Scikit-learn is an open source machine learning library for Python that
provides implementations of various machine learning algorithms for
classification, clustering, regression and dimension reduction problems.

Python Web Application Framework –Django

• Django is an open source web application framework for developing web


applications in Python.
• A web application framework in general is a collection of solutions,
packages and best practices that allows development of web applications
and dynamic websites.
• Django is based on the Model-Template-View architecture and provides a
separation of the data model from the business rules and the user interface.
• Django provides a unified API to a database backend.
• Thus web applications built with Django can work with different
databases without requiring any code changes.
• With this fiexibility in web application design combined with the
powerful capabilities of the Python language and the Python ecosystem,
Django is best suited for cloud applications.
• Django consists of an object-relational mapper, a web templating system
and a regular-expression based URL dispatcher.

Django Architecture

• Django is Model-Template-View (MTV) framework.


• Model
• The model acts as a definition of some stored data and handles the
interactions with the database. In a web application, the data can be stored
in a relational database, non-relational database, an XML file, etc. A

95
Django model is a Python class that outlines the variables and methods for
a particular type of data.
• Template
• In a typical Django web application, the template is simply an HTML
page with a few extra placeholders. Django’s template language can be
used to create various forms of text files (XML, email, CSS, Javascript,
CSV, etc.)

• View
• The view ties the model to the template. The view is where you write the
code that actually generates the web pages. View determines what data is to
be displayed, retrieves the data from the database and passes the data to the
template.

Designing a Restful web API

To design a REST API


To apply REST principles in design process.
Steps in designing REST Services
Identify Object Model
Create Model URIs
Determine Representations
Assign HTTP Methods
More Actions

Identify Object Model

The very first step in designing a REST API based application is –


identifying the objects which will be presented as resources.

For a network based application, object modeling is pretty much simpler.


There can be many things such as devices, managed entities, routers,
modems etc. For simplicity sake, we will consider only two resources i.e.

 Devices

 Configurations
 Here configuration is sub-resource of a device. A device can have
many configuration options.
96
Note that both objects/resources in our above model will have a unique
identifier, which is the integer id property.

Create Model URIs

Now when object model is ready, it’s time to decide the resource URIs. At
this step, while designing the resource URIs – focus on the relationship
between resources and its sub-resources. These resource URIs are
endpoints for RESTful services.

In our application, a device is a top-level resource. And configuration is


sub-resource under device. Let’s write down the URIs.
/devices
/devices/{id}

/configurations
/configurations/{id}

/devices/{id}/configurations
/devices/{id}/configurations/{id}
Notice that these URIs do not use any verb or operation. It’s very
important to not include any verb in URIs. URIs should all be nouns only.

Determine Representations

Now when resource URIs have been decided, let’s work on their
representations. Mostly representations are defined in either XML or
JSON format. We will see XML examples as its more expressive on how
data is composed.

Collection of Device Resource

When returning a collection resource, include only most important


information about resource. This will keep the size of payload small, and so
will improve the performance of REST APIs.

<devices size="2">
<link rel="self" href="/devices"/>

97
<device id="12345">
<link rel="self" href="/devices/12345"/>
<deviceFamily>apple-es</deviceFamily>
<OSVersion>10.3R2.11</OSVersion>
<platform>SRX100B</platform>
<serialNumber>32423457</serialNumber>
<connectionStatus>up</connectionStatus>
<ipAddr>192.168.21.9</ipAddr>
<name>apple-srx_200</name>
<status>active</status>
</device>
<device id="556677">
<link rel="self" href="/devices/556677"/>
<deviceFamily>apple-es</deviceFamily>
<OSVersion>10.3R2.11</OSVersion>
<platform>SRX100B</platform>
<serialNumber>6453534</serialNumber>
<connectionStatus>up</connectionStatus>
<ipAddr>192.168.20.23</ipAddr>
<name>apple-srx_200</name>
<status>active</status>
</device>
</devices>

98
Single Device Resource

Opposite to collection URI, here include complete information of a device


in this URI. Here, also include a list of links for sub-resources and other
supported operations. This will make your REST API HATEOAS driven.

<device id="12345">
<link rel="self" href="/devices/12345"/>

<id>12345</id>
<deviceFamily>apple-es</deviceFamily>
<OSVersion>10.0R2.10</OSVersion>
<platform>SRX100-LM</platform>
<serialNumber>32423457</serialNumber>
<name>apple-srx_100_lehar</name>
<hostName>apple-srx_100_lehar</hostName>
<ipAddr>192.168.21.9</ipAddr>
<status>active</status>

<configurations size="2">
<link rel="self" href="/configurations" />

<configuration id="42342">
<link rel="self" href="/configurations/42342" />
</configuration>

<configuration id="675675">
<link rel="self" href="/configurations/675675" />
</configuration>
</configurations>

<method href="/devices/12345/exec-rpc" rel="rpc"/>


<method href="/devices/12345/synch-config"rel="synch device
configuration"/>
</device>

Configuration Resource Collection

Similar to device collection representation, create configuration collection


representation with only minimal information.
99
<configurations size="20">
<link rel="self" href="/configurations" />

<configuration id="42342">
<link rel="self" href="/configurations/42342" />
</configuration>

<configuration id="675675">
<link rel="self" href="/configurations/675675" />
</configuration>
...
...
</configurations>
Please note that configurations collection representation inside device is
similar to top-level configurations URI. Only difference is
that configurations for a device are only two, so only two configuration
items are listed as subresource under device.

Single Configuration Resource

Now, single configuration resource representation must have all possible


information about this resource – including relevant links.

<configuration id="42342">
<link rel="self" href="/configurations/42342" />
<content><![CDATA[...]]></content>
<status>active</status>
<link rel="raw configuration content" href="/configurations/42342/raw"
/>
</configuration>

Configuration Resource Collection Under Single Device

This resource collection of configurations will be a subset of primary


collection of configurations, and will be specific a device only. As it is the
subset of primary collection, DO NOT create a different representation
data fields than primary collection. Use same presentation fields as primary
collection.

<configurations size="2">
100
<link rel="self" href="/devices/12345/configurations" />

<configuration id="53324">
<link rel="self" href="/devices/12345/configurations/53324" />
<link rel="detail" href="/configurations/53324" />
</configuration>

<configuration id="333443">
<link rel="self" href="/devices/12345/configurations/333443" />
<link rel="detail" href="/configurations/333443" />
</configuration>
</configurations>
Notice that this subresource collection has two links. One for its direct
representation inside sub-collection
i.e. /devices/12345/configurations/333443 and other pointing to its location
in primary collection i.e. /configurations/333443.
Having two links is important as you can provide access to a device specific
configuration in more unique manner, and you will have ability to mask
some fields (if design require it) which shall not be visible in a secondary
collection.

Single Configuration Resource Under Single Device

This representation should have either exactly similar representation as of


Configuration representation from primary collection; OR you may mask
few fields.

This subresource representation will also have an additional link to its


primary presentation.

<configuration id="11223344">
<link rel="self" href="/devices/12345/configurations/11223344" />
<link rel="detail" href="/configurations/11223344" />
<content><![CDATA[...]]></content>
<status>active</status>
<link rel="raw configuration content" href="/configurations/11223344/raw" />

101
</configuration>

Now, before moving forward to next section, let’s note down few
observations so you don’t miss them.

 Resource URIs are all nouns.

 URIs are usually in two forms – collection of resources and singular


resource.

 Collection may be in two forms primary collection and secondary


collection. Secondary collection is sub-collection from a primary
collection only.

 Each resource/collection contain at least one link i.e. to itself.

 Collections contain only most important information about resources.

 To get complete information about a resource, you need to access


through its specific resource URI only.
 Representations can have extra links (i.e. methods in single device).
Here method represent a POST method. You can have more attributes
or form links in altogether new way also.

 We have not talked about operations on these resources yet.

Assign HTTP Methods

So our resource URIs and their representation are fixed now. Let’s decide
the possible operations in application and map these operations on resource
URIs. A user of network application can perform browse, create, update or
delete operations. So let’s map them.

Browse all devices or configurations [Primary Collection]

HTTP GET /devices


HTTP GET /configurations

102
If the collection size is large, you can apply paging and filtering as well.
e.g. Below requests will fetch first 20 records from collection.

HTTP GET /devices?startIndex=0&size=20


HTTP GET /configurations?startIndex=0&size=20

Browse all devices or configurations [Secondary Collection]

HTTP GET /devices/{id}/configurations

It will be mostly a small size collection – so no need to enable filtering or


soring here.

Browse single device or configuration [Primary Collection]

To get the complete detail of a device or configuration, use GET operation


on singular resource URIs.
HTTP GET /devices/{id}
HTTP GET /configurations/{id}

Browse single device or configuration [Secondary Collection]

HTTP GET /devices/{id}/configurations/{id}

Subresource representation will be either same as or subset of primary


presentation.

Create a device or configuration

Create is not idempotent operation, and in HTTP protocol – POST is also


not idempotent. So use POST.
HTTP POST /devices
HTTP POST /configurations
Please note that request payload will not contain any id attribute, as server
is responsible for deciding it. Response of create request will look like this:

103
HTTP/1.1 201 Created
Content-Type: application/xml
Location: http://example.com/network-app/configurations/678678
<configuration id="678678">
<link rel="self" href="/configurations/678678" />
<content><![CDATA[...]]></content>
<status>active</status>
<link rel="raw configuration content" href="/configurations/678678/raw" />
</configuration>

Update a device or configuration

Update operation is an idempotent operation and HTTP PUT is also is


idempotent method. So we can use PUT method for update operations.
HTTP PUT /devices/{id}
HTTP PUT /configurations/{id}

PUT response may look like this.

HTTP/1.1 200 OK
Content-Type: application/xml
<configuration id="678678">
<link rel="self" href="/configurations/678678" />
<content><![CDATA[. updated content here .]]></content>
<status>active</status>
<link rel="raw configuration content" href="/configurations/678678/raw" />
</configuration>

104
Remove a device or configuration

Removing is always a DELETE operation.


HTTP DELETE /devices/{id}
HTTP DELETE /configurations/{id}
A successful response SHOULD be 202 (Accepted) if resource has been
queues for deletion (async operation), or 200 (OK) / 204 (No Content) if
resource has been deleted permanently (sync operation).
In case of async operation, application shall return a task id which can be
tracked for success/failure status.

Please note that you should put enough analysis in deciding the behavior
when a subresource is deleted from system. Normally, you may want
to SOFT DELETE a resource in these requests – in other words, set their
status INACTIVE. By following this approach, you will not need to find
and remove its references from other places as well.

Applying or Removing a configuration from a device


In real application, you will need to apply the configuration on device – OR
you may want to remove the configuration from device (not from primary
collection). You shall use PUT and DELETE methods in this case, because
of its idempotent nature.

//Apply Configuration on a device


HTTP PUT /devices/{id}/configurations

//Remove Configuration on a device


HTTP DELETE /devices/{id}/configurations/{id}

More Actions

So far we have designed only object model, URIs and then decided HTTP
methods or operations on them. You need to work on other aspects of the
application as well:
105
1) Logging
2) Security
3) Discovery etc.

Cloud Application Development in Python


Design methodology for IaaS service modelt Design

Component Design

• Identify the building blocks of the application and to be performed


by each block
• Group the building blocks based on the functions performed and
type of cloud resources required and identify the application
components based on the groupings
• Identify the inputs and outputs of each component
• List the interfaces that each component will expose
• Evaluate the implementation alternatives for each component
(design patterns such as MVC, etc.)

Architecture Design

• Define the interactions between the application components


• Guidelines for loosely coupled and stateless designs - use messaging
queues (for asynchronous communication), functional interfaces
(such as REST for loose coupling) and external status database (for
stateless design)

Deployment Design
• Map the application components to specific cloud resources (such as
web servers, application servers, database servers, etc.)
Design methodology for PaaS service model

• For applications that use the Platform-as-a-service (PaaS) cloud service


model, the architecture and deployment design steps are not required since
the platform takes care of the architecture and deployment.
• Component Design

106
• In the component design step, the developers have to take into
consideration the platform specific features.
• Platform Specific Software
• Different PaaS offerings such as Google App Engine, Windows Azure
Web Sites, etc., provide platform specific software development kits
(SDKs) for developing cloud applications.
• Sandbox Environments
• Applications designed for specific PaaS offerings run in sandbox
environments and are allowed to perform only those actions that do not
interfere with the performance of other applications.
• Deployment & Scaling
• The deployment and scaling is handled by the platform while the
developers focus on the application development using the platform-
specific SDKs.
• Portability
• Portability is a major constraint for PaaS based applications as it is
difficult to move the

Image Processing App – Component Design

• Functionality:
• A cloud-based Image Processing application.
• This application provides online image filtering capability.
• Users can upload image files and choose the filters to apply.
• The selected filters are applied to the image and theprocessed image can
then be downloaded.
• Component Design
• Web Tier: The web tier for the image processing app has front ends for
image submission and displaying processed images.
• Application Tier: The application tier has components for processing the
image submission requests, processing the submitted image and processing
requests for displaying the results.
• Storage Tier: The storage tier comprises of the storage for processed
images.

107
Fig: Component design for Image Processing App

Image Processing App – Architecture Design


• Architecture design step which defines the interactions between the
application components.
• This application uses the Django framework, therefore, the web tier
components map to the Django templates and the application tier
components map to the Django views.
• A cloud storage is used for the storage tier. For each component, the
corresponding code box numbers are mentioned.

Fig : Architecture design for Image Processing App

108
Image Processing App – Deployment Design
• Deployment for the app is a multi-tier architecture comprising of load
balancer, application servers and cloud storage for processed images.
• For each resource in the deployment the corresponding Amazon Web
Services (AWS) cloud service is mentioned.

Fig : Deployment design for Cloud Drive App

MapReduce App – Component Design


• Functionality:
• This application allows users to submit MapReduce jobs for data
analysis.
• This application is based on the Amazon Elastic MapReduce (EMR)
service.
• Users can upload data files to analyze and choose/upload the Map
and Reduce programs.
• The selected Map and Reduce programs along with the input data are
submitted to a queue for processing.
• Component Design
• Web Tier: The web tier for the MapReduce app has a front end for
MapReduce job submission.
• Application Tier: The application tier has components for processing
requests for uploading files, creating MapReduce jobs and enqueuing
jobs, MapReduce consumer and the component that sends email
notifications.
• Analytics Tier: The Hadoop framework is used for the analytics tier
109
and a cloud storage is used for the storage tier.
• Storage Tier: The storage tier comprises of the storage for files.

Fig : Component design for MapReduce App

MapReduce App – Architecture Design


• Architecture design step which defines the interactions between the
application components.
• This application uses the Django framework, therefore,the web tier
components map to the Django templates and the aption tier components
map to the Django views.
• For each component, the corresponding code box numbers are mentioned.
• To make the application scalable the job submission and job processing
components are separated.
• The MapReduce job requests are submitted to a queue.
• A consumer component that runs on a separate instance retrieves the
MapReduce job requests from the queue and creates the MapReduce jobs
and submits them to the Amazon EMR service.
• The user receives an email notification with the download link for the
results when the job is complete.

110
Fig : Architecture design for MapReduce App

MapReduce App – Deployment Design


• Deployment for the app is a multi-tier architecture comprising of load
balancer, application servers and a cloud storage for storing MapReduce
programs, input data and MapReduce output.
• For each resource in the deployment the corresponding Amazon Web
Services (AWS) cloud service is mentioned.

Fig : Deployment design for MapReduce App


111
Social Media Analytics App – Component Design
• Functionality:
• A cloud-based Social Media Analytics application.
• This application collects the social media feeds (Twitter tweets) on a
specified keyword in real time and analyzes the sentiments of the tweets
and provides aggregate results.
• Component Design
• Web Tier: The web tier has a front end for displaying results.
• Application Tier: The application tier has a listener component that
collects social media feeds, a consumer component that analyzes tweets and
a component for rendering the results in the dashboard.
• Database Tier: A MongoDB database is used for the database tier and a
cloud storage is used for the storage tier.
• Storage Tier: The storage tier comprises of the storage for files.

Fig : Component design for Social Media Analytics App

Social Media Analytics App – Architecture Design


• Architecture design step which defines the interactions between the
application components.
• To make the application scalable the feeds collection component
(Listener) and feeds processing component (Consumer) are separated.
• The Listener component uses the Twitter API to get feeds on a specific
keyword (or a list of keywords) and enqueues the feeds to a queue.
• The Consumer component (that runs on a separate instance) retrieves the
feeds from the queue and analyzes the feeds and stores the aggregated
results in a separate database.
• The aggregate results are displayed to the users from a Django
application.

112
Fig : Architecture design for Social Media Analytics App

Social Media Analytics App – Deployment Design


• Deployment for the app is a multi-tierarchitecture comprising of load
balancer,application servers, listener and consumerinstances, a cloud
storage for storing raw dataand a database server for storing
aggregatedresults.
• For each resource in the deployment thecorresponding Amazon Web
Services (AWS) cloud service is mentioned.

Fig : Deployment design for Social Media Analytics App

113
Social Media Analytics App – Dashboard

PART-A (2-Marks Questions)


1. Describe the steps involved in google compute engine authorization.
2. Mention the services that are provided by Window Azure Operating
System?
3. What is Google app engine?
4.Define AutoScaling Group.
6.what are the components of Django.
7.What is use RestFul API

114
8.What are the key features of Python?
9. What are the social media analytics app.?
10. what are the document storage app?

PART-B (10-Marks Questions)


1. Explain Google Cloud Platform.

2. Explain Amazon web services.

3. Explain Python packages of Interest.

4. Explain Python web Application Frame work.

5. Explain Designing a RESTful web API.

6. Explain Design Approaches in cloud.

7.Explain Document Storage in cloud.

8.Explain Social Media Analytics App.

115
UNIT-IV
Big Data Analytics

Introduction:

What is Big Data?

Big data is defined as collections of data sets whose volume, velocity in


terms of time variation, or variety is so large that it is difficult to store,
manage, process and analyze the data using traditional databases and data
processing tools.
Sources of Big Data

These data come from many sources like

o Social networking sites: Facebook, Google, LinkedIn all these sites


generates huge amount of data on a day to day basis as they have
billions of users worldwide.
o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates
huge amount of logs from which users buying trends can be traced.
o Weather Station: All the weather station and satellite gives very
huge data which are stored and manipulated to forecast weather.
o Telecom Company: Telecom giants like Airtel, Vodafone study the
user trends and accordingly publish their plans and for this they
store the data of its million users.
o Share Market: Stock exchange across the world generates huge
amount of data through its daily transaction.

Characteristics of big data:

Volume
• Though there is no fixed threshold for the volume of data to be considered
as big data, however, typically, the term big data is used for massive scale
data that is difficult to store, manage and process using traditional databases
and data processing Architectures. The volumes of data generated by
modern IT, industrial, healthcare and systems is growing exponentially
116
driven by the lowering costs of data storage and processing architectures
and the need to extract valuable insights from the data improve business
processes, efficiency and service to consumers.
Velocity
• Velocity is another important characteristic of big data and the primary
reason for exponential growth of data. Velocity of data refers to how fast
the data is generated. Modern IT, industrial and other systems are
generating data at increasingly higher speeds generating big data.
Variety
• Variety refers to the forms of the data. Big data comes in different forms
such as structured or unstructured data, including text data, image, audio,
video and sensor data.

Clustering Big Data:


• Clustering is the process of grouping similar data items together such that
data items that are more similar to each other (with respect to some
similarity criteria) than other data items are put in one cluster.
• Clustering big data is of much interest, and happens in applications such
as:
• Clustering social network data to find a group of similar users
• Clustering electronic health record (EHR) data to find similar patients.
• Clustering sensor data to group similar or related faults in a machine
• Clustering market research data to group similar customers
• Clustering click stream data to group similar users
• Clustering is achieved by clustering algorithms that belong to broad
category algorithms called unsupervised machine learning.
• Unsupervised machine learning algorithms find the patterns and hidden
structure in data for which no train in data is available.

k-means Clustering
• k-means is a clustering algorithm that groups data items into k clusters,
where k is user defined.
• Each cluster is defined by a centroid point.
• k-means clustering begins with a set of k centroid points which are either
randomly chosen from the dataset or chosen using some initialization
algorithm such as canopy clustering.
• The algorithm proceeds by finding the distance between each data point in
the data set and the centroid points.
• Based on the distance measure, each data point is assigned to a cluster
belonging to the closest centroid.
117
• In the next step the centroids are recomputed by taking the mean value of
all the data points in a cluster.
• This process is repeated till the centroids no longer move more than a
specified threshold.

k-means Clustering Algorithm

Start with k centroid points


while the centroids no longer move beyond a threshold or maximum
number of iterations reached:
for each point in the dataset:
for each centroid:
find the distance between the point and the centroid
assign the point to the cluster belonging to the nearest centroid
for each cluster:
recompute the centroid point by taking mean value of all points in the
cluster

Fig: Example of clustering 300 points with k-means: (a) iteration 1, (b)
iteration 2, (c) iteration 3, (d) iteration 5, (e) iteration 10, (f) iteration 100.

Clustering Documents with k-means


• Document clustering is the most commonly used application of k-means
clustering algorithm.

118
• Document clustering problem occurs in many big data applications such
as finding similar news articles, finding similar patients using electronic
health records, etc.
• Before applying k-means algorithm for document clustering, the
documents need to be vectorized. Since documents contain textual
information, the process of vectorization is required for clustering
documents.
• The process of generating document vectors involves several steps:
• A dictionary of all words used in the tokenized records is generated. Each
word in the dictionary has a dimension number assigned to it which is used
to represent the dimension the word occupies in the document vector.
• The number of occurrences or term frequency (TF) of each word is
computed.
• Inverse Document Frequency (IDF) for each word is computed.
Document Frequency (DF) for a word is the number of documents (or
records) in which the word occurs.
• Weight for each word is computed. The term weight Wi is used in the
document vector as the value for the dimension-i.
• Similarity between documents is computed using a distance measure such
as Euclidean distance measure.

k-means with MapReduce


• The data to be clustered is distributed on a distributed file system such as
HDFS and split into blocks which are replicated across different nodes in
the cluster.
• Clustering begins with an initial set of centroids. The client program
controls the clustering process.
• In the Map phase, the distances between the data samples and centroids
are calculated and each sample is assigned to the nearest centroid.
• In the Reduce phase, the centroids are recomputed using the mean of all
the points in each cluster.
• The new centroids are then fed back to the client which checks whether
convergence is reached or maximum number of iterations are completed.

119
Fig: Parallel implementation of K-means clustering with Map Reduce

DBSCAN clustering
• DBSCAN is a density clustering algorithm that works on the notions of
density reachability and density connectivity.
Density Reachability
• Is defined on the basis of Eps-neighborhood, where Eps-neighborhood
means that for every point p in a cluster C there is a point q in C so that p is
inside of the Eps-neighborhood of q and there are at least a minimum
number (MinPts) of points in an Eps-neighborhood of that point.
• A point p is called directly density-reachable from a point q if it is not
farther away than a given distance (Eps) and if it is surrounded by at least a
minimum number (MinPts) of points that may be considered to be part of a
cluster.
Density Connectivity
• A point p is density connected to a point q if there is a point o such that
both, p and q are density-reachable from o wrt. Eps and MinPts.
• A cluster, is then defined based on the following two properties:
• Maximality: For all point p, q if p belongs to cluster C and q is density-
reachable from p (wrt. Eps and MinPts), then q also belongs to the cluster
C.
• Connectivity: For all point p, q in cluster C, p is density-connected to q
(wrt. Eps and MinPts).

120
DBSCAN vs K-means
• DBSCAN can find irregular shaped clusters as seen from this example
and can even find a cluster completely surrounded by different cluster.
• DBSCAN considers some points as noise and does not assign the to any
cluster.

121
Classification of Big Data

• Classification is the process of categorizing objects into predefined


categories.
• Classification is achieved by classification algorithms that belong to a
broad category of algorithms called supervised machine learning.
• Supervised learning involves inferring a model from a set of input data
and known responses to the data (training data) and then using the inferred
model to predict responses to new data.
Binary classification
• Binary classification involves categorizing the data into two categories.
For example, classifying the sentiment of a news article into positive or
negative, classifying the state of a machine into good or faulty, classifying
the heath test into positive or negative, etc.
Multi-class classification
• Multi-class classification involves more than two classes into which the
data is categorized. For example, gene expression classification problem
involves multiple classes.
Document classification
• Document classification is a type of multi-class classification approach in
which the data to the classified is in the form of text document. For
classifying news articles into different categories such as politics, sports,
etc.

Performance of Classification Algorithms

Precision: Precision is the fraction of objects that are classified correctly.

Recall: Recall is the fraction of objects belonging to a category that are


classified correctly.

122
Accuracy:

F1-score: F1-score is a measure of accuracy that considers both precision


and recall. F1-score is the harmonic means of precision and recall given as

Naive Bayes
 Naive Bayes is a probabilistic classification algorithm based on the
Bayes theorem with a naive assumption about the independence of
feature attributes. Given a class variable C and feature variables
F1,...,Fn , the conditional probability (posterior according to Bayes
theorem is given as,

 where, P(C|F1,...,Fn ) is the posterior probability, P(F1,...,Fn |C) is


the likelihood and P(C) is the prior probability and P(F1,...,Fn ) is the
evidence. Naive Bayes makes a naïve assumption about the
independence every pair of features given as,

 Since the evidence P(F1,...,Fn ) is constant for a given input and does
not depend on the class variable C, only the numerator of the

123
posterior probability is important for classification. With this
simplification, classification can then be done as follows,

Fig: Binary Classification with Naïve Bayes

124
Decision Trees

• Decision Trees are a supervised learning method that use a tree created
from simple decision rules learned from the training data as a predictive
model.
• The predictive model is in the form of a tree that can be used to predict the
value of a target variable based on a several attribute variables.
• Each node in the tree corresponds to one attribute in the dataset on which
the “split” is performed.
• Each leaf in a decision tree represents a value of the target variable.
• The learning process involves recursively splitting on the attributes until
all the samples in the child node have the same value of the target variable
or splitting further results in no further information gain.
• To select the best attribute for splitting at each stage, different metrics can
be used.

Fig: Binary classification with Decision trees

125
Splitting Attributes in Decision Trees
To select the best attribute for splitting at each stage, different metrics can
be used such as:
Information Gain
• Information content of a discrete random variable X with probability mass
function (PMF), P(X), is defined as,

 Information gain is defined based on the entropy of the random variable


which is defined as,

• Entropy is a measure of uncertainty in a random variable and choosing the


attribute with the highest information gain results in a split that reduces the
uncertainty the most at that stage.

Gini Coefficient
• Gini coefficient measures the inequality, i.e. how often a randomly chosen
sample that is labeled based on the distribution of labels, would be labeled
incorrectly. Gini coefficient is defined as,

Fig: Example of generated decision tree

126
Decision Tree Algorithms
• There are different algorithms for building decisions trees, popular ones
being ID3 and C4.5.
ID3:
• Attributes are discrete. If not, discretize the continuous attributes.
• Calculate the entropy of every attribute using the dataset.
• Choose the attribute with the highest information gain.
• Create branches for each value of the selected attribute.
• Repeat with the remaining attributes.
• The ID3 algorithm can be result in over-fitting to the training data
and can be expensive to train especially for continuous attributes.
C4.5
• The C4.5 algorithm is an extension of the ID3 algorithm. C4.5 supports
both discrete and continuous attributes.
• To support continuous attributes, C4.5 finds thresholds for the continuous
attributes and then splits based on the threshold values. C4.5 prevents over-
fitting by pruning trees after they have been created.
• Pruning involves removing or aggregating those branches which provide
little discriminatory power.
Random Forest
• Random Forest is an ensemble learning method that is based on
randomized decision trees.
• Random Forest trains a number decision trees and then takes the majority
vote by using the mode of the class predicted by the individual trees.

127
Breiman’s Algorithm

1. Draw a bootstrap sample (n times with replacement from the N samples


in the training set) from the dataset
2. Train a decision tree
- Until the tree is fully grown (maximum size)
- Choose next leaf node
- Select m attributes (m is much less than the total number of attributes M)
at random.
- Choose the best attribute and split as usual
3. Measure out-of-bag error
- Use the rest of the samples (not selected in the bootstrap) to estimate the
error of the tree, by
predicting their classes.
4. Repeat steps 1-3 k times to generate k trees.
5. Make a prediction by majority vote among the k trees
Support Vector Machine
• Support Vector Machine (SVM) is a supervised machine learning
approach used for classification and regression.
• The basic form is SVM is a binary classifier that classifies the data points
into one of the two classes.
• SVM training involves determining the maximum margi hyperplane that
separates the two classes.
• The maximum margin hyperplane is one which has the larges separation
from the nearest training data point.
• Given a training data set (xi ,yi ) where xi is an n dimensional vector and
yi = 1 if xi is in class 1 and yi = -1 if xi is in class 2.
• A standard SVM finds a hyperplane w.x-b = 0, which correctly separates
the training data points and has a maximum margin which is the distance
between the two hyperplanes
w.x-b= 1 and w.x-b = -1
.• Support Vector Machine (SVM) is a supervised machine learning
approach used for classification and regression.
• The basic form is SVM is a binary classifier that classifies the data points
into one of the two classes.
• SVM training involves determining the maximum margin hyperplane that
separates the two classes.
• The maximum margin hyperplane is one which has the largest separation
from the nearest training data point.
128
• Given a training data set (xi ,yi ) where xi is an n dimensional vector and
yi = 1 if xi is in class 1 and yi = -1 if xi is in class 2.
• A standard SVM finds a hyperplane w.x-b = 0, which correctly separates
the training data points and has a maximum margin which is the distance
between the two hyperplanes
w.x-b = 1 and w.x-b = -1

Fig: Maximum margin hyperplane

Fig: Binary classification with linear SVM

129
Fig: Binary classification with RBF SVM

Recommendation Systems
• Recommendation systems are an important part of modern cloud
applications such as e-Commerce, social networks, content delivery
networks, etc.
• Item-based or Content-based Recommendation
• Provides recommendations to users (for items such as books, movies,
songs, or restaurants) for unrated items base on the characteristics of the
item.
• Collaborative Filtering
• Provides recommendations based on the ratings given by the user and
other users to similar items.

130
Multimedia Cloud
Design methodology for PaaS service model
• For applications that use the Platform-as-a-service (PaaS) cloud service
model, the architectureand deployment design steps are not required since
the platform takes care of the architecture and deployment.
Component Design
• In the component design step, the developers have to take into
consideration the platform specific features.
Platform Specific Software
• Different PaaS offerings such as Google App Engine, Windows Azure
Web Sites, etc., provide platform specific software development kits
(SDKs) for developing cloud applications.
Sandbox Environments
• Applications designed for specific PaaS offerings run in sandbox
environments and are allowed to perform only those actions that do not
interfere with the performance of other applications.
Deployment & Scaling
• The deployment and scaling is handled by the platform while the
developers focus on the application development using the platform-
specific SDKs.
Portability
• Portability is a major constraint for PaaS based applications as it is
difficult to move.
Multimedia Cloud Reference Architecture
Infrastructure Services
• In the Multimedia Cloud reference architecture, the first layer is the
infrastructure services layer that includes computing and storage resources.
Platform Services
• On top of the infrastructure services layer is the platform services layer
that includes frameworks and services for streaming an associated tasks
such as transcoding and analytics that can be leveraged for rapid
development of multimedia applications.
Applications
• The topmost layer is the applications such as live video streaming, video
transcoding, video-on-demand, multimedia processing etc.
• Cloud-based multimedia applications alleviates the burden of installing
and maintaining multimedia applications locally on the multimedia

131
consumption devices (desktops, tablets, smartphone, etc and provide access
to rich multimedia content.

Service Models
• A multimedia cloud can have various service models such as IaaS, PaaS
and SaaS that offer infrastructure, platform or application services.

Fig: Multimedia Cloud Reference Architecture

Case Study: Multimedia Cloud - Live Video Streaming


Workflow of a live video streaming application that uses multimedia
cloud:
• The video and audio feeds generated by a number cameras and
microphones are mixed/multiplexed with video/audio mixers and then
encoded by a client application which then sends the encoded feeds to the
multimedia cloud.
• On the cloud, streaming instances are created on-demand and the streams
are then broadcast over the internet.
• The streaming instances also record the event streams which are later
moved to the cloud storage for video archiving.

132
Fig: Workflow for live video streaming using multimedia cloud

Streaming Protocols
• RTMP Dynamic Streaming (Unicast)
• High-quality, low-latency media streaming with support for live and on-
demand and full adaptive bitrate.
• RTMPE (encrypted RTMP)
• Real-time encryption of RTMP.
• RTMFP (multicast)
• IP multicast encrypted with support for both ASM or SSM multicast for
multicast-enabled network.
• RTMFP (P2P)
• P2P live video delivery between Flash Player clients.
• RTMFP (multicast fusion)
• IP and P2P working together to support higher QoS within enterprise
networks.
• HTTP Dynamic Streaming (HDS)
• Enabling on-demand and live adaptive bitrate video streaming of
standards-based MP4 media over regular HTTP connections.
• Protected HTTP Dynamic Streaming (PHDS)
• Real-time encryption of HDS.
• HTTP Live Streaming (HLS)
• HTTP streaming to iOS devices or devices that support the HLS format;
optional encryption with AES128 encryption standard.
RTMP Streaming
• Real Time Messaging Protocol (RTMP) is a protocol for streaming audio,
video and data over the Internet.
• The plain version of RTMP protocol works on top of TCP. RTMPS is a
secure variation of RTMP that works over TLS/SSL.
• RTMP provides a bidirectional message multiplex service over a reliable
stream transport, such as TCP.
133
• RTMP maintains persistent TCP connections that allow low-latency
communication.
• RTMP is intended to carry parallel streams of video, audio, and data
messages, with associated timing information, between a pair of
communicating peers.
• Streams are split into fragments so that delivery of the streams smoothly.
• The size of the stream fragments is either fixed or negotiated dynamically
between the client and server.
• Default fragment sizes used are 64-bytes for audio data, and 128 bytes for
video data.
• RTMP implementations typically assign different priorities to different
classes of messages, which can affect the order in which messages are
enqueued to the underlying stream transport when transport capacity is
constrained.
HTTP Live Streaming
• HTTP Live Streaming (HLS) can dynamically adjust playback quality to
match the available speed of wired or wireless networks.
• HLS supports multiple alternate streams at different bit rates, and the
client software can switch streams intelligently as network bandwidth
changes.
• HLS also provides for media encryption and user authentication over
HTTPS, allowing publishers to protect their work.
• The protocol works by splitting the stream into small chunks which are
specified in a playlist file.
• Playlist file is an ordered list of media URIs and informational tags.
• The URIs and their associated tags specify a series of media segments.
• To play the stream, the client first obtains the playlist file and then obtains
and plays each media segment in the playlist .
HTTP Dynamic Streaming
• HTTP Dynamic Streaming (HDS) enables on-demand and live adaptive
bitrate video delivery of standards-based MP4 media (H.264 or VPC) over
regular HTTP connections.
• HDS combines HTTP (progressive download) and RTMP (streaming
download) to provide the ability to deliver video content in a steaming
manner over HTTP.
• HDS supports adaptive bitrate which allows HDS to detect the client’s
bandwidth and computer resources and serve content fragments encoded at
the most appropriate bitrate for the best viewing experience.

134
• HDS supports high-definition video up to 1080p, with bitrates from 700
kbps up to and beyond 6Mbps, using either H.264 or VP6 video codecs, or
AAC and MP3 audio codecs.
• HDS allows leveraging existing caching infrastructures, content delivery
networks (CDNs) and standard HTTP server hardware to deliver on-
demand and live content.

Live Video Steaming App – Case Study


Functionality
• Live video streaming application allows on-demand creation of video
streaming instances in the cloud.
Development
• The live streaming application is created using the Django framework and
uses Amazon EC2 cloud instances.
• For video stream encoding and publishing, the Adobe Flash Media Live
Encoder and Flash Media Server are used.

Fig: Screenshots of live video streaming application showing video streaming page
Video Transcoding App – Case Study
Functionality
• Video transcoding application is based on multimedia cloud.
• The transcoding application allows users to upload video files and choose
the conversion presets.
Development
• The application is built upon the Amazon Elastic Transcoder.

135
• Elastic Transcoder is highly scalable, relatively easy to use service from
Amazon that allows converting video files from their source format into
versions that will playback on mobile devices like smartphones, tablets and
PCs.

Video Transcoding App – Demo

Fig: Screenshot of video transcoding app showing video uploading form

136
Cloud Application Benchmarking & Tuning

Benchmarking
• Benchmarking of cloud applications is important or the following reasons:
• Provisioning and capacity planning
• The process of provisioning and capacity planning for cloud applications
involves determining the amount of computing, memory and network
resources to provision for the application.
• Benchmarking can help in comparing alternative deployment architectures
and choosing the best and most cost effective deployment architecture that
can meet the application performance requirements.
• Ensure proper utilization of resources
• Benchmarking can help in determining the utilization of computing,
memory and network resources for applications and identify resources
which are either under-utilized or overprovisioned and hence save
deployments costs.
• Market readiness of applications
• Performance of an application depends on the characteristics of the
workloads it experiences. Different types of workloads can dead to different
performance for the same application.
• To ensure the market readiness of an application it is important to model
all types of workloads the application can experience and benchmark the
application with such workloads.
Cloud Application Benchmarking - Steps
• Trace Collection/Generation
• The first step in benchmarking cloud applications is to collect/generate
traces of real application workloads. For generating a trace of workload, the
application is instrumented to log information such as the requests
submitted by the users, the time-stamps of the requests, etc.
• Workload Modeling
• Workload modeling involves creation of mathematical models that can be
used for generation of synthetic workloads.

• Workload Specification
• Since the workload models of each class of cloud computing applications
can have different workload attributes, a Workload Specification Language
(WSL) is often used for specification of application workloads. WSL can
provide a structured way for specifying the workload attributes that are
critical to the performance of the applications. WSL can be used by

137
synthetic workload generators for generating workloads with slightly
varying the characteristics.
• Synthetic Workload Generation
• Synthetic workloads are used for benchmarking cloud applications. An
important requirement for a synthetic workload generator is that the
generated workloads should be representative of the real workloads.

Synthetic Workload Generation Approaches


Empirical approach
• In this approach traces of applications are sampled and replayed to
generate the synthetic workloads.
• The empirical approach lacks flexibility as the real traces obtained from a
particular system are used for workload generation which may not well
represent the Workloads on other systems with different configurations and
load conditions.
Analytical approach
• Uses mathematical models to define the workload characteristics that are
used by a synthetic workload generator.
• Analytical approach is flexible and allows generation of workloads with
different characteristics by varying the workload model attributes.
• With the analytical approach it is possible to modify the workload model
parameters one at a time and investigate the effect on application
performance to measure the application sensitivity to different parameters.
User Emulation vs Aggregate Workloads
The commonly used techniques for workload generation are:
User Emulation
• Each user is emulated by a separate thread that mimics the actions of a
user by alternating between making requests and lying idle.
• The attributes for workload generation in the user emulation method
include think time, request types, interrequest dependencies, for instance.
• User emulation allows fine grained control over modeling the behavioral
aspects of the users interacting with the system under test, however, it does
not allow controlling the exact time instants at which the requests arrive the
system.
Aggregate Workload Generation:
• Allows specifying the exact time instants at which the requests should
arrive the system under test.
• However, there is no notion of an individual user in aggregate workload
generation, therefore, it is not possible to use this approach when
dependencies between requests need to be satisfied.
138
• Dependencies can be of two types inter-request and data dependencies.
• An inter-request dependency exists when the current request depends on
the previous request, whereas a data dependency exists when the current
requests requires input data which is obtained from the response of the
previous request.
Workload Characteristics
Session
• A set of successive requests submitted by a user constitute a session.
• Inter-Session Interval
• Inter-session interval is the time interval between successive sessions.
Think Time
• In a session, a user submits a series of requests in succession. The time
interval between two successive requests is called think time.
Session Length
• The number of requests submitted by a user in a session is called the
session length.
Workload Mix
• Workload mix defines the transitions between different pages of an
application and the proportion in which the pages are visited.

Application Performance Metrics


The most commonly used performance metrics for cloud applications are:
Response Time
• Response time is the time interval between the moment when the user
submits a request to the application and the moment when the user receives
a response.
Throughput
• Throughput is the number of requests that can be serviced per second.

Considerations for Benchmarking Methodology


Accuracy
• Accuracy of a benchmarking methodology is determined by how closely
the generated synthetic workloads mimic the realistic workloads.
Ease of Use
• A good benchmarking methodology should be user friendly and should
involve minimal hand coding effort for writing scripts for workload
generation that take into account the dependencies between requests,
workload attributes, for instance.

139
Flexibility
• A good benchmarking methodology should allow fine grained control
over the workload attributes such as think time, inter-session interval,
session length, workload mix, for instance, to perform sensitivity analysis.
• Sensitivity analysis is performed by varying one workload characteristic
at a time while keeping the others constant.
Wide Application Coverage
• A good benchmarking methodology is one that works for a wide range of
applications and not tied to the application architecture or workload types.

Types of Tests

Baseline Tests
• Baseline tests are done to collect the performance metrics data of the
entire application or a component of the application.
• The performance metrics data collected from baseline tests is used to
compare various performance tuning changes which are
subsequently made to the application or a component.
Load Tests
• Load tests evaluate the performance of the system with multiple users and
workload levels that are encountered in the production phase.
• The number of users and workload mix are usually specified in the load
test configuration.
Stress Tests
• Stress tests load the application to a point where it breaks down.
• These tests are done to determine how the application fails, the conditions
in which the application fails and the metrics to monitor
which can warn about impending failures under elevated workload levels.
Soak Tests
• Soak tests involve subjecting the application to a fixed workload level for
long periods of time.
• Soak tests help in determining the stability of the application under
prolonged use and how the performance changes with time.

Deployment Prototyping
• Deployment prototyping can help in making deployment architecture
design choices.
• By comparing performance of alternative deployment architectures,
deployment prototyping can help in choosing the best and most cost

140
effective deployment architecture that can meet the application performance
requirements.
• Deployment design is an iterative process that involves the following
steps:
Deployment Design
• Create the deployment with various tiers as specified in the deployment
configuration and deploy the application.
Performance Evaluation
• Verify whether the application meets the performance requirements with
the deployment.
Deployment Refinement
• Deployments are refined based on the performance evaluations. Various
alternatives can exist in this step such as vertical scaling, horizontal scaling,
for instance.

Fig: Steps involved in deployment prototyping for a cloud application

141
Performance Evaluation Workflow

Semi-Automated Workflow (Traditional Approach)


• In traditional approach to capture workload characteristics, a real user’s
interactions with a cloud application are first recorded as virtual user
scripts.
• The recorded virtual user scripts then are parameterized to account for
randomness in application and workload parameters.
• Multiple scripts have to be recorded to create different workload
scenarios. This approach involves a lot of manual effort.
• To add new specifications for workload mix and new requests, new scripts
need to be recorded and parameterized.
• Traditional approaches which are based on manually generating virtual
user scripts by interacting with a cloud application, are not able to generate
synthetic workloads which have the same characteristics as real workloads.
• Traditional approaches do now allow rapidly comparing various
deployment architectures.

Fig: Traditional performance evaluation workflow

142
Performance Evaluation Workflow

Fully-Automated Workflow (Modern Approach)


• In the automated approach real traces of a multi-tier application which are
logged on web servers, application servers and database servers are
analyzed to generate benchmark and workload models that capture the
cloud application and workload characteristics.
• A statistical analysis of the user requests in the real traces is performed to
identify the right distributions that can be used to model the workload
model attributes.
• Real traces are analyzed to generate benchmark and workload models.
• Various workload scenarios can be created by changing the specifications
of the workload model.
• Since real traces from a cloud application are used to capture workload an
application characteristics into workload and benchmark models, the
generated synthetic workloads have the same characteristics as real
workloads.
• An architecture model captures the deployment configurations of multi-
tier applications.

Fig: Fully-Automated Workflow (Modern Approach)


143
Benchmarking Case Study
• Fig (a) shows the average throughput and response time. The observed
throughput increases as demanded request rate increases. As more number
of requests are served per second by the application, the response time also
increases. The observed throughput saturates beyond a demanded request
rate of 50 req/sec.
• Fig (b) shows the CPU usage density of one of the application servers.
This plot shows that the application server CPU is non-saturated resource.
• Fig (c) shows the database server CPU usage density. From this density
plot we observe that the database CPU spends a large percentage of time at
high utilization levels for demanded request rate more than 40 req/sec.
• Fig (d) shows the density plot of the database disk I/O bandwidth.
• Fig (e) shows the network out rate for one of the application servers
• Fig (f) shows the density plot of the network out rate for the database
server. From this plot we observe a continuous saturation of the network
out rate around 200KB/s.

Analysis
• Throughput continuously increases as the demanded request rate increases
from 10 to 40 req/sec. Beyond 40req/sec demanded request rate, we
observe that throughput saturates, which is due to the high CPU utilization
density of the database server CPU. From the analysis of density plots of
various system resources we observe that the database CPU is a system
bottleneck.

144
Fig(a) Plot of average throughput and response time results obtained from
the load testing with httperf,(b)application server-1 cpu utilization
density,(c)database sserrver cpu utilization density,(d)database server disk
I/O bandwidth,(e)application server-1 network outgoing rate
density,(f)database server network outgoing density

145
PART-A (2-Mark Questions)
1.What is big data approach?

2.List out the applications of big data analytics.

3.List the Characteristics of big data.

4.List the classification Algorithms.

5. What are the components of multimedia Architecture.

6.List the streaming protocols

7.What are the workload Characteristics?

8. Define Throughput.

9. List the various steps in Deployment Prototyping.

10.What are the design Considerations for Benchmarking Methodology.

11.What are the various tests that can be done using the benchmarking
tools.

12. What is Stress Testing?

PART-B (10-Mark Questions)


1.Explain Classification of Big data, Recommendation of Systems.
2.Explain case Study: Video Transcoding App.
3.Explain the Workload Characteristics.
4.List the Application Performance Metrics.
5.What are the Benchmarking Tools?
6.Explain Load Testing & Bottleneck Detection case Study.
7.Explain Hadoop benchmarking case Study.

146
UNIT-V
Cloud Security

Introduction:
“Security in the Cloud is much like security in your on-premises data
centers – only without the costs of maintaining facilities and hardware. In
the Cloud, you don’t have to manage physical servers or storage devices.
Instead, you use software-based security tools to monitor and protect the
flow of information into and of out of your Cloud resources.”

Cloud Security Challenges


Authentication
• Authentication refers to digitally confirming the identity of the entity
requesting access to some protected information.
• In a traditional in-house IT environment authentication polices are under
the control of the organization. However, in cloud computing environments,
where applications and data are accessed over the internet, the complexity
of digital authentication mechanisms increases rapidly.
Authorization
• Authorization refers to digitally specifying the access rights to the
protected resources using access policies.
• In a traditional in-house IT environment, the access policies are controlled
by the organization and can be altered at their convenience.
• Authorization in a cloud computing environment requires the use of the
cloud service providers services for specifying the access policies.
Security of data at rest
• Due to the multi-tenant environments used in the cloud, the application
and database servers of different applications belonging to different
organizations can be provisioned side-by-side increasing the complexity of
securing the data.
• Appropriate separation mechanisms are required to ensure the isolation
between applications and data from different organizations.
Security of data in motion
• In traditional in-house IT environments all the data exchanged between
the applications and users remains within the organization’s control and
geographical boundaries.
147
• With the adoption of the cloud model, the applications and the data are
moved out of the in-house IT infrastructure to the cloud provider.
• Therefore, appropriate security mechanisms are required to ensure the
security of data in, and while in, motion.
Data Integrity
• Data integrity ensures that the data is not altered in an unauthorized
manner after it is created, transmitted or stored. Due to the outsourcing of
data storage in cloud computing environments, ensuring integrity of data is
important.

Auditing
• Auditing is very important for applications deployed in cloud computing
environments.
• In traditional in-house IT environments, organizations have complete
visibility of their applications and accesses to the protected information.
• For cloud applications appropriate auditing mechanisms are required to
get visibility into the application, data accesses and actions performed by
the application users, including mobile users and devices such as wireless
laptops and smartphones.

CSA Cloud Security Architecture


• Cloud Security Alliance (CSA) provides a Trusted Cloud Initiativeb(TCI)
Reference Architecture.
• TCI is a methodology and a set of tools that enable cloud application
developers and security architects to assess where their internal IT and their
cloud providers are in terms of
security capabilities, and to plan a roadmap to meet the security needs of
their business.
• Security and Risk Management (SRM) domain within the TCI Reference
includes:
• Governance, Risk Management, and Compliance
• Information Security Management
• Privilege Management Infrastructure
• Threat and Vulnerability Management
• Infrastructure Protection Services
• Data Protection
• Policies and Standards

148
Fig: Security and risk management (SRM) domain.

Authentication
• Authentication refers to confirming the digital identity of the entity
requesting access to some protected information.
• The process of authentication involves, but is not limited to, validating the
at least one factor of identification of the entity to be authenticated.
• A factor can be something the entity or the user knows (password or pin),
something the user has (such as a smart card), or something that can
uniquely identify the user (such as fingerprints).
• In multifactor authentication more than one of these factors are used for
authentication.
• There are various mechanisms for authentication including:
• SSO
• SAML-Token
• OTP
Single Sign-on (SSO)
• Single Sign-on (SSO) enables users to access multiple systems or
applications after signing in only once, for the first time.
149
• When a user signs in, the user identity is recognized and there is no need
to sign in again and again to access related systems or applications.
• Since different systems or applications may be internally using different
authentication mechanisms, SSO upon receiving initial credential translates
to different credentials for different systems or applications.
• The benefit of using SSO is that it reduces human error and saves time
spent in authenticating with different systems or applications for the same
identity.
There are different implementation mechanisms:
• SAML-Token
• Kerberos
SAML-Token
• Security Assertion Markup Language (SAML) is an XML-based open
standard data format for exchanging security informatio(authentication and
authorization data) between an identity provider and a service provider.
SAML-token based SSO authentication
• When a user tries to access the cloud application, a SAML request is
generated and the user is redirected to the identity provider.
• The identity provider parses the SAML request and authenticates the
user.A SAML token is returned to the user, who then accesses the cloud
application with the token.
• SAML prevents man-in-the-middle and replay attacks by requiring the use
of SSL encryption when transmitting assertions and messages.
• SAML also provides a digital signature mechanism that enables the
assertion to have a validity time range to prevent replay attacks.

Fig: SAML token based SSO authentication

150
Kerberos
• Kerberos is an open authentication protocol that was developed At MIT.
• Kerberos uses tickets for authenticating client to a service that
communicate over an un-secure network.
• Kerberos provides mutual authentication, i.e. both the client and the server
authenticate with each other.

Fig: Kerberos authentication flow

One Time Password (OTP)


• One time password is another authentication mechanism that uses
passwords which are valid for single use only for a single transaction or
session.
• Authentication mechanism based on OTP tokens are more secure because
they are not vulnerable to replay attacks.
• Text messaging (SMS) is the most common delivery mode for OTP
tokens.
• The most common approach for generating OTP tokens is time
synchronization.

151
• Time-based OTP algorithm (TOTP) is a popular time synchronization
based algorithm for generating OTPs.
Authorization
• Authorization refers to specifying the access rights to the protected
resources using access policies.

OAuth
• OAuth is an open standard for authorization that allows resource owners
to share their private resources stored on one site with another site without
handing out the credentials.
• In the OAuth model, an application (which is not the resource owner)
requests access to resources controlled by the resource owner (but hosted
by the server).
• The resource owner grants permission to access the resources in the form
of a token and matching shared-secret.
• Tokens make it unnecessary for the resource owner to share its credentials
with the application.
• Tokens can be issued with a restricted scope and limited lifetime, and
revoked independently.

Fig: OAuth authorization flow

152
Identity & Access Management
• Identity management provides consistent methods for digitally identifying
persons and maintaining associated identity attribute for the users across
multiple organizations.
Access management deals with user privileges.
• Identity and access management deal with user identities, their
authentication, authorization and access policies.
Federated Identity Management
• Federated identity management allows users of one domain to securely
access data or systems of another domain seamlessly without the need for
maintaining identity information separately for multiple domains.
• Federation is enabled through the use single sign-on mechanisms such as
SAML token and Kerberos.
Role-based access control
• Used for restricting access to confidential information to authorized users.
• These access control policies allow defining different roles for different
users.

Fig: Role based access control in the cloud

153
Securing Data at Rest
• Data at rest is the data that is stored in database in the form of
tables/records, files on a file server or raw data on a distributed storage or
storage area network (SAN).
• Data at rest is secured by encryption.
• Encryption is the process of converting data from its original form (i.e.,
plaintext) to a scrambled form (ciphertext) that is unintelligible. Decryption
converts data from ciphertext to plaintext.
Encryption can be of two types:
• Symmetric Encryption (symmetric-key algorithms)
• Asymmetric Encryption (public-key algorithms)

Symmetric Encryption
• Symmetric encryption uses the same secret key for both encryption and
decryption.
• The secret key is shared between the sender and the receiver.
• Symmetric encryption is best suited for securing data at rest since the data
is accessed by known entities from known locations.
• Popular symmetric encryption algorithms include:
• Advanced Encryption Standard (AES)
• Twofish
• Blowfish
• Triple Data Encryption Standard (3DES)
• Serpent
• RC6
• MARS

Asymmetric Encryption
• Asymmetric encryption uses two keys, one for encryption (public key)
and other for decryption (private key).
• The two keys are linked to each other such that one key encrypts
plaintext to ciphertext and other decrypts ciphertext back to plaintext.
• Public key can be shared or published while the private key is known only
to the user.
• Asymmetric encryption is best suited for securing data that is exchanged
between two parties where symmetric encryption can be unsafe because the
secret key has to be exchanged between the parties and anyone who
manages to obtain the secret key can decrypt the data.
• In asymmetric encryption a separate key is used for decryption which is
kept private.
154
Fig: Asymmetric encryption using public/private keys
Encryption Levels
Encryption can be performed at various levels:
Application
• Application level encryption involves encrypting application data right at
the point where it originates i.e. within the application.
• Application level encryption provides security at the level of both the
operating system and from other applications.
• An application encrypts all data generated in the application before it
flows to the lower levels and resents decrypted data to the user.
Host
• In host-level encryption, encryption is performed at the file-level for all
applications running on the host.
• Host level encryption can be done in software in which case additional
computational resource is required for encryption or it can be performed
with specialized hardware such as a cryptographic accelerator card.
Network
• Network-level encryption is best suited for cases where the threats to data
are at the network or storage level and not at the application or host level.
• Network-level encryption is performed when moving the data form a
creation point to its destination using a specialized hardware that encrypts
all incoming data in real-time.

155
Device
• Device-level encryption is performed on a disk controller or a storage
server.
• Device level encryption is easy to implement and is best suited for cases
where the primary concern about data security is to protect data residing on
storage media.

Fig: Encryption levels


Securing Data in Motion
• Securing data in motion, i.e., when the data flows between a client and a
server over a potentially insecure network, is important to ensure data
confidentiality and integrity.
• Data confidentiality means limiting the access to data so that only
authorized recipients can access it.
• Data integrity means that the data remains unchanged when moving from
sender to receiver.
• Data integrity ensures that the data is not altered in an unauthorized
manner after it is created, transmitted or stored.
• Transport Layer Security (TLS) and Secure Socket Layer (SSL) are the
mechanisms used for securing data in motion.
• TLS and SSL are used to encrypt web traffic using Hypertext Transfer
Protocol (HTTP).
• TLS and SSL use asymmetric cryptography for authentication of key
exchange, symmetric encryption for confidentiality and message
authentication codes for message integrity.

156
Fig: TLS Handshake

Key Management
• Management of encryption keys is critical to ensure security of encrypted
data.
The key management lifecycle involves different phases including:
• Creation
• Backup
• Deployment
• Monitoring
• Rotation
• Expiration
• Archival
• Destruction
Key Management Approach (example)
• All keys for encryption must be stored in a data store which is separate
and distinct from the
actual data store.
• Additional security features such as key rotation and key encrypting keys
can be used.
• Keys can be automatically or manually rotated.
• In the automated key change approach, the key is changed after a certain
number of
transactions.
• All keys can themselves be encrypted using a master key.

157
Fig: Example of a key management approach

Auditing
• Auditing is mandated by most data security regulations.
• Auditing requires that all read and write accesses to data be logged.
• Logs can include the user involved, type of access, timestamp, actions
performed and records accessed.
• The main purpose of auditing is to find security breaches, so that
necessary changes can be made in the application and deployment to
prevent a further security breach.
The objectives of auditing include:
• Verify efficiency and compliance of identity and access management
controls as per established access policies.
• Verifying that authorized users are granted access to data and services
based on their roles.
• Verify whether access policies are updated in a timely manner upon
change in the roles of the users.
• Verify whether the data protection policies are sufficient.
• Assessment of support activities such as problem management.

MIGRATING INTO A CLOUD


 The promise of cloud computing has raised the IT expectations of
small and medium enterprises beyond measure. Large companies are
deeply debating it. Cloud computing is a disruptive model of IT
158
whose innovation is part technology and part business model in short
a ―disruptive techno-commercial model‖ of IT.
 We propose the following definition of cloud computing: ―It is a
techno-business disruptive model of using distributed largescale data
centers either private or public or hybrid offering customers a
scalable virtualized infrastructure or an abstracted set of services
qualified by service-level agreements (SLAs) and charged only by
the abstracted IT resources consumed.
 In Figure the promise of the cloud both on the
business front (the attractive cloudonomics) and the
technology front widely aided the CxOs to spawn
out several non-mission critical IT needs from the
ambit of their captive traditional data centers to
the appropriate cloud service.
• ‗Pay per use‘ – Lower Cost Barriers
• On Demand Resources –Autoscaling
Cloudonomics • Capex vs OPEX – No capital expenses (CAPEX) and only operational expenses OPEX.
• SLA driven operations – Much Lower TCO
• Attractive NFR support: Availability, Reliability

• ‗Infinite‘ Elastic availability – Compute/Storage/Bandwidth


• Automatic Usage Monitoring and Metering
Technology • Jobs/ Tasks Virtualized and Transparently ‗Movable‘
• Integration and interoperability ‗support‘ for hybrid ops
• Transparently encapsulated & abstracted IT features.

Fig The promise of the cloud computing services.

 Several small and medium business enterprises, however,


leveraged the cloud much beyond the cautious user. Many
startups opened their IT departments exclusively using cloud
services very successfully and with high ROI. Having observed
these successes, several large enterprises have started
successfully running pilots for leveraging the cloud
 Many large enterprises run SAP to manage their
operations. SAP itself is experimenting with running its
suite of products: SAP Business One as well as SAP
Netweaver on Amazon cloud offerings.

159
THE CLOUD SERVICE OFFERINGS AND DEPLOYMENT
MODELS
 Cloud computing has been an attractive proposition both for
the CFO and the CTO of an enterprise primarily due its
ease of usage. This has been achieved by large data center
service vendors or now better known as cloud service
vendors again primarily due to their scale of operations.
IaaS • Abstract Compute/Storage/Bandwidth Resources
• Amazon Web Services[10,9] – EC2, S3, SDB, CDN,
IT Folks

Paa • Abstracted Programming Platform with encapsulated


Programmers • Google Apps Engine(Java/Python), Microsoft Azure,

Saa • Application with encapsulated infrastructure &

Cloud Application Deployment & Consumption


Public Hybrid Private

Fig: The cloud computing service offering and deployment models

BROAD APPROACHES TO MIGRATING INTO THE CLOUD

 Cloud Economics deals with the economic rationale for


leveraging the cloud and is central to the success of
cloud-based enterprise usage. Decision-makers, IT
managers, and software architects are faced with several
dilemmas when planning for new Enterprise IT
initiatives.

THE SEVEN-STEP MODEL OF MIGRATION INTO A CLOUD

 Typically migration initiatives into the cloud are


implemented in phases or in stages. A structured and
process-oriented approach to migration into a cloud has
160
several advantages of capturing within itself the best
practices of many migration projects.

1. Conduct Cloud Migration Assessments


2. Isolate the Dependencies
3. Map the Messaging & Environment
4. Re-architect & Implement the lost Functionalities
5. Leverage Cloud Functionalities & Features
6. Test the Migration
7. Iterate and Optimize

The Seven-Step Model of Migration into the Cloud.

START
Assess

Optimize Isolate
END

The Iterative Seven Step

Test Map

Re-
Augment

Fig: The iterative Seven-step Model of Migration into the Cloud.


Migration Risks and Mitigation
 The biggest challenge to any cloud migration project is
how effectively the migration risks are identified and
161
mitigated. In the Seven-Step Model of Migration into
the Cloud, the process step of testing and validating
includes efforts to identify the key migration risks. In
the optimization step, we address various approaches
to mitigate the identified migration risks.

 Migration risks for migrating into the cloud fall under


two broad categories: the general migration risks and the
security-related migration risks. In the former we
address several issues including performance monitoring
and tuning—essentially identifying all possible
production level deviants; the business continuity and
disaster recovery in the world of cloud computing
service.

 the compliance with standards and governance issues;


the IP and licensing issues; the quality of service
(QoS) parameters as well as the corresponding SLAs
committed to; the ownership, transfer, and storage of
data in the application; the portability and
interoperability issues which could help mitigate
potential vendor lock-ins; the issues that result in
trivializing and non comprehencing the complexities of
migration that results in migration failure and loss of
senior management‘s business confidence in these
efforts.
ORGANIZATIONALREADINESSAND CHANGE
MANAGEMENT IN THE CLOUD AGE

 Studies for Organization for Economic Co-


operation and Development (OECD) economies in
2002 demonstrated that there is a strong
correlation between changes in organization and
workplace practices and investment in information
technologies . This finding is also further
confirmed in Canadian government studies, which
indicate that the frequency and intensity of
organizational changes is positively correlated
162
with the amount and extent of information
technologies investment. It means that the
incidence of organizational change is much higher
in the firms that invest in information technologies
(IT) than is the case in the firms that do not invest
in IT, or those that invest less than the competitors
in the respective industry.

 IT must adopt emerging technologies to facilitate


business to leverage the new technologies to create
new opportunities, or to gain productivity and
reduce cost. Sometimes emerging technology (e.g.,
cloud computing: IaaS, PaaS, SaaS) is quite
disruptive to the existing business process,
including core IT services— for example, IT
service strategy, service design, service transition,
service operation, and continual service
improvement—and requires fundamental re-
thinking of how to minimize the negative impact
to the business, particularly the potential impact on
morale and productivity of the organization.
The Context
 The adaptation of cloud computing has forced
many companies to recognize that clarity of
ownership of the data is of paramount importance.
The protection of intellectual property (IP) and
other copyright issues is of big concern and needs
to be addressed carefully.
The Take Away
 Transition the organization to a desirable level of
change management maturity level by enhancing
the following key domain of knowledge and
competencies:

Domain 1. Managing the Environment: Understand


the organization (peo- ple, process, and
culture).
163
Domain 2. Recognizing and Analyzing the Trends
(Business and Technol- ogy): Observe the
key driver for changes.
Domain 3. Leading for Results: Assess organizational readiness
and archi- tect solution that delivers definite business values.

BASIC CONCEPT OF ORGANIZATIONAL READINESS


 Change can be challenging; it brings out the fear
of having to deal with uncertainties. This is the
FUD syndrome: Fear,Uncertainty, and Doubt.
Employees understand and get used to their roles
and responsibility and are able to leverage their
strength. It is a common, observable human
behavior that people tend to become comfortable
in an unchanging and stable environment, and will
become uncom- fortable and excited when any
change occurs, regardless the level and intensity of
the change.
 A survey done by Forrester in June 2009 suggested
that large enterprises are going to gravitate toward
private clouds. The three reasons most often ad-
vanced for this are:

1. Protect Existing Investment: By building a private


cloud to leverage existing infrastructure.
2. Manage Security Risk: Placing private cloud
computing inside the company reduces some of the
fear (e.g., data integrity and privacy issues) usually
associated with public cloud.
A Case Study: Waiting in Line for a Special Concert Ticket
 It is a Saturday morning in the winter, the
temperature is 212○C outside, and you have been
waiting in line outside the arena since 5:00 AM
this morning for concert tickets to see a
performance by Supertramp. What is your
reaction? What should you do now without the
164
tickets? Do you need to change the plan? Your
reaction would most likely be something like this:

 Denial. You are in total disbelief, and the first


thing you do is to reject the fact that the concert
has been sold out.
 Anger. You probably want to blame the weather;
you could have come here 10 minutes earlier.
 Bargaining. You try to convince the clerk to check
again for any available seats.
 Depression. You are very disappointed and do not know
what to do next.
 Acceptance. Finally accepting the inevitable fate,
you go to plan B if you have one.
 The five-stage process illustrated above was
originally proposed by Dr. Elizabeth Ku¨ bler-Ross to
deal with catastrophic news. There are times in which
people receive news that can seem catastrophic; for
example; company merger, right- sizing, and so on.

What Do People Fear?


 Let’s look at this from a different perspective and try to
listen to and understand what people are saying when they
first encounter change.

“That is not the way we do things here; or it is different .in here ”


 People are afraid of change because they feel far
more comfortable and safe by not going outside
their comfort zone, by not rocking the boat and
staying in the unchanged state.

“It is too risky ”


 People are also afraid of losing their position,
power, benefits, or even their jobs in some
instances. It is natural for people to try to defend
and protect their work and practice.
165
 The more common concerns are related to cloud
computing, and some of them are truly
legitimate and require further study, including:
 Security and privacy protection
 Loss of control (i.e., paradigm shift)
 New model of vendor relationship management
 More stringent contract negotiation and service-level
agreement (SLA)
 Availability of an executable exit strategy

DRIVERS FOR CHANGES:

A Framework To Comprehend The Competitive Environment


The Framework.The five driving factors for change
encapsulated by the framework are:
● Economic (global and local, external and internal)
● Legal, political, and regulatory compliance
● Environmental (industry structure and trends)
● Technology developments and innovation
● Sociocultural (markets and customers)
● The five driving factors for change is an approach to
investigate, analyze, and forecast the emerging trends
of a plausible future, by studying and understanding
the five categories of drivers for change. The results
will help the business to make better decisions, and it
will also help shape the short- and long-term strategies
of that business.
● Every organization’s decisions are influenced by particular key
factors, some of them are within the organization’s control, such
as (a) internal financial weakness and strength and (b) technology
development and innovation, and therefore the organization has
more control. The others, such as legal compliance issues,
competitor capabilities, and strategies, are all external factors over
which the organization has little or no control. In a business
setting, it helps us to visualize and familiarize ourselves with
future possibilities (opportunities and threats). Economic (Global
anLocal,External and Internal).
166
Following are sample questions that could help to provoke further
discussion:
● What is the current economic situation?
● What will the economy looks likin 1 year, 2 years, 3 years, 5 years,
andso on?
What are some of the factors that will influence the Legal, Political,
and Regulatory Compliance
● This section deals with issues of transparency,
compliance, and conformity. The objective is to be
a good corporate citizen and industry leader and to
avoid the potential cost of legal threats from
external factors.
The following are sample questions that could help
to provoke further discussion:
● What are the regulatory compliance requirements?
● What is the implication of noncompliance?
● What are the global geopolitical issues?
● e future economic outlook?
● Is capital easy to access?
● How does this technology transcend the existing business model?
● Buy vs. build? Which is the right way?
● What is the total cost of ownership (TCO)?

Environmental (Industry Structure and Trends)

Environmental factors usually deal with the quality of the


natural environment, human health, and safety.The following
are sample questions that could help to provoke further
discussion:
● What is the implication of global warming concern?
● Is a green data center over-hyped?
● How can IT initiatives help and support
organizational initiatives to reduce
carbonfootprint?
● Can organizations and corporations leverage
information technology, including cloud
computing to pursue sustainable development?
167
Technology Developments and Innovation
Scientific discoveries are seen to be key drivers of
economic growth; leading economists have identified
technological innovations as the single most important
contributing factor in sustained economic growth.
The following are sample questions that could
help to provoke further discussion:

● When will the IT industry standards be finalized?


By who? Institute of Electrical and Electronics
Engineers (IEEE)?
● Who is involved in the standardization process?
● Who is the leader in cloud computing technology?
● What about virtualization of application—operating
system (platform) pair (i.e., write once, run
anywhere)?
● How does this emerging technology (cloud computing)
open up new areas for innovation?
● How can an application be built once so it can
configure dynamically in real time to operate most
effectively, based on the situational constraint(e.g., out
in the cloud somewhere, you might have bandwidth
constraint to transfer needed data)?
● What is the guarantee from X Service Providers (XSP)
that the existing applications will still be compatible
with the future infrastructure (IaaS)? Will the data still
be executed correctly?
Sociocultural (Markets and Customers)
Societal factors usually deal with the intimate
understanding of the human side of changes and with the
quality of life in general. A case in point: The companies
that make up the U.S. defense industry have seen more
than 50% of their market disappear.
The following are sample questions that could
help to provoke further discussion:

● What are the shifting societal expectations and trends?


● What are the shifting demographic trends?
168
● How does this technology change the user experience?
● Is the customer the king?
● Buy vs. build? Which is the right way?
● How does cloud computing change the world?
● Is cloud computing over-hyped?

Creating a Winning Environment


At the cultural level of an organization, change too
often requires a lot of planning and resource. In order to
overcome this, executives must articulate a new vision
and must communicate aggressively and extensively to
make sure that every employee understands:

1. The new direction of the firm (where we want to go today)


2. The urgency of the change needed
3. What the risks are to
a. Maintain status quote
b. Making the change
4. What the new role of the employee will be
5. What the potential rewards are

● Build a business savvy IT organization


● Are software and hardware infrastructure an unnecessary
burden?
● What kind of things does IT do that matter most to
business?
● Would the IT professional be better off
focusing on highly valued product issues?
● Cultivate an IT savvy
business organization.
● Do users require new
skill and expertise?
One of the important value propositions of cloud computing should be to to
explain the decision maker and the users the benefits of:
● Buy and not build
● No need for a large amount of up-front capital investment
● Opportunity to relieve your smartest people from

169
costly data center operational activities; and
switch to focus on value-added activities
● Keep integration (technologies) simple

COMMON CHANGE MANAGEMENT MODELS


● There are many different change management
approaches and models, and we will discuss two of
the more common models and one proposed
working model (CROPS) here; the Lewin’s Change
Management Model, the Deming Cycle (Plan, Do,
Study, Act) and the proposed CROPS Change
Management Framework.
Lewin’s Change Management Model
● Kurt Lewin, a psychologist by training, created
this change model in the 1950s. Lewin observed
that there are three stages of change, which are:
Unfreeze, Transition, and Refreeze. It is
recognized that people tend to become compla-
cent or comfortable in this “freeze” or
“unchanging/stable” environment, and they wish
to remain in this “safe/comfort” zone. Any
disturbance/disruption to this unchanging state
will cause pain and become uncomfortable.
● The transition phase is when the change (plan) is
executed and actual change is being implemented.
Since these “activities” take time to be completed,
the process and organizational structure may also
need to change, specific jobs may also change. The
most resistance to change may be experienced
during this transition period. This is when
leadership is critical for the change process to
succeed, and motivational factors are paramount to
project success.
● The last phase is Refreeze; this is the stage when
the organization once again becomes
unchanging/frozen until the next time a change is
initiated.

170
UNFREEZE TRANSITION REFREEZE

Deming Cycle (Plan, Do, Study,Act)


● The Deming cycle is also known as the PDCA
cycle; it is a continuous improvement (CI) model
comprised of four sequential subprocesses; Plan,
Do, Check, and Act.
● Edward Deming proposed in the 1950s that
business processes and systems should be
monitored, measured, and analyzed continuously to
identify varia- tions and substandard products and
services, so that corrective actions can be taken to
improve on the quality of the products or services
delivered to the customers.
● PLAN: Recognize an opportunity and plan achange.
● DO: Execute the plan in a small scale to prove the
concept.
● CHECK: Evaluate the performance of the change
and report the results to sponsor.
● ACT: Decide on accepting the change and
standardizing it as part of the process.

Incorporate what has been learned from the previous steps


to plan new improvements, and begin a new cycle.
Deming’s PDCA cycle is illustrated in Fig . Deming’s PDCA
cycle.

171
Study the result; Understand the gap
redesign systems to between residents’
reflect learning – expectations and what is
change standards and being delivered; set
ACT PLAN
BETTER
ENVIRONME
NT FOR
CITIES

Observe the effects of Implement


the change and test – changes; collect

Fig: . Deming’s PDCA cycle

A Proposed Working Model: CROPS Change Management


Framework
● For many organizations, change management focuses on the
project manage- ment aspects of change. There are a good
number of vendors offering products that are intended to
help organizations manage projects and project changes,
including the Project Portfolio Management Systems
(PPMS). PPMS groups projects so they can be managed as a
portfolio, much as an investor would manage his/her stock
investment portfolio to reduce risks.
● Culture. Corporate culture is a reflection of organizational
(management and employees) values and belief. Edgar
Schein, one of the most prominent theorists of
organizational culture, gave the following very general
definition.

● The culture of a group can now be defined as: A pattern of


shared basic assumptions that the group learned as it
solved its problems of external adapta- tion and internal
integration, that has worked well enough to be considered
valid and, therefore, to be taught to new members as the
correct way to perceive, think, and feel in relation to those
problems.

172
Elements of organizational culture may include:
● Stated values and belief
● Expectations for member behavior

Processe
s

Organization Skills and


and Competencies

Rewards
and Culture
Managemen
● Customs and rituals
● Stories and myths about the history of the organization

● Norms—the feelings evoked by the way members interact


with each other, with outsiders, and with their
environment
● Metaphors and symbols—found embodied in other cultural
elements

Rewards and Management System. This management system


focuses on how employees are trained to ensure that they have the right
skills and tools to do the job right.
Organization and Structures. How the organization is structured is
largely influenced by what the jobs are and how the jobs are
performed. The design of the business processes govern what the jobs
are, and when and where they get done.

Process. Thomas Davenport defined a business process or business


method as a collection of related, structured activities or tasks that
produce a specific service or product (serve a particular goal) for a
particular customer or customers.
● Hammer and Champy’s definition can be considered as a

173
subset of Davenport’s. They define a process as “a
collection of activities that takes one or more kinds of
input and creates an output that is of value to the
customer.”

Skills and Competencies. Specialized skills that become part


of the organ- zational core competency enable innovation and
create a competitive edge. Organizations that invest in
research and development which emphasize investing in
people’s training and well-being will shape a winning
strategy.
The CROPS model.

CHANGE MANAGEMENT MATURITY MODEL (CMMM)


● A Change Management Maturity Model
(CMMM) helps organizations to (a) analyze,
understand, and visualize the strength and
weakness of the firm’s change management
process and (b) identify opportunities for
improvement and building competitiveness. The
model should be simple enough to use and
flexible to adapt to different situations.
● The working model is based on CMM
(Capability Maturity Model), originally
developed by Amer- ican Software Engineering
Institute (SEI) in cooperation with Mitre
Corpora- tion. CMM is a model of process
maturity for software development, but it has
since been adapted to different domains. The
CMM model describes a five-level process
maturity continuum.
● How does CMMM help organizations to adopt
new technology, including cloud computing,
successfully? The business value of CMMM can
be expressed in terms of improvements in
business efficiency and effectiveness. All
organiza- tional investments are business
investments, including IT investments. The
resulting benefits should be measured in terms of
174
business returns. Therefore, CMMM value can
be articulated as the ratio of business performance
to CMMM investment; forexample

Whereas

● ROIT: Observed business value or total return on


investment from IT initiative (CMMM)
● Business performance improvement
● Reduce error rate

● Increase customer/user satisfaction


● Customerretention
● Employee retention
● Increase market share and revenue
● Increase sales from existing customer
● Improve productivity
● And others
● CMMM investment
● Initial capital investment
● Total cost of ownership (TCO) over the life of the investment
(solution)
A Case Study: AML Services Inc.
● AML (A Medical Laboratory Services Inc.) is one of the
medical laboratory service providers for a city with a
population of one million, and AML is a technology-
driven company with 150 employees serving the city and
surround- ing municipalities.
● Although the barrier to entry is high—the field requires a
lot of startup investment for equipment and technologies
(e.g., laboratory testing, X ray, MRI, and information
technologies), as well as highly skilled staff— there is
some competition in this segment of the health care
industry.
Tom Cusack, the CIO of AML, decides to hire a consulting
firm to help him architect the right solution for AML. Potential
discussion questions could be as follows:

175
● Should AML consider cloud computing part of the solution?
● Is AML ready for cloud computing?
● What does “done” look like?
● How can the organization overcome these challenges of change?

ORGANIZATIONAL READINESS SELF-ASSESSMENT: (WHO, WHEN,


WHERE, AND HOW)
● An organizational assessment is a process intending to
seek a better under- standing of the as-is (current) state of
the organization. It also defines the roadmap (strategies
and tactics) required to fill the gap and to get the organiza-
tion moving toward where it wants to go (future state)
from its current state.
● The process implies that the organization needs to
complete the strategy analysis process first and to
formulate the future goals and objectives that support
the future direction of the business organization.

During an effective organization readiness assessment, it is desirable to
achieve the following:
● Articulate and reinforce the reason for change.
● Determine the as-is state.
● Identify the gap (between future and currentstate).
● Anticipate and assess barriers to change.
● Establish action plan to remove barriers.
Involve the right people to enhance buy-in:

● It is critical to involve all the right people (stakeholders)


across the organization, and not just management and
decision-makers, as partici- pants in any organization
assessment.
Asking the “right questions” is also essential. The assessment
shouldprovide insight into your challenges and help determine some
of these key questions:

● How big is the gap?


● Does your organization have the capacity to execute
and implement changes?
176
● How will your employees respond to the changes?
● Are all your employees in your organization ready to adopt
changes that help realize the vision?
● What are the critical barriers to success?
● Are you business partners ready to support the changes?

TABLE for Working Assessment Template


Don’t
Agree Know Disagree
Nontechnical
Does your organization have a good
common un- derstanding of why
business objectives have been met or
missed in the past?
Does your organization have a good
common un- derstanding of why
projects have succeeded or failed in the
past?
Does your organization have a change
champion?
Does your organization perceive change
as unneces- sary disruption to business?
Does your organization view
changes as the man- agement fad of
the day?
Does your organization adopt an
industry standard change
management best practice and
methodology approach?
Does your organization adopt and adapt
learning organization philosophy and
practice?
How familiar is your organization
with service pro- visioning with an
external service provider?

177
Technical
Does your organization implement any
industry management standards?
● ITIL
● COBIT
● ITSM
● others
Does your organization have a well-
established pol- icy to classify and
manage the full lifecycle of all
corporate data?
Can you tell which percentage of your
applications is CPU-intensive, and
which percentage of your appli- cations
is data-intensive?

LEGAL ISSUES IN CLOUD COMPUTING

LEGAL ISSUES
 The legal issues that arise in cloud computing are wide ranging.
Significant issues regarding privacy of data and data security
exist, specifically as they relate to protecting personally identifiable
information of individuals, but also as they relate to protection of
sensitive and potentially confidential business information either
directly accessible through or gleaned from the cloud systems (e.g.,
identification of a company‘s customer by evaluating traffic across
the network).
 Additionally, there are multiple contracting models under which
cloud services may be offered to customers (e.g., licensing, service
agreements, on-line agreements, etc.).
 The appropriate model depends on the nature of the services as well
as the potential sensitivity of the systems being implemented or data
being released into the cloud. In this regard, the risk profile (i.e.,
which party bears the risk of harm in certain foreseeable and other
not-so-foreseeable situations) of the agreement and the cloud

178
provider‘s limits on its liability also require a careful look when
reviewing contracting models.
 Additionally, complex jurisdictional issues may arise due to the
potential for data to reside in disparate or multiple geographies. This
geographical diversity is inherent in cloud service offerings. This
means that both virtualization of and physical locations of servers
storing and processing data may potentially impact what country‘s
law might govern in the event of a data breach or intrusion into
cloud systems.

CLOUD CONTRACTING MODELS

1)Licensing Agreements Versus Services Agreements

2)On-Line Agreements Versus Standard Contracts

3)The Importance of Privacy Policies Terms and Conditions

4) Risk Allocation and Limitations of Liability

Licensing Agreements
A traditional software license agreement is used when a licensor is
providing a copy of software to a licensee for its use (which is usually non-
exclusive). This copy is not being sold or transferred to the licensee, but a
physical copy is being conveyed to the licensee.
The software license is important because it sets forth the terms under
which the software may be used by the licensee. The license protects the
licensor against the inadvertent transfer of ownership of the software to the
person or company that holds the copy. It also provides a mechanism for
the licensor of the software to (among other things) retrieve the copy it
provided to the licensee in the event that the licensee (a) stops complying
with the terms of the license agreement or (b) stops paying the fee the
licensee charges for the license.

Service Agreement.
service agreement, on theother hand, is not designed to protect against the
perils of providing a copy of software to a user. It is primarily designed to
provide the terms under which a service can be accessed or used by a
customer. The service agreement may also set forth quality parameters
179
around which the service will be provided to the users. Since there is no
transfer of possession of a copy of software and the service is controlled by
the company providing it, a service agreement does not necessarily need to
cover infringement risk, nor does it need to set forth the scenarios and
manner in which a copy of software is to be returned to the vendor when a
relationship is terminated.

Value of Using a Service Agreement in Cloud Arrangements.


In each of the three permutations of cloud computing (SaaS, PaaS, and
IaaS), the access to the cloud-based technology is provided as a service to
the cloud user. The control and access points are provided by the cloud
provider. There is no conveyance of software to the cloud user. A service
agreement covers all the basic terms and conditions that provide adequate
protection to the cloud user without committing the cloud provider to risk
and liability attendant with the licensing of the software.

On-Line Agreements Versus Standard Contracts


There are two contracting models under which a cloud provider will grant
access to its services. The first, the on-line agreement, is a click wrap
agreement with which a cloud user will be presented before initially
accessing the service. A click wrap is the agreement the user enters into
when he/she checks an “I Agree” box, or something similar at the initiation
of the service relationship. The agreement is not subject to negotiation and
is generally thought to be a contract of adhesion (i.e., a contract that heavily
restricts one party while leaving the other relatively free). There is complete
inequality in bargaining power in click wrap agreements because there is no
ability to negotiate them. The click wrap is currently the most commonly
used contracting model. The second model, the standard, negotiated,
signature-based contract will have its place as well— over time. As larger
companies move to the cloud (especially the public cloud), or more
mission-critical applications or data move to the cloud, the cloud user
will most likely require the option or a more robust and user-friendly
agreement.

The Importance of Privacy Policies Terms and Conditions


The privacy policy of a cloud provider is an important contractual
document for the cloud user to read and understand. Why? In its privacy
policy the cloud provider will discuss, in some detail, what it is doing (or
not doing, as the case may be) to protect and secure the personal
information of a cloud user and its customers. The cloud user may get a
180
sense of how the cloud provider is complying with various privacy laws by
reviewing the privacy policy.

Risk Allocation and Limitations of Liability

the limitation of liability in an agreement sets forth the maximum amount


the parties will agree to pay one another should there be a reason to bring
some sort of legal claim under the agreement. As a practical matter,
contractual risk (e.g., provision of warranties, assuming liability for third
parties under the provider’s control, covenants to implement certain
industry standards, service level agreements, etc.) is not distributed evenly
between the parties. This is due in part because the performance obligations
primarily fall on the provider. This sets up the traditional thinking that the
contractual risk should follow the party with the most significant
performance obligations. In reality, the cloud provider may have the bulk of
the performance obligations, but may seek to take a “we bear no
responsibility if something goes wrong” posture in its contracts, especially
if those contracts are click wrap agreements.

JURISDICTIONAL ISSUES RAISED BY VIRTUALIZATION


AND DATA LOCATION

Jurisdiction is defined as a court’s authority to judge acts committed in a


certain territory.The geographical location of the data in a cloud computing
environment will have a significant impact on the legal requirements for
protection and handling of the data. This section highlights those issues.

Virtualization and Multi-tenancy


Virtualization. Computer virtualization in its simplest form is where one
physical server simulates being several separate servers. For example, in an
enterprise setting, instead of having a single server dedicated to payroll
systems, another one dedicated to sales support systems, and still a third
dedicated to asset management systems, virtualization allows one server to
handle all of these functions. A single server can simulate being all three.
Each one of these simulated servers is called a virtual machine.

Multi-tenancy. Multi-tenancy refers to the ability of a cloud provider to


deliver software as-a-service solutions to multiple client organizations (or
tenants) from a single, shared instance of the software. The cloud user’s

181
information is virtually, not physically, separated from other users. The
major benefit of this model is cost-effectiveness for the cloud provider.

The Issues Associated with the Flexibility of Data-Location

One of the benefits of cloud computing from the cloud provider’s


perspective is the ability of the cloud provider to move data among its
available data center resources as necessary to maximize the efficiencies of
it overall system. From a technology perspective, this ability to move data
is a reasonably good solution to the problem of under utilized machines.

Data Protection. In fact, in the cloud environment it is possible that the


same data may be stored in multiple locations at the same time. For
example, real time-transaction data may be in one geographic location
while the backup or disaster recovery systems may be elsewhere. It is also
likely that the agreement governing the services says nothing about data
location.

Other Jurisdiction Issues

Confidentiality and Government Access to Data. Each jurisdiction (and


perhaps states or provinces within a jurisdiction) has its own regime to
protect the confidentiality of information. In the cloud environment, given
the potential movement of data among multiple jurisdictions, the data
housed in a jurisdiction is subject to the laws of that jurisdiction, even if its
owner resides elsewhere. Given the inconsistency of confidentiality
protection in various jurisdictions, a cloud user may find that its sensitive
data are not entitled to the protection with which the cloud user may be
familiar, or that to which it contractually agreed.
Subcontracting. A cloud provider’s use of a third-party subcontractor to
carry out its business may also create jurisdictional issues. The existence or
nature of a subcontracting relationship is most likely invisible to the cloud
user. If, in the performance of the services, there was a lapse that was due to
the subcontractor’s performance, the location of the subcontractor or the
data acted on by the subcontractor will be difficult for a cloud user to
ascertain. As a result, the risk associated with the acts of or the locations of
the subcontractor are difficult to measure by the cloud user.

182
International Conflicts of Laws

The body of law known as “conflict of laws” acknowledges that the laws of
different countries may operate in opposition to each other, even as those
laws relate to the same subject matter. In such an event, it is necessary to
decide which country’s law will be applied.
Every nation is sovereign within its own territory. That means that the laws
of that nation affect all property and people within it, including all contracts
made and actions carried out within its borders. When there is either
(1) no statement of the law that governs a contract,
(2) no discussion of the rules regarding conflicts of laws in the agreement,
or
(3) a public policy in the jurisdiction which mandates that the governing
law in the agreement will be ignored, the question of which nation’s law
will apply to the transaction will be decided based on a number of factors
and circumstances surrounding the transaction. This cannot be reduced to a
simple or easy-to-apply rule.

COMMERCIAL AND BUSINESS CONSIDERATIONS—A CLOUD


USER’S VIEWPOINT

As potential cloud users assess whether to utilize cloud computing, there


are several commercial and business considerations that may influence the
decision-making. Many of the considerations presented below may
manifest in the contractual arrangements between the cloud provider and
cloud user.

Minimizing Risk

Maintaining Data Integrity. Data integrity ensures that data at rest are not
subject to corruption. Multi-tenancy is a core technological approach to
creating efficiencies in the cloud, but the technology, if implemented or
maintained improperly, can put a cloud user’s data at risk of corruption,
contamination, or unauthorized access. A cloud user should expect
contractual provisions obligating a cloud provider to protect its data, and
the user ultimately may be entitled to some sort of contract remedy if data
integrity is not maintained.

183
Accessibility and Availability of Data/SLAs.

The service-level agreement (SLA) is the cloud provider’s contractually


agreed-to level of performance for certain aspects of the services. The SLA,
specifically as it relates to availability of services and data, should be high
(i.e., better than 99.7%), with minimal scheduled downtime (scheduled
downtime is outside the SLA). Regardless of the contract terms, the cloud
user should get a clear understanding of the cloud provider’s performance
record regarding accessibility and availability of services and data. A cloud
provider’s long-term viability will be connected to its ability to provide its
customers with almost continual access to their services and data. The
SLAs, along with remedies for failure to meet them (e.g., credits against
fees), are typically in the agreement between the cloud provider and cloud
user.

Disaster Recovery. For the cloud user that has outsourced the processing
of its data to a cloud provider, a relevant question is, What is the cloud
provider’s disaster recovery plan? What happens when the unanticipated,
catastrophic event affects the data center(s) where the cloud services are
being provided? It is important for both parties to have an understanding of
the cloud provider’s disaster recovery plan.

Viability of the Cloud Provider

In light of the wide diversity of companies offering cloud services, from


early stage and startup companies to global, publicly traded companies, the
cloud provider’s ability to survive as business is an important consideration
for the cloud user. A potential cloud user should seek to get some
understanding about the viability of the cloud provider, particularly early-
stage cloud providers.

Protecting a Cloud User’s Access to Its Data

Though the ability for the cloud user to have continual access to the cloud
service is a top consideration, a close second, at least from a business
continuity standpoint, is keeping access to its data. This section introduces
three scenarios that a cloud user should contemplate when placing its data
into the cloud. There are no clear answers in any scenario. The most
conservative or riskaverse cloud user may consider having a plan to keep a

184
copy of its cloud-stored dataset in a location not affiliated with the cloud
provider.

SPECIAL TOPICS

The Cloud Open-Source Movement


In Spring 2009 a group of companies, both technology companies and users
of technology, released the Open Cloud Manifesto [25]. The manifesto’s
basic premise is that cloud computing should be as open like other IT
technologies. The manifesto sets forth five challenges that it suggests must
be overcome before the value of cloud computing can be maximized in the
marketplace. These challenges are (1) security, (2) data and applications
interoperability, (3) data and applications portability, (4) governance and
management, and (5) metering and monitoring. The manifesto suggests that
open standards and transparency are methods to overcome these challenges.
It then suggests that openness will benefit business by providing (a) an
easier experience transitioning to a new provider, (b) the ability for
organizations to work together, (c) speed and ease of integration, and (d) a
more available, cloud-savvy talent pool from which to hire.

Litigation Issues/e-Discovery

From a U.S. law perspective, a significant effort must be made during the
course of litigation to produce electronically stored information (ESI). This
production of ESI is called “e-discovery.” The overall e-discovery process
has three basic components: (1) information management, where a
company decides where and how its information is processed and retained,
(2) identifying, preserving, collecting, and processing ESI once litigation
has been threatened or started, and (3) review, processing, analysis, and
production of the ESI for opposing counsel [26]. The Federal Rules of Civil
Procedure require a party to produce information within its “possession,
custody, or control.”

185
Cloud for Industry,Healthcare & Education

Cloud Computing for Healthcare

Heathcase Ecosystem
• The healthcare ecosystem consists of numerous entities including
healthcare providers (primary care physicians, private health insurance
companies, employers), pharmaceutical, device and medical service
companies, IT solutions and services firms, and patients.
Healthcare Data
• The process of provisioning healthcare involves massive healthcare data
that exists in different forms (structured or unstructured), is stored in
disparate data sources (such as relational databases, file servers, for
instance) and in many different formats.
• The cloud can provide several benefits to all the stakeholders in the
healthcare ecosystem through systems such as
Health Information Management System (HIMS),
Laboratory Information System (LIS), Radiology Information
System (RIS), Pharmacy Information System (PIS), for
instance.

186
Benefits of Cloud for Healthcare

Providers & Hospitals


• With public cloud based EHR systems hospitals don’t need to spend a
significant portion of their budgets on IT infrastructure.
• Public cloud service providers provide on-demand provisioning of
hardware resources with pay-per-use pricing models.
• Thus hospitals using public cloud based EHR systems can save on upfront
capital investments in hardware and data center infrastructure and pay only
for the operating expenses of the cloud resources used.
• Hospitals can access patient data stored in the cloud and share the data
with other hospitals.
Patients
• Patients can provide access to their health history and information stored
in the cloud (using SaaS applications) to hospitals so that the admissions,
care and discharge processes can be streamlined.
• Physicians can upload diagnosis reports (such as pathology reports) to the
cloud so that they can be accessed by doctors remotely for diagnosing the
illness.
• Patients can manage their prescriptions and associated information such as
dosage, amount and frequency, and provide this information to their
healthcare provider.
Payers
• Health payers can increase the effectiveness of their care management
programs by providing value added services and giving access to health
information to members.

Electronic Health Records (EHRs)


• EHRs capture and store information on patient health and provider actions
including individual-level laboratory results, diagnostic, treatment, and
demographic data. EHRs maintain information such as patient visits,
allergies, immunizations, lab reports, prescribed medicines, vital signs, for
instance.
• Though the primary use of EHRs is to maintain all medical data for an
individual patient and to provide efficient access to the stored data at the
point of care, EHRs can be the source for valuable aggregated information
about overall patient populations.

187
• The EHR data can be used for advanced healthcare applications such as
population-level health surveillance, disease detection, outbreak prediction,
public health mapping, similarity-based clinical decision intelligence,
medical prognosis, syndromic diagnosis, visual-analytics investigation, for
instance.
• To exploit the potential to aggregate data for advanced healthcare
applications there is a need for efficiently integrating information from
distributed and heterogeneous healthcare IT systems and analyzing the
integrated information.

Cloud EHRs
Save Infrastructure Costs
• Traditional client-server EHR systems with dedicated hosting require a
team of IT experts to install, configure, test, run, secure and update
hardware and software.
• With cloud-based EHR systems, organizations can save on the upfront
capital investments for setting up the computing infrastructure as well as
the costs of managing the infrastructure.
Data Integration & Interoperability
• Traditional EHR systems use different and often confiicting technical and
semantic standards which leads to data integration and interoperability
problems.
• To address interoperability problems, several electronic health record
(EHR) standards that enable structured clinical content for the purpose of
exchange are currently under development.
• Interoperability of EHR systems will contribute to more effective and
efficient patient care by facilitating the retrieval and processing of clinical
information about a patient from different sites.
Scalability and Performance
• Traditional EHR systems are built on a client-server model with dedicated
hosting that involves a server which is installed within the organization’s
network and multiple clients that access the server. Scaling up such systems
requires additional hardware.
• Cloud computing is a hosting abstraction in which the underlying
computing infrastructure is provisioned on demand and can be scaled up or
down based on the workload.
• Scaling up cloud applications is easier as compared to client-server
applications.

188
Security for Cloud EHRs
Security Concerns for Cloud EHRs
• Security of patient information is one of the biggest obstacles in the
widespread adoption of cloud computing technology for EHR systems due
to the outsourced nature of cloud computing.
Government Regulations
• Government regulations require privacy protection and security of patient
health information.
• In the U.S., organizations called covered entities (CE), that create,
maintain, transmit, use, and disclose an individual’s protected health
information (PHI) are required to meet Health Insurance Portability and
Accountability Act (HIPAA) requirements.
• HIPAA requires covered entities (CE) to assure their customers that the
integrity, confidentiality, and availability of PHI information they collect,
maintain, use, or transmit is protected.
• HIPAA was expanded by the Health Information Technology for
Economic and Clinical Health Act (HITECH), which addresses the privacy
and security concerns associated with the electronic transmission of health
information.

Securing Cloud EHRs


• Cloud-based health IT systems require enhanced security features such as
authorization services, identity management services and authentication
services for providing secure access to healthcare data.

Cloud EHR – Reference Architecture


• The reference architecture shown for a cloud-based EHR system that can
support both primary and secondary use of healthcare data with healthcare
data storage and analytics in the cloud.
• In this architecture, tier-1 consists of web servers and load balancers, tier-
2 consists of application servers and tier-3 consists of a cloud based
distributed batch processing infrastructure such as Hadoop.
• HBase is used for the database layer. HBase is a distributed non-relational
column oriented database that runs on top of HDFS.
• Hive is used to provide a data warehousing infrastructure on top of
Hadoop. Hive allows querying and analyzing data in HDFS/HBase using
the SQL-like Hive Query Language (HQL).

189
• Zookeeper is used to provide a distributed coordination service for
maintaining configuration information, naming, providing distributed
synchronization, and providing group services.

190
Cloud Computing for Energy Systems
• Complex clean energy systems (such as smart grids, power plants, wind turbine farms, for
instance.) have a large number of critical components that must function correctly so that the
systems can perform their operations correctly.
• Energy systems have thousands of sensors that gather real-time maintenance data
continuously for condition monitoring and failure prediction purposes.
• Analyzing massive amounts of maintenance data collected from sensors in energy systems
and equipment can provide predictions for the impending failures (potentially in real-time) so
that their reliability and availability can be improved.

Energy Systems Prognostics with Cloud


• Prognostic real-time health management involves predicting system performance by
analyzing the extent of deviation of a system from its normal operating profiles.
• Cloud-based big sensor data storage and analytics systems are based on distributed storage
systems (such as HDFS) and distributed batch processing frameworks (such as Hadoop).
• Cloud-based distributed batch processing infrastructures process large volumes of data
using

191
inexpensive commodity computers which are connected to work in parallel.
• Such systems are designed to work on commodity hardware which has high probability of
failure using techniques such as replication of file blocks on multiple machines in a cluster.
Collecting Sensor Data in Cloud
• Workflow for aggregating sensor data in a cloud:
• The first step in this workflow is data aggregation. Each incoming data stream is mapped to
a data aggregator.
• Since the raw sensor data comes from a large number of machines in the form of data
streams, the data has to be preprocessed to make the data analysis using cloud-based parallel
processing frameworks (such as Hadoop) more efficient. For example, the Hadoop
MapReduce data processing model works more efficiently with a small number of large files
rather than a large number of small files.
• The data aggregators buffer the streaming data into larger chunks.
• The next step is to filter data and remove bad records in which some sensor readings are
missing.
• The filtered data then compressed and archived to a cloud storage.

Predicting Faults in Energy Systems


• Faults in energy systems have unique signatures such as increase in temperature, increase in
vibration levels, for instance.
• Various machine learning and analysis algorithms can be implemented over the distributed
processing frameworks in the cloud for analyzing the machine sensor data.
• Clustering algorithms can help in fault isolation.
• Case-based reasoning (CBR) is another popular method that has been used for fault
prediction.

192
Case Based Reasoning for Fault Prediction
• Case-based reasoning (CBR) is popular method that has been used for fault prediction.
• CBR finds solutions to new problems based on past experience.
• CBR is an effective technique for problem solving in the fields in which it is hard to
establish a quantitative mathematical model, such as prognostic health management.
• In CBR, the past experience is organized and represented as cases in a case-base.
• The steps involved in CBR are:
• Retrieve: retrieving similar cases from case-base
• Reuse: reusing the information in the retrieved cases
• Revise: revising the solution
• Retain: retaining a new experience into the case-base.

Cloud Computing for Smart Grids


• Smart Grid is a data communications network integrated with the electrical grid that collects
and analyzes data captured in near-real-time about power transmission, distribution, and
consumption.
• Smart Grid technology provides predictive information and recommendations to utilities,
their suppliers, and their customers on how best to manage power.
• Smart Grids collect data regarding electricity generation (centralized or distributed),
consumption (instantaneous or predictive), storage (or conversion of energy into other
forms), distribution and equipment health data.
• Smart grids use high-speed, fully integrated, two-way communication technologies for real-
time information and power exchange.

Cloud Computing for Smart Grids


• Sensing and measurement technologies are used for evaluating the health of equipment and
the integrity of the grid. Power thefts can be prevented using smart metering.
• By analyzing the data on power generation, transmission and consumption smart girds can
improve efficiency throughout the electric system.

193
• Storage collection and analysis of smarts grids data in the cloud can help in dynamic
optimization of system operations, maintenance, and planning.
• Cloud-based monitoring of smart grids data can improve energy usage levels via energy
feedback to users coupled with real-time pricing information and from users with energy
consumption status to reduce energy usage.
• Real- time demand response and management strategies can be used for lowering peak
demand and overall load via appliance control and energy storage mechanisms.
• Condition monitoring data collected from power generation and transmission systems can
help in detecting faults and predicting outages.

Cloud Computing for Transportation Systems


• Modern transportation systems are driven by data collected from multiple sources which is
processed to provide new services to the stakeholders.
• By collecting large amount of data from various sources and processing the data into useful
information, data-driven transportation systems can provide new services such as:
• Advanced route guidance
• Dynamic vehicle routing
• Anticipating customer demands for pickup and delivery problem

Cloud Apps for Transportation Systems


• Fleet Tracking
• Vehicle fleet tracking systems use GPS technology to track the locations of the vehicles in
real-time.
• The vehicle locations and routes data can be aggregated and analyzed in the cloud for
detecting bottlenecks in the supply chain such as traffic congestions on routes, assignments
and generation of alternative routes, and supply chain optimization.
• Route Generation & Scheduling
• Route generation and scheduling systems can generate end-to-end routes using combination
of route patterns and transportation modes and feasible schedules based on the availability of
vehicles.
• As the transportation network grows in size and complexity, the number of possible route
combinations increases exponentially.
• Cloud-based route generation and scheduling systems can provide fast response to the route
generation queries and can be scaled up to serve a large transportation network.

194
• Condition Monitoring Condition monitoring solutions for transportation systems allow
monitoring the conditions inside containers.
• Planning, Operations & Services
• Different transportation solutions (such as fleet tracking, condition monitoring, route
generation, scheduling, cargo operations, fleet maintenance, customer service, order booking,
billing & collection, for instance.) can be moved to the cloud to provide a seamless
integration between order management, tactical planning & execution and customer facing
processes & systems.

Cloud Computing for Manufacturing Industry


• There are two forms of cloud manufacturing:
• One that uses cloud computing technologies for manufacturing
• Other involves service-oriented manufacturing that replicates the cloud computing
environment using physical manufacturing resources (like computing resources in cloud
computing).
• Industrial Control Systems
• Industrial control systems such as supervisory control and data acquisition (SCADA)
systems, distributed control systems (DCS), and other control system configurations such as
Programmable Logic Controllers (PLC) found in the manufacturing industry continuously
generate monitoring and control data.
• Real-time collection, management and analysis of data on production operations generated
by ICS, in the cloud, can help in estimating the state of the systems, improve plant and
personnel safety and thus take appropriate action in real-time to prevent catastrophic failures.

Cloud-based Design & Manufacturing


• Software-as-a-Service (SaaS)
• SaaS service model provides software applications such as customer relationship
management (CRM), enterprise relationship management (ERP), computer aided design
and manufacturing (CAD/CAM) hosted in a computing cloud, through thin clients such as
web browsers.
• Platform-as-a-Service (PaaS)
• PaaS service model allows deployment of applications without the need for buying or
managing the underlying infrastructure. PaaS provides services for developing, integrating
and testing applications in an integrated development environment.
• Infrastructure-as-a-Service (IaaS)
• IaaS provides physical resources such as servers, storage that can be provisioned on
demand.
• Hardware-as-a-Service (HaaS)
• HaaS provides access to machine tools, 3D printers, manufacturing cells, industrial robots,
for instance. HaaS providers can rent hardware to consumers through the CBDM
environment.

195
Cloud Computing for Manufacturing Industry
• Modern transportation systems are driven by data collected from multiple sources which is
processed to provide new services to the stakeholders.
• By collecting large amount of data from various sources and processing the data into useful
information, data-driven transportation systems can provide new services such as:
• Advanced route guidance
• Dynamic vehicle routing
• Anticipating customer demands for pickup and delivery Problem

Cloud Computing for Education


• Online Learning Platforms & Collaboration Tools
• Cloud computing is bringing a transformative impact in the field of education by improving
the reach of quality education to students through the use of online learning platforms and
collaboration tools.
• MOOCs
• MOOCs are aimed for large audiences and use cloud technologies for providing audio/video
content, readings, assignment and exams.

196
• Cloud-based auto-grading applications are used for grading exams and assignments. Cloud-
based applications for peer grading of exams and assignments are also used in some MOOCs.
• Online Programs
• Many universities across the world are using cloud platforms for providing online degree
programs.
• Lectures are delivered through live/recorded video using cloud based content delivery
networks to students across the world.

Cloud Computing for Education


• Online Proctoring
• Online proctoring for distance learning programs is also becoming popular through the use
of cloud-based live video streaming technologies where online proctors observe test takers
remotely through video.
• Virtual Labs
• Access to virtual labs is provided to distance learning students through the cloud. Virtual
labs provide remote access to the same software and applications that are used by students on
campus.
• Course Management Platforms
• Cloud-based course management platforms are used to for sharing reading materials,
providing assignments and releasing grades, for instance.
• Cloud-based collaboration applications such as online forums, can help student discuss
common problems and seek guidance from experts.
• Information Management
• Universities, colleges and schools can use cloud-based information management systems to
improve administrative efficiency, offer online and distance education programs, online
exams, track progress of students, collect feedback from students, for instance.
• Reduce Cost of Education
• Cloud computing thus has the potential of helping in bringing down the cost of education by
increasing the student-teacher ratio
through the use of online learning platforms and new evaluation approaches without
sacrificing quality.

197
PART-A (2-Marks Questions)
1.What are the cloud Cloud Security Challenges.

2.List the are various mechanisms for authentication.

3.Define identity.

4.Define Encryption

5.What are the symmetric encryption algorithms.

6.List various stages in key management.

7.List the advantages of cloud computing in education.

8. What are the risks of storing data in the Cloud?

9. List the cloud computing legal challenges.

10. What are the factors to identify the threats in cloud?

PART-B (10-Mark Questions)


1. Expalin Cloud for Industry, Healthcare & Education
2. Draw and explain CSA Cloud Security Architecture
3. Explain cloud computing for Cloud Computing for Transportation Systems.
4. Explain Cloud Computing for Manufacturing Industry.
5. List the the seven –step model of migration into a cloud and explain.
6. List and explain change management maturity models.
7. Explain contracting models.
8. Explain commercial and business considerations Special Topics

198

You might also like