Cloud Computing 1
Cloud Computing 1
UNIT - II
UNIT – III
UNIT–IV
PYTHON FOR CLOUD : Introduction- Installing Python-Data types & Data Structures-
Control Flow- Functions- Modules- Packages- FileHandling- Date/Time Operations –
Classes- Python for Cloud: Amazon Web Services –Google Cloud Platform - Windows
Azure –Map Reduced – Packages of Interest–Designing a RESTful Web API.
UNIT – V
BIG DATA ANALYTICS, MULTIMEDIA CLOUD & CLOUD SECURITY: Big Data
Analytics: Clustering big data - Classification of Big Data – Recommendation systems.
Multimedia Cloud: Case Study: Live Video Stream App - Streaming Protocols – Case Study:
Video Transcoding App-Cloud Security: CSA Cloud Security Architecture – Authentication -
Authorization - Identity and Access management - Data Security - Key Management-
Auditing- Cloud for Industry, Healthcare & Education.
TEXT BOOKS
REFERENCES
OUTCOMES:
Cloud computing is the delivery of different services through the Internet. These resources include tools
and applications like data storage, servers, databases, networking, and software.
Rather than keeping files on a proprietary hard drive or local storage device, cloud-based storage makes it
possible to save them to a remote database. As long as an electronic device has access to the web, it has
access to the data and the software programs to run it.
Cloud computing is a popular option for people and businesses for a number of reasons including cost
savings, increased productivity, speed and efficiency, performance, and security.
There are four key characteristics of cloud computing. They are shown in the following
diagram:
Rapid Elasticity
It is very easy to scale the resources vertically or horizontally at any time. Scaling of
resources means the ability of resources to deal with increasing or decreasing demand.
The resources being used by customers at any given point of time are automatically monitored.
Measured Service
In this service cloud provider controls and monitors all the aspects of cloud service. Resource
optimization, billing, and capacity planning etc.
BENEFIT - CHALLENGES
Cost Savings: You gain access to powerful tools without the cost of purchasing hardware,
software, or licenses. Everything is on the cloud.
Accessibility: You can access anything you stored in the cloud through all your web-enabled
devices.
Reliability: Given an array of available servers, services can be easily moved to other servers in
case of a failure.
Scalability: If a company needs to expand, cloud service providers can upscale to suit your needs.
This lessens the need for new equipment.
Efficiency: Multiple users can work together and communicate over the cloud. This makes
operations more smooth-sailing.
Agility: Given all the advantages of cloud computing, app development, testing, and deployment
are accelerated.
Security: Given that they cater to a huge number of clients, cloud service providers comply to
high-level security protocols. These security measures cost a fortune to implement — another area
where your business could cut down some costs.
Downtime: Handling a lot of clients may overwhelm cloud service providers. There are also power
outages and connectivity interruptions. All these entail the suspension of your processes.
Vendor Lock-in: Currently, migration from one provider to another is rough. There are
interoperability, compatibility, and support issues surrounding migration. This might end up with
your business getting stuck with your provider when you need to make a switch.
Opaqueness: Cloud providers do not immediately inform customers about a security breach. This
is something not every enterprise can tolerate.
Security: While there are high standards when it comes to security, cloud computing services are
still vulnerable to security risks. Given that you’re using these services to handle important business
data, you cannot afford such risk. This matter won’t ever really go away, providers should just
ensure that the risks are under control.
Cloud Computing – Distributed Systems
The most rapidly growing type of computing is cloud computing. Cloud computing has been described as a
metaphor for the Internet, since the Internet is often drawn as a cloud in network diagrams. Using cloud
computing, organizations and individual users can use Web services, database services, and application
services
over the Internet, without having to invest in corporate or personal hardware, software, or software tools.
Figure illustrated below depicts the exchanges between client computers and services in the cloud.
Businesses use Web browsers such as Microsoft Internet Explorer or Mozilla Firefox to access
applications. As you can see, servers store software and data for businesses.
Many large, well-established hardware, software, and consulting companies such as Cisco, Dell, IBM, HP,
Microsoft, SAP, and others are creating massive cloud computing endeavors, often with what are termed
“virtualized resources.” What is distinct about these approaches is their ability to grow and adapt to
changing business needs. That is, they are scalable to suit growing (or changing) demand by users.
The model of “software as a service,” also called SaaS, is included in the concept of cloud computing.
VIRTUALIZATION
Virtualization in Cloud Computing is nothing but a creation of virtual resource such as the desktop
operating system, physical storage into virtual form. Virtualizations can also manage workload by
transforming traditional computing, to make it more scalable.
We can apply it to a wide range of system layers, including hardware level virtualizations, server
virtualizations, storage virtualization and operating system level virtualization. One of the major uses of
virtualization is to provide the application with a standard version of the cloud clients.
Hardware virtualization
Hardware virtualization is the extraction of computing resources from the software. The hardware
virtualization installs the hypervisor that gives rise to an abstraction layer which lies between the software
and the hardware.
The software relies on virtual representations of the computing components when the hypervisor will place.
Operating System virtualizations are just a use of software which allows the hardware of a system to run
multiple operating systems concurrently. This further provides the benefit to run multiple applications
requiring a different operating system on a single computer.
Server virtualization
Server virtualization is nothing, but masking of server resources with number and identity of the individual
physical servers. In server virtualization, a single server is divided, which can work into multiple isolated
virtual environments.
Storage Virtualization
Storage virtualization is the division of physical storage from multiple storage devices which finally
appears as a single storage device. We can use storage virtualization to integrate other resources and data
centres into one logical view.
Storage virtualization is an old technique, which is used to resolve many challenges in scaling and
managing a large amount of storage. Moreover, virtualization improves scalability, tendency, performance,
and economics.
Service Oriented Architecture (SOA) is a group of services, which can transfer data within each other. This
data transfer can be either simple data processing or some activities. The Service Oriented Architecture
does not depend on other services.
The Service-Oriented Architecture is also known as application structure, which divides the business
application into particular business procedures and functions. The service-oriented architecture places the
commitment and expenses of deployment.
It also energizes incorporated appropriation and fragmentation reuse. Due to this, we can reduce the cost of
programming advancements and conveyance.
The figure shows, the service customer sends the message to the service provider. After this, the service
provider returns the request to the service consumer. This communication is understandable by both the
customer and the provider. Here, the service provider can also be a service consumer.
Don’t confuse with Grid computing with Cloud Computing. Grid Computing is a distributed computing
resource to accomplish a common goal.
It is connected by parallel nodes that form a computer cluster and runs on an operating system. It is a
distributed system with non-interactive workloads including a large number of files.
The cluster can be either a small work station or a large network. Its common uses are ATM banking, back-
end infrastructure, and scientific marketing research.
We can make grid computing with the help of the application, which can use for computation problem.
That are connected in a parallel networking environment.
Cloud Computing Technology – Grid Computing
It combines computers, which gathers information and forms a single application, which is computation-
intensive.
Utility computing is a service provisioning model. This service provisioning modeling provides computing
resources and infrastructure management to the customer as per their demand.
The customers are charged for them as you go basis without any upfront cost. The utility model maximizes
the efficient use of resources while minimizing the associated cost.
Utility computing has an advantage that there will be a low initial cost to acquire computer resources.
The customer can access the infinite amount of computing solution with the help of the internet or a virtual
private network. A provider can perform, the backend infrastructure and computing resources management.
Cloud Computing Technology – Utility Computing
So, this was all about Cloud Computing Technology. Hope you liked our explanation.
Utility computing is a service provisioning model that offers computing resources such as hardware,
software, and network bandwidth to clients as and when they require them on an on-demand basis. The
service provider charges only as per the consumption of the services, rather than a fixed charge or a flat
rate.
Utility computing is a subset of cloud computing, allowing users to scale up and down based on their
needs. Clients, users, or businesses acquire amenities such as data storage space, computing capabilities,
applications services, virtual servers, or even hardware rentals such as CPUs, monitors, and input devices.
The utility computing model is based on conventional utilities and originates from the process of making
IT resources as easily available as traditional public utilities such as electricity, gas, water, and telephone
services. For example, a consumer pays his electricity bill as per the number of units consumed, nothing
more and nothing less. Similarly, utility computing works on the same concept, which is a pay-per-use
model.
The service provider owns and manages the computing solutions and infrastructure, and the client
subscribes to the same and is charged in a metered manner without any upfront cost. The concept of utility
computing is simple—it provides processing power when you need it, where you need it, and at the cost of
how much you use it.
INTRODUCTION
Adoption of cloud computing and cloud technologies is increasing day by day. This article on Cloud
Computing Environment 101 is intended for beginners to understand the basic principles, concepts and
terms of Cloud Computing.
Cloud Computing
Cloud refers to a set of hosted servers along with the applications that run on it and which are accessible
over the internet.
Cloud computing is a method of providing computing services to customers using cloud. These services
include infrastructure, platforms and/or software. For example: Customers can subscribe to computing
services like servers, applications, tools, storage, databases, etc. in the cloud and can use them from
anywhere, at any time over internet.
By using cloud computing, users and companies don’t have to manage servers and applications by
themselves on their own physical machines.
On-Premise Computing
In on-premise computing, servers and applications are hosted and managed in-house (own data
center). Application implementation, network configuration, integration, deployment, security setup,
backup, maintenance, etc. are all managed by the internal team. There is no third party involvement and the
entire ownership is with you. Advantage is that, you have the complete data visibility and control. On-
premise data centers will normally have single tenancy. Scale up needs may require you to
make additional upfront investments in hardware and software. For achieving high availability, necessary
redundant infrastructure should be setup as well, which can further increases costs.
Virtualization (VM)
Cloud computing is made possible by using virtualization technology. Virtualization allows the creation of
simulated virtual computers that behave like physical computers. Such computers are called Virtual
Machines (VM). Though multiple VMs can be created on one physical server, VMs work as isolated
independent machines and their files and other resources are not visible to one other. Virtualization allows
more efficient use of the hardware by running multiple VMs in the same hardware and serving multiple
customers or customer applications at the same time. This in turn helps to reduce the cost of computing.
Container
Containers are packages of software which can be developed, shipped and deployed easily. It packages up
code and all its dependencies so that the applications can be deployed and run quickly and reliably on any
computing environment.
A container image is a light weight, standalone executable package of software that includes everything
(code, runtime, system tools, libraries and settings) needed to run an application. Containerized software
will run the same, regardless of the infrastructure. Containers isolates the software from its environments
and ensures that it runs uniformly in different environments like development, QA, staging and production.
Container Vs Virtual Machines: Containers are an abstraction at the application level that packages code
and dependencies together. But, VMs are an abstraction of the physical hardware turning one server into
many servers. Containers take up less space than VMs, can handle more applications and require fewer
VMs and Operating systems.
Docker
Docker is an open source technology launched in 2013 for building applications based on containers.
Originally built on Linux, Docker now runs on Windows and MacOS too. Docker engine is the run-time,
which creates and runs the Docker containers. Containerd is an industry-standard container runtime that
leverages runc and was created with an emphasis on simplicity, robustness and portability.
Kubernetes (K8s)
Containers are a good option to bundle applications and run it in different environments. But, one need to
manage these containers manually to ensure that there is no downtime (in case if one container goes down
another one should be started), scalability is not affected (containers should be started or stopped to scale
up/down), etc. Kubernetes provides a framework to manage these features and run distributed systems
resiliently.
Kubernetes, also referred as K8s, is aportable, extensible, open source system for automating deployment,
scaling, and management of containerised applications. Kubernetes supports features like, automated
rollouts and rollbacks, service discovery and load balancing, storage orchestration, secret and configuration
management, batch execution, horizontal scaling, automating bin packing, self-healing, etc.
Cloud computing providers (service providers) are companies which provide cloud computing as a service
over the internet. Their services include, servers, virtual machines, applications, application development
platforms, storage, databases, networking, etc. Customers (individuals or companies) can subscribe to such
cloud computing services with varying pricing options to choose from. Some of the prominent cloud
service providers are,
Google Cloud
Amazon Web Services (AWS)
Microsoft Azure
IBM Cloud
Alibaba Cloud
Rackspace
GoDaddy
VMWare
Oracle Cloud
Digital Ocean
Relative merits of cloud computing are many, including cost savings, reliability, availability, scalability,
mobility, faster time to market and increased productivity.
Cost savings: are mostly in terms of savings in capital cost and maintenance cost. i.e. No need to
buy any costly hardware/software and less people to manage it. Billing for cloud computing
services are usually based on usage (i.e. pay per use).
Reliability: Internally, cloud providers store data and applications in multiple locations for
redundancy and to prevent any data loss.
Availability: Cloud services normally assure 99.999 availability for the applications/services
hosted with them. Which means customer services can be made available 24/7 to its users.
Mobility: Cloud computing allows users from different part of the world to connect and work, even
if they are moving.
Increased productivity: Without spending much time or money on building infrastructure, new
products can be quickly developed and deployed using cloud and cloud tools.
Faster time to market: Newer products or newer version of the same product can be developed
and delivered to your customers very quickly and easily, beating competitors.
Speed and efficiency: Higher computing power. No processor or memory limitations. On demand
allocations for resources.
Scalability: Based on application or user requirements, server or computing capacity can be scaled
up or down on demand. Cloud offers unlimited storage too. Cloud provides tremendous flexibility
and agility in rapidly scaling your resources up or down on an “as needed” basis.
Easier backup and restore: You can easily setup backup sites and restore operations in a cloud
environment.
Integration: It is easier to integrate one application with other cloud software or services and build
newer applications quickly and easily.
Security: Cloud provide a wide range of advanced online security features and best practices to
ensure data and application security.
Data security is a core concern with many enterprises like fin-tech and med-tech companies when
they have to store their data on cloud.
Sometimes, performance of services provided varies from vendor to vendor affective customer
service.
Monthly bills may vary widely giving you surprises, if you don’t configure and use the services
properly.
Lack of cloud expertise/resource availability is still a concern with many companies.
Internet connectivity and band width related issues are another concern while using cloud services.
CLOUD COMPUTING PLATFORMS AND TECHNOLOGIES
Cloud computing applications develops by leveraging platforms and frameworks. Various types of services
are provided from the bare metal infrastructure to customize-able applications serving specific purposes.
AmazonWebServices(AWS) –
AWS provides different wide-ranging clouds IaaS services, which ranges from virtual compute, storage,
and networking to complete computing stacks. AWS is well known for its storage and compute on demand
services, named as Elastic Compute Cloud (EC2) and Simple Storage Service (S3). EC2 offers
customizable virtual hardware to the end user which can be utilize as the base infrastructure for deploying
computing systems on the cloud. It is likely to choose from a large variety of virtual hardware
configurations including GPU and cluster instances. Either the AWS console, which is a wide-ranged Web
portal for retrieving AWS services, or the web services API available for several programming language is
used to deploy the EC2 instances. EC2 also offers the capability of saving an explicit running instance as
image, thus allowing users to create their own templates for deploying system. S3 stores these templates
and delivers persistent storage on demand. S3 is well ordered into buckets which contains objects that are
stored in binary form and can be grow with attributes. End users can store objects of any size, from basic
file to full disk images and have them retrieval from anywhere. In addition, EC2 and S3, a wide range of
services can be leveraged to build virtual computing system including: networking support, caching
system, DNS, database support, and others.
GoogleAppEngine –
Google AppEngine is a scalable runtime environment frequently dedicated to executing web applications.
These utilize benefits of the large computing infrastructure of Google to dynamically scale as per the
demand. AppEngine offers both a secure execution environment and a collection of which simplifies the
development if scalable and high-performance Web applications. These services include: in-memory
caching, scalable data store, job queues, messaging, and corn tasks. Developers and Engineers can build
and test applications on their own systems by using the AppEngine SDK, which replicates the production
runtime environment, and helps test and profile applications. On completion of development, Developers
can easily move their applications to AppEngine, set quotas to containing the cost generated, and make it
available to the world. Currently, the supported programming languages are Python, Java, and Go.
MicrosoftAzure –
Microsoft Azure is a Cloud operating system and a platform in which user can develop the applications in
the cloud. Generally, a scalable runtime environment for web applications and distributed applications is
provided. Application in Azure are organized around the fact of roles, which identify a distribution unit for
applications and express the application’s logic. Azure provides a set of additional services that
complement application execution such as support for storage, networking, caching, content delivery, and
others.
Hadoop –
Apache Hadoop is an open source framework that is appropriate for processing large data sets on
commodity hardware. Hadoop is an implementation of MapReduce, an application programming model
which is developed by Google. This model provides two fundamental operations for data processing: map
and reduce. Yahoo! Is the sponsor of the Apache Hadoop project, and has put considerable effort in
transforming the project to an enterprise-ready cloud computing platform for data processing. Hadoop is an
integral part of the Yahoo! Cloud infrastructure and it supports many business processes of the corporates.
Currently, Yahoo! Manges the world’s largest Hadoop cluster, which is also available to academic
institutions.
Force.comandSalesforce.com –
Force.com is a Cloud computing platform at which user can develop social enterprise applications. The
platform is the basis of SalesForce.com – a Software-as-a-Service solution for customer relationship
management. Force.com allows creating applications by composing ready-to-use blocks: a complete set of
components supporting all the activities of an enterprise are available. From the design of the data layout to
the definition of business rules and user interface is provided by Force.com as a support. This platform is
completely hostel in the Cloud, and provides complete access to its functionalities, and those implemented
in the hosted applications through Web services technologies.
CLOUD MODELS
There are certain services and models working behind the scene making the cloud
computing feasible and accessible to end users. Following are the working models for cloud
computing:
Service Models
Deployment Models
SERVICE MODELS
Deployment models
As the cloud technology is providing users with so many benefits, these benefits must
have to be categorized based on users requirement. Cloud deployment model represents the exact
category of cloud environment based on proprietorship, size, and access and also describes the
nature and purpose of the cloud. Most organizations implement the cloud infrastructure to
minimize capital expenditure & regulate operating costs.
Public Cloud
Community Cloud
Hybrid Cloud
Private Cloud
Public cloud
Public cloud or external cloud describes cloud computing in the traditional mainstream
sense, whereby resources are dynamically provisioned on a fine-grained, selfservice basis
over the Internet, via web applications/web services from an off-site third-party provider
who bills on a fine-grained utility computing basis.
The cloud infrastructure is made available to the general public or a large industry group,
and is owned by an organization selling cloud services. Examples: Amazon Elastic-
Compute-Cloud, IBM's BlueCloud, Sun Cloud, Google AppEngine.
Community cloud
A community cloud may be established where several organizations have similar
requirements and seek to share infrastructure so as to realize some of the benefits of cloud
computing. With the costs spread over fewer users than a public cloud (but more than a single
tenant) this option is more expensive but may offer a higher level of privacy, security and/or
policy compliance. Examples of community cloud include Google‘s "Gov Cloud".
Hybrid cloud
The term "Hybrid Cloud" has been used to mean either two separate clouds joined
together (public, private, internal or external), or a combination of virtualized cloud
server instances used together with real physical hardware.
The most correct definition of the term "Hybrid Cloud" is probably the use of physical
hardware and virtualized cloud server instances together to provide a single common
service. Two clouds that have been joined together are more correctly called a "combined
cloud".
A hybrid storage cloud uses a combination of public and private storage clouds. Hybrid
storage clouds are often useful for archiving and backup functions, allowing local data to
be replicated to a public cloud.
Private cloud
A private cloud is a particular model of cloud computing that involves a distinct and secure
cloud based environment in which only the specified client can operate. As with other cloud
models, private clouds will provide computing power as a service within a virtualized
environment using an underlying pool of physical computing resource.
However, under the private cloud model, the cloud (the pool of resource) is only accessible
by a single organization providing that organization with greater control and privacy.
The possible dependencies between CaaS, SaaS, PaaS & IaaS is as follows:
CLOUD SERVICES EXAMPLES
1. Scalable Usage
Cloud computing offers scalable resources through various subscription models. This
means that you will only need to pay for the computing resources you use. This helps in
managing spikes in demands without the need to permanently invest in computer hardware.
Netflix, for instance, leverages this potential of cloud computing to its advantage. Due to
its on-demand streaming service, it faces large surges in server load at peak times. The move to
migrate from in-house data centres to cloud allowed the company to significantly expand its
customer base without having to invest in setup and maintenance of costly infrastructure.
2. Chatbots
The expanded computing power and capacity of the cloud enables us to store information
about user preferences. This can be used to provide cus\
tomized solutions, messages and products based on the behaviour and preferences of users.
Siri, Alexa and Google Assistant - all are cloud-based natural-language intelligent bots.
These chatbots leverage the computing capabilities of the cloud to provide personalized context-
relevant customer experiences. The next time you say, ―Hey Siri!‖ remember that there is a
cloud-based AI solution behind it.
3. Communication
The cloud allows users to enjoy network-based access to communication tools like
emails and calendars. Most of the messaging and calling apps like Skype and WhatsApp are
also based on cloud infrastructure. All your messages and information are stored on the service
provider‘s hardware rather than on your personal device. This allows you access your
information from anywhere via the internet.
4. Productivity
Office tools like Microsoft Office 365 and Google Docs use cloud computing, allowing
you to use your most-productive tools over the internet. You can work on your documents,
presentations and spreadsheets - from anywhere, at any time. With your data stored in the cloud,
you don‘t need to bother about data loss in case your device is stolen, lost or damaged. Cloud
also helps in sharing of documents and enables different individuals to work on the same
document at the same time.
5. Business Process
Many business management applications like customer relationship management (CRM)
and enterprise resource planning (ERP) are also based on a cloud service provider. Software as a
Service (SAAS) has become a popular method for deploying enterprise level software.
Salesforce, Hubspot, Marketo etc. are popular examples of this model. This method is
cost-effective and efficient for both the service provider and customers. It ensures hassle free
management, maintenance and security of your organization‘s critical business resources and
allows you to access these applications conveniently via a web browser.
6. Backup and recovery
When you choose cloud for data storage the responsibility of your information also lies
with your service provider. This saves you from the capital outlay for building infrastructure and
maintenance. Your cloud service provider is responsible for securing data and meeting legal and
compliance requirements.
The cloud also provides more flexibility in the sense that you can enjoy large storage and
on-demand backups. Recovery is also performed faster in the cloud because the data is stored
over a network of physical servers rather than at one on-site data centre. Dropbox, Google Drive
and Amazon S3 are popular examples of cloud backup solutions.
7. Application development
Whether you are developing an application for web or mobile or even games, cloud
platforms prove to be a reliable solution. Using cloud, you can easily create scalable cross-
platform experiences for your users. These platforms include many pre-coded tools and libraries
— like directory services, search and security. This can speed up and simplify the development
process. Amazon Lumberyard is a popular mobile game development tool used in the cloud.
Cloud computing enables data scientists to tap into any organizational data to analyze it
for patterns and insights, find correlations make predictions, forecast future crisis and help in
data backed decision making. Cloud services make mining massive amounts of data possible by
providing higher processing power and sophisticated tools.
There are many open source big data tools that are based on the cloud for instance
Hadoop, Cassandra, HPCC etc. Without the cloud, it won‘t be very difficult to collect and
analyze data in real time, especially for small companies.
A cloud application, or cloud app, is a software program where cloud-based and local
components work together. This model relies on remote servers for processing logic that is
accessed through a web browser with a continual internet connection.
Cloud application servers typically are located in a remote data center operated by a
third-party cloud services infrastructure provider. Cloud-based application tasks may encompass
email, file storage and sharing, order entry, inventory management, word processing, customer
relationship management (CRM), data collection, or financial accounting features.
Instant scalability
As demand rises or falls, available capacity can be adjusted.
API use
Third-party data sources and storage services can be accessed with an application
programming interface (API). Cloud applications can be kept smaller by using APIs to hand
data to applications or API-based back-end services for processing or analytics computations,
with the results handed back to the cloud application. Vetted APIs impose passive
consistency that can speed development and yield predictable results.
Gradual adoption.
Refactoring legacy, on-premises applications to a cloud architecture in steps, allows
components to be implemented on a gradual basis.
Reduced costs.
The size and scale of data centers run by major cloud infrastructure and service providers,
along with competition among providers, has led to lower prices. Cloud-based applications
can be less expensive to operate and maintain than equivalents on-premises installation.
Improved data sharing and security.
Data stored on cloud services is instantly available to authorized users. Due to their
massive scale, cloud providers can hire world-class security experts and implement
infrastructure security measures that typically only large enterprises can obtain. Centralized
data managed by IT operations personnel is more easily backed up on a regular schedule and
restored should disaster recovery become necessary.
Testing cloud applications prior to deployment is essential to ensure security and optimal
performance.
A cloud application must consider internet communications with numerous clouds and a
likelihood of accessing data from multiple sources simultaneously. Using API calls, a cloud
application may rely on other cloud services for specialized processing. Automated testing can
help in this multicloud, multisource and multiprovider ecosystem.
The maturation of container and microservices technologies has introduced additional
layers of testing and potential points of failure and communication. While containers can
simplify application development and provide portability, a proliferation of containers introduces
additional complexity.
Containers must be managed, cataloged and secured, with each tested for its own
performance, security and accuracy. Similarly, as legacy monolithic applications that perform
numerous, disparate tasks are refactored into many single-task microservices that must
interoperate seamlessly and efficiently, test scripts and processes grow correspondingly complex
and time-consuming.
Cloud Based Services
Cloud computing is the the use of various services, such as software development
platforms, servers, storage and software, over the internet, often referred to as the ―cloud‖.
Companies offering these computing services are called cloud providers and typically charge for
cloud computing services based on usage.
1. Software As A Service
Software-as-a-Service (SaaS) is a way of delivering services and applications over the
Internet. Instead of installing and maintaining software, we simply access it via the Internet,
freeing ourselves from the complex software and hardware management.It removes the need to
install and run applications on our own computers or in the data centers eliminating the expenses
ofhardware as well as software maintenance. SaaS provides a complete software solution which
you purchase on a pay-as-you-go basis from a cloud service provider.Most SaaS applications
can be run directly from a web browser without any downloads or installations required.The
SaaS applications are sometimes called Web-based software, on-demand software, or hosted
software.
Advantages of SaaS
1. Cost Effective :
Pay only for what you use.
2. Reduced time :
Users can run most SaaS apps directly from their web browser without needing to
download and install any software.This reduces the time spent in installation and configuration,
and can reduce the issues that can get in the way of the software deployment.
3. Accessibility :
We can Access app data from anywhere.
4. Automatic updates :
Rather than purchasing new software, customers rely on a SaaS provider to automatically
perform the updates.
5. Scalability :
It allows the users to access the services and features on demand.
The various companies providing software as a service are Cloud9 Analytics, Salesforce.com,
Cloud Switch, Microsoft Office 365, Eloqua, dropBox and Cloud Tran .
2. Platform As A Service
PaaS is a category of cloud computing that provides a platform and environment to allow
developers to build applications and services over the internet.
PaaS services are hosted in the cloud and accessed by users simply via their web browser.
A PaaS provider hosts the hardware and software on its own infrastructure.
As a result, PaaS frees users from having to install in-house hardware and software to
develop or run a new application.Thus, the development and deployment of the
application takes place independent of the hardware.
The consumer does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, or storage, but has control over the deployed
applications and possibly configuration settings for the application-hosting environment.
Advantages of PaaS:
1. Simple and convenient for users :
It provides much of the infrastructure and other IT services, which users can access
anywhere via a web browser.
2. Cost Effective :
It charges for the services provided on a per-use basis thus eliminating the expenses one
may have for on-premises hardware and software.
3. Efficiently managing the lifecycle :
It is designed to support the complete web application lifecycle: building, testing,
deploying, managing and updating.
4. Efficiency :
It allows for higher-level programming with reduced complexity thus, the overall
development of the application can be more effective
The various companies providing Platform as a service are Amazon Web services, Salesforce,
Windows Azure, Google App Engine, cloud Bess and IBM smart cloud.
3. Infrastructure As A Service
Infrastructure as a service (IaaS) is a service model that delivers computer infrastructure
on an outsourced basis to support various operations.
Typically IaaS is a service where infrastructure is provided as an outsource to enterprises
such as networking equipments, devices, database and web servers.
Infrastructure as a service (IaaS) is also known as Hardware as a service (HaaS).
IaaS customers pay on a per-use basis, typically by the hour, week or month. Some
providers also charge customers based on the amount of virtual machine space they use.
It simply provides the underlying operating systems, security, networking, and servers for
developing such applications, services, and for deploying development tools, databases, etc.
Advantages of IaaS :
1. Cost Effective :
Eliminates capital expense and reduces ongoing cost and IaaS customers pay on a per use
basis, typically by the hour, week or month.
2. Website hosting :
Running websites using IaaS can be less expensive than traditional web hosting.
3. Security :
The IaaS Cloud Provider may provide better security than your existing software.
4. Maintainence :
There is no need to manage the underlying data center or the introduction of new releases
of the development or underlying software. This is all handled by the IaaS Cloud Provider.
The various companies providing Infrastructure as a service are Amazon web services,
Bluestack, IBM, Openstack, Rackspace and Vmware.
This provides lots of latitude for implementing cloud-based IT solutions and ensures there
will be lots of competition among suppliers for both the underlying resources and the provision
of services.
5. Cloud-based systems
Fundamental to cloud computing is the idea that what is delivered to the customer is
services, not systems consisting of dedicated hardware and software. Under the cloud computing
―covers‖ there may be many components that are shared by many customers.
This includes security services, administrative services, ecosystem services and performance
services. The vision is to make cloud IT pervasive and to achieve both the digital economy (for
business) and the digital society (for the public) leading to the ―Digitally Interconnected
Society‖.
8. Overarching concerns
There are a number of overarching considerations that are generally applicable to any cloud
computing deployment and which have a major impact on the success of any cloud-based
system. For example:
Governance and management: auditability, governance, regulatory, privacy, security and
service levels (SLAs);
Qualities: availability, performance and resiliency;
Capabilities: interoperability, maintainability, portability and reversibility
9. Cloud-specific risks
As with any new technology, there are business risks associated with cloud computing, both
for providers and customers. While the most visible of these has so far been security, there are
other important things to keep in mind, including:
Supplier quality, longevity and lock-in
Available expertise – technical, organizational and operational
Adaptation of business processes to available services
Financial management including changes in purchasing and variable bills
Exploitation and innovation
Sum total is that there are many things to consider as you prepare to include cloud computing in
your IT solutions. These vary according to the role you will play, the services that are being
used and the maturity of your IT organization. As part of developing your policies and roadmaps
for cloud computing, I recommend creating a centre of cloud computing excellence to kick start
your journey.
Cloud technology
Cloud Computing Technologies (CCT), as a cloud systems integrator and cloud service
provider, CCT specializes in cloud systems aggregation, cross-cloud platform integration,
application API integration, software development, and management of your cloud-
ecosystem. Our professional cloud services include cloud systems design and
implementation (private cloud, public cloud, or hybrid cloud), migration to shared services
and on-premises private cloud infrastructures.
As your single point of contact for cloud integration, we explain third-party cloud service
level agreements, pricing models, and contracts as your trusted adviser.
Organizations that seek do-it-yourself cloud shared services solutions, CCT, offers secure,
scalable, and on-demand cloud service through our enterprise level cloud partners, Amazon
Web Services Platform-as-a-Service (Paas).
At all Cloud Computing Technologies services levels, we are proud of our track record of
delivering high-impact public cloud service with excellent customer satisfaction.
Our mission is ―To provide high-quality Cloud Computing Shared Services Solutions to
accomplish our clients‘ business goals and develop long-term relationships.
Our commitment is to continuous improvement of shared services, deliverables, and
competitive pricing with current and emerging cloud computing technology.‖
Para-virtualization
Utilizes the use of a virtual machine monitor, which is software that allows a single
physical machine to support multiple virtual machines . It allows multiple virtual machines to
run on one host and each instance of a guest program is executed independently on their own
virtual machine.
Isolation
Is similar to Para virtualization although it only allows virtualization of the same
operating system as the host and only supports Linux systems but it is considered to perform the
best and operate the most efficiently.
As more businesses are starting to move to the cloud, they should be aware of the many
challenges that the technology is currently experiencing, its important that they are prepared to
encounter some of these challenges during their migration towards cloud technologies. Cloud
computing has been around for many years and has always been clouded in ambiguity as to what
the technology was and many individuals would provide their own interpretations and opinions
in defining various cloud delivery models.
This is very much to do with lack of standards and a clear definition of each aspect of
what cloud technology is and how it actually functions. Many cloud computing providers
admitted they consider standards as the first step to commoditization, something they would
rather not see this early in the emerging market . So the lack of standards is partially to do with
many cloud providers not wanting them defined yet, which is certainly going to cause more
ambiguity and possibly slow down adoption of cloud technologies.
Cloud consumers do not have control over the underlying computing resources, they do
need to ensure the quality, availability, reliability, and performance of these resources when
consumers have migrated their core business functions onto their entrusted cloud [13]. Cloud
service providers need to be transparent and responsible for the services they provide for their
consumers to create consumer confidence in their services.
Consumer confidence can be achieved through a mutual agreement commonly referred to
as a Service Level Agreement (SLA). By migrating to the cloud service provider‘s infrastructure
means that they have a large responsibility for the consumers‘ data and services to be maintained
and made available with the specification outlined in the SLA.
A broker could lose $4 million in revenues per millisecond if their electronic trading
platform is 5 milliseconds behind the competition [14], which is another reason why the
consumer must be confident that their cloud provider can and will deliver a high-quality
infrastructure. Data governance is a large issue for consumers because when they migrate their
systems to the CSP‘s infrastructure they lose control of their own data and they rely solely on the
CSP‘s ability to make the data available and to make it secure in the process.
Where the CSP‘s data centers are physically located can also make a large difference in
terms of security and confidentiality as the US Patriot Act grants government and other agencies
with virtually limitless powers to access information including that belonging to companies
whereas in the EU this type of data would be much more secure it is important consumers take
this into account when selecting a cloud service provider.
Future of the technology
As cloud computing is a relatively new delivery model it‘s future is not fully known but
seeing as its popularity and excitement around the technology is constantly growing, it‘s safe to
say that cloud computing is here to stay.
Short-term forecasts predict that in 2012 80% of new commercial enterprise apps will be
deployed on cloud platforms , which illustrates that cloud adoption is set to rise exponentially
this year.
In the long-term technology experts and stakeholders say they expect they will ‗live
mostly in the cloud‘ in 2020 and not on the desktop, working mostly through cyberspace- based
applications accessed through networked devices .
The many stakeholders and enthusiasts of the technology see it as the next step in
computing, with many businesses and individual users in the future using cloud technology in
some shape of Cloud Computing Technologies (Sean Carlin) 63 form. Using cloud technologies
will become even more popular as our network infrastructure is improved allowing less latency
and quicker connections to the content on the cloud.
Cloud computing technologies are for everyone as it benefits the common user as much
as it benefits stakeholders, business leaders and academics as cloud computing has the potential
to reduce cost and risk, increase revenue, and enhance total customer experience for everyone.
There are a number of trends that have been projected such as that integrated public and private
cloud infrastructure will become possible 2012, and many will take advantage of it.
This will be possible with emerging technologies such as vCloud Connector , which lets
users running workloads on internal VMware infrastructure slide all or part of those workloads
into a leased public cloud running the same infrastructure allowing communication between
private and public clouds. Businesses will want to share their information, services and
infrastructure with other clouds that means that clouds are going to move towards a cloud
network.
This will facilitate collaboration for projects or engagements across enterprises and
enabling conference calls including temporary, controlled access to internal information systems,
knowledge bases or information distribution systems which usually are only accessible to
employees . All the common problems outlined in the previous section will have to be addressed
with security being the largest challenge as it can influence the cloud market and also drive
trends. There is a concern about cyber gangs hacking into commercial and military systems.
UNIT – 1
1 MARKS
1.What is cloud computing replacing?
A) Too expensive
𝗕) 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗰𝗼𝗻𝗰𝗲𝗿𝗻𝘀
D) Accessibility
A) Google
B) Amazon
𝗖) 𝗕𝗹𝗮𝗰𝗸𝗯𝗼𝗮𝗿𝗱
D) Microsoft
𝗔) 𝗧𝗿𝘂𝗲
B) False
C) Subscription
𝗗) 𝗟𝗮𝗱𝗱𝗲𝗿
E) Perpetual liense
𝗔) 𝗛𝗮𝗿𝗱𝘄𝗮𝗿𝗲 𝗮𝘀 𝗮 𝘀𝗲𝗿𝘃𝗶𝗰𝗲
B) platform as a service
C) Software as a service
D) Infrastructure as a service
𝗔) 𝗧𝗿𝘂𝗲
B) False
A) Google 101
C) Microsoft Azure
D) Amazon EC2
A) Wireless
B) Hard drives
C) People
𝗗) 𝗜𝗻𝘁𝗲𝗿𝗻𝗲𝘁
10.Which of these should a company consider before implementing cloud computing technology?
A) Employee satisfaction
C) Information sensitivity
5 MARKS
1. Write short notes on cloud computing?
2. Write about cloud models?
3. Explain cloud based services.
4. What is mean by cloud technology? Explain it.
10 MARKS
VIRTUALIZATION
1) Hardware Virtualization:
When the virtual machine software or virtual machine manager (VMM) is directly
installed on the hardware system is known as hardware virtualization. The main job of
hypervisor is to control and monitoring the processor, memory and other hardware resources.
Hardware virtualization is mainly done for the server platforms, because controlling
virtual machines is much easier than controlling a physical server.
When the virtual machine software or virtual machine manager (VMM) is installed on
the Host operating system instead of directly on the hardware system is known as operating
system virtualization. Operating System Virtualization is mainly used for testing the
applications on different platforms of OS.
3) Server Virtualization:
When the virtual machine software or virtual machine manager (VMM) is directly
installed on the Server system is known as server virtualization. Server virtualization is done
because a single physical server can be divided into multiple servers on the demand basis and
for balancing the load.
1
4) Storage Virtualization:
Storage virtualization is the process of grouping the physical storage from multiple
network storage devices so that it looks like a single storage device. Storage virtualization is
also implemented by using software applications. Storage virtualization is mainly done for
back-up and recovery purposes.
CHARACTERISTICS
On-demand self-services:
The Cloud computing services does not require any human administrators, user
themselves are able to provision, monitor and manage computing resources as
needed.
Rapid elasticity:
The Computing services should have IT resources that are able to scale out and
in quickly and on as needed basis. Whenever the user require services it is
provided to him and it is scale out as soon as its requirement gets over.
Resource pooling:
The IT resource (e.g., networks, servers, storage, applications, and services)
present are shared across multiple applications and occupant in an uncommitted
manner. Multiple clients are provided service from a same physical resource.
Measured service:
The resource utilization is tracked for each application and occupant, it will
provide both the user and the resource provider with an account of what has been
used. This is done for various reasons like monitoring billing and effective use of
resource.
Virtualization is a technique of how to separate a service from the underlying physical delivery
of that service. It is the process of creating a virtual version of something like computer hardware. It was
initially developed during the mainframe era. It involves using specialized software to create a virtual or
2
software-created version of a computing resource rather than the actual version of the same resource.
With the help of Virtualization, multiple operating systems and applications can run on same machine
and its same hardware at the same time, increasing the utilization and flexibility of hardware.
In other words, one of the main cost effective, hardware reducing, and energy saving techniques
used by cloud providers is virtualization. Virtualization allows to share a single physical instance of a
resource or an application among multiple customers and organizations at one time. It does this by
assigning a logical name to a physical storage and providing a pointer to that physical resource on
demand. The term virtualization is often synonymous with hardware virtualization, which plays a
fundamental role in efficiently delivering Infrastructure-as-a-Service (IaaS) solutions for cloud
computing. Moreover, virtualization technologies provide a virtual environment for not only executing
applications but also for storage, memory, and networking.
The machine on which the virtual machine is going to be built is known as Host Machine and
that virtual machine is referred as a Guest Machine.
Types of Virtualization:
1.Application Virtualization.
2.Network Virtualization.
3.Desktop Virtualization.
4.Storage Virtualization.
5.Server Virtualization.
6.Data virtualization.
1. Application Virtualization:
Application virtualization helps a user to have remote access of an application from a server. The server
stores all personal information and other characteristics of the application but can still run on a local
workstation through the internet. Example of this would be a user who needs to run two different
versions of the same software. Technologies that use application virtualization are hosted applications
and packaged applications.
2. Network Virtualization:
The ability to run multiple virtual networks with each has a separate control and data plan. It co-exists
together on top of one physical network. It can be managed by individual parties that potentially
confidential to each other.
Network virtualization provides a facility to create and provision virtual networks—logical switches,
3
routers, firewalls, load balancer, Virtual Private Network (VPN), and workload security within days or
even in weeks.
3. Desktop Virtualization:
Desktop virtualization allows the users’ OS to be remotely stored on a server in the data centre. It allows
the user to access their desktop virtually, from any location by a different machine. Users who want
specific operating systems other than Windows Server will need to have a virtual desktop. Main benefits
of desktop virtualization are user mobility, portability, easy management of software installation,
updates, and patches.
4. Storage Virtualization:
Storage virtualization is an array of servers that are managed by a virtual storage system. The servers
aren’t aware of exactly where their data is stored, and instead function more like worker bees in a hive. It
makes managing storage from multiple sources to be managed and utilized as a single repository. storage
virtualization software maintains smooth operations, consistent performance and a continuous suite of
advanced functions despite changes, break down and differences in the underlying equipment.
5. Server Virtualization:
This is a kind of virtualization in which masking of server resources takes place. Here, the central-
server(physical server) is divided into multiple different virtual servers by changing the identity number,
processors. So, each system can operate its own operating systems in isolate manner. Where each sub-
server knows the identity of the central server. It causes an increase in the performance and reduces the
operating cost by the deployment of main server resources into a sub-server resource. It’s beneficial in
virtual migration, reduce energy consumption, reduce infrastructural cost, etc.
6. Data virtualization:
This is the kind of virtualization in which the data is collected from various sources and managed that at
a single place without knowing more about the technical information like how data is collected, stored &
formatted then arranged that data logically so that its virtual view can be accessed by its interested
people and stakeholders, and users through the various cloud services remotely. Many big giant
companies are providing their services like Oracle, IBM, At scale, Cdata, etc.
Data-integration
Business-integration
Service-oriented architecture data-services
Searching organizational data
4
PROS AND CONS OF VIRTUALIZATION IN CLOUD COMPUTING
Virtualization is the creation of Virtual Version of something such as server, desktop, storage
device, operating system etc.
Host Machine –
The machine on which virtual machine is going to create is known as Host Machine.
Guest Machine –
The virtual machines which are created on Host Machine is called Guest Machine.
But it is very costly to provide physical services per customer on rent because firstly it becomes
very costly and also user’s will not use the fully services. So this problem can be solved by
Virtualization. It is very cool approach for not only efficient use of Physical services but also reduce
costs of vendors. Thus cloud vendor’s can vitalize their single big server and provide smaller spec server
to multiple customer’s
CONS OF VIRTUALIZATION :
6
put’s our data in vulnerable condition. Any hacker can attack on our data or try to perform unauthorized access.
Without Security solution our data is in threaten situation.
Learning New Infrastructure –
As Organization shifted from Servers to Cloud. They required skilled staff who can work with cloud easily.
Either they hire new IT staff with relevant skill or provide training on that skill which increase the cost of
company.
High Initial Investment –
It is true that Virtualization will reduce the cost of companies but also it is truth that Cloud have high initial
investment. It provides numerous services which are not required and when unskilled organization will try to
set up in cloud they purchase unnecessary services which are not even required to them.
ARCHITECTURE
REFERENCE MODEL
The cloud computing reference model is an abstract model that characterizes and
standardizes the functions of a cloud computing environment by partitioning it into abstraction
layers and cross-layer functions. This reference model groups the cloud computing functions
and activities into five logical layers and three cross-layer functions.
The five layers are physical layer, virtual layer, control layer, service orchestration layer,
and service layer. Each of these layers specifies various types of entities that may exist in a
cloud computing environment, such as compute systems, network devices, storage devices,
virtualization software, security mechanisms, control software, orchestration software,
management software, and so on. It also describes the relationships among these entities.
The three cross-layer functions are business continuity, security, and service
management. Business continuity and security functions specify various activities, tasks, and
processes that are required to offer reliable and secure cloud services to the consumers. Service
management function specifies various activities, tasks, and processes that enable the
7
administrations of the cloud infrastructure and services to meet the provider’s business
requirements and consumer’s expectations.
Virtual Layer
Control Layer
8
Deployed either on virtual layer or on physical layer
Specifies entities that operate at this layer : control software
Functions of control layer : Enables resource configuration, resource pool configuration and
resource provisioning. Executes requests generated by service layer. Exposes resources to and
supports the service layer. Collaborates with the virtualization software and enables resource
pooling and creating virtual resources, dynamic allocation and optimizing utilization of
resources.
Service Layer
Cross-layer function
Business continuity
Specifies adoption of proactive and reactive measures to mitigate the impact of downtime.
Enables ensuring the availability of services in line with SLA.
Supports all the layers to provide uninterrupted services.
Security
Service Management
9
Specifies adoption of activities related to service portfolio management and service
operation management.
TYPES OF CLOUDS
1. Public Cloud
2. Private Cloud
3. Hybrid Cloud
4. Community Cloud
Public Cloud
Public clouds are managed by third parties which provide cloud services over
the internet to the public,
These services are available as pay-as-you-go billing models.
They offer solutions for minimizing IT infrastructure costs and become a good
option for handling peak loads on the local infrastructure.
Public clouds are the go-to option for small enterprises, which are able to start
their businesses without large upfront investments by completely relying on
public infrastructure for their needs
The fundamental characteristics of public clouds are multitenancy. A public
cloud is meant to serve multiple users, not a single customer.
1
0
Private Cloud
Private clouds are distributed systems that work on private infrastructure and
provide the users with dynamic provisioning of computing resources.
Instead of a pay-as-you-go model in private clouds, there could be other
schemes that manage the usage of the cloud and proportionally billing of the
different departments or sections of an enterprise.
Hybrid Cloud
A hybrid cloud is a heterogeneous distributed system formed by combining
facilities of public cloud and private cloud.
For this reason, they are also called heterogeneous clouds.
A major drawback of private deployments is the inability to scale on-
demand and efficiently address peak loads. Here public clouds are needed.
Hence, a hybrid cloud takes advantage of both public and private clouds.
1
1
Community Cloud
Community clouds are distributed systems created by integrating the services of
different clouds to address the specific needs of an industry, a community, or a
business sector.
In the community cloud, the infrastructure is shared between organizations that
have shared concerns or tasks. The cloud may be managed by an organization or a
third party.
1
2
Compute Services – Amazon EC2
To launch a new instance click on the launch instance button. This will open a wizard
where you can select the Amazon machine image (AMI) with which you want to launch the
instance. You can also create their own AMIs with custom applications, libraries and data.
Instances can be launched with a variety of operating systems.
• Instance Sizes
When you launch an instance you specify the instance type (micro, small, medium, large,
extra-large, etc.), the number of instances to launch based on the selected AMI and availability
zones for the instances.
• Key-pairs
When launching a new instance, the user selects a key-pair from existing keypairs or
creates a new keypair for the instance. Keypairs are used to securely connect to an instance after
it launches.
• Security Groups
The security groups to be associated with the instance can be selected from the instance
launch wizard. Security groups are used to open or block a specific network port for the launched
instances.
• Launching Instances
To create a new instance, the user selects an instance machine type, a zone in which the
instance will be launched, a machine image for the instance and provides an instance name,
instance tags and meta-data.
• Disk Resources
Every instance is launched with a disk resource. Depending on the instance type, the disk
resource can be a scratch disk space or persistent disk space. The scratch disk space is deleted
when the instance terminates. Whereas, persistent disks live beyond the life of an instance.
1
3
• Network Options
Network option allows you to control the traffic to and from the instances. By default,
traffic between instances in the same network, over any port and any protocol and incoming SSH
connections from anywhere are enabled.
• Launching Instances:
To create a new instance, you select the instance type and the machine image.
• You can either provide a user name and password or upload a certificate file for securely
connecting to the instance.
• Any changes made to the VM are persistently stored and new VMs can be created from the
previously stored machine images.
STORAGE SERVICES
A cloud storage service is a business that maintains and manages its customers' data and
makes that data accessible over a network, usually the internet.
Most of these types of services are based on a utility storage model. They tend to offer
flexible, pay-as-you-go pricing and scalability. Cloud storage providers also provide for
unlimited growth and the ability to increase and decrease storage capacity on demand.
Leading use cases for a cloud storage service include backup, disaster recovery (DR),
collaboration and file sharing, archiving, primary data storage and near-line storage.
Public cloud storage is a service owned and operated by a provider. It is usually suitable
for unstructured data that is not subject to constant change. The infrastructure usually consists of
inexpensive storage nodes attached to commodity drives. Data is stored on multiple nodes for
1
4
redundancy and accessed through internet protocols, typically representational state transfer
(REST).
Designed for use by many clients, a public cloud storage service supports massive multi-
tenancy with data isolation, access and security for each customer. It is generally used for
purposes ranging from static noncore application data to archived content that must still be
available for DR and backup.
Vendors generally charge on a dollar- or cents-per-gigabyte-per-month basis. There may
be added fees for the amount of data transferred and access charges. Amazon, Microsoft and
Google are the three largest public cloud storage providers. Other examples of public cloud
service providers are Apple, AT&T, Box, Barracuda, Certain Safe, Dropbox, eFolder, IBM, Iron
Mountain, Mega, Mozy, NTT Communications, Rackspace, SpiderOak, SugarSync and
Virtustream.
Private cloud storage services address the data safety and performance concerns of public
cloud storage by bringing cloud storage inside an organization. A private cloud storage service is
more suitable for actively used data and data that an organization needs more control over. Here,
storage is on a dedicated infrastructure within the data center, which helps ensure security and
performance. One example of a private cloud storage offering is the Hitachi Data Systems Cloud
Service for Private File Tiering.
Some enterprise users opt for a hybrid cloud storage model that stores
unstructured data -
- for backup and archiving purposes, for example -- and less sensitive data with a public
cloud provider, while a private cloud is used for active, structured and more sensitive data.
When considering any cloud storage service, you need to consider the following:
Does the service use REST, the most commonly used cloud storage API?
Does your data have to be preserved in some specific format to meet compliance
requirements? That capacity is not commonly available.
Does the provider offer both public and private clouds? This may become important if you
want to migrate data from one type of service to the other.
1
5
Cloud storage pros/cons
Advantages of private cloud storage include high reliability and security. But this
approach to cloud storage provides limited scalability and requires on-site resources and
maintenance.
Public cloud storage offers high scalability and a pay-as-you-go model with no need for
an on-premises storage infrastructure. However, performance and security measures can vary by
service provider. In addition, reliability depends on service provider availability and internet
connectivity.
Advantages of a hybrid cloud
George Crump, president of analyst firm Storage Switzerland, explains some benefits of
the hybrid cloud.
Hybrid cloud storage offers the best of the private and public cloud with high scalability
and on-premises integration that adds more layers of security. The result is better performance
and reliability because active content is cached locally. While a hybrid cloud tends to be more
costly than public storage, it is cheaper than private cloud storage. Reliability can be an issue, as
users must depend on service provider availability and internet connectivity.
Choose a storage service that delivers the amount of performance and resilience most
suitable for your workload at the least possible cost.
Metrics enable users to monitor and measure when a public cloud storage service
performs as it should or has issues. Having access to these metrics eases troubleshooting and
facilitates improvements to architectures and workload designs.
The way a cloud storage provider stores and provides access to data cannot be changed
by customers to address unexpected variations in performance as they share the infrastructure
with many other organizations.
But clients do have the ability to redesign the architecture of their workloads by
duplicating storage resources in more than one public cloud region, for example. This way, cloud
storage customers can redirect storage resources to the replicated region should problems arise.
1
6
Caching can also be used to address -- and head off -- potential cloud storage service
performance issues.
Deploy dedicated tools to accelerate connectivity between your on-premises data center
and cloud storage when local workloads can't surmount the performance limitations of a public
cloud storage service.
Improve connectivity.
Migrating data from one cloud storage service to another is an often-overlooked area.
Cloud migrations have become more common due to market consolidation and price
competition.
Businesses tend to switch cloud storage providers either because of price -- which must
be substantially cheaper to justify the cost and work of switching -- or when a cloud provider
goes out of business or stops providing storage services. With public cloud providers, it is
usually just as easy to copy data out of the cloud as it was to upload data to it. Available
bandwidth can become a major issue, however. In addition, many providers charge extra to
download data.
To mitigate concerns about a provider going out of business, you could copy data to more
than one cloud storage service. While this increases cloud storage costs, it is often still cheaper
than maintaining data locally.
Should that not be the case, or if bandwidth becomes a major sticking point, find out if
the original and the new cloud storage service have a direct-connect relationship. This approach
also removes the need of cloud storage customers to use their data centers as a bridge or go-
between -- such as using an on-premises cache -- to facilitate the transfer of data between the two
cloud storage providers.
Many different cloud database service providers are working who provide database as a
service that is further divided into major three categories. There are rational database, non-
1
7
rational database and operating virtual machine loaded with local database software like
SQL.
There are different companies offering database as a service, DBaaS like Amazon RDS,
Microsoft SQL Azure, Google AppEngine Datastore and Amazon SimpleDB (Pizzete and
Cabot 2012). Each service provider is different from the other depending upon the quality
and sort of services being provided.
There are certain parameters that can be used to select the best service that will suit for your
company. This is not limited to a certain company; these parameters can help in deciding the
best service provider depending upon the requirements of any company.
1
8
Choosing best DBaaS
The selecting of DBaaS depends not only on the services being provided by the company,
but it also depends on the requirements of the company as well. There are certain parameters that
can be taken as a guide to choose the best DBaas.
Data Sizing
Every DBaaS provider has a different capacity of storing data on the database. The data
sizing is very important as the company will need to be sure about the size of data that it will be
stored in its database. For example, the Amazon RDS allows the user to store up to 1TB of data
in one database on the other hand SQL Azure offers only 50GB of data for one database.
Portability
The database should be portable as the database should never be out of the access of the
user. The service provider may go out of business, so the database and the data stored can be
destroyed. There should be an emergency plan if such things happen. This can be resolved by
taking cloud services from other companies as well so that the database is accessible even in the
case of emergency.
Transaction Capabilities
The transaction capabilities are the major feature of the cloud database as the completion
of the transaction is very important for the user. The user must be aware if the transaction has
been successful or not. There are companies who mostly do transact money, in this situation the
complete read and write operations must be accomplished. The user needs a guarantee of
the transaction he made, and this sort of transaction is called an ACID transaction (Pizzete and
Cabot 2012). If there is no need of the guarantee then the transactions can be made by non ACID
transactions. This will be faster as well.
Configurability
1
9
There are many databases that can easily configurable by the user as most of the
configuration are done by the service provider. In this way there are very less options available
left to the administrator of the database and he can easily manage the database without more
efforts.
Database Accessibility
As there are different number of databases, the mechanism for accessing the database are
different as well. The first method is the one that is RDBMS being offered through the standards
of the industry drivers such as Java Database Connectivity. The motive of this driver is that
allows the external connection to access the services through the standard connection. The
second accessibility of the database is that by the usage of interfaces or protocols like, Service-
Oriented Architecture (SOA) and SOAP or rest (Pizzete and Cabot 2012). These interfaces use
HTTP and some new API definition.
Certification and Accreditation
It is better to get the services of the cloud database provider, who have got certification
and accreditation. It helps in mitigating the risks of services for the company to avoid any
inconvenience. The companies who have certifications like FISMA can be considered reliable as
compared to other DBaaS provider.
International Journal of Database Management Systems Data Integrity, Security and
Storage Location Security has been the major threat to the data stored in the cloud storage. The
security also depends on the encryption methods used and the storage locations of the data.The
data is stored in the different locations in data centers.
APPLICATIONS SERVICES
2
0
Applications as a service can also provide software to enterprise users more efficiently,
because it can be distributed and maintained for all users at a single point – in the public cloud
(Internet). The efficiency gains are facilitated through the use of various automation tools that
can be implemented on the cloud services platform. The automation of functions such as: user
provisioning, user account management, user subscription management and application life cycle
management make on-demand software a highly efficient and cost effective way to deliver
software to enterprise users.
Companies that provide applications as a service (on-demand software) are known as ASPs
or application service providers. ASPs are divided into 4 major categories, as follows:
ASPs own the software that they deliver to consumers, as well as the hardware which
supports the software. ASPs bill on a per use basis, on a monthly basis or on an annual basis –
making software on demand a very affordable option for many organizations. On-demand
software provides small and medium size businesses with a method of accessing software that
may have previously been financially out of reach, due to software licensing costs and additional
hardware costs.
The SaaS, on-demand or applications as a service model of software delivery offers a
long list of benefits to subscribers, including: an elimination of many software integration issues,
improved reliability of applications, increased availability of applications, increased security of
applications, access to dedicated experts for each on-demand software service subscribed to, an
overall reduction in IT operational costs and guaranteed service levels from ASPs.
2
1
CONTENT DELIVERY SERVICES
CDNs have made a significant impact on how content is delivered via the Internet tothe
end-users . Traditionally content providers have relied on third-party CDNs to deliver their
content to end-users. With the ever changing landscape of content typese.g. moving for standard
definition video to high definition to full high definition, it is achallenge for content providers
who either supplement their existing delivery networkswith third-party providers or completely
rely on them to understand and monitor theperformance of their service. Moreover, the
performance of the CDN is impacted bythe geographical availability of the third-party
infrastructure.
CCDN allows the users to consume the delivery content using a pay-as-you-go
model.
Increased point-of-presence
The content is moved closer to users with relative ease in the CCDN system than the
traditional CDN due to the omnipresence of cloud .The Cloud-based content delivery network
can reduce the transmission latency as It can rent operating resources from the cloud provider to
increase the reach and visibility of the CDN on-demand.
CCDN Interoperability
CDN interoperability has emerged as a strategic important concept for service providers
and content providers. Interoperability of CDNs via the cloud will allow content providers to
reach new markets and regions and support nomadic users. E.g., instead of setting up an
infrastructure to serve a small group of customers in Africa, taking advantage of current cloud
providers in the region to dynamically host surrogate servers.
The cloud can support dynamic changesin load. This will facilitate the CDNs to support
different kinds of applications thathave unpredictable bursting traffic, predictable bursting traffic,
scale up and scaledown of resources and ability to expand and grow fast.However, while cloud-
based CDNs have made a remarkable progress in the pastfive years, they are still limited in a
number of aspects. For instance, moving into thecloud might carry some marked security and
2
3
performance challenges that can impact theefficiency and productivity of the CDN thus affecting
the client‘s business.
CDNs are designed for streaming staged content but do not perform well in
situationswhere content is produced dynamically. This is typically the case when content is pro-
duced, managed and consumed in collaborative activities. For example, an art teachermay find
and discuss movies from different film archives; the students may then edit theselected movies.
Parts of them may be used in producing new movies that will be sent tothe students‘friends for
comments and suggestions. Current CDNs do not support suchcollaborative activities that
involve dynamic content creation.
Content Creation
Traditional CDNs are not designed to manage content (e.g., find and play high defi-nition
movies). This is typically done by CDN applications [42,43]. For example, CDNs do not provide
services that allow an individual to create a streaming music video service combining music
videos from an existing content source on the Internet(e.g., YouTube), his/her personal
collection, and from live performances he/she attendsusing his/her smart phone to capture such
content.
This can only be done by anapplication managing where and when the CDN will deliver
the video component ofhis/her music program. With CCDN, the end-user will act as both content
creator and consumer. CCDN needs to support this feature inherently. User-generated content
distribution is emerging as one of the dominant forms in the global media market.
Content Heterogeneity
Existing Web 2.0 technologies currently support the authoring of structured multimedia
content (e.g., web pages linking images, sounds, videos, and animations). The CCDNs will need
to extend and broaden existing Web 2.0 strengths with a new environment aimed at supporting
the creation and consumption of interactive multimedia content (e.g., interactive audio and
video), as well as other novel forms of multimedia content (e.g., virtual and augmented reality)
that are currently not supported by existing Web 2.0 technologies and tools.
CCDN Ownership
2
4
Cloud CDN service providers either own all the services they use to run their
CDNservices or they outsource this to a single cloud provider. A specialized legal and technical
relationship is required to make the CDN work in the latter case.
CCDN Personalization
CDNs do not support content personalization. For example, if the subscriber‘s behavior
and usage pattern can be observed, a better estimation on the traffic demand can be achieved.
The performance of content delivery is moving from speed and latency to on-demand delivery of
relevant content matching end-user‘s interest and context.
The cloud cost model works well as long as the network consumption is predictable for
both service provider and end-user. However, such predictions become very challenging with
distributed cloud CDNs.
Security
CDNs also impose security challenges due to the introduction public clouds to store,
share and route content. The use of multi vendor public clouds further complicates this problem.
Security is the protection of content against unauthorised usage, modification, tampering and
protection against illegal use, hack attacks, viruses and other unwanted intrusions. Further,
security also plays an important role while accessing and delivering content to relevant users .
Hybrid Clouds
The integration of cloud and CDN will also allow the development of hybrid CCDN that
can leverage on a combination and private and public cloud providers. E.g. the content provider
can use a combination of cloud service platforms offered by Microsoft Azure and Amazon AWS
to host their content. Depending on the pay-as-you go model, the content provider can also move
from one cloud provider to another. However, achieving a hybrid model is very challenging due
to various CCDN ownership issues and QoS issues.
CCDN Monitoring
2
5
The CCDNs can deliver end-to-end QoS monitoring by tracking the overall service
availability and pinpoint issues. Clouds can also provide additional tools for monitoring specific
content e.g. video quality monitoring. However, developing a CCDN monitoring framework is
always a challenge.
CCDN QoS
With the notion of virtually unlimited resources offered by the cloud, quality for service
plays a key role in CCDNs to maintain a balance between service delivery quality and cost.
Defining appropriate SLA‘s to enforce QoS and guarantee service quality is very important and
is also challenging. Further, the notion of hybrid clouds further complicate CCDN QoS
challenges due to the involvement of multiple cloud providers with varying SLAs.
It is critical that CCDNs are able to predict the demands and behaviours of hosted
applications, so that it can manage the cloud resources optimally. Concrete prediction orf
orecasting models must be built before the demands and behaviours of CDN applications can be
predicted accurately. The hard challenge is to accurately identify and continuously learn the most
important behaviours and accurately compute statistical prediction functions based on the
observed demands and behaviours such as request arrival pattern, service time distributions, I/O
system behaviours, user profile, and network usage.
The diversity of offering by Cloud providers make cloud section to host CDN
components a complex task. A practical question to be addressed is: how well does a cloud
provider perform compared to the other providers? For example, how does a CDN application
engineer compare the cost/performance features of CPU, storage, and network resources offered
by Amazon EC2, Microsoft Azure, GoGrid, FelxiScale, TerreMark, and RackSpace.
2
6
For instance, a low-end CPU resource of Microsoft Azure is 30 % more expensive than
the comparable Amazon EC2 CPU resource, but it can process CDN application workload twice
as quickly. Similarly, a CDN application engineer may choose one provider for storage intensive
applications and another for computation intensive CDN applications. Hence, there is need to
develop novel decision making framework that can analyse existing cloud providers to help
CDN service engineers in making optimal selection decisions.
ANALYTIC SERVICES
Analytics as a service (AaaS) refers to the provision of analytics software and operations
through web-delivered technologies. These types of solutions offer businesses an alternative to
developing internal hardware setups just to perform business analytics.
2
7
Software as a service (SaaS)
What these all have in common is that the service model replaces internal systems with web-
delivered services. In the example of analytics as a service, a provider might offer access to a
remote analytics platform for a monthly fee. This would allow a client to use that particular
analytics software for as long as it is needed, and to stop using it and stop paying for it at a future
time.
Analytics as a service is becoming a valuable option for businesses because setting up
analytics processes can be a work-intensive process. Businesses that need to do more analytics
may need more servers and other kinds of hardware, and they may need more IT staff to
implement and maintain these programs.
If the business can use analytics as a service instead, it may be able to bypass these new costs
and new business process requirements.
Along with the appeal of complete outsourcing that analytics as a service provides, there is
the option of going with a hybrid system where businesses use what they have on hand for
analytics and outsource other components through the web.
All of this equips the modern business with more choices and more precise solutions for
changing business needs in markets that work largely on the availability of big data.
Advantages
Instead of handling speed and delivery time related hassles from your on-premise servers,
cloud computing resources are high-powered and can deliver your queries and reports in no-time.
Ad hoc Deployment of Resources for Better Performance
If you are having an in-house analytics team, you should be concerned about an efficient
warehouse, latency of your data over poor public internet, being up-to date with advanced tools
and experience in handling the high demands for real-time BI or emergency queries. Employing
Cloud services in data science and analytics can help your business scale-up by establishing a
direct connection between them, reducing the latency and response issues to less than a
millisecond.
2
8
Match, Consolidate and Clean Data Effortlessly
Real time Cloud analytics with real-time access to your online data keeps your data up-to
date and organized, helping your Operations and Analytics teams function under the same roof.
This makes sure of no mismatches and delays, helping you to also predict and implement finer
decisions.
Accessibility
Cloud services are capable in sharing data and visualization and performing cross-
organizational analysis, making the raw data more accessible and perceivable by a broader user
base.
Cloud-based applications are built with self-learning models and have a consumer
friendly user experience unlike the on-premise applications. Cloud technologies learn to adopt as
your business grows and can expand or adjust as your data storage and applications needs
increase or decrease.
Affordability
There are no upgrade costs or issues, and enabling new tools or applications require
minimal IT maintenance. This keeps the business in a continuous flow without any interventions
like the need for upgrading the on-premise infrastructure, and having to redo your integrations
and other time consuming efforts.
2
9
Security
Robustly built, Cloud analytics are reportedly more reliable than on-premise systems in
times of a data breach. Detecting a breach or a security issue can be within hours or minutes with
Cloud security whereas with an in-house team, it takes weeks or even months in detecting a
breach. Your data is more trusted and secure with cloud computing.
Implementing cloud services in data science can be the best and most-effective
infrastructure you can give to your business. They are agile, secure and flexible and help you to
streamline each of your business process as Cloud services enable all your teams function under
the same data foundation.
Mindtree‘s rich infrastructure and application experience enables high availability and
continuous optimization in a hybrid cloud across the business application ecosystem.
Mindtree cloud management services include:
Optimization in Regular
spending operational metrics
Automation and
DevOps
Mindtree has developed a distinctive approach to deliver management through its proven
cloud management platform and skilled workforce. Our platform delivers integrated
3
0
Mindtree addresses the challenges of both application and infrastructure, jointly referred
to as AppliStructure. Our approach is to deliver security-as-hygiene. This means building
security at every step of the delivery process rather than transposing security in isolation.
Needless to say, we deploy the right tools to help in thread prediction, identification and
remediation.
Benefits
By outsourcing your cloud managed services, you control and reduce costly network
maintenance costs. Staffing a full-time IT department is expensive and often unnecessary for
small to medium-sized businesses with simple networks. Outsourcing to a cloud-first managed
services provider like Agile IT can save you thousands each year on the cost of an in-house IT
department.
Predictable, recurring monthly costs
With the flexibility of cloud managed services, you decide how much you‘re willing to
pay for IT services and have a consistent monthly bill.
For example, a tax service has a spike in customers during tax season and will need more
support during the first quarter of the year and less during the second through fourth quarters. A
privatized learning institute for working adults will need the most support in the evenings when
students are online after work.
With a fixed monthly service plan that‘s customized to fit your needs or budget, you
optimize the amount you pay for IT support.
Future-proofed technology
Migrating to a cloud environment is the first step in future-proofing your data center.
Next, you‘ll need make the latest technology and services available to your business.
By hiring an in-house IT staff, your IT personnel will have to spend company time
training when a new technology or required upgrade gets released. Cloud technicians are already
prepared to manage the latest technology.
3
1
Other cloud managed service providers offer a converged solution, which produces even
more cost savings. These converged solutions may include security protection, network
monitoring or the setup of a new service area.
Robust infrastructure
Cloud MSPs like Agile IT offer a robust network infrastructure with 24/7
management.
Depending on the service agreement, a cloud managed service provider can monitor and
scan thenetwork for patch requirements security, and more.
Managed service providers can also integrate existing business practices and policies to
manage your network to coincide with your organizational goals.
With a managed cloud network, the provider manages all applications and servers in a
central data center.
This increased network availability also raises employee production. Your remote
network users can access centralized data within the same network including virtual services, and
you can build storage and backup into a centralized network.
Cloud service providers offer better control over service levels, performance and
maintenance. With a comprehensive service-level agreement, your business gains service
continuity. The longer you work with a cloud managed services provider like Agile IT, the more
familiar they become with your network, leading to faster issue response times.
Disaster recovery
Services are the lifeline of a cloud managed service provider. Agile IT has designed
countless networks and data centers with proven redundancy and resiliency to maintain business
continuity.
3
2
Fast response times
Your businesses can expect quick response times through enterprise-level monitoring and
remote cloud services. Agile IT can access, monitor and repair virtually any network issue
remotely. If you must resolve an issue locally, a technician should be dispatched within the same
business day.
Vendor interfacing
When vendor-specific service issues arise, cloud managed service providers take care
of contacting third-party vendors to resolve them.
As a certified Microsoft consulting partner and 4-time Microsoft Cloud Partner of the
Year, Agile IT understands the technical questions to ask when communicating issues with cloud
vendors including Microsoft and Amazon.
At Agile IT, we are committed to helping businesses leverage custom cloud solutions to
control costs and automate critical processes. As a cloud managed services provider, we set up,
manage and protect your cloud environment so you can focus on growing your business.
Identity Management
1. The pure identity paradigm: creation, management and deletion of identities without
regard to access or entitlements.
2. The user access (log on) paradigm: A traditional method say for ex a user uses the
smart card to log on to a service.
3. The service paradigm: A system that delivers personalized role based, online, on-
demand, presence based services to users and their devices.
A set of parties use IdM and collaborate to identify an entity. These parties are
1. Identity Provider (IdP): It issues digital identities. For example debit card providers
issue identities enabling payment, government issues PAN card or SSN to citizens.
2. Service Provider (SP): It provides access to services to the identities that have the right
required identities. For example- a user needs to provide identity information to be able to do
transactions via net banking.
3. Entity: Entities are the ones about who claims are made.
4. Identity Verifier: Service Providers send them the request for verifying claims about an
identity.
An Identity management system uses one of these three identifiers
1. That are known by both the entity as well as the service provider
2. That an entity knows and can be verified by the service provider via the identity
providers 3. Identifiers like biometric information
The Identity Life Cycle is closely related to the concept of digital identities. It
comprisesthree main steps:
Provisioning
The term provisioning is often explained with an example of a new employee. When
people join a new company, they often need physical objects like an office, a desk, a phone, a
key card, etc. Likewise, many collections of digital information need to be created for the new
3
4
The allocation of these digital objects and the creation of the digital identity information
that enables the necessary services for a user is called provisioning [7].
This idea can however be expanded to more than just people joining the company. Many
individuals from outside the company might also need provisioning, for example customers,
vendors and business partners.
Basically, everything that can have and identity that might use provisioning. While
provisioning is often limited to people-related identity information, it might hence also include
the information of other company assets. Provisioning often happens when a new identity is
created at the beginning of the Identity Life Cycle.
Typically, information that describes the object corresponding with the identity is
provisioned into Human Resources (HR) systems, operating system directories, application
directories, etc. From these newly created Persona's, additional information describing the
identity's roles and entitlements within the organization is then created.
For example, upon joining the company every new employee might have a user account
created on a certain server. This Persona might contain identity information like the person's job
title. This information can in turn be used to define the roles this person might have. A new
employee with "administrator" as its job title might for example automatically be added to the
administrators group on said server.
Maintenance
Identity information is prone to changes over time. During the Identity Life
Cycle,modifications of it will therefore most likely be necessary. Synchronization plays a
substantial role here.
As the information is updated in one data store, it is often desired that this will be distributed
automatically to other data stores using certain synchronization processes that are in place.
An example of this would be the change in home address of a certain employee.
This piece of identity information might be modified in the HR system of the company and
then synchronized to a server belonging to a department that sends out a monthly magazine
to all employees.
Maintenance however should not be confused with (re)provisioning. As maintenance is
solely about updating identity information it does not cover the creation of new information
that describes persons, groups, devices or services.
3
5
For example, if an employee of a certain company is promoted, this might cause him/her to
acquire new roles and responsibilities.
To reflect these new entitlements, the employee's identity information is said to be
reprovisioned, not synchronized.
As the promotion might cause changes in various relationships, it is possible that news
accounts and other data objects must be created in various data stores.
3Deprovisioning Previously, we learned that when an employee joins a company, through
provisioning, itis digital identity is created.
In time, while the employee keeps on working at the same company, the identity information
will be modified and synchronized.
On major changes, like a promotion, the information might even be reprovisioned. When an
employee leaves the company however, its identity reaches the end of its life cycle and it is
time for the last phase, deprovisioning. Deprovisioning corresponds with the removal or
disabling of Persona's when an identity leaves a domain.
3
6
IAM Architecture
As we will now talk about technologies however, this functional approach is not very
practical anymore. We will hence look at IAM from a more architectural point of view now,
starting with the following diagram inspired.
Before we will focus on the different components of IAM systems as depicted in
thisoverview, some initial considerations are deemed necessary. In the center of the diagram, we
see the Directory Services component, which can be considered the core of IAM. While all IAM
components can in fact be deployed independent from any IAM architecture.
The use of Directory Services as a central building block other components can
leverageand integrate tightly with[10], could be considered the characteristic feature which
distinguishes an IAM solution from any other IT solution offering similar functionality. On the
right side of the overview, some example servers are depicted that could be part of a company
infrastructure. In most cases, this would be legacy systems that are not specifically placed there
as part of the IAM solution.
The (Identity/Access) management of these systems will eventually become the task of
the IAM system. Most of the identity data the IAM system works with will come from these
systems. Some of them will wholly or mainly provide data, others will receive it.
The first group will be called Authoritative Sources, a term which is also used to
determine which data source should be leading in case of conflict. If on two connected servers in
a company the unique e-mail address of a certain customer differs for example, the email address
on the server that is considered to be the Authoritative Source will be taken as the leading one.
Possibly, the IAM solution might consequently overwrite the other e-mail address.
Identity & Access Management (IDAM) services allow managing the authentication and
authorization of users to provide secure access to cloud resources.
• Using IDAM services you can manage user identifiers, user permissions, security
credentials and access keys.
• Amazon Identity & Access Management
• AWS Identity and Access Management (IAM) allows you to manage users and user
permissions for an AWS account.
3
7
• Windows Azure Active Directory
• Windows Azure Active Directory is an Identity & Access Management Service from
Microsoft.
• Azure Active Directory provides a cloud-based identity provider that easily integrates
with your on-premises active directory deployments and also provides support for third party
identity providers.
• With Azure Active Directory you can control access to your applications in Windows
Azure.
• Apache CloudStack is an open source cloud software that can be used for creating private cloud
offerings.
• CloudStack manages the network, storage, and compute nodes that make up a cloud
infrastructure.
• A CloudStack installation consists of a Management Server and the cloud infrastructure that it
manages.
• Zones
• The Management Server manages one or more zones where each zone is typically a
single datacenter.
• Pods
• Each zone has one or more pods. A pod is a rack of hardware comprising of a switch
and one or more clusters.
• Cluster
• A cluster consists of one or more hosts and a primary storage. A host is a compute node
that runs guest virtual machines.
• Primary Storage
• The primary storage of a cluster stores the disk volumes for all the virtual machines
running on the hosts in that cluster.
• Secondary Storage
3
8
• Each zone has a secondary storage that stores templates, ISO images, and disk volume
snapshots.
Open Source Private Cloud Software - OpenStack
• Eucalyptus is an open source private cloud software for building private and hybrid clouds that
are compatible with Amazon Web Services (AWS) APIs.
• Node Controller
• NC hosts the virtual machine instances and manages the virtual network endpoints.
• Cluster Controller - which manages the virtual machines and is the front-end for a
cluster.
• Storage Controller – which manages the Eucalyptus block volumes and snapshots to
the instances within its specific cluster. SC is equivalent to AWS Elastic Block Store
(EBS).
3
9
Another file hosting software system which exploits open source property to avail its
users with all advantages they expect from a good cloud storage software system. It is written in
C, Python with latest stable release being 4.4.3 released on 15th October 2015.
2. Seafile
Seafile provides desktop client for Windows, Linux, and OS X and mobile clients for
Android, iOS and Windows Phone. Along with a community edition released under General
Public License, it also has a professional edition released under commercial license which
provides extra features not supported in community edition i.e. user logging and text search.
Since it got open sourced in July 2012, it started gaining international attention. Its main
features are syncing and sharing with main focus on data safety. Other features of Seafile which
have made it common in many universities like: University Mainz, University HU Berlin and
University Strasbourg and also among other thousands of people worldwide are: online file
editing, differential sync to minimize the bandwidth required, client-side encryption to secure
client data.
3. Pydio
Earlier known by the name AjaXplorer, Pydio is a freeware aiming to provide file
hosting, sharing and syncing. As a project it was initiated in 2009 by Charles du jeu and since
2010, it is on all NAS equipment‘s supplied by LaCie.
Pydio is written in PHP and JavaScript and available for Windows, Mac OS and Linux
and additionally for iOS and Android also. With nearly 500,000 downloads on Sourceforge, and
acceptance by companies like Red Hat and Oracle, Pydio is one of the very popular Cloud
Storage Software in the market.
In itself, Pydio is just a core which runs on a web server and can be accessed through any
browser. Its integrated WebDAV interface makes it ideal for online file management and
SSL/TLS encryption makes transmission channels encrypted securing the data and ensuring its
privacy. Other features which come with this software are: text editor with syntax highlighting,
audio and video playback, integration of Amazon, S3, FTP or MySQL Databases, image editor,
file or folder sharing even through public URL‘s.
4
0
UNIT - II
QUESTION BANK
One Marks
1) What type of computing technology refers to services and applications that typically run on a distributed
network through virtualized resources?
A.Distributed Computing B.Cloud Computing
C.Soft Computing
D.Parallel Computing
Answer: B
Answer: A
3) Cloud computing is a kind of abstraction which is based on the notion of combining physical resources
and represents them as resources to users.
A.Real B.Cloud C.Virtual
D.none of the mentioned
Answer: C
4
1
4) Which of the following has many features of that is now known as cloud computing?A.Web Service
B.Softwares
5) Which one of the following cloud concepts is related to sharing and pooling the resources?
A.Polymorphism
B.VirtualizationC.Abstraction
D.None of the mentioned
Answer: B
A. The popularization of the Internet actually enabled most cloud computing systems.
B. Cloud computing makes the long-held dream of utility as a payment possible for you, with an infinitely
scalable, universally available system, pay what you use.
C. Soft computing addresses a real paradigm in the way in which the system is deployed.
Answer: C
7) Which one of the following can be considered as a utility is a dream that dates from the beginning of the
computing industry itself?
A.ComputingB.Model C.Software
D.All of the mentioned
Answer: A
Answer: B
4
2
9) Which one of the following is Cloud Platform by Amazon?A.Azure
B.AWS
C.Cloudera
Answer: B
A. Through cloud computing, one can begin with very small and become big in a rapid manner.
Answer: B
11) In the Planning Phase, Which of the following is the correct step for performing the analysis? A.Cloud
Computing Value Proposition
B. Cloud Computing Strategy Planning
C. Both A and B
Answer: C
12) In which one of the following, a strategy record or Document is created respectively to the events,
conditions a user may face while applying cloud computing mode.
A. Cloud Computing Value Proposition
B. Cloud Computing Strategy Planning
C. Planning Phase
Answer: B
A. We recognize the risks that might be caused by cloud computing application from a business
perspective.
4
3
B. We identify the applications that support the business processes and the technologies required to support
enterprise applications and data systems.
C. We formulate all kinds of plans that are required to transform the current business to cloud computing
modes.
D. None of the above
4
4
Answer: A
14) Which one of the following refers to the non-functional requirements like disaster recovery, security,
reliability, etc.
A. Service Development
B. Quality of service
15) Which one of the following is a phase of the Deployment process? A.Selecting Cloud Computing
Provider
B.IT Architecture Development
16) This phase involves selecting a cloud provider based on the Service Level Agreement (SLA), which
defines the level of service the provider receives.
A.Maintenance and Technical Service B.Selecting Cloud Computing Provider C.Both A and B
D.None of the above
Answer: B
17) In which one of the following phases, IT Architecture Development came?A.Strategy Phase
B.Planning Phase C.Deployment Phase D.Development PhaseAnswer: B
18) Which of the model involves the special types of services that users can access on a Cloud Computing
platform?
A. Service B.Planning C.Deployment D.Application
4
5
Answer: A
19) Which one of the following is related to the services provided by Cloud?A.Sourcing
B. OwnershipC.Reliability4.PaaS Answer: A
20) How many phases are present in Cloud Computing Planning?A.2
B.3
C.4
D.5
Answer: B
21) Cloud computing architecture is a combination of? A.service-oriented architecture and grid computing
B.Utility computing and event-driven architecture.
C. Service-oriented architecture and event-driven architecture. D.Virtualization and event-driven
architecture.
Answer: C
22) Which one of the following refers to the user's part of the Cloud Computing system?A.back End
B.Management C.InfrastructureD.Front End Answer: D
23) Which one of the following can be considered as the example of the Front-end?A.Web Browser
B.Google Compute EngineC.Cisco Metapod D.Amazon Web Services Answer: A
4
6
24) By whom is the backend commonly used?A.Client
B.User C.Stockholders D.service provider Answer: D
25) Through which, the backend and front-end are connected with each other?A.Browser
B.Database C.Network D.Both A and BAnswer: C
26) How many types of services are there those are offered by the Cloud Computing to the users?A.2
B.4
C.3
D.5
Answer: C
27) The Foce.com and windows Azure are examples of which of the following?A.IaaS
B.PaaSC.SaaS
D.Both A and B
Answer: B
28) Which of the following is one of the backend's built-in components of cloud computing?A.Security
B.ApplicationC.Storage D.Service Answer: A
29) Which of the following provides the Graphic User Interface (GUI) for interaction with the cloud?
A.Client
B.Client Infrastructure
4
7
C.ApplicationD.Server Answer: B
30.Which one of the following is related to the services offered by the Cloud?
A.Sourcing
B.OwnershipC.ReliabilityD.AaaS Answer: A
5 MARKS
1. Write short notes on compute services?
10MARKS
4
8
UNIT III
APPLICATION DESIGN
When designing applications for the cloud, irrespective of the chosen platform, I have
often found it useful to consider four specific topics during my initial discussions; scalability,
availability, manageability and feasibility.
It is important to remember that the items presented under each topic within this article
are not an exhaustive list and are aimed only at presenting a starting point for a series of long and
detailed conversations with the stakeholders of your project, always the most important part of
the design of any application. The aim of these conversations should be to produce an initial
high-level design and architecture.
This is achieved by considering these four key elements holistically within the domain of
the customers project requirements, always remembering to consider the side-effects and trade-
offs of any design decision (i.e. what we gain vs.
what we lose, or what we make more difficult).
Scalability
Will we need to scale individual application layers and, if so, how can we achieve this
without affecting
availability?
Will the application need to run at scale 24x7, or can we scale-down outside business hours or
atweekends for example?
Platform / Data
Can we work within the constraints of our chosen persistence services while working at
scale
(database size, transaction throughput, etc.)?
How can we partition our data to aid scalability within persistence platform constraints (e.g.
maximum database sizes, concurrent request limits, etc.)?
How can we ensure we are making efficient and effective use of platform resources? As a
rule of thumb, I generally tend towards a design based on many small instances, rather than
fewer large ones.
Can we collapse tiers to minimise internal network traffic and use of resources, whilst
maintaining efficient scalability and future code maintainability?
Load
How can we improve the design to avoid contention issues and bottlenecks? For example,
can we usequeues or a service bus between services in a co-operating producer,
competing consumer pattern?
Which operations could be handled asynchronously to help balance load at peak times?
How could we use the platform features for rate-leveling (e.g. Azure Queues, Service
Bus, etc.)?
How could we use the platform features for load-balancing (e.g. Azure Traffic Manager,
Load
Balancer, etc.)?
Availability
Availability describes the ability of the solution to operate in a manner useful to the
consumer in spite of transient and enduring faults in the application and underlying
operating system, network and hardware dependencies.
In reality, there is often some crossover between items useful for availability and
scalability.
Conversations should cover at least the following itemsUptime Guarantees
What Service Level Agreements (SLA‘s) are the products required to meet?
Can these SLA‘s be met? Do the different cloud services we are planning to use all
conform to thelevels required? Remember that SLA‘s are composite.
Replication and failover
Which parts of the application could benefit from redundancy and failover options?
Are we restricted to specific geopolitical areas? If so, are all the services we are planning
to useavailable in those areas?
How do we prevent corrupt data from being replicated?
Will recovery from a failure put excess pressure on the system? Do we need to
implement retrypolicies and/or a circuit-breaker?
Disaster recovery
How are we handling backups? Do we have a need for backups in addition to data-
replication?
How do we handle ―in-flight‖ messages and queues in the event of a failure?
Performance
What are the acceptable levels of performance? How can we measure that? What happens
if we dropbelow this level?
Can we make any parts of the system asynchronous as an aid to performance?
Which parts of the system are the mostly highly contended, and therefore more likely to
cause
performance issues?
Are we likely to hit traffic spikes which may cause performance issues? Can we auto-
scale or usequeue-centric design to cover for this?
Security
This is clearly a huge topic in itself, but a few interesting items to explore which relate
directly to cloud-computing include:
What is the local law and jurisdiction where data is held? Remember to include the
countries where failover and metrics data are held too.
Is there a requirement for federated security (e.g. ADFS with Azure Active Directory)?
Is this to be a hybrid-cloud application? How are we securing the link between our
corporate and cloudnetworks?
How do we control access to the administration portal of the cloud provider?
How do we restrict access to databases, etc. from other services (e.g. IP Address white-
lists, etc.)?
How do we handle regular password changes?
How we will deal with operating system and vendor security patches and updates?
Manageability
`This topic of conversation covers our ability to understand the health and performance of
the livesystem and manage site operations. Some useful cloud specific considerations include:
Monitoring
How are we planning to monitor the application?
Are we going to use off-the-shelf monitoring services or write our own?
Where will the monitoring/metrics data be physically stored? Is this in line with data
protection policies?
How much data will our plans for monitoring produce?
How will we access metrics data and logs? Do we have a plan to make this data useable
as volumes increase?
Is there a requirement for auditing as well as logging?
Can we afford to lose some metrics/logging/audit data (i.e. can we use an asynchronous
design to ―fire and forget‖ to help aid performance)?
Will we need to alter the level of monitoring at runtime?
Deployment
How do we patch and/or redeploy without disrupting the live system? Can we still meet
the SLA‘s?
How do we check that a deployment was successful?
Feasibility
When discussing feasibility we consider the ability to deliver and maintain the system,
within budgetary andtime constraints. Items worth investigating include:
Can the SLA‘s ever be met (i.e. is there a cloud service provider that can give the uptime
guarantees that we need to provide to our customer)?
Do we have the necessary skills and experience in-house to design and build cloud
applications?
Can we build the application to the design we have within budgetary constraints and a
timeframe that makes sense to the business?
How much will we need to spend on operational costs (cloud providers often have very
complex pricing structures)?
What can we sensibly reduce (scope, SLAs, resilience)?
When designing applications for the cloud, irrespective of the chosen platform, I have
often found it useful to consider four specific topics during my initial discussions; scalability,
availability, manageability and feasibility.
It is important to remember that the items presented under each topic within this article
are not an exhaustive list and are aimed only at presenting a starting point for a series of long and
detailed conversations with the stakeholders of your project, always the most important part of
the design of any application. The aim of these conversations should be to produce an initial
high-level design and architecture. This is achieved by considering these four key elements
holistically within the domain of the customers project requirements, always remembering to
consider the side-effects and trade-offs of any design decision (i.e. what we gain vs.what we
lose, or what we make more difficult).
Scalability Conversations about scalability should focus on any requirement to add
additional capacity to the application and related services to handle increases in load and
demand. It is particularly important to consider each application tier when designing for
scalability, how they should scale individually and how we can avoid contention issues and
bottlenecks. Key areas to consider include:
REFERENCE ARCHITECTURE FOR CLOUD COMPUTING
private cloud,
The differences are based on how exclusive the computing resources are made to a Cloud
Consumer.
public cloud
A public cloud is one in which the cloud infrastructure and computing resources are made
available to the general public over a public network. A public cloud is owned by an
organization selling cloud services, and serves a diverse pool of clients. Figure 9 presents a
simple view of a public cloud and its customers. A private cloud organization the exclusive
access to and usage of the infrastructure and computational resources. It may be managed either
by the Cloud Consumer organization or by a third party, and may be hosted on (i.e. on-site
private clouds) or outsourced to a hosting company (i.e. outsourced private clouds).
private cloud
Private Cloud A community cloud serves a group of Cloud Consumers which have shared
concerns such as mission objectives, security, privacy and compliance policy, rather than serving
a single organization as does a private cloud. Similar to private clouds, a community cloud may
be managed by the organizations or by a third party, and may be implemented on customer
premise (i.e. on-site community cloud) or outsourced to a hosting company (i.e. outsourced
community cloudhybrid cloud.
Hybrid cloud
Cloud methodologies
(i) Service Oriented Architecture (SOA):
Since the paradigm of Cloud computing perceives of all tasks accomplished as a ―Service‖
rendered to users, it is said to follow the Service Oriented Architecture.
This architecture comprises a flexible set of design principles used during the phases of
system development and integration. The deployment of a SOA-based architecture will
provide a loosely-integrated suite of services that can be used within multiple business
domains.
The enabling technologies in SOA allow services to be discovered, composed, and executed.
For instance, when an end-user wishes to accomplish a certain task, a service can be
employed to discover the required resources for the task. This will be followed by a
composition service which will plan the road-map to provide the desired functionality and
quality of service to the end-user.
(ii) Virtualization
The concept of virtualization is to relieve the user from the burden of resourcepurchasesand
installations.
The Cloud brings the resources to the users. Virtualization may refer to Hardware (execution
of software in an environment separated from the underlying hardware resources), Memory
(giving an application program the impression that it has contiguous working memory,
isolating it from the underlying physical memory implementation), Storage (the process of
completely abstracting logical storage from physical storage), Software (hosting of multiple
virtualized environments within a single Operating System (OS) instance), Data (the
presentation of data as an abstract layer,
independent of underlying database systems, structures and storage) and Network (creation
of a virtualized network addressing space within or across network subnets). Virtualization
has become an indispensable ingredient for almost every Cloud; the most obvious reasons being the
ease of abstraction and encapsulation. Amongst the other important reasons for which the Clouds
tend to adopt virtualization are:
(i) Server and application consolidation – as multiple applications can be run on the same
server resources can be utilized more efficiently.
(ii) (ii) Configurability – as the resource requirements for various applications could diff er
significantly, (some require largestorage,some requirehigher computation capability)
virtualization is the only solution for customized configuration and aggregation of resources
which are not achievable at the hardware level.
(iii) (iii) Increased application availability – virtualization allows quick recovery from
unplanned outages as virtual environments can be backed up and migrated with no
interruption in services.
(iv) (iv) Improved responsiveness – resource provisioning, monitoring and maintenance can be
automated, and common resources can be cached and reused.
CLOUD STORAGE Cloud storage is a service that maintains data, manage and backup
remotely and made data available to users over the network (via internet) .There are many cloud
storage providers. Most of the providers provide free space up to certain gigabytes.
For ex: DropBox provide free space up to 2GB, Google Drive, Box, Amazon, Apple Cloud
provide free space up to 5GB, Microsoft SkyDrive provide free space up to
7GB[4].Customer have to pay amount according to the plan if they cross the free space
limit.
Features like maximum file size, auto backup, bandwidth, upgrade for limited space differ
from one provider to another provider like maximum file size in DropBox is 300MB where
as maximum file size in Google Drive is 1TB .
By using cloud storage service, customers need not invest on storage devices, even technical
support is not required for maintenance, the storage, backup, disaster recovery . The concept
of cloud storage in not worth when the client is able to store and manage the data at low cost
when compared through the use of cloud .So, the cloud should be designed in such a way
that it is cost effective, autonomic computable, multi-tenant, scalable, available, control,
efficient.
Storage Network Industry Association TM published CDMI in the year 2009.This supports
both Legacy and New applications. Cloud storage standards define roles and
responsibilities for archiving, retrieving, data ownership.
This also provides standard auditing way so that calculations are done in consistent
manner. These are helpful to the cloud storage providers, cloud storage subscribers, cloud
storage developers, cloud storage service brokers .By using CDMI, cloud storage subscribers
can easily identify the providers according to their requirements.
Even, the CDMI provides common interface for providers to advertise their specific
capabilities so that subscribers can easily identify the providers.
Cloud storage architecture consists of front end, middleware, back end. The front end can
be webservice frontend, file based front end, and even more traditional front ends. The
middleware consists of storage logic which implements various features like replication, data
reduction, data placement algorithms. The back end implements the physical storage for data.
The access methods for cloud are different from traditional storage as the cloud holds
different type of data of different customers. Most of the providers implement multiple access
methods.
Virtual storage architecture
An important part of the cloud model is , the concept of a pool of resources that is drawn
from upon the demand in small increments .The recent innovation that has made this
possible is virtualization. Cloud Storage is simply the delivery of virtualized storage on
demand.
This architecture is based on Storage Virtualization Model. It consists of three layers namely
1.Interface Layer, 2.Rule and Metadata Management, 3. Virtual Storage Management. In
Interface Layer, Administrator and users are provided with the interface modes that may
include icommands, client web browsers.
The Rule and Metadata Management layer consists of 2 parts- Upper layer and Under layer.
The upper layer consists of separate interface for client and admin. Both interface‘s have
different rights. Rule is created from the Operating Transactions. In the client interface, user
requests are sent to the Resource Based Services and Meta-Based Services.
These services are present in the Under layer. Resource based service control resource
scheduling, where as Meta-based Service manages the Meta data. Physical device
virtualization and data/ file request load balancing is taken care by the Virtual Storage
Management layer. Parameters like bandwidth, rotating speed etc are maintained by URM.
System maintains a table holding these parameters and also routing table.
After analyzing all resource nodes, system will assemble the collection in logic space and
structure a global space at last. If there is data/file write request, system invokes write
operation. Similarly, Replica routing module is invoked when there is need to balance the
load. Replica module is implemented by using Fair-Share Replication algorithm. Based on
the access load factor, this algorithm will identify the best candidate nodes for replicas
replacement.
There are many benefits to storing data in the cloud over local storage.
Companies only pay for the storage they use. It creates operating expenses rather than capital
expenses.
The data is quickly accessible and reliable. The data is located on the web across multiple
storage systems instead of a local site.
Better protection in case of a disaster. Sometimes, the organization has a local backup and in
cases of fire or natural disaster, the backup will not be available.
Cloud vendors provide hardware redundancy and automatic storage failover. This help to avoid
service outages caused by hardware failure. The vendors know how to distribute copies to
mitigate any hardware failure.
Virtually limitless storage capacities. If the customer does not have the necessity of extra
storage, the costs will decrease.
On the other hand, there are many disadvantages to storing data in the cloud over local
storage.
Immaturity. Vendors had to rewrite solutions to solve some incompatibilities with storing data
online, and It has created difficulty for organizations (Galloway, 2013).
Price and Reliability. The customer has to calculate the cost-effectiveness of a cloud solution
against hosting and maintaining their data.
DEVELOPMENT IN PHYTHON
DESIGN APPROACH
Design Patterns in Python
Python is a ground-up object oriented language which enables one to do object oriented
programming in a very easy manner. While designing solutions in Python, especially the
ones that are more than use-and-throw scripts which are popular in the scripting world, they
can be very handy.
Python is a rapid application development and prototyping language. So, design patterns can
be a very powerful tool in the hands of a Python programmer.
• Humans are not satisfied with the quality of images and therefore they make use of
image processing.
• Humans rely upon their visual system (eyes and brain) to collect visual information about
their surroundings. Visual information refers to images and videos. In the past, we needed
visual information mainly for survival. Nowadays, visual information is required for survival as
well as for communication and entertainment purpose.
• To enhance an image
• To extract some useful information from an image that can be utilised for heath sci- ences,public
safety, etc. So, in short, following steps are involved in image processing :
1) We take input as an image.
Python has multiple libraries for multiple purposes like web development, scientific and
numeric computing, image processing.
To work on images, Python has a library i.e Python Imaging Library(PIL) for image
processing operations. The Python Imaging Library provides many functions for image
processing. We performed some basic operations using PIL modules.
Its initial release was in the year 1995. And many versions of PIL are available according to
our operating system.
Some of the file formats that it supports are ppm, png, jpeg, tiff, bmp, gif. PIL has been
written in C and Python programming language. We can also install PIL through pip. $ sudo
easy_install pip Now, to install Pillow, simply type the following in your terminal: sudo pip
install pillow Also, PIL can be used for the image enhancement and the development of the
Python based image processing application so that it becomes easy for the beginners to learn
and understand the complex tasks of the image processing using Python based image
processing.
Cross platform
Occasionally crashes
OneDrive, previously known as SkyDrive, was rolled out in 2007 as Microsoft‘s own
cloud storage platform. It works as part of the Microsoft Office Suite and gives users
5GB of free storage space. Registered students and those working in academia are given
1TB of free storage.
OneDrive is available for all platforms. You need to have a Hotmail or Microsoft account
but this is very easy to set up. Users can collaborate on, share and store documents.
OneDrive also gives you offline access to documents so you can always have your most
important documents at your fingertips. It comes pre-installed on all Windows 10
machines and can be easily accessed or downloaded onto other platforms.
One of the main complaints about OneDrive is that it appears to have trouble syncing at
times and there have been reports by users that it can crash on occasion.
You can upgrade your storage to 50GB for $3 (£2.30) a month.
Egnyte
Flexible pricing plus a robust interface makes Egnyte an ideal document storage platform
Excellent integration
Egnyte was founded in 2007. The company provides software for enterprise file
synchronization and sharing.
Egnyte allows businesses to store their data locally and in the cloud. All types of data can
be stored in the cloud, whilst data of a more sensitive nature can be stored on servers on-
premise. This provides better security.
Business teams can work how and where they want with an easy to use collaboration
system through their content services platform.
Egnyte integrates with the more popular industry applications such as Office 365. This
allows remote and internal employees to access all the files they need.
The ‗Office‘ plan starts at $8 (£6.14) per employee per month. This covers 5-25
employees, 5TB of storage and 10GB max file size.
The ‗Business‘ packages starts at $15 (£11.51) per employee per month. This includes
25-100 employees, 10TB online storage and 10GB max file size.
In order to take advantage of their ‗Enterprise tier‘, which includes over 100 employees,
25GB max file size and unlimited storage, you will need to contact Egnyte directly.
Egnyte offer a 15-day free trial for their packages.
Users have observed that some files, such as photos, can take a long time to sync.
Box is a cloud content management and file sharing service for businesses. It was
founded in 2005.
Box offers strong management capabilities and security features. The interface is made
for ease of use and is simple to navigate.
The dashboard allows access to settings and files and folders. Admins cam manage all
users, monitors activity and control sharing.
As Box has been around for a while, it is supported by a number of mainstream apps such
as Google Docs and Office 365. The Box Sync client is available from the Downloads
page for Mac and Windows, plus there's also an official Android client.
Box offers a 14-day free trial for all packages. Their ‗Starter‘ plan is priced at $5 (£3.84)
per user per month. This includes 100GB secure storage, 2GB file upload with a
maximum of 10 users.
Their ‗Business‘ plan starts at $16 (£12.27) per user per month which includes unlimited
storage, 5GB file upload and no maximum number of users.
The ‗Business Plus‘ package is $27 (£20.71) per user per month and this comes with
unlimited storage, 5GB file upload and unlimited external collaborators.
In order to subscribe to Box‘s ‗Enterprise‘ plan, users will have to contact them directly
for a quote.
Unlike other cloud storage providers, if you choose to share a file with someone who
doesn‘t have a Box account they‘ll only have read-only access.
9
Dropbox
Simplified cloud storage from a veteran in the field
2GB free
Relatively expensive
Dropbox is one of the oldest cloud storage providers. It does offer a rather miniscule 2GB
of storage space for free users but this can be increased by up to 16GB through referrals
as well as by linking your Dropbox account to social media accounts.
To date it is one of the simplest storage providers to use. Dropbox can be installed on
most computers or devices and syncs easily between apps. The app can store almost any
kind of file with no compatibility issues. You can drag and drop files into the desktop app
with ease.
You can also share files with other users easily through links, even if they don‘t have a
Dropbox account.
As Dropbox has been around for a long time it integrates with most other apps such as
MS Office and Slack.
The downside to Dropbox is that it can be expensive if you need more than 2GB of space
and you have run out of friends to refer. The ‗Plus‘ package allows you to upgrade to
1TB of storage at a cost of $9.99 (£7.67) per month. The ‗Business‘ package offers users
2TB worth of storage for $19.99 (£15.34) per month.
You can sign up for Dropbox here
SpiderOak
1
7
straightforward to use. They also include a handy drag and drop feature for organising
files.
Users can access settings for all applications such as backup selection and sharing from
the centralized device management dashboard. The dashboard also allows users to
manage their accounts, set group permissions and gain insight into usage.
Prospective clients will need to contact SpiderOak‘s Sales Team directly to obtain a
quote.
Online commentators have observed that SpiderOak lacks many of the collaboration tools
that are available from other cloud storage providers.
You can sign up for SpiderOak here
MAPREDUCE APP
The MapReduce library is available in two versions: one for Java and one for
Python.
Both libraries are built on top of App Engine services, including Datastore and Task Queues.
You must download the MapReduce library and include it with your application. The library
provides:
A programming model for large-scale distributed data processing
I/O scheduling
There are no usage charges associated with the MapReduce library. As with any App Engine
application, you are charged for any App Engine resources that the library or your MapReduce
code consumes (beyond the free quotas) while running your job. These can include instance
hours, Datastore and Google Cloud Storage usage, network, and other storage.
1
8
The Python MapReduce library can be used for complete map-shuffle-reduce pipelines
only. It does not have the ability to run a map-only job.
The App Engine adaptation of Google's MapReduce model is optimized for the needs of
the App Engine environment, where resource quota management is a key consideration. This
releaseof the MapReduce API provides the following features and capabilities:
Automatic sharding for faster execution, allowing you to use as many workers as you
need to get your results faster
Standard data input readers for iterating over blob and datastore data.
Standard output writers
Status pages to let you see how your jobs are running
Processing rate limiting to slow down your mapper functions and space out the work,
helping you avoid exceeding your resource quotas
MapReduce Job
A MapReduce job has three stages: map, shuffle, and reduce. Each stage in the sequence
must complete before the next one can run. Intermediate data is stored temporarily between
the stages. The map stage transforms single input items to key-value pairs, the shuffle stage
groups values with the same key together, and the reduce stage processes all the items with
the same key at once.
The map-shuffle-reduce algorithm is very powerful because it allows you to process all
the items (values) that share some common trait (key), even when there is no way to access
those items directly because, for instance, the trait is computed.
The data flow for a MapReduce job looks like this:
Map
The MapReduce library includes a Mapper class that performs the map stage. The map
stage uses an input reader that delivers data one record at a time. The library also contains a
collection of Input classes that implement readers for common types of data. You can also
create your own reader, if needed.
1
9
The map stage uses a map() function that you must implement. When the map stage runs,
it repeatedly calls the reader to get one input record at a time and applies the map() function
to the record.
The implementation of the map() function depends on the kind of job you are running.
When used in a Map job, the map() function emits output values. When used in a map reduce
job, the map() function emits key-value pairs for the shuffle stage.
When emitting pairs for a MapReduce job, the keys do not have to be unique. The
same key can appear in many pairs. For example, assume the input is a dog database that
contains records listing license id, breed, and name:
14877 poodle muffy
88390 beagle dotty
A MapReduce job that computes the most popular name for each breed has a map()
function that pairs each dog's name with its breed and emits these pairs:
(poodle, muffy) (beagle, dotty) (collie, pancakes) (beagle, esther) (collie,
lassie) (poodle, albert) (poodle, muffy)
The marketing managers often face the problem of taking informed decisions in
designing marketing strategies. With increasing gap between the perception of the marketing
managers and the educated customers about the product/service, designing an effective
marketing strategy has become more complex. In order to reduce this gap, with rapid growth of
social media presence, social media can be used as it is a good approximation of the entire web.
Almost all companies have their content and dedicated pages, channels on various social
media platforms.
The final goal of any marketing strategy is to help the business grow, increase its brand
awareness and customer base; bolstering trust with current stakeholders is an added advantage.
2
0
The competitive edge obtained by the use of SMA in marketing strategy is well explained
through a case study of a financial institution Applications of SMA
Some of the key applications of SMA in marketing are brand management, effective
marketing communications, real-time identification of the competitors and customer
engagement.
While the entire industry is moving towards agility, time has really become a key
parameter in deciding the success and failure of the marketing strategy. With social media, the
response time is reduced from days and weeks to just minutes and hours.
This gives an excellent opportunity to analyze the effectiveness of the strategy,
customer sentiment and also capture the valuable feedback from the customers in little time.
Based on the SMA, suitable changes can be made to the strategy or the plan which helps to
market the product/service in a better way.
The marketing strategy basically includes detailing the specific activities that have to
be undertaken, identifying the target audience for each of its activities, specifying metrics for
measuring success, being flexible to allow adjustments if necessary, and automate the process.
Detailing the specific activities that have to be undertaken Deciding upon the type of
content that the company uses to promote its product on social media. The content could be of
many types – plain text, links, images, videos, quotes and re-shares.
Apart from having a main content type, posting a different type of content occasionally
will help to alleviate any possible boredom to prospective customers. Other than the content
type, the company should also focus on maintaining a good social media profile which is
consistent across various channels and evidently inform the followers what they can expect from
the company.
Identifying target audience for each of its activities All content types will not succeed
on all social media platforms as different type of users are present on different platforms.
Inorder to have a successful marketing strategy, it is imperative to know what content type on
which social media platform will yield significant results. Apart from this, Pew research data and
Google analytics demographics data can also help to know type of population living in a
particular region, which will collectively aid in target audience for each of the activity (or posts).
2
1
Specifying metrics for measuring success
The metrics to determine the effectiveness of a marketing strategy are number of clicks,
traffic to website, number of followers, likes, shares, comments, etc.
Compare the current analytics of the company to analytics in a month after
implementation of new marketing strategy. Apart from these metrics, social media
reports (Page Insights for Facebook; Twtrland for Twitter; Klout for all social media)
also shows how successful the company‘s social media activity have been so that the
company can know if it is engaging with the actual prospects or not.
With social media, the feedback from the customers can be obtained in a span of very
short time. Any mistakes in the marketing strategy can be identified very quickly based on the
feedback from the customers and also suitable actions will be taken to rectify it. Being flexible to
allow adjustments also helps the entire process to become agile i.e. to address the ever-changing
customers demand very quickly.
When the marketing strategy is determined to be an effective one, the entire process is
automated to post weekly or periodically about the product and company updates.
What is SMAC?
Today, this has been addressed by the arrival of data visualisation tools and customised
ISVs that are built with industry specific templates, improving the user‘s experience and
allowing executives to quickly gain access to the latest business information. Added to this have
2
2
been the integration of analytics tools and a host of social and collaborative procedures: all of
which can be accessed via mobile devices.
This combination of new technologies is known as Social, Mobile, Analytics and Cloud:
or SMAC for short. Social helps people to find their colleagues, who they can then collaborate
with; mobile provides access to other data sources and the cloud;
the cloud contains the information and the applications that people use; and analytics
allows people to make sense of this data. The broad idea of SMAC is that social networks like
Facebook and Twitter can be used for brand building and customer engagement; big data
analytics can be used to analyse large volumes of data; cloud computing provides a shared pool
of resources; and mobile applications provide access to services on the go.
Why SMAC is so important for businesses
In the coming years, it is widely expected that there will be three major trends that emerge
and affect not only IT technologies but also the way we do business: all of which will be heavily
impacted by SMAC. These include:
New working styles: Today in business, both employees and customers expect a style of
content, collaboration and commerce that offers the same ―anytime, anywhere‖ convenience
that they enjoy related to their personal lives with companies such as Facebook and Amazon.
It is expected that there will be an increase in the mobile elite workforce; especially as
wearable devices such as watches and glasses add to user‘s options.In terms of SMAC,
business applications will be required to embrace this approach in order to maximise
productivity and convenience.
It is expected that SMAC architectures will become the new way forward for interaction and
the preferred application paradigm. IT departments will need to offer a contextually relevant
experience that will support new working styles – seamlessly integrating mobile application
management with device management and social platforms.
Digitisation of processes and business models: With society and nations migrating towards
the internet economy, entire business processes are becoming increasingly digitised. Many
media and entertainment industries have made the digital switch – such as music and movies
and now it is likely that other industries will see their physical chain become increasingly digitised.
In the SMAC era, the role of the CIO will change from an engineer to a pioneer with
processes set to be broken down into individual components and redesigned from a digital
perspective. Business process analysts will become more like business process scientists
2
3
needing to combine analytic skills with an understanding of how to make the most of
emerging technologies. Hybrid cloud environments will also be needed to support these
dynamic services and help create flexible models for these digital services.
Information overload: According to Cisco, the number of devices that are connected to IP
networks will actually be three times higher than the population by 2017. This exponential
growth, which will include the emergence of the internet of things, will place new burdens on
data centres and existing information infrastructures.
As such, within the SMAC area, IT departments will need to supply an infrastructure layer
that is capable of dealing with vast amounts of data streaming as well as making informative
and intelligent decisions. It has even been predicted by Gartner that 10 per cent of computers
will be learning and not processing devices by 2017.
In addition, in-memory databases will be used to speed up analytical processes, which will
offer a host of benefits. It will allow analytical processing and transactions to take place
within the same in-memory database. Finally, there should be improvements across the
voice over internet protocol (VoIP) that will give companies insight into their mobile
workforce.
It will be necessary for IT departments to master the deployment and development of SMAC
to support the work styles of their customers and employees, to understand how to access
new technologies and to address the forthcoming information overload. It will be necessary
to focus on intelligent analytics to make informed decisions and ensure smooth processing.
2
4
QUESTIONS BANK 1 MARKS
1. Which of the following is not a type of cloud?
A : PrivateB : Public
C : Protected
D : Hybrid
2. Which of the following architectural standards is working with cloud computing industry?
A : Service-oriented architecture B : Standardized Web services C : Web-application
frameworks D : Web-based archietectrure
3. Which of the following provides evidence that the message received is the same as created
by itsrightful sender ?
A : Trusted Signature B : Analog Signature C : Digital Signature
D : Encryption
2
5
C : Messaging
D : Run Server
8. When you add operating system and applications to the service, the model called as
___.
A : PaaS B : CaaS C : SaaS D : IaaS
9. Which organization supports the development of standards for the cloud computing? A :
IEEE
B : OMG
C : OCC
D : Stateless
2
6
C. Cloud Computing provides us means of accessing the applications as utilities over
computer only.
D. All of the above
D. 5
2
7
D. Public-as-a-Service
10. provides the runtime environment for applications, development and deployment
tools, etc.
A. IaaS
B. PaaS
C. SaaS
D. XaaS
E.
11. __________is yet another service model, which includes Network-as-a-Service,
Business-as-a- Service, Identity-as-a-Service, Database-as-a-Service or Strategy-as-a-
Service.
A. IaaS
B. PaaS
C. SaaS
D. XaaS
5 MARKS
10 MARKS
2
8
b) Social media analytic app
2
9
UNIT IV
INTRODUCTION
You’ve likely heard the terms “Big Data” and “Cloud Computing” before. If you’re involved with cloud
application development, you may even have experience with them. The two go hand-in-hand, with many
public cloud services performing big data analytics.
With Software as a Service (SaaS) becoming increasingly popular, keeping up-to-date with cloud
infrastructure best practices and the types of data that can be stored in large quantities is crucial. We’ll take a
look at the differences between cloud computing and big data, the relationship between them, and why the
two are a perfect match, bringing us lots of new, innovative technologies, such as artificial intelligence.
Before discussing how the two go together, it’s important to form a clear distinction between “Big Data” and
“Cloud Computing”. Although they are technically different terms, they’re often seen together in literature
because they interact synergistically with one another.
Big Data: This simply refers to the very large sets of data that are output by a variety of programs. It
can refer to any of a large variety of types of data, and the data sets are usually far too large to peruse
or query on a regular computer.
Cloud Computing: This refers to the processing of anything, including Big Data Analytics, on the
“cloud”. The “cloud” is just a set of high-powered servers from one of many providers. They can
often view and query large data sets much more quickly than a standard computer could.
Essentially, “Big Data” refers to the large sets of data collected, while “Cloud Computing” refers to the
mechanism that remotely takes this data in and performs any operations specified on that data.
The Cloud Client Library is the idiomatic way for Python 3 developers to integrate their apps with Google
Cloud services on the Python 3 runtime.
For example, you can install the corresponding Python client library for Cloud Datastore or Cloud Storage to
integrate those services with your app.
For a complete list of all of the Python libraries for the supported Google Cloud services, see APIs & Python
Libraries
Example
Assume that you want to use Cloud Datastore on your local machine.
If you want to use Python client libraries in your App Engine app, see Specifying Dependencies.
1. Structured
2. Unstructured
3. Semi-structured
Structured
Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’
data. Over the period of time, talent in computer science has achieved greater success in developing
techniques for working with such kind of data (where the format is well known in advance) and also deriving
value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent,
typical sizes are being in the rage of multiple zettabytes.
Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.
Looking at these figures one can easily understand why the name Big Data is given and imagine the
challenges involved in its storage and processing.
Do you know? Data stored in a relational database management system is one example of
a ‘structured’ data.
Unstructured
Any data with unknown form or the structure is classified as unstructured data. In addition to the size being
huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A
typical example of unstructured data is a heterogeneous data source containing a combination of simple text
files, images, videos etc. Now day organizations have wealth of data available with them but unfortunately,
they don’t know how to derive value out of it since this data is in its raw form or unstructured format.
Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in
form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured
data is a data represented in an XML file.
<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
Please note that web application data, which is unstructured, consists of log files, transaction history files etc.
OLTP systems are built to work with structured data wherein data is stored in relations (tables).
Volume
Variety
Velocity
Variability
(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial
role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data
or not, is dependent upon the volume of data. Hence, ‘Volume’ is one characteristic which needs to be
considered while dealing with Big Data solutions.
Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During
earlier days, spreadsheets and databases were the only sources of data considered by most of the applications.
Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being
considered in the analysis applications. This variety of unstructured data poses certain issues for storage,
mining and analyzing data.
(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated
and processed to meet the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like business processes,
application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive
and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus hampering
the process of being able to handle and manage the data effectively.
Ability to process Big Data in DBMS brings in multiple benefits, such as-
Access to social data from search engines and sites like facebook, twitter are enabling organizations to fine
tune their business strategies.
Traditional customer feedback systems are getting replaced by new systems designed with Big Data
technologies. In these new systems, Big Data and natural language processing technologies are being used to
read and evaluate consumer responses.
Big Data technologies can be used for creating a staging area or landing zone for new data before identifying
what data should be moved to the data warehouse. In addition, such integration of Big Data technologies and
data warehouse helps an organization to offload infrequently accessed data.
Organizing, managing and storing data is important as it enables easier access and efficient modifications.
Data Structures allows you to organize your data in such a way that enables you to store collections of data,
relate them and perform operations on them accordingly.
Types of Data Structures in Python
Python has implicit support for Data Structures which enable you to store and access data. These structures
are called List, Dictionary, Tuple and Set.
Python allows its users to create their own Data Structures enabling them to have full control over their
functionality. The most prominent Data Structures are Stack, Queue, Tree, Linked List and so on which are
also available to you in other programming languages. So now that you know what are the types available to
you, why don’t we move ahead to the Data Structures and implement them using Python.
As the name suggests, these Data Structures are built-in with Python which makes programming easier and
helps programmers use them to obtain solutions faster. Let’s discuss each of them in detail.
Lists
Lists are used to store data of different data types in a sequential manner. There are addresses assigned to
every element of the list, which is called as Index. The index value starts from 0 and goes on until the last
element called the positive index. There is also negative indexing which starts from -1 enabling you to
access elements from the last to first. Let us now understand lists better with the help of an example program.
Creating a list
To create a list, you use the square brackets and add elements into it accordingly. If you do not pass any
elements inside the square brackets, you get an empty list as the output.
my_list = [] #create empty list
print(my_list)
my_list = [1, 2, 3, 'example', 3.132] #creating list with data
print(my_list)
Output:
[]
[1, 2, 3, ‘example’, 3.132]
Adding Elements
Adding the elements in the list can be achieved using the append(), extend() and insert() functions.
The append() function adds all the elements passed to it as a single element.
The extend() function adds the elements one-by-one into the list.
The insert() function adds the element passed to the index value and increase the size of the list too.
my_list = [1, 2, 3]
print(my_list)
my_list.append([555, 12]) #add as a single element
print(my_list)
my_list.extend([234, 'more_example']) #add as different elements
print(my_list)
my_list.insert(1, 'insert_example') #add element i
print(my_list)
Output:
[1, 2, 3]
[1, 2, 3, [555, 12]]
[1, 2, 3, [555, 12], 234, ‘more_example’]
[1, ‘insert_example’, 2, 3, [555, 12], 234, ‘more_example’]
Deleting Elements
To delete elements, use the del keyword which is built-in into Python but this does not return anything
back to us.
If you want the element back, you use the pop() function which takes the index value.
To remove an element by its value, you use the remove() function.
Example:
Accessing Elements
Accessing elements is the same as accessing Strings in Python. You pass the index values and hence can
obtain the values as needed.
Output:
1
2
3
example
3.132
10
30
[1, 2, 3, ‘example’, 3.132, 10, 30]
example
[1, 2]
[30, 10, 3.132, ‘example’, 3, 2, 1]
Other Functions
You have several other functions that can be used when working with lists.
Dictionary
Dictionaries are used to store key-value pairs. To understand better, think of a phone directory where
hundreds and thousands of names and their corresponding numbers have been added. Now the constant
values here are Name and the Phone Numbers which are called as the keys. And the various names and phone
numbers are the values that have been fed to the keys. If you access the values of the keys, you will obtain all
the names and phone numbers. So that is what a key-value pair is. And in Python, this structure is stored
using Dictionaries. Let us understand this better with an example program.
Cloud Functions
Google Cloud Functions is a serverless execution environment for building and connecting cloud services.
With Cloud Functions you write simple, single-purpose functions that are attached to events emitted from
your cloud infrastructure and services. Your function is triggered when an event being watched is fired. Your
code executes in a fully managed environment. There is no need to provision any infrastructure or worry
about managing any servers.
Cloud Functions can be written using JavaScript, Python 3, Go, or Java runtimes on Google Cloud Platform.
You can take your function and run it in any standard Node.js (Node.js 10 or 12), Python 3 (Python 3.7 or
3.8), Go (Go 1.11 or 1.13) or Java (Java 11) environment, which makes both portability and local testing a
breeze.
Cloud Functions provides a connective layer of logic that lets you write code to connect and extend cloud
services. Listen and respond to a file upload to Cloud Storage, a log change, or an incoming message on a
Pub/Sub topic. Cloud Functions augments existing cloud services and allows you to address an increasing
number of use cases with arbitrary programming logic. Cloud Functions have access to the Google Service
Account credential and are thus seamlessly authenticated with the majority of Google Cloud services,
including Cloud Vision, as well as many others. In addition, Cloud Functions are supported by numerous
Google Cloud client libraries, which further simplify these integrations.
Cloud events are things that happen in your cloud environment. These might be things like changes to data in
a database, files added to a storage system, or a new virtual machine instance being created.
Events occur whether or not you choose to respond to them. You create a response to an event with a trigger.
A trigger is a declaration that you are interested in a certain event or set of events. Binding a function to a
trigger allows you to capture and act on events. For more information on creating triggers and associating
them with your functions, see Events and Triggers.
Serverless
Cloud Functions removes the work of managing servers, configuring software, updating frameworks, and
patching operating systems. The software and infrastructure are fully managed by Google so that you just add
code. Furthermore, provisioning of resources happens automatically in response to events. This means that a
function can scale from a few invocations a day to many millions of invocations without any work from you.
Use cases
Asynchronous workloads like lightweight ETL, or cloud automations such as triggering application builds
now no longer need their own server and a developer to wire it up. You simply deploy a function bound to the
event you want and you're done.
The fine-grained, on-demand nature of Cloud Functions also makes it a perfect candidate for lightweight
APIs and webhooks. In addition, the automatic provisioning of HTTP endpoints when you deploy an HTTP
function means there is no complicated configuration required as there is with some other services. See the
following table for additional common Cloud Functions use cases:
Listen and respond to Cloud Storage events such as when a file is created, changed, or
Data processing / ETL removed. Process images, perform video transcoding, validate and transform data, and
invoke any service on the internet from your Cloud Functions.
Via a simple HTTP trigger, respond to events originating from 3rd party systems like
Webhooks
GitHub, Slack, Stripe, or from anywhere that can send HTTP requests.
Compose applications from lightweight, loosely coupled bits of logic that are quick to
Lightweight APIs build and that scale instantly. Your functions can be event-driven or invoked directly
over HTTP/S.
Use Google’s mobile platform for app developers, Firebase, and write your mobile
Mobile backend backend in Cloud Functions. Listen and respond to events from Firebase Analytics,
Realtime Database, Authentication, and Storage.
MODULES
Statistics
This core module focuses on the statistical techniques that are expected of a data analyst, and provides
students with the opportunity to apply these techniques using the industry standard software SAS. This
module lays fundamental core knowledge that will be built upon in other modules in the programme, to
enable students to effectively recognise and use statistics in problem solving.
Business Intelligence Systems Concepts and Method
The module introduces Business Intelligence (BI) systems, which other modules can draw upon when
studying more a detailed BI system component or the development of such systems. Specifically, it
introduces students to the Business Intelligence (BI) system concept and its application within organisations.
The historical and current relationships between BI systems and other types of computer-based Information
Systems (IS), such as decision and management support systems, data warehouses and artificial intelligence
systems, are discussed, in addition to assessing the reasons why and how organisations utilise BI systems, and
their overall architecture and expected vs. actual impact/effect.
Research Methods
This module provides grounding in the research methods required at MSc level, looking at both quantitative
and qualitative approaches including laboratory evaluation, surveys, case studies and action research.
Example research studies from appropriate areas are analysed to obtain an understanding of types of research
problems and applicable research methods.
This module covers the design of data warehouses and how an On-Line Analytical Processing (OLAP) tool
can provide access to data within a data warehouse. It builds on the student’s prior knowledge of Relational
Databases and Relational Database Management Systems (DBMS) to consider the data requirements,
underpinned by an appropriate technical infrastructure, for a data warehouse in response to a particular
business situation.
The module builds on the BI systems knowledge already gained by students from previous programme
modules, concentrating on the predictive nature of such applications and on their development. Real life case
studies will be used to illustrate a range of BI applications (such as demand forecasting, fraud detection, risk
analysis, simulation and optimisation). The models used to carry out the processing within the system will be
introduced so that students gain an understanding of the underlying (often) mathematically-based model.
Big Data analytics is the process of collecting, storing and accessing large volumes of unstructured
heterogenous data in order to uncover useful patterns, trends and correlations. Big Data differentiates from
the tradional view of a dataset by the so-called big V’s (Volume, Variety, Velocity and Veracity), where
modern computing systems allow businesses, governments and scientists to gather a vast array of
unstructured data rapidly. Processing such data has provided its own considerable challenges, leading to a
wide spread of new technologies that are constantly changing and improving. This module will introduce
students to state-of-the-art approaches to Big Data problems. It will utilise the Hadoop Distributed File
System (HDFS) and Apache Spark to demonstrate data mining and machine learning algorithms for
knowledge discovery and for presenting the newly acquired information in meaningful ways. Parallel
computing in the cloud will be a key aspect incorporated throughout.
Data Mining Techniques and Applications
Data mining is a collection of tools, methods and statistical techniques for exploring and modelling
relationships in large amounts of data, to enable meaningful information to be extracted for decision making
purposes. The aim of this module is to review the data mining methods and techniques available for
uncovering important information from large data sets and to know when and how to use a particular
technique effectively. The module will enable the student to develop an in-depth knowledge of applying data
mining methods and techniques and interpreting the statistical results in relevant problem domains. This is a
practical module, where the emphasis is on students gaining practical experience of using the data mining
software, SAS Enterprise Miner, to build sensible models and then for the students to apply their knowledge
to interpret the statistical results, to make informed decisions.
Analytics Programming
Decision making requires appropriate and representative information and data to be collected and analysed.
Typically, more effective decisions can be made using large rather than small amounts of data. It is virtually
impossible to perform even the most basic statistical techniques by hand. Instead, data can be entered and
analysed using a computer software package. One such statistical software package is SAS. This is a very
comprehensive package which combines data entry and manipulation capabilities with report production,
graphical display and statistical analysis facilities. This module provides students with the opportunities to
explore the SAS software package and its capabilities. As well as covering how SAS procedures are used to
summarise and display data for inclusion in reports, the module introduces the application of SAS
programming to basic statistical analyses, much of which will be made use of in the IMAT 5238 Data Mining
module. Case studies will be used to illustrate how datasets from external sources are imported into SAS and
how these datasets can be combined together and how new variables can be created.
PYTHON PACKAGES
A python package is a collection of modules. Modules that are related to each other are mainly put in the
same package. When a module from an external package is required in a program, that package can be
imported and its modules can be put to use.
Any Python file, whose name is the module’s name property without the .py extension, is a module.
A package is a directory of Python modules that contains an additional __init__.py file, which distinguishes a
package from a directory that is supposed to contain multiple Python scripts. Packages can be nested to
multiple depths if each corresponding directory contains its own __init__.py file.
When you import a module or a package, the object created by Python is always of type module.
When you import a package, only the methods and the classes in the __init__.py file of that package are
directly visible.
This question must have stuck your head, isn’t it? Why so buzz and stress on this simple thing.
Well for that let’s take an example, suppose you want your python script to fetch data from the internet and
then process that data. Now if data is small then this processing can be done every time you run the script but
in case of humongous data repetitive processing cannot be performed, hence the processed data needs to be
stored. This is where data storage or writing to a file comes in. One thing to note while writing data to a file is
that its consistency and integrity should be maintained.
Once you have stored your data on a file now the most important thing is it’s retrieval because in computer
it’s stored as bits of 1s and 0s and if it’s retrieval is not done properly then it becomes completely useless and
data is said to be corrupted.
Hence writing as well as reading is also important aspect of File Handling in python.
Let’s take an example to understand the standard steps used during File Handling in Python.
Consider a book you want to write in. First, you need to open that book so that you can write in it. Isn’t it?
Same goes here, first, you need to open file so that you can write to it. So to open a file in python we use the
following syntax
The open function returns the instance of the file that you opened to work on. It takes 2 primarily arguments,
file_name and mode. There are four different modes you can open a file to:
For example:
To write to a file first, you must open it in write mode and then you can write to it. However, it is important to
note that all previously written data will be overwritten.
For this example let’s make a file name edureka.txt and write in it using python.
2for _ in range(10):
4fp.close()
As you can see, to write to a file I have first opened a file named edureka.txt and saved its instance in variable
fp. Now I ran a loop 10 times to write “Edureka is a platform for developing market-based skills” in that file
10 times. Now for good programming practice, you must close all the files that you opened.
One thing to note here is to write texts to a file, you must open it in text mode (“t”). If you are working with
binary files use “b” while opening the file.
Now let us write to a binary file, first thing to remember while writing to a binary file is that data is to be
converted into binary format before writing. Moreover, binary data is not human-readable hence you cannot
read it by simply opening a file.
2Data = [1,2,3]
3fp.write(bytearray(Data))
4fp.close()
Here you can see I have first opened binaryFile to write my data into it. Consider I have an array of
information to write to a file(in this case Data) then first i converted into binary data by using function
bytearray() so that data is converted into binary format. Then, at last, I closed the file.
Appending to a File
Now, most of the times you will be writing to a file without destroying the earlier contents. To write to a file
while preserving previous content is called appending to a file.
For this example let’s append to the same file that we already created. Let’s append to edureka.txt
2for _ in range(5):
4fp.close()
Now in the above example, you can see that I have opened a file named edureka.txt using append mode. This
tells python that do not overwrite data but start writing from the last line. So what it would do now is that
after the ending lines it will add “I am appending something to it!” 5 times. And then we have closed that file.
Closing a File
Well, I have already shown how to close a file. Just use file_reference.close() in python to close an opened
file.
For example:
2# Do some work!
3fp.close()
We need to look at five main object classes in this module, which we will eventually need depending on the
work we want to do. After that, we will discuss some examples explaining the tasks of the classes. The
classes are as follows -
1. datetime.date : It allows us to manipulate date without interfering time (month, day, year)
2. datetime.time : It allows us to manipulate date without interfering date (hour, minute, second,
microsecond)
3. datetime.datetime : It allows us to manipulate the combination of date and time (month, day, year,
hour, second, microsecond).
4. datetime.tzinfo : An abstract class for dealing with time zones. These types of objects are immutable.
For instance, to account for different time zones and/or daylight saving times.
5. datetime.timedelta : It is the difference between two date, time or datetime instances; the resolution
ca
DATETIME OPERATIONS
As datetime includes enables us to deal with both the date and time, therefore, first let’s see how this object
behaves. datetime is both a module and a class within that module. We will import the datetime class from
the datetime module and print the current date and time to take a closer look . We can do this using
datetime’s .now() function. We’ll print our datetime object.
print('Type :- ',type(today))
From above example we can understand that how should date and time objects work. date will work only
with dates, excluding the time and time will work vice versa.
dt_nw = datetime.now()
From datetime we can also get the day of the week using its . weekday() function as a number . But we can
convert that to a text format (i.e. Monday, Tuesday, Wednesday…) using the calendar module and a method
called .day_name().
First, we will import calendar module and then find out what is the month and year and the do the above
mentioned operations.
my_date= datetime.now()
# To get month from date
print('Month: ', my_date.month)
The definition of Python class is simply a logical entity that behaves as a prototype or a template to create
objects. Python classes provide methods that could be used to modify an object’s state. They also specify the
qualities that an object may possess.
All object instances of a class share a Python class variable. When a class is being created, variables are
defined. They aren’t defined in any of a class’ methods.
Variables and functions are defined inside the class and are accessed using objects. These variables and
functions are collectively known as attributes.
Let’s take an example to understand the concept of Python classes and objects. We can think of an object as a
regular day-to-day object, say, a car. Now as discussed above, we know that a class has the data and functions
defined inside of it, and all this data and functions can be considered as features and actions of the object,
respectively. That is, the features (data) of the car (object) are color, price, number of doors, etc. The actions
(functions) of the car (object) are speed, application of brakes, etc. Multiple objects with different data and
functions associated with them can be created using a class as depicted by the following diagram.
Classes provide an easy way of keeping the data members and methods together in one place which
helps in keeping the program more organized.
Using classes also provides another functionality of this object-oriented programming paradigm, that
is, inheritance.
Classes also help in overriding any standard operator.
Using classes provides the ability to reuse the code which makes the program more efficient.
Grouping related functions and keeping them in one place (inside a class) provides a clean structure to
the code which increases the readability of the program.
Become a Professional Python Programmer with this complete Python Training in Singapore!
Creating a Python Class
Just as a function in Python is defined using the def keyword, a class in Python is also defined using the class
keyword, followed by the class name.
Much similar to functions, we use docstrings in classes as well. Although the use of docstrings is not
mandatory, it is still recommended as it is considered to be a good practice to include a brief description of
the class to increase the readability and understandability of the code.
class IntellipaatClass:
“Class statements and methods here'”
The create class statement will create a local namespace for all the attributes including the special attributes
that start with double underscores (__), for example, __init__() and __doc__(). As soon as the class is created,
a class object is also created which is used to access the attributes in the class, let us understand this with the
help of a Python class example.
class IntellipaatClass:
a=5
def function1(self):
print(‘Welcome to Intellipaat’)
#accessing attributes using the class object of same name
IntellipaatClass.function(1)
print(IntellipaatClass.a)
Output:
Welcome to Intellipaat
5
We saw in the previous topic that the class object of the same name as the class is used to access attributes.
That is not all the class object is used for; it can also be used to create new objects, and then those objects can
be used to access the attributes as shown in the following example:
class IntellipaatClass:
a=5
def function1(self):
print(‘Welcome to Intellipaat’)#creating a new object named object1 using class object
object1 = IntellipaatClass()
Output:
Welcome to Intellipaat
We must notice that we are using a parameter named self while defining the function in the class, but we’re
not really passing any value while calling the function. That is because, when a function is called using an
object, the object itself is passed automatically to the function as an argument, so object1.function1() is
equivalent to object1.function1(object1). That’s why the very first argument in the function must be the
object itself, which is conventionally called ‘self’. It can be named something else too, but naming it ‘self’ is
a convention and it is considered as a good practice to follow this convention.
Person.age=20
For example,
del objectName
There are various types of classes in Python in which some are as follows:
An abstract class is a class that contains one or more abstract methods. The term “abstract method” refers to a
method that has a declaration but no implementation. When working with a large codebase, it might be
difficult to remember all classes. That is when a Python Abstract class can be used. Python, unlike most high-
level languages, does not have an abstract class by default.
A class can be defined as an abstract class using abc.ABC, and a method can be defined as an abstract method
using abc.abstractmethod. The abbreviation for abstract base class is ABC. The ABC module, which provides
the foundation for building Abstract Base classes, must be imported. The ABC module operates by decorating
base class methods as abstract. It installs concrete classes as the abstract base’s implementations.
Example:
Concreate classes have only concrete methods but abstract classes can have both concrete methods and
abstract methods. The concrete class implements abstract methods, but the abstract base class can also do so
by initiating the methods through super ().
The partial class is one of the python classes. We can use it to develop a new function that only applies a
subset of the statements and keywords you pass to it. You can use partial to freeze a chunk of your function’s
statements and/or keywords, resulting in the creation of a new object. We can use the Functools module to
implement this class.
AWS is a cloud platform service from amazon, used to create and deploy any type of
application in the cloud.
AWS is a Cloud platform service offering compute power, data storage, and a wide array
of other IT solutions and utilities for modern organizations. AWS was launched in 2006, and has
since become one of the most popular cloud platforms currently available.
We should have an account in AWS to use aws services. It offers many featured services
for compute, storage, networking, analytics, application services, deployment, identity and
access management, directory services, security and many more cloud services.
We can use Boto3 (python package) which provides interfaces to Amazon Web Services,
it makes us easy to integrate our Python application, library, or script with AWS services.
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for
Python, which allows Python developers to write software that makes use of services like
Amazon S3(Simple storage service) and Amazon EC2(Elastic Compute Cloud).
Amazon Lambda
Amazon Lambda is a computing service which automatically manages the server. AWS
Lambda executes our code only when needed and scales automatically, from a few requests per
day to thousands per second.
We only pay for the compute time we comsume , and there will be no charge if the code
is not running.
The initial purpose of lambda is to simplify building on-demand applications that are
responsive to events. AWS starts a Lambda instance within milliseconds of an event.
As I have just mentioned that there are various Cloud service providers in the market, so
what makes Google Cloud Platform different? Following image will give you some of the
major reasons why one should opt for it:
Pricing: GCP leaves all the competition way behind with its highly flexible pricing and
is rightly a leader here
Scalability: Scaling down can always be an issue with cloud services. GCP lets you
scale up and down with extreme ease
Custom Machines: With Custom Machine Types you can easily create a machine type
customized to your needs with discount implications up to 50% off
Integrations: Easily use various API‘s, practice Internet of Things and Cloud Artificial
Intelligence
Big Data Analytics: Use Big Query and Big Data Analytics to carry out plethora of
analytical practices
Serverless: Serverless is a new paradigm of computing that abstracts away the
complexity associated with managing servers for mobile and API back-ends, ETL, data
processing jobs, databases, and more.
In case you wish to know more about Google Cloud Platform and also get introduced to its
practical aspect then the following video is highly recommended.
Google offers a wide range of Services. In case you wish to know more about Google Cloud
Platform Services this blog talks about it in detail.
You can also watch the below video from our Google Cloud expert, discussing about Google
Cloud Platform.
What Is Google Cloud Platform?
You can think of it as collection of Cloud Services offered by Google. The platform hosts a
wide range of services like comprising of
Compute
Storage
Application development
Now who can access these services? Well these can be accessed by developers, cloud
administrators and other enterprise IT professionals. This can be done through the public internet
or through a dedicated network connection.
Next I will putforth some of the core functionalities and services of GCP:
Google Compute Engine: Google Compute Engine helps you deliver VM that runs in
Google‘s innovative data centers and worldwide fiber network. It lets you scale from
single instances to global and implement load-balanced cloud computing.
App Engine: This PaaS offering lets developers access Google‘s scalable hosting.
Developers are also free to access software SDK‘s to develop software products that run
on App Engine.
Cloud Storage: Google Cloud Storage platform enables you to store large, unstructured
data sets. Well, Google also offers database storage options such as Cloud Datastore
for No SQL non relational storage, Cloud SQL for MySQL fully relational storage and
Google‘s native Cloud Big table database.
Google Container Engine: It is a management and orchestration system
for Docker containers that runs within Google‘s public cloud. Google Container Engine
is based on the Google Kubernetes container orchestration engine.
Azure:
This article introduces you to two Python development tools. One is the IPython
Notebook, can be deployed to Linux and Windows virtual machines (VM) on Microsoft Azure.
Tools for Developing Python Applications on Microsoft Azure
This section gives you an overview of the IPython Notebooks and PTVS
development
tools.
Using IPython Notebooks The IPython project provides a collection of tools for
scientific
computing that include interactive shells, high-performance and easy to use parallel
libraries, anda web-based development environment called the IPython Notebook.
The notebook provides a working environment for interactive computing that combines
code execution with the creation of a live computational document. These notebook files can
contain text, mathematical formulas, input code, results, graphics, videos and any other kind of
media that a modern web browser is capable of displaying.
Using PTVS
If you're a Windows user you can take advantage of PTVS. This is a free download and
you don't have to buy Microsoft Visual Studio. Instead, you can use PTVS combined with the
integrated/isolated Visual Studio shell, which is also free.
PTVS has many features. It supports CPython, IronPython, editing, browsing,
IntelliSense (autocompletion), mixed Python/C++ debugging, remote Linux/MacOS debugging,
profiling, HPC clusters, multiple REPL‘s, IPython, and Django.
Feature
Storage Blob, File, Table, Queue
– see the Azure storage Git repository or readthedocs for a complete list of supported features.
Service Bus
– Queues: create, list and delete queues; create, list, and delete subscriptions; send, receive,
unlock and delete messages
– Topics: create, list, and delete topics; create, list, and delete rules Azure Active Directory
Graph RBAC API
– Users
– Apps:
Web Apps: App Service Plan, web sites, certificate, domains and more
Network: create virtual networks, network interfaces, public ips and more
Resource:
– see the Azure storage Git repository or readthedocs for a complete list of supported features.
• Service Bus
– Queues: create, list and delete queues; create, list, and delete subscriptions; send, receive, unlock
and delete messages
– Topics: create, list, and delete topics; create, list, and delete rules • Azure Active Directory Graph
RBAC API – Users
– Applications Resource Management
– Authorization: permissions, subscriptions, roles and more
Web Apps: App Service Plan, web sites, certificate, domains and more
– Network: create virtual networks, network interfaces, public ips and more
Programming Model
As a programming framework, Mrs controls the execution flow and is invoked by a call
to mrs.main. The execution of Mrs depends on the command-line options and the specified
program class. In its simplest form, a program class has an init method which takes the
arguments opts and args from command-line parsing and a run method that takes a job argument.
In practice, most program classes inherit from mrs.MapReduce, which provides a variety of
reasonable but overridable defaults including init and run methods that are sufficient for
many simple programs. The simplest MapReduce program need only implementation map and a
reduce method.
Architecture
Mrs owes much of its efficiency to simple design. Many choices are driven by concerns such
as simplicity and ease of maintainability.
For example, Mrs uses XML-RPC because it is included in the Python standard library even
though other protocols are more efficient. Profiling has helped to identify
realbottlenecksandtoavoidworryingabouthypotheticalones.
We include a few details about the architecture of Mrs. Communication between the master
and a slave occurs over a simple HTTP-based remote procedure call API using XMLRPC.
Intermediate data between slaves uses either direct communication for high performance or
storage on a filesystem for increased fault-tolerance. Mrs can read and write to any
filesystem supported by the Linux kernel or FUSE, including NFS, Lustre, and the Hadoop
Distributed File System (HDFS), and native support for WebHDFS is in progress.
logic of the application in your own code. This "plugging in" aspect of Web
development is often seen as being in opposition to the classical distinction between programs
and libraries, and the notion of a "mainloop" dispatching events to application code is very
similar to that found in GUI programming.
Popular Full-Stack Frameworks
A web application may use a combination of a base HTTP application server, a storage
mechanism such as a database, a template engine, a request dispatcher, an authentication module
and an AJAX toolkit. These can be individual components or be provided together in a high-
level framework.
These are the most popular high-level frameworks. Many of them include components
listed on the WebComponents page.
L
L atest
N description
atest update
ame
version date
0
8-
01
dependencies. Development,
deployment, debugging,
backward compatibility.
See below for some other arguably less popular full-stack frameworks
But what is an API exactly? An API is just a fancy term for describing a way for programs (or
websites) to exchange data in a format that is easily interpreted by a machine. This is in contrast to
regular websites, which are exchanging data in a format that is easily interpreted by a human. For a
website, we might use HTML and CSS, but for an API we would use JSON or XML.
In this article I will be focusing on designing an API using the RESTful paradigm. REST
is basically a list of design rules that makes sure that an API is predictable and easy to
understand and use.
Some of these rules include:
Stateless design: Data will never be stored in a session (each request includes all
information needed by the server and client).
Self-descriptive messages: Ideally you should be able to understand requests and
responses after spending minimal time reading the documentation.
Semantics, semantics, semantics: The API should use existing features of the HTTP
protocol to improve the semanticness of input and output (e.g. HTTP Verbs, HTTP Status
Codes and HTTP Authentication)
One of the most difficult decisions to make when designing a RESTful API is to find the fine
line between making your API simple enough that it can be tested directly in the browser for
most use cases yet still adhere to the design guidelines as closely as possible.
Output formats
First, let‘s talk about output formats. The most important thing to look at when
determining what format your API should output data in is what users of your API would be
using the data for and with.
Maybe you need to support legacy systems where JSON parsing is not feasible and XML
is more desirable, or maybe it makes more sense for you to output data in the CSV format for
easy import into spreadsheet applications. Whichever you choose, it's important to think about
your users and their use cases.
URL Structure
The URL structure is one of the most important pieces of the puzzle. Spending some time to
define the right endpoint names can make your API much easier to understand and also help
making the API more predictable.URLs should be short and descriptive and utilize the
natural hierarchy of the path structure.
It‘s also important to be consistent with pluralization.
Authentication
There are many ways to handle authentication, here are three ideas.
your API has user-based authentication, OAuth is really the way to go. It might look terribly
confusing at first glance, but give it a chance. It‘s the only widely-used, proven, and secure
solution out there it‘s being used by all major API providers and is very well tested.
If you just need to password protect your API, HTTP Basic Authentication is a nice and
semantic way to go.
The only issue is if you want to provide an API key or other sorts of single-string
authentication. Some people suggest using HTTP Basic Authentication with the API Key as
both username and password or just leave the password blank, but really that‘s a messy way
of doing it if you ask me.
In that case I would recommend simply appending the API key to the query string, e.g.
api.geocod.io/v1/geocode?q=42370 Bob Hope Drive, Rancho Mirage
CA&api_key=YOUR_API_KEY
Don‘t bother adding the API Key as a header or another fancy solution like that. By adding
it as a query string parameter, the user will be able to test your API directly in a regular
browser without any bells and whistles.
Timestamps
One of the common pitfalls of API design is to make sure you get timestamps right from
the beginning.
Don‘t use Unix Timestamps as they don‘t have timezone support and are not human
readable (among a lot of other reasons) -- Facebook learned this the hard way.
A widely-accepted standard for timestamps is ISO-8601. It is easy to read, easy to parse
and it has great timezone support.
Error handling
The last thing on our list is error handling. There‘s a lot of different kinds of errors you need
to handle in your API, including permission errors (You are not allowed to delete this store),
validation errors (Please specify a name for the store), not found errors, or even internal
server errors.
You should always return a semantic HTTP status code with your requests. So for example,
no errors would be a 200 OK, permission errors could be 403 Forbidden, and validation
errors could be 422 Unprocessable Entity (http://httpstatus.es has a great human-readable list
of available status codes).
In addition to the status code you, should always return an error message if necessary with a
more detailed description of what happened. My favorite way of doing this is simply to
respond with an "errors" key that has the error message as the value.
QUESTIONS BANK
1MARKS
01. Cloud computing is a kind of abstraction that is based on the notion of combining physical resources
and represents them as resources to users.
A. Virtual
B. Real
C. Cloud
D. none of the mentioned
Answer : A
09. How many types of security threshold values are actually present in the cloud cube model?
A. 1
B. 2
C. 3
D. None of the mentioned
Answer : B
10. Which of the following type of virtualization is also characteristic of cloud computing?
A. Storage
B. Application
C. CPU
D. All of the mentioned
Answer:D
11. What type of computing technology refers to services and applications that typically run on a
distributed network through virtualized resources?
A. Distributed Computing
B. Soft Computing
C. Cloud Computing
D. Parallel Computing
Answer : C
12. Which model consists of the particular types of services that you can access on a cloud computing
platform.
A. Service
B. Deployment
C. Application
D. None of the mentioned
Answer : A
13. Which of the following is the working models for cloud computing?
A. Deployment Models
B. Configuring Model
C. Collaborative Model
D. All of the above
Answer : A
16. Which of these should a company consider before implementing cloud computing technology?
A. Employee satisfaction
B. Potential cost reduction
C. Information sensitivity
D. All of the above
Answer : D
17. A larger cloud network can be built as either a layer 3 or layer 4 networks.
A. True
B. Flase
Answer : B
18. Which one of the following refers to the non-functional requirements like disaster recovery, security,
reliability, etc.
A. Service Development
B. Quality of service
C. Plan Development
D. Technical Service
Answer : B
19. Which of the following is the best-known service model?
A. SaaS
B. IaaS
C. PaaS
D. All of the mentioned
Answer : D
20. Which cloud allows systems and services to be accessible by a group of organizations?
A. Private cloud
B. Public cloud
C. Community cloud
D. Hybrid cloud
Answer : C
5 MARKS
10 MARKS
Clustering is an essential data mining and tool for analyzing big data. There are difficulties
for applying clustering techniques to big data duo to new challenges that are raised with big
data. As Big Data is referring to terabytes and petabytes of data and clustering algorithms
are come with high computational costs, the question is how to cope with this problem and
how to deploy clustering techniques to big data and get the results in a reasonable time.
This study is aimed to review the trend and progress of clustering algorithms to cope with
big data challenges from very first proposed algorithms until today‘s novel solutions. The
algorithms and the targeted challenges for producing improved clustering algorithms are
introduced and analyzed, and afterward the possible future path for more advanced
algorithms is illuminated based on today‘s available technologies and frameworks.
Big data clustering techniques can be classified into two major categories:
o Parallel clustering
In this section advancements of clustering algorithms for big data analysis in categories
that arementioned above will be reviewed.
Challenges of big data have root in its five important characteristics:
• Volume: The first one is Volume and an example is the unstructured data streaming in form of
social media and it rises question such as how to determine the relevance within large data
volumes and how to analyze the relevant data to produce valuable information.
• Velocity: Data is flooding at very high speed and it has to be dealt with in reasonable time.
Responding quickly to data velocity is one of the challenges in big data.
• Variety: Another challenging issue is to manage, merge and govern data that comes from
different sources with different specifications such as: email, audio, unstructured data, social
data, video and etc.
• Variability: Inconsistency in data flow is another challenge. For example in social media it
could be daily or seasonal peak data loads which makes it harder to deal and manage the data
specially when the data is unstructured.
• Complexity: Data is coming from different sources and have different structures;
consequently it is necessary to connect and correlate relationships and data linkages or you find
your data to be out of control quickly.
Traditional clustering techniques cannot cope with this huge amount of data because of
their high complexity and computational cost. As an instance, the traditional Kmeans clustering
is NP-hard, even when the number of clusters is k=2. Consequently, scalability is the main
challenge for clustering big data. The main target is to scale up and speed up clustering
algorithms with minimum sacrifice to the clustering quality. Although scalability and
speed of clustering algorithms were always a target for researchers in this domain, but big data
challenges underline these shortcomings and demand more attention and research on this topic.
Reviewing the literature of clustering techniques shows that the advancement of these techniques
could be classified in stages.
Big data can be applied to real-time fraud detection, complex competitive analysis, call
center optimization, consumer sentiment analysis, intelligent traffic management, and to
manage smart power grids, to name only a few applications.
Big data is characterized by three primary factors: volume (too much data to handle easily);
velocity (the speed of data flowing in and out makes it difficult to analyze); and variety (the
range and type of data sources are too great to assimilate). With the right analytics, big data
can deliver richer insight since it draws from multiple sources and transactions to uncover
hidden patterns and relationships.
There are four types of big data BI that really aid business:
1. Prescriptive – This type of analysis reveals what actions should be taken. This is the most
valuable kind of analysis and usually results in rules and recommendations for next steps.
2. Predictive – An analysis of likely scenarios of what might happen. The deliverables are
usually a predictive forecast.
3. Diagnostic – A look at past performance to determine what happened and why. The result of
the analysis is often an analytic dashboard.
4. Descriptive – What is happening now based on incoming data. To mine the analytics, you
typically use a real-time dashboard and/or email reports.
Big Data Analytics in Action
Prescriptive analytics is really valuable, but largely not used. Where big data analytics in
general sheds light on a subject, prescriptive analytics gives you a laser-like focus to answer
specific questions. For example, in the health care industry, you can better manage the
patient population by using prescriptive analytics to measure the number of patients who are
clinically obese, then add filters for factors like diabetes and LDL cholesterol levels to
determine where to focus treatment. The same prescriptive model can be applied to almost
any industry target group or problem.
Predictive analytics use big data to identify past patterns to predict the future. For example,
some companies are using predictive analytics for sales lead scoring. Some companies have
gone one step further use predictive analytics for the entire sales process, analyzing lead
source, number of communications, types of communications, social media, documents,
CRM data, etc. Properly tuned predictive analytics can be used to support sales, marketing,
or for other types of complex forecasts.
Diagnostic analytics are used for discovery or to determine why something happened. For
example, for a social media marketing campaign, you can use descriptive analytics to assess
the number of posts, mentions, followers, fans, page views, reviews, pins, etc. There can be
thousands of online mentions that can be distilled into a single view to see what worked in
your past campaigns and what didn‘t.
Descriptive analytics or data mining are at the bottom of the big data value chain, but they
can be valuable for uncovering patterns that offer insight. A simple example of descriptive
analytics would be assessing credit risk; using past financial performance to predict a
customer‘s likely financial performance. Descriptive analytics can be useful in the sales
cycle, for example, to categorize customers by their likely product preferences and sales
cycle.
As you can see, harnessing big data analytics can deliver big value to business, adding
context to data that tells a more complete story. By reducing complex data sets to actionable
intelligence you can make more accurate business decisions. If you understand how to
demystify big data for your customers, then your value has just gone up tenfold.
RECOMMENDATION SYSTEM
o Recommendation systems have impacted or even redefined our lives in many ways. One
example of this impact is how our online shopping experience is being redefined. As we
browse through products, the Recommendation system offer recommendations of products
we might be interested in. Regardless of the perspective — business or consumer,
Recommendation systems have been immensely beneficial. And big data is the driving force
behind Recommendation systems.
A typical Recommendation system cannot do its job without sufficient data and big data
supplies plenty of user data such as past purchases, browsing history, and feedback for the
Recommendation systems to provide relevant and effective recommendations. In a
nutshell, even the most advanced Recommenders cannot be effective without big data.
Let us assume that a user of Amazon website is browsing books and reading the details.
Each time the reader clicks on a link, an event such as an Ajax event could be fired. The
event type could vary depending on the technology used. The event then could make an
entry into a database which usually is a NoSQL database. The entry is technical in content
but in layman‘s language could read something like ―User A clicked Product Z details
once‖. That is how user details get captured and stored for future recommendations.
How does the Recommendation system capture the details? If the user has logged in, then
the details are extracted either from an http session or from the system cookies. In case the
Recommendation system depends on system cookies, then the data is available only till the
time the user is using the same terminal. Events are fired almost in every case — a user
liking a Product or adding it to a cart and purchasing it. So that is how user details are
stored. But that is just one part of what Recommenders do.
The following paragraphs show how Amazon offers its product recommendations to a
user who is browsing for books:
As shown by the image below, when a user searched for the book Harry Potter and the
Philosopher‘s Stone, several recommendations were given.
In another example, a customer who searched Amazon for Canon EOS 1200D 18MP
Digital SLR Camera (Black) was interestingly given several recommendations on camera
accessories.
Ratings
Ratings are important in the sense that they tell you what a user feels about a product.
User‘s feelings about a product can be reflected to an extent in the actions he or she takes such as
likes, adding to shopping cart, purchasing or just clicking. Recommendation systems can assign
implicit ratings based on user actions. The maximum rating is 5. For example, purchasing can be
assigned a rating of 4, likes can get 3, clicking can get 2 and so on. Recommendation systems
can also take into account ratings and feedback users provide.
Filtering
Filtering means filtering products based on ratings and other user data. Recommendation
systems use three types of filtering: collaborative, user-based and a hybrid approach. In
collaborative filtering, a comparison of users‘ choices is done and recommendations given. For
example, if user X likes products A, B, C, and D and user Y likes products A, B, C, D and E, the
it is likely that user X will be recommended product E because there are a lot of similarities
between users X and Y as far as choice of products is concerned.
Several reputed brands such as Facebook, Twitter, LinkedIn, Amazon, Google News,
Spotify and Last.fm use this model to provide effective and relevant recommendations. In user-
based filtering, the user‘s browsing history, likes, purchases and ratings are taken into account
before providing recommendations. This model is used by many reputed brands such as IMDB,
Rotten Tomatoes and Pandora. Many companies also use a hybrid approach. Netflix is known to
use a hybrid approach.
Role of big data
How Amazon uses the powerful duo of big data and Recommendation System is worth a
study. Amazon has been in certain ways a pioneer of ecommerce but more important than that
accolade is how it is driving its revenue up by providing more and more effective
recommendations.
Buying can be both impulsive and planned and Amazon is smartly tapping into the
impulsive shopper‘s mind by providing relevant and useful product recommendations. For that, it
is relentlessly working on making its Recommendation engine more powerful. Shopping has a
connection with psychology. Shoppers buy for instant gratification, instant mood uplift, social
esteem and reasons not even known to them clearly.
MULTIMEDIA CLOUD
Media Cloud is an open-source content analysis tool that aims to map news media coverage
of current events. It "performs five basic functions -- media definition, crawling, text
extraction, word vectoring, and analysis."
Media cloud "tracks hundreds of newspapers and thousands of Web sites and blogs, and
archives the information in a searchable form.
The database ... enable[s] researchers to search for key people, places and events — from
Michael Jackson to the Iranian elections — and find out precisely when, where and how
frequently they are covered.
" Media Cloud was developed by the Berkman Center for Internet & Society at Harvard
University and launched in March 2009.[
LIVE VIDEO STREAM APP
The video recording technology has been available for decades. People record videos, create
movies, and publish them online so that the videos and movies can be shared to their online
groups or even to the public. With the innovation of mobile technology, users use mobile to
download videos from online sources, such as YouTube, Vimeo, LiveTV, and PPStream.
Many mobile apps have been constructed to enable mobile users to stream videos online.
When users use an application, they are always allowed to assess and give feedback on the
application so that the application could be enhanced and improved to the next stage in
meeting and satisfying the users‘ needs.
Thus, usability studies have turned to be a very vital element in evaluating the application.
Recently, with the emergence of various mobile apps, the role of usability studies extends
the scope of studies to the evaluation of mobile apps as well [1], including the user interface,
and performance.
This scenario also goes for mobile video streaming apps as well. Researchers develop
various video streaming apps and perform usability test for different groups of users under
different conditions.
However, there are no studies to consolidate the results of usability test for different mobile
video streaming apps to produce a review on the metrics used in usability tests by the
researchers. Studies in this domain are scarce and limited.
This paper therefore seeks to systematically review the test metrics employed in usability
evaluation with respects to mobile video streaming applications.
The study will assist practitioners/ professionals as well as academics in knowing and
understanding the commonly used test metrics in usability evaluation in the area of mobile
video streaming applications.
It will also enhance their practice and knowledge of usability evaluation in the mobile
domain. Researchers will in addition grasp the existing gaps in the literatures so as to fill
them.
Even designers will find the results of the review interesting as it will foster their
understanding of the functionalities that more frequently improve the satisfaction of users
and customers
Systematic Review
In this paper, the activities to be performed in the facilitation of the process of the
systematic review are: the elaboration of the definition of a search strategy, the selection of
primary studies, the extraction of data, and the implementation of a synthesis strategy
Search Strategy
In order to perform the search and selection of the usability test metrics for mobile video
streaming apps, articles and
journals from different online databases were searched. Also, relevant data from the search
results were extracted and finally, the collection of studies for review was listed. The search
strategy is in this wise:
Search Terms: In this review, the search terms were chosen based on a scope narrowed to
mobile video streaming apps. The search was done using the following search strings: C1
(―User Experience‖ OR ―User Review‖ OR ―User Adoption‖ OR ―Usability‖), C2
(―Mobile‖), and C3 (―Streaming‖ OR ―Video Streaming‖). So, the complete string used in
the review was: C1 AND C2 AND C3. 2) Search Process: There were two phases in the
search process, namely: the primary search and secondary search.
Study Selection
The scope of the review was defined to be the metrics used in usability test in mobile
video streaming apps. Since the scope had been defined clearly before the search process was
carried out, most of the articles and journals found were relevant to the review objective.
However, there were many articles and journals excluded from the search process, based on the
following criteria:
1) The study is only on mobile video apps development,
2) the study presents the usability test on mobile apps without touching on video
streaming apps,
3) the study is not written in English, and
The results obtained from the reviewed articles were classified based on the categories of
metrics used in the usability test of mobile video streaming apps, the detailed metrics, number of
studies and the percentage of studies.
streaming apps involves downloading the video content from remote servers and playing it
on a local platform.
In desktop application, users usually use broadband connection which provide high
bandwidth and a good download rate which allows desktop applications to play high quality
video without problem.
However, things do not go so smooth when it moves to mobile platform. In mobile platform,
the bandwidth is low and the network connection is not stable. If it is not well managed, the
video presented will be jittering, pausing to rebuffer frequently. In this review, there were 14
studies that took streaming performance as an aspect to evaluate; and to be enhanced to
improve usability of mobile video streaming applications.
Video Quality
Video is the main aspect to look at when it comes to video streaming. This does not differ in
the case of mobile streaming.
Users use video streaming mobile app to consume video of their interest. It cannot be denied
that video itself is the key for users to assess or rate the usability of a mobile video streaming
app.
In this review, there were 23 studies that emphasized video quality as a key aspect in the
assessment of the usability of a mobile video streaming app.
When users consume the video, it will be meaningless if the video is corrupted, blurred, or
not visible. Hence, the mobile video streaming apps need to provide video with clear and
satisfying video quality to achieve higher usability.
User interface design is a vital factor in application development. It touches not only the
appearance design, but also setting up the navigation flow, and incorporating functionality into
various forms of interactive elements to be used by users. Hence, user interface is an important
factor in the assessment of the usability of an application. Of course, mobile video streaming
apps are not exempted from this. Among the studies that were reviewed, there were 9 works that
talks about user interface factors in assessing the usability of a mobile video streaming app.
Functionality
In video streaming apps, the performance of streaming video and the video quality
offered are vital factors for usability. However, apps have to offer other functionality for the
users to access the video provided, before the users can consume and interact with them. Without
this functionality, the users cannot find the video they are interested in, hence, degrading the
usability of the apps.
Social Context
An application should have a context it aims to serve at. Although it is generally named
as "mobile video streaming app", but the context may be different for each app. For example,
there might be news video streaming app, TV watching app, documentary viewing app and etc.
Besides that, it also includes the factor of social context that surrounds the app and the users.
STREAMING PROTOCOLS
Streaming of audio and video is a confusing subject. This page is aimed at providing some
of the basic concepts.
Streaming means sending data, usually audio or video, in a way that allows it to start being
processed before it's completely received. Video clips on Web pages are a familiar example.
Progressive streaming, aka progressive downloading, means receiving an ordinary file and
starting to process it before it's completely downloaded. It requires no special protocols, but
it requires a format that can be processed based on partial content. This has been around for
a long time; interleaved images, where the odd-numbered pixel rows are received and
displayed before any of the even ones, are a familiar example. They're displayed at half
resolution before the remaining rows fill in the full resolution.
Progressive streaming doesn't have the flexibility of true streaming, since the data rate can't
be adjusted on the fly and the transmission can't be separated into multiple streams. If it
delivers a whole file quickly and the user listens to or watches just the beginning, it wastes
bandwidth. The user is given the whole file and can copy it without any effort.
"True" streaming uses a streaming protocol to control the transfer. The packets received
don't add up to a file. Don't mistake streaming for copy protection, though; unless there's
server-to-application encryption, it's not hard to reconstruct a file from the data.
True streaming may be adaptive. This means that the rate of transfer will automatically
change in response to the transfer conditions. If the receiver isn't able to keep up with a
higher data rate, the sender will drop to a lower data rate and quality. This may be done by
changes within the stream, or by switching the client to a different stream, possibly from
another server. Streamingmedia.com has a discussion of adaptive streaming.
Streaming involves protocols at several different layers of the OSI Reference Model. The
lower levels (physical, data link, and network) are generally taken as given. Streaming protocols
involve:
The transport layer, which is responsible for getting data from one end to the other.
The session layer, which organizes streaming activity into ongoing units such as movies
and broadcasts.
The presentation layer, which manages the bridge between information as seen by the
application and information as sent over the network.
The application layer, which is the level at which an application talks to the network.
Most Internet activity takes place using the TCP transport protocol. TCP is designed to
provide reliable transmission.
This means that if a packet isn't received, it will make further efforts to get it through.
Reliability is a good thing, but it can come at the expense of timeliness. Real-time streaming
puts a premium on timely delivery, so it often uses UDP (User Datagram Protocol). UDP is
lightweight compared with TCP and will keep delivering information rather than put extra
effort into re-sending lost packets. Some firewalls may block UDP because they're tailored
only for TCP communications.
Support for the right streaming protocol doesn't necessarily mean that software will play a
particular stream. You need software that supports both the appropriate streaming protocol
and the appropriate encoding.
HTTP Live Streaming
The new trend in streaming is the use of HTTP with protocols that support adaptive
bitrates. This is theoretically a bad fit, as HTTP with TCP/IP is designed for reliable delivery
rather than keeping up a steady flow, but with the prevalence of high-speed connections these
days it doesn't matter so much. Apple's entry is HTTP Live Streaming, aka HLS or Cupertino
streaming. It was developed by Apple for iOS and isn't widely supported outside of Apple's
products. Long Tail Video provides a testing page to determine whether a browser supports HLS.
Its specification is available as an Internet Draft. The draft contains proprietary material, and
publishing derivative works is prohibited.
Shoutcast
The Shoutcast server is a popular way to deliver broadcast streaming. It uses its own
protocols, and finding any decent documentation is difficult. Shoutcast's protocol was
originally known as ICY; the name Ultravox is currently used for Shoutcast 2. A superset of
HTTP is used, with additional headers that don't follow the "X-" convention. Shoutcast's
protocols can be used over either TCP or UDP.
Metadata and streaming content are mixed in the same stream. The ICY scheme ("icy://")
was used in some early versions of the protocol and is still sometimes found. I've also
encountered the schema "icyxp://", which seems to be proprietary to one software creator; a
search for information about it turns up nothing.
HTML5
HTML5 needs to be mentioned here, mostly for what it isn't. HTML5 provides the <audio>
and <video> tags, along with DOM properties that allow JavaScript to control the playing of
the content that these elements specify.
This is an application-layer protocol only, with no definition of the lower layers. HTML5
implementations can specify formats which they process. The server is expected to
download the content progressively, and it will keep downloading it completely even if
paused, unless the browser completely eliminates the element. The Web Audio API allows
detailed programmatic control of playback.
Remote video surveillance relies on capturing video streams from camera(s) mounted on
surveillance site and transmitting those streams to a remote command and control center for
analysis and inspection The process of streaming or uploading video to a remote monitoring
location requires large bandwidth since video is streamed continuously.
Moreover, many sites have large number of cameras which makes the streaming process
very costly and in some cases impractical given the needed bandwidth.
In order to reduce the streaming bandwidth requirements, each camera output (frame rate,
bit rate, resolution) can be configured manually which is an error prone process and would
not be suitable for large scale deployment.
In addition, this option can degrade the quality of important details of each video frame and
make it difficult to recognize. In other solutions, video transcoding techniques (bitrate,
frame rate, resolution, or combination of them)[1] are used to modify the stream to be
transmitted and adapt it to the available bandwidth.
AvidBeam Smart Video Transcoding Solution
AvidBeam has developed a comprehensive and robust solution for optimizing bandwidth for
surveillance systems with limited effect to the video stream quality [2]. Our solution is based
onthe use of a multistage filter pipeline.
where several filters are used to eliminate unnecessary frames and identify region of interest
before invoking the video transcoder. Consequently, the transmitted bandwidth can be
reduced dramatically without affecting the quality of the important information in the video
frames.
Clients can enable/disable each filter separately as well as configure each filter according to
their needs.
Those filters are described as follows
1. Frame Filter
The frame filter is used to detect motion in a given frame. The amount of motion to be
detected is configured. The filter passes only frames with motion greater or equal to the
configured motion size. This way, the small variation in each video frame due to external factor
such as wind blow, camera vibration, or small animals or birds moving in front of the
surveillance camera can be eliminated easily.
the experimental results from using motion detection filter before streaming out frames.
As shown in the Figure, when motion detection is enabled, frames with no significant motion are
not transmitted.
The object specific filter is used to identify the presence of object of interest in the video
frame and will pass only the frames that have the object(s) of interest. Figure 4 shows the results
of applying the object specific filter to several use cases such as vehicle license plate recognition
(LPR) or people count.
In each case, a dedicated object detector is applied to the video frame (LPR or people).
The bitrate saving results as shown in both cases which are approximately 73.5% and 43.5%. It
should be noted that these results are directly proportional to the % appearance of the license
plate of people in each frame.
3. ROI Filter
The purpose of the ROI filter is to identify region of interest in each frame and pass this
information to the transcoder. The ROI information can be used to clip the transmitted frame or
to encode the frame with different quality values for both of the ROI and none-ROI frame
blocks.
4. Video Transcoder
The final stage in the pipeline includes the actual video transcoding. The transcoder
receives the selected ROI together with their proper quality (quantization) settings. Other
transcoding parameters are also selected (resolution, bitrate, frame rate) based on client system
configuration. As shown in Figure 7, there are several transcoding options that can be applied to
the video stream.
A. Resolution Transcoding: in this case, each input video frame is decoded, scaled down,
and re-encoded again with a new resolution
B. Bitrate Transcoding: in this case, each video stream is re-encoded to provide the
required bitrate.
C. Frame rate Transcoding: in this case, the frame rate of the streamed video is modified.
Frames can be dropped in order to save additional bandwidth.
D. Video format Transcoding: in this case, video formats that produces better bitrate for
the same quality can be used. Ex H.264/H.265, MJPG/H.264, etc.) Those options can be applied
separately or combined together to achieve optimal quality/bitrate pair.
CLOUD SECURITY
Security, data security becomes more important when using cloud computing at all levels
infrastructure-as-a-service(IaaS), platform-as-a-service(PaaS), and software-as-a- service (SaaS).
This chapter describes several aspects of data security, including:
• Data-in-transit
• Data-at-rest
• Data lineage
• Data provenance
• Data remanence
The objective of this chapter is to help users evaluate their data security scenarios and
make informed judgments regarding risk for their organizations. As with other aspects of cloud
computing and security, not all of these data security facets are of equal importance in all
topologies (e.g., the use of a public cloud versus a private cloud, or non-sensitive data versus
sensitive data).
With regard to data-in-transit, the primary risk is in not using a vetted encryption algorithm.
Although this is obvious to information security professionals, it is not common for others
tounderstand this requirement when using a public cloud, regardless of whether it is IaaS,
PaaS, or SaaS.
It is also important to ensure that a protocol provides confidentiality as well as integrity (e.g.,
FTP over SSL [FTPS], Hypertext Transfer Protocol Secure [HTTPS], and Secure Copy
Program [SCP])—particularly if the protocol is used for transferring data across the Internet.
Merely encrypting data and using a non-secured protocol (e.g., ―vanilla‖ or ―straight‖ FTP or
HTTP) can provide confidentiality, but does not ensure the integrity of the data (e.g., with the
use of symmetric streaming ciphers).
Although using encryption to protect data-at-rest might seem obvious, the reality is not that
simple. If you are using an IaaS cloud service (public or private) for simple storage (e.g.,
Amazon‘s Simple Storage Service or S3), encrypting data-at-rest is possible—and is strongly
suggested. However, encrypting data-at-rest that a PaaS or SaaS cloud-based application is
using (e.g., Google Apps, Salesforce.com) as a compensating control is not always feasible.
Data-at-rest used by a cloud-based application is generally not encrypted, because encryption
would prevent indexing or searching of that data.
Data Security Mitigation
If prospective customers of cloud computing services expect that data security will serve as
compensating controls for possibly weakened infrastructure security, since part of a
customer‘s infrastructure security moves beyond its control and a provider‘s infrastructure
security may (for many enterprises) or may not (for small to medium-size businesses, or
SMBs) be less robust than expectations, you will be disappointed. Although data-in-transit
can and should be encrypted, any use of that data in the cloud, beyond simple storage,
requires that it be decrypted.
Therefore, it is almost certain that in the cloud, data will be unencrypted. And if you are
using a PaaS-based application or SaaS, customer-unencrypted data will also almost
certainly be hosted in a multitenancy environment (in public clouds). Add to that exposure
the difficulties in determining the data‘s lineage, data provenance—where necessary—and
even many providers‘ failure to adequately address such a basic security concern as data
remanence, and the risks of data security for customers are significantly increased.
So, what should you do to mitigate these risks to data security? The only viable option for
mitigation is to ensure that any sensitive or regulated data is not placed into a public cloud
(or that you encrypt data placed into the cloud for simple storage only).
Given the economic considerations of cloud computing today, as well as the present limits
of cryptography, CSPs are not offering robust enough controls around data security. It may
be that those economics change and that providers offer their current services, as well as a
―regulatory cloud environment‖ (i.e., an environment where customers are willing
to pay more for enhanced security controls to properly handle sensitive and regulated data).
Currently, the only viable option for mitigation is to ensure that any sensitive or regulated
data is not put into a public cloud.
In addition to the security of your own customer data, customers should also be concerned
about what data the provider collects and how the CSP protects that data. Specifically with
regard to your customer data, what metadata does the provider have about your data, how is
it secured, and what access do you, the customer, have to that metadata? As your volume of
data with a particular provider increases, so does the value of that metadata.
Additionally, your provider collects and must protect a huge amount of security-related data.
For example, at the network level, your provider should be collecting, monitoring, and
protecting firewall, intrusion prevention system (IPS), security incident and event
management (SIEM), and router flow data. At the host level your provider should be
collecting system logfiles, and at the application level SaaS providers should be collecting
application log data, including authentication and authorization information.
What data your CSP collects and how it monitors and protects that data is important to the
provider for its own audit purposes
where all the security related services for cloud-based platform are shifted form a platform
to an application level and are provided as web services by our security system architecture.
One of the advantages of shifting all the security-related service to an application level is
based on its design modularity and generosity.
This means that our architecture is applicable to any cloudbased platform, regardless of its
delivery and deployment models. The components of our security system are based on
―Service Oriented Architecture‖ and are responsible for managing and
distributing certificates, identity management (CRUD), identity federation, creating and
managing XACML-based policies, and providing strong authentication mechanisms.
All the components within the system are interoperable and act as a security service
providers in order to assure a secure cloud-based system. Figure 2 shows logical components
of our central security system.
PKI server, also known as Local Certification Authority (LCA) in our system is responsible
for issuing and distributing X509 certificates to all components in a domain. This server can
either be configured as single certification authority, by generating self-signed certificates or
may be linked to PKI in order to exchange certificates and establish trust relationship
between various domains. In this case higher level trusted certification authority server
issues certificates to the issuing CA.
XACML server is also known as Policy Decision Point and is responsible for creating and
validating SAML Tickets for Single Sign-On protocol. This server is also responsible for
management of group, roles, XACML policy and policy sets. IDMS server is responsible for
creating, reading, updating and deleting identities in a collaborative environment. Strong
Authentication (SA) server performs mutual authentication with clients using various
extended authentication protocols, like FIPS 196. This server also interacts with XACML
policy server to generate SAML ticket for authenticated clients.
SSO service provider interacts with service consumers through request-response message
protocols. All system entities securely store their private keys locally. SAML server issues
tickets according to the decision made by the central authentication server. That is why they
communicate only over trusted internal network.
At the same time central authentication server communicates with the IDMS and CA servers
over a trusted network. Therefore, the central security system is an isolated secure
environment, where all the system entities trust each other.
AUTHENTICATION
Authentication System A single enterprise may provide many application services to end-
users. E-mail servers and web servers are examples of application services providers. As
company‘s boundaries broaden, the number of application services grows. Mostly all service
providers should authenticate clients before service transactions are executed, because they
are dealing with personal information.
This means that the client should have security context for each application server and log in
before it can consume any service. The same situation happens when the client accesses
resources in different security domains.
As mentioned in the second chapter, having many security credentials for authentication
purposes is not an effective solution from security, system coordination, and management
perspectives. While organizations migrate to cloud environments, the same problem still
exists.
To this problem, as a solution a Single Sign-on (SSO) protocol is proposed, which is part of
the shared security system of a cloud environment. This solution relies on the SAML web
browser SSO profile, which complete description can be found in the following referenced
document.
The system consists of a SAML server which provides SSO services for application service
providers: SAML server issues SAML ticket which contains an assertion about the client‘s
identity verification, thus confirming that it has been properly authenticated or not. Once the
user is authenticated, he or she can request access to different authorized resources at
different application provider sites without the need to re-authenticate for each domain.
SAML server resides in the shared security system. Besides SAML assertions issuing
server, there are three other security entities in the central security system, coordinated with
each other, in order to accomplish the desired solution.
When the user wants to access some resource at some application service provider site for
the first time, he or she is redirected to the central authentication server by the PEP running
in front of the application service.
The central authentication server makes identity verification according to the Strong
Authentication Protocol specified by the Federal Information Processing Standard (FIPS)
196.
It can be one way or mutual authentication process. Authentication server verifies whether
the user is registered in the IDMS database. In case of unregistered user, the authentication
process is terminated and the server notifies that the user is not registered in the IDMS. If the
user has a valid registration entry confirmed by the IDMS server, his or her X.509 certificate
is verified in cooperation with the Local Certificate Authority service.
The result of the authentication process is passed to the SAML server which, in turn, issues
a SAML ticket confirming whether the user is authenticated or not. SAML ticket has a
validity period which is calculated according to the system policy. SAML ticket is passed to
the user (client application) through the authentication server.
Then the ticket is embedded in the request directed to the application service provider. The
request message is intercepted by the PEP, which verifies the embedded SAML ticket. Once
the ticket confirms that the user has been successfully authenticated, a valid local session is
created for the user. Until the validity period expires the user can request services from other
application service providers with the same ticket without re-authenticating himself.
This mechanism works because there is a trust relationship between the SSO service
provider and application service providers existing in different security domains. All
application services should be registered in the IDMS in order for the SAML server to
deliver SSO services to them.
At the same time, SSO service provider also needs to register itself in the IDMS. The IDMS
server provides registration services to identity service providers, thus making them
available to be looked up and consumed by the application service providers. SSO service
provider publishes its metadata, which contains the WSDL or the WSDL URL, in the IDMS.
PKI system establishes a trust relationship between application service providers and
identity service providers.
As the SSO system is designed using WS technology, other cloud providers which lack such
identity services can benefit from it. Foreign clouds can register themselves in the IDMS as
external cloud platforms and consume SSO service in favor of their cloud environment.
In this case, the IDMS service should provide identity federations services. Identity-related
information is outside of the SAML message exchanges. When the SAML request message
is delivered to the SSO service provider, the latter first checks whether the service requester
is a trusted entity with the help of the IDMS service provider.
It is up to the IDMS service provider to check its registration validity, either locally or in a
federated environment. The same approach can be applied when the subject‘s registration
validity, to which the SAML assertion is addressed, needs to be verified.
Top root CA establishes a trust relationship between two clouds. Integration details with
other cloud providers are out of the scope of this research.
AUTHORIZATION
DATA SECURITY
Data protection is a crucial security issue for most organizations. Before moving into the
cloud, cloud users need to clearly identify data objects to be protected and classify data
based on their implication on security, and then define the security policy for data protection
as well as the policy enforcement mechanisms.
For most applications, data objects would include not only bulky data at rest in cloud servers
(e.g., user database and/or filesystem), but also data in transit between the cloud and the
user(s) which could be transmitted over the Internet or via mobile media (In many
circumstances, it would be more cost-eff ective and convenient to move large volumes of
data to the cloud by mobile media like archive tapes than transmitting over the Internet.).
Data objects may also include user identity information created by the user management
model, service audit data produced by the auditing model, service profile information used
to describe the service instance(s), temporary runtime data generated by the instance(s), and
many other application data.
Diff erent types of data would be of diff erent value and hence have diff erent security
implication to cloud users. For example, user database at rest in cloud servers may be of the
core value for cloud users and thus require strong protection to guarantee data
confidentiality, integrity and availability. User identity information can contain Personally
Identifiable Information (PII) and has impact on user privacy.
Therefore, just authorized users should be allowed to access user identity information.
Service audit data provide the evidences related to compliances and the fulfillment of
Service Level Agreement (SLA), and should not be maliciously manipulated.
Service profile information could help attackers locate and identify the service instances and
should be well protected. Temporary runtime data may contain critical data related to user
business and should be segregated during runtime and securely destroyed after runtime.
Security Services:
The basic security services for information security include assurance of data
Confidentiality, Integrity, and Availability (CIA). In Cloud Computing, the issue of data
security becomes more complicated because of the intrinsic cloud characteristics. Before
potential cloud users are able to safely move their applications/data to the cloud, a suit of
security services would be in place which we can identify as follows (not necessarily all
needed in a specific application):
1) Data confidentiality assurance: This service protects data from being disclosed to
illegitimate parties. In Cloud Computing, data confidentiality is a basic security service to be in
place. Although diff erent applications may have diff erent requirements in terms of what kind of
data need confidentiality protection, this security service could be applicable to all the data
objects discussed above.
2) Data integrity protection: This service protects data from malicious modification.
When having outsource their data to remote cloud servers, cloud users must have a way to check
whether or not their data at rest or in transit are intact. Such a security service would be of the
core value to cloud users. When auditing cloud services, it is also critical to guarantee that all the
audit data are authentic since these data would be of legal concerns. This security service is also
applicable to other data objects discussed above.
3) Guarantee of data availability: This service assures that data stored in the cloud are
available on each user retrieval request. This service is particularly important for data at rest in
cloud servers and related to the fulfillment of Service Level Agreement. For long-term data
storage services, data availability assurance is of more importance because of the increasing
possibility of data damage or loss over the time.
4) Secure data access: This security service is to limit the disclosure of data content to
authorized users. In practical applications, disclosing application data to unauthorized users may
threat the cloud user‘s business goal. In missioncritical applications, inappropriate disclosure of
sensitive data can have juristic concerns. For better protection on sensitive data, cloud users may need
finegrained data access control in the sense that diff erent users may have access to diff erent set of data.
This security service is applicable to most of the data objects addressed above.
5) Regulations and compliances: In practical application scenarios, storage and access
of sensitive data may have to comply specific compliance. For example, disclosure of health
records may be limited by the Health Insurance Portability and Accountability Act (HIPAA)
[12]. In addition to this, the geographic location of data would frequently be of concern due to
export-law violation issues. Cloud users should thoroughly review these regulation and
compliance issues before moving their data into the cloud.
6) Service audition: This service provides a way for cloud users to monitor how their
data are accessed and is critical for compliance enforcement. In the case of local storage, it is not
hard to audit the system. In Cloud Computing, however, it requires the service provider to
support trustworthy transparency of data access.
KEY MANAGEMENT
Cloud key management Infrastructure consists of cloud key management client (CKMC)
and cloud key management server (CKMS) [5]. CKMC exits in cloud applications,
serving for three fundamental cloud service model, including Software, Platform or
Infrastructure (as a Service).
CKMS interacts with CKMC using cloud key management interoperability protocol,
which interacts with symmetric key management system (SKMS) and public key
infrastructure (PKI) using symmetric key management protocol and asymmetric key
The cloud Key Management Interoperability Protocol (CK-MIP) establishes a single
comprehensive protocol for communication between cloud key management servers and
cryptographic clients.
By defining a protocol that can be use by any cloud cryptographic client, ranging
from multi-tenant implementation to cloud storage, it addresses the critical need for a
comprehensive key management protocol Key management is the set of techniques
involves generation, distribution, storage, and revoking, verifying keys. Key management
can be applied to Cloud Infrastructure.
In this section, we present the taxonomy of key management for Cloud storage based on
location of placing key and describe various key management methods. Fig. describes key
management taxonomy
Management of Key At Client
Side In this approach, data will be stored at cloud service provider side in encrypted
form. Client may be thin e.g. mobile phone. Keys will be maintained at customer side. Usually
this approach is taken in Homomorphism cryptographic technique. Operations are done on
encrypted data at server side [8, 9]. Figure describes key management approach. In this
approach, mobile phone user and desktop user maintains key at its own side [20].
Key Management
At Cloud Service Provider Side In this approach, keys are maintained at cloud service
provider side. If the key is lost, customer is unable to read data which is present at cloud. Data is
stored in the encrypted form and decrypted by the key to get it in the original form.
Management of Key
At Both Sides In this technique, key is divided into two parts. One part is stored at user
side and other part is stored at cloud side. If both parts are combined together, it is possible to
retrieve the data properly. Thus, data remains the secure and can be controlled by the user. Thus,
solution is also scalable. Cloud service provider and user do not need to maintain complete key
at Cloud side. If part of the key is lost, data cannot be recovered.
Key Splitting Technique Content provider share data in cloud so as to accessible by the
other users. Key is spitted and distributed among the users. If particular user has to access the
data from the cloud, first he/she needs to get the partial keys from the users. If k out of n keys is
combined, then user is able to encrypt and decrypt the data.
Key Management
At Centralized Server This approach uses asymmetric key approach. Data is encrypted
with the public key stored in key server. Data at cloud side is stored in the encrypted form. The
user accesses the data. This will be decrypted by private key maintained at each user.
Disadvantage of this method is that if key server is crashed, its single point of failure]. that each
user generates public and private keys. Public keys are stored at Key server. Suppose mobile
phone user wants to share data with desktop user. He/She will encrypt the data with public key of
desktop user. Thus desktop user will access data with its private key
For Cloud Data Storage Data is shared in cloud by trusted members of the group. Group
key is established for securing data at cloud side. Group key is formed by the partial keys
maintained at each user. If particular group members want to access the data, group key used to
access the data. If member leaves the group, group key is formed again. If member joins the
group, group key is established among members.
AUDITING
A cloud audit is a periodic examination an organization does to assess and document
its cloud vendor's performance. The goal of such an audit is to see how well a cloud vendor is
doing in meeting a set of established controls and best practices.
The Cloud Security Alliance (CSA) provides audit documents, guidance and controls
that an IT organization can use to examine its cloud vendors. Third-party auditors can also use
CSA audit materials. CSA resources are considered the primary audit tools to perform and
optimize a comprehensive cloud audit.
Security, Trust, Assurance Checklist tool to ask cloud STAR Level 1 Security
and Risk (STAR) security vendors about security controls Questionnaire
questionnaire (downloadable document)
Controls Applicability Help for auditors to decide the Included in CCM and
Matrix most appropriate controls to CAIQ v4
use for a specific vendor
CCM v4 Implementation Guidelines for using the CCM Included in CCM and
Guidelines v4 audit standards CAIQ
Cloud audit professional credentials
The CSA and ISACA jointly offer the following cloud audit credentials:
Certificate of Cloud Security Knowledge is a body of knowledge in cloud technology areas, including
cloud processing and security. It is a first step in preparation for the companion certification in cloud
auditing knowledge.
Certificate of Cloud Auditing Knowledge trains candidates in how to audit cloud platforms and
security.
Both certificates complement ISACA credentials. They provide evidence of an auditor's knowledge
of cloud infrastructure and systems, security and vulnerabilities, and they show that the auditor knows
how to conduct a cloud audit.
An Industry Cloud focuses on specialised processes with tools and business services dedicated to a
specific industry. Designed with industry challenges in mind, these clouds can enable organizations jump
ahead and deliver value at record pace.
Standardized configurations
Improved operational efficiency
Deeper customer engagement
Focused spends
Customized holistic solutions
How does Industry Cloud compare to general cloud computing?
It is worth stating that an industrial cloud is not a new type of technology or a paradigm-
shifting concept or solution. It is simply a specific method or way of using cloud computing to
handle industrial processes and challenges.
The major difference can be found in how they are integrated into business operations.
The Industry Cloud is designed to create more value when managing systems within the
boundaries of the industry it is applied to. The general cloud employs a more horizontal
approach and is built to function outside the boundaries of a specific industry. Another
difference is that the Industry Cloud is equipped with specific industrial features to manage data
and schedule tasks which means they can determine what data is to be sent to the cloud and
what data is not. On the other hand, general cloud solutions will struggle to handle these specific
data requirements.
Overall, the industry-specific cloud has an advantage over the general cloud as it
understands data, standards and regulatory policies associated with a specific industry. This
helps the organizations in scalability as it occurs within the constraints set by the policies of an
industrial niche. Thus, the emergence of the industry-specific cloud is a natural progression of
cloud computing to meet the varying and highly personalised industry requirements.
Wrapping up
The war between traditional on-premises data center infrastructure providers and public
cloud providers is far from over. In fact, the cloud game has gone a step ahead wherein
industries today require more than general-purpose cloud capabilities. They require industry-
specific cloud capabilities. As cloud solutions continue to mature, major public cloud providers
are investing in solutions to meet specialized industry needs. It’s still early days for industry
clouds. Established companies in industries feeling the sting of competition from cloud-native
disrupters are especially good prospects for these types of solutions.
4. The ———– algorithm is based on the fact that the algorithm uses prior
knowledge to find frequent item set.
A : Clustring B : Regression C : Naïve Bays
D : Apriori
13. Which of the following is measure used in decision trees while selecting
splliting criteria that partitions data into the best possible manner.
A : Probability B : Gini Index C : Regression D :
Association
17. ————– data does not fits into a data model due to variatins in contents
A : Structured data B : Un-Structured data C : Semi-Structured data
D : Scattered
19. ——————– is a general purpose array-processing package provides a high performance multi-
dimentional array object and tools for working with these arrays
A : NumPy B : SciPy C : sklearn
D : None of these
20. ——– library is built on the top of Numpy, SciPy and Matplotlib
A : Sympy B : Scikit C : Pandas D : Numpy
22. ————the step is performed by data scientist after acquiring the data.
A : Data Cleansing B : Data Integration C : Data Replication D : Data
loading