KEMBAR78
Cloud Computing Notes | PDF | Cloud Computing | Peer To Peer
0% found this document useful (0 votes)
33 views98 pages

Cloud Computing Notes

notes

Uploaded by

ChikkalaNaidu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views98 pages

Cloud Computing Notes

notes

Uploaded by

ChikkalaNaidu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

CLOUD COMPUTING – Unit - 1

CHARACTERISTICS OF CLOUD COMPUTING


1. Cloud computing is user-centric.
Once users are connected to the cloud, whatever is available (stored) in cloud as documents,
messages, images, applications, it becomes users own. Users can also share it with others.
2. Cloud computing is task-centric.
It focusing on the application for users on what users need to done and how the application can
do it for users. Eg. collage, Animation, Movie in Google Photos
3. Cloud computing is powerful.
Connecting hundreds or thousands of computers together in a cloud creates a wealth of
computing power impossible with a single desktop PC.
4. Cloud computing is accessible.
Data is stored in the cloud; users can instantly retrieve more information from multiple
repositories. You’re not limited to a single source of data.
5. Cloud computing is intelligent.
With all the various data stored on the computers in a cloud, data mining and analysis are
necessary to access that information in an intelligent manner.
6. Cloud computing is programmable.
Many of the tasks necessary with cloud computing must be automated.
Advantages
• Lower-Cost Computers for Users
• Improved Performance
• Lower IT Infrastructure Costs
• Fewer Maintenance Issues
• Lower Software Costs
• Instant Software Updates
• Increased Computing Power
• Unlimited Storage Capacity
• Increased Data Safety
• Improved Compatibility between Operating Systems
• Improved Document Format Compatibility
• Easier Group Collaboration
• Universal Access to Documents
• Latest Version Availability
• Removes the Tether to Specific Devices
Dis-Advantages
● Requires a Constant Internet Connection
● Doesn’t Work Well with Low-Speed Connections
● Can Be Slow
● Features Might Be Limited
● Stored Data Might Not Be Secure
● Problem will arise If Data loss occurs

1
DISTRIBUTED SYSTEM
A distributed system contains multiple nodes that are physically separate but linked
together using the network. All the nodes in this system communicate with each other and
handle processes in tandem. Each of these nodes contains a small part of the distributed
operating system software.

Fig: Distributed System


Types of Distributed Systems
The nodes in the distributed systems can be arranged in the form of client/server systems or
peer to peer systems. Details about these are as follows:
Client/Server Systems
In client server systems, the client requests a resource and the server provides that resource. A
server may serve multiple clients at the same time while a client is in contact with only one
server. Both the client and server usually communicate via a computer network and so they are
a part of distributed systems.
Peer to Peer Systems
The peer to peer systems contains nodes that are equal participants in data sharing. All the tasks
are equally divided between all the nodes. The nodes interact with each other as required as
share resources. This is done with the help of a network.
Advantages of Distributed Systems
 Can easily share data with other nodes.
 More nodes can easily be added to the distributed system i.e. it can be scaled as
required.
 Failure of one node does not lead to the failure of the entire distributed system. Other
nodes can still communicate with each other.
 Resources like printers can be shared with multiple nodes rather than being restricted
to just one.
Disadvantages of Distributed Systems
 It is difficult to provide adequate security in distributed systems because the nodes as
well as the connections need to be secured.
 Some messages and data can be lost in the network while moving from one node to
another.
 The database connected to the distributed systems is quite complicated and difficult to
handle as compared to a single user system.
 Overloading may occur in the network if all the nodes of the distributed system try to
send data at once.

2
Difference between Parallel Computing and Distributed Computing
Parallel Computing:
In parallel computing multiple processors performs multiple tasks assigned to them
simultaneously. Memory in parallel systems can either be shared or distributed. Parallel
computing provides concurrency and saves time and money.
Distributed Computing:
In distributed computing we have multiple autonomous computers which seem to the user as a
single system. In distributed systems there is no shared memory and computers communicate
with each other through message passing. In distributed computing a single task is divided
among different computers.
Table 3: Difference between Parallel Computing and Distributed Computing

ELEMENTS OF PARALLEL COMPUTING


The primary goal of parallel computing is to increase the computational power available to
your essential applications. Typically, this infrastructure is where the set of processors are
present on a server, or separate servers are connected to each other to solve a computational
problem. In the earliest computer software, that executes a single instruction (having a single
Central Processing Unit (CPU)) at a time that has written for serial computation. A Problem is
broken down into multiple series of instructions, and that Instructions executed one after
another. Only one computational instruction complete at a time.
Main Reasons to use Parallel Computing is that:
1. Save time and money.
2. Solve larger problems.
3. Provide concurrency.
4. Multiple execution units
Types of parallel computing
Bit-level parallelism
In the Bit-level parallelism every task is running on the processor level and depends on
processor word size (32-bit, 64-bit, etc.) and we need to divide the maximum size of instruction
into multiple series of instructions in the tasks. For Example, if we want to do an operation on
16-bit numbers in the 8-bit processor, then we would require dividing the process into two 8
bit operations.
Instruction-level parallelism (ILP)
Instruction-level parallelism (ILP) is running on the hardware level (dynamic
parallelism), and it includes how many instructions executed simultaneously in a single CPU
clock cycle.
Data Parallelism
The multiprocessor system can execute a single set of instructions (SIMD), data
parallelism achieved when several processors simultaneously perform the same task on
the separate section of the distributed data.

3
Task Parallelism
Task parallelism is the parallelism in which tasks are splitting up between the processors to
perform at once.
Hardware architecture of parallel computing –
The hardware architecture of parallel computing is disturbed along the following
categories as given below:
1. Single-instruction, single-data (SISD) systems
2. Single-instruction, multiple-data (SIMD) systems
3. Multiple-instruction, single-data (MISD) systems
4. Multiple-instruction, multiple-data (MIMD) systems Architecture:

1. Node controller (NC) controls the execution, inspection, and termination of VM


instances on the host where it runs.
2. Cluster controller (CC) gathers information about and schedules VM execution on
specific node controllers, as well as manages virtual instance network.
3. Storage controller (SC) is a put/get storage service that implements Amazon’s S3
(Simple Storage Service) interface and provides a way for storing and accessing VM
images and user data.
4. Cloud controller (CLC) is the entry point into the cloud for users and administrators. It
queries node managers for information about resources, makes high-level scheduling
decisions, and implements them by making requests to cluster controllers.
5. Walrus (W) is the controller component that manages access to the storage services
within Eucalyptus. Requests are been communicated to Walrus using the SOAP
(Simple Object Access Protocol) or REST (Representational State Transfer) based
interface
Explain about the architecture of distributed system

Distributed computing is a system of software components spread over


different computers but running as a single entity. A distributed system can
be an arrangement of different configurations, such as mainframes,
computers, workstations, and minicomputers.

4
Distributed System

Sharing resources such as hardware, software, and data is one of the


principles of cloud computing. With different levels of openness to the
software and concurrency, it’s easier to process data simultaneously through
multiple processors. The more fault-tolerant an application is, the more
quickly it can recover from a system failure.

Organizations have turned to distributed computing systems to handle data


generation explosion and increased application performance needs. These
distributed systems help businesses scale as data volume grows. This is
especially true because the process of adding hardware to a distributed
system is simpler than upgrading and replacing an entire centralized system
made up of powerful servers.

Distributed systems consist of many nodes that work together toward a single
goal. These systems function in two general ways, and both of them have the
potential to make a huge difference in an organization.

 The first type is a cohesive system where the customer has each
machine, and the results are routed from one source.
 The second type allows each node to have an end-user with their
own needs, and the distributed system facilitates sharing
resources or communication.
Benefits of a multi-computer model
 Improved scalability: Distributed computing clusters are a great
way to scale your business. They use a ‘scale-out architecture,’
which makes adding new hardware easier as load increases.
 Enhanced performance: This model uses ‘parallelism’ for the
divide-and-conquer approach. In other words, all computers in the
cluster simultaneously handle a subset of the overall task.
Therefore, as the load increases, businesses can add more
computers and optimize overall performance.
 Cost-effectiveness: The cost-efficiency of a distributed system
depends on its latency, response time, bandwidth, and

5
throughput. Distributed systems work toward a common goal of
delivering high performance by minimizing latency and enhancing
response time and throughput. They achieve this goal by using
low-cost commodity hardware to ensure zero data loss, making
initial deployments and cluster expansions easy.
Architecture of Distributed Systems

Cloud-based software, the backbone of distributed systems, is a complicated


network of servers that anyone with an internet connection can access. In a
distributed system, components and connectors arrange themselves in a way
that eases communication. Components are modules with well-defined
interfaces that can be replaced or reused. Similarly, connectors are
communication links between modules that mediate coordination or
cooperation among components.

A distributed system is broadly divided into two essential concepts — software


architecture (further divided into layered architecture, object-based
architecture, data-centered architecture, and event-based architecture) and
system architecture (further divided into client-server architecture and peer-
to-peer architecture).

1. Software architecture

Software architecture is the logical organization of software components and


their interaction with other structures. It is at a lower level than system
architecture and focuses entirely on components; e.g., the web front end of
an ecommerce system is a component. The four main architectural styles of
distributed systems in software components entail:

2. Layered architecture

Layered architecture provides a modular approach to software. By separating


each component, it is more efficient. For example, the open systems
interconnection (OSI) model uses a layered architecture for better results. It
does this by contacting layers in sequence, which allows it to reach its goal.
In some instances, the implementation of layered architecture is in cross-layer
coordination. Under cross-layer, the interactions can skip any adjacent layer
until it fulfills the request and provides better performance results.

6
Layered Architecture

Layered architecture is a type of software that separates components into


units. A request goes from the top down, and the response goes from the
bottom up. The advantage of layered architecture is that it keeps things
orderly and modifies each layer independently without affecting the rest of the
system.

ii) Object-based architecture

Object-based architecture centers around an arrangement of loosely coupled


objects with no specific architecture like layers. Unlike layered architecture,
object-based architecture doesn’t have to follow any steps in a sequence. Each
component is an object, and all the objects can interact through an interface
(or connector). Under object-based architecture, such interactions between
components can happen through a direct method call.

Object-based Architecture

At its core, communication between objects happens through method


invocations, often called remote procedure calls (RPC). Popular RPC systems
include Java RMI and Web Services and REST API Calls. The primary design

7
consideration of these architectures is that they are less structured. Here,
component equals object, and connector equals RPC or RMI.

iii) Data-centered architecture

Data-centered architecture works on a central data repository, either active


or passive. Like most producer-consumer scenarios, the producer (business)
produces items to the common data store, and the consumer (individual) can
request data from it. Sometimes, this central repository can be just a simple
database.

Data-centered Architecture

All communication between objects happens through a data storage system


in a data-centered system. It supports its stores’ components with a persistent
storage space such as an SQL database, and the system stores all the nodes
in this data storage.

Event-based architecture

In event-based architecture, the entire communication is through events.


When an event occurs, the system gets the notification. This means that
anyone who receives this event will also be notified and has access to
information. Sometimes, these events are data, and at other times they are
URLs to resources. As such, the receiver can process what information they
receive and act accordingly.

Event-Based Architecture

One significant advantage of event-based architecture is that the components


are loosely coupled. Eventually, it means that it’s easy to add, remove, and
modify them. To better understand this, think of publisher-subscriber
systems, enterprise services buses, or akka.io. One advantage of event-based
architecture is allowing heterogeneous components to communicate with the
bus, regardless of their communication protocols.

8
2. System architecture

System-level architecture focuses on the entire system and the placement of


components of a distributed system across multiple machines. The client-
server architecture and peer-to-peer architecture are the two major system-
level architectures that hold significance today. An example would be an
ecommerce system that contains a service layer, a database, and a web front.

Client-server architecture

As the name suggests, client-server architecture consists of a client and a


server. The server is where all the work processes are, while the client is where
the user interacts with the service and other resources (remote server). The
client can then request from the server, and the server will respond
accordingly. Typically, only one server handles the remote side; however,
using multiple servers ensures total safety.

Client-server Architecture

Client-server architecture has one standard design feature: centralized


security. Data such as usernames and passwords are stored in a secure
database for any server user to have access to this information. This makes it
more stable and secure than peer-to-peer. This stability comes from client-
server architecture, where the security database can allow resource usage in
9
a more meaningful way. The system is much more stable and secure, even
though it isn’t as fast as a server. The disadvantages of a distributed system
are its single point of failure and not being as scalable as a server.

Peer-to-peer (P2P) architecture

A peer-to-peer network, also called a (P2P) network, works on the concept of


no central control in a distributed system. A node can either act as a client or
server at any given time once it joins the network. A node that requests
something is called a client, and one that provides something is called a
server. In general, each node is called a peer.

Peer-to-Peer Architecture

If a new node wishes to provide services, it can do so in two ways. One way is
to register with a centralized lookup server, which will then direct the node to
the service provider. The other way is for the node to broadcast its service
request to every other node in the network, and whichever node responds will
provide the requested service.

P2P networks of today have three separate sections:

Structured P2P: The nodes in structured P2P follow a predefined


distributed data structure.
 Unstructured P2P: The nodes in unstructured P2P randomly
select their neighbors.
 Hybrid P2P: In a hybrid P2P, some nodes have unique functions
appointed to them in an orderly manner.
Key Components of a Distributed System

The three basic components of a distributed system include primary system


controller, system data store, and database. In a non-clustered environment,
optional components consist of user interfaces and secondary controllers.

10
1. Primary system controller

The primary system controller is the only controller in a distributed system


and keeps track of everything. It’s also responsible for controlling the dispatch
and management of server requests throughout the system. The executive
and mailbox services are installed automatically on the primary system
controller. In a non-clustered environment, optional components consist of a
user interface and secondary controllers.

2. Secondary controller

The secondary controller is a process controller or a communications


controller. It’s responsible for regulating the flow of server processing requests
and managing the system’s translation load. It also governs communication
between the system and VANs or trading partners.

3. User-interface client

The user interface client is an additional element in the system that provides
users with important system information. This is not a part of the clustered
environment, and it does not operate on the same machines as the controller.
It provides functions that are necessary to monitor and control the system.

4. System datastore

Each system has only one data store for all shared data. The data store is
usually on the disk vault, whether clustered or not. For non-clustered
systems, this can be on one machine or distributed across several devices,
but all of these computers must have access to this datastore.

11
5. Database

In a distributed system, a relational database stores all data. Once the data
store locates the data, it shares it among multiple users. Relational databases
can be found in all data systems and allow multiple users to use the same
information simultaneously.

Examples of a Distributed System

When processing power is scarce, or when a system encounters unpredictable


changes, distributed systems are ideal, and they help balance the workload.
Hence distributed systems have boundless use cases varying from electronic
banking systems to multiplayer online games. Let’s check out more explicit
instances of distributed systems:

1. Networks

The 1970s saw the invention of Ethernet and LAN (local area networks), which
enabled computers to connect in the same area. Peer-to-peer networks
developed, and e-mail and the internet continue to be the biggest examples of
distributed systems.

2. Telecommunication networks

Telephone and cellular networks are other examples of peer-to-peer networks.


Telephone networks started as an early example of distributed
communication, and cellular networks are also a form of distributed
communication systems. With the implementation of Voice over Internet
(VoIP) communication systems, they grow more complex as distributed
communication networks.

3. Real-time systems

Real-time systems are not limited to specific industries. These systems can be
used and seen throughout the world in the airline, ride-sharing, logistics,
financial trading, massively multiplayer online games (MMOGs), and
ecommerce industries. The focus in such systems is on the correspondence
and processing of information with the need to convey data promptly to a huge
number of users who have an expressed interest in such data.

4. Parallel processors

Parallel computing splits specific tasks among multiple processors. This, in


turn, creates pieces to put together and form an extensive computational task.
Previously, parallel computing only focused on running software on multiple
threads or processors accessing the same data and memory. As operating
systems became more prevalent, they too fell into the category of parallel
processing.

12
5. Distributed database systems

A distributed database is spread out across numerous servers or regions.


Data can be replicated across several platforms. A distributed database
system can be either homogeneous or heterogeneous in nature. A
homogeneous distributed database uses the same database management
system and data model across all systems.

Adding new nodes and locations makes it easier to control and scale
performance. On the other hand, multiple data models and database
management systems are possible with heterogeneous distributed databases.
Gateways are used to translate data across nodes and are typically created
due to the merger of two or more applications or systems.

6. Distributed artificial intelligence

Distributed artificial intelligence is one of the many approaches of artificial


intelligence that is used for learning and entails complex learning algorithms,
large-scale systems, and decision making. It requires a large set of
computational data points located in various locations.

Standardised Communications Protocols


A communications protocol is a set of formal rules describing how to transmit
or exchange data, especially across a network. A standardised
communications protocol is one that has been codified as a standard.
Examples of these include WiFi, the Internet Protocol, and the Hypertext
Transfer Protocol (HTTP). Layer Models
Modern communications protocols rarely operate in isolation and
depend on other protocols in a layered model known as a stack. Each layer in
a stack relies on those below it, and provides for the layers above. For
example, the internet works using the TCP/IP stack, which is divided into four
layers.

 The Link layer governs the direct connections between two devices, such
as a computer and a network switch or a phone and a mobile network
tower.

 The Internet layer routes traffic from a source network to a destination


network.

 The Transport layer routes traffic between any two devices, regardless
of their network. For example, between your computer and a server in
a remote location.

 The Application layer is used commonly by two computer applications


that are communicating with each other. For example, a web browser
will use HTTP to access and retrieve data from a web server.
 From a web server via HTTP using a browser
 From a file server via File Transfer Protocol (FTP) using an FTP client application
 Through a well-documented Application Programming Interface (API).
13
Logical Clock in Distributed System
Logical Clocks refer to implementing a protocol on all machines within your
distributed system, so that the machines are able to maintain consistent
ordering of events within some virtual timespan. A logical clock is a
mechanism for capturing chronological and causal relationships in a
distributed system. Distributed systems may have no physically
synchronous global clock, so a logical clock allows global ordering on events
from different processes in such systems.
Example:
If we go outside then we have made a full plan that at which place we have
to go first, second and so on. We don’t go to second place at first and then
the first place. We always maintain the procedure or an organization that is
planned before. In a similar way, we should do the operations on our PCs
one by one in an organized way. Suppose, we have more than 10 PCs in a
distributed system and every PC is doing its own work but then how we make
them work together. There comes a solution to this i.e. LOGICAL CLOCK.
Method-1:
To order events across process, try to sync clocks in one approach.
This means that if one PC has a time 2:00 pm then every PC should have
the same time which is quite not possible. Not every clock can sync at one
time. Then we can’t follow this method.
Method-2:
Taking the example into consideration, this means if we assign the first place
as 1, second place as 2, and third place as 3 and so on. Then we always
know that the first place will always come first and then so on. Similarly, if
we give each PC their individual number than it will be organized in a way
that 1st PC will complete its process first and then second and so on.
Message delivery rules
Causal ordering of messages is one of the four semantics of multicast
communication namely unordered, totally ordered, causal, and sync-ordered
communication. Multicast communication methods vary according to the
message’s reliability guarantee and ordering guarantee. The causal ordering
of messages describes the causal relationship between a message send event
and a message receive event.
For example, if send(M1) -> send(M2) then every recipient of both the
messages M1 and M2 must receive the message M1 before receiving the
message M2. In Distributed Systems the causal ordering of messages is not
automatically guaranteed.

14
Reasons that may lead to violation of causal ordering of messages
1. It may happen due to a transmission delay.
2. Congestion in the network.
3. Failure of a system.
Protocols that are used to provide causal ordering of messages
1. Birman Schipher Stephenson Protocol
2. Schipher Eggli Sandoz Protocol
Both protocol’s algorithm requires that the messages be delivered reliably
and both prefer that there is no network partitioning between the systems.
The general idea of both protocols is to deliver a message to a process only
if the message immediately preceding it has been delivered to the process.
Otherwise, the message is not delivered immediately instead it is stored in a
buffer memory until the message preceding it has been delivered.
The ISIS System
The ISIS system was developed by Ken Birman and Joseph in 1987
and 1993. It is a framework for reliable distributed communication which is
achieved through the help of process groups. It is a programming toolkit
whose basic features consist of process group management calls and ordered
multicast primitives for communicating with the process group members.
ISIS provides multicast facilities such as unordered multicast (FBCAST),
casually ordered multicast (CBCAST), totally ordered multicast (ABCAST),
and sync-ordered multicast (GBCAST).
The ISIS CBCAST Protocol
ISIS uses vector timestamps to implement causally ordered multicast
between the members of a process group. It is assumed that all the messages
are multicast to all the members of the group including the sender. ISIS uses
UDP/IP protocol as its basic transport facility and sends acknowledgments
and retransmits packets as necessary to achieve reliability. Messages from
a given member are sequenced and delivered in order. There is no

15
assumption that hardware support for broadcast or multicast exists. If IP
multicast is implemented, then ISIS can exploit it to send a single UDP
packet to the appropriate multicast address. IP multicast takes advantage of
hardware like ethernet, for multicast facilities. Otherwise, packets are sent
point-to-point to the individual group members.
Petri net concurrency
A petri net is a curious little graphical modeling language for control
flow in concurrency. They came up in this talk a few weeks ago: Petri-nets as
an Intermediate Representation for Heterogeneous Architectures, but what I
found interesting was how I could describe some common concurrency
structures using this modeling language.
Here is, for example, the well venerated lock:

The way to interpret the graph is thus: each circle is a “petri dish”
(place) that may contain some number of tokens. The square boxes
(transitions) are actions that would like to fire, but in order to do so all of the
petri dishes feeding into them must have tokens. It’s the sort of representation
that you could make into a board game of sorts!
If multiple transitions can fire off, we pick one of them and only that
one succeeds; the ability for a token to flow down one or another arrow
encodes nondeterminism in this model. In the lock diagram, only one branch
can grab the lock token in the middle, but they return it once they exit the
critical area (unlock).
Here is a semaphore:

16
It’s exactly the same, except that the middle place may contain more
than one token. Of course, no one said that separate processes must wait
before signalling. We can implement a simple producer-consumer chain like
this:

Note that petri net places are analogous to MVar (), though it takes a
little care to ensure we are not manufacturing tokens out of thin air in Haskell,
due to the lack of linear types. You may also notice that petri nets say little
about data flow; we can imagine the tokens as data, but the formalism doesn’t
say much about what the tokens actually represent.
The 3 Cloud Computing Service Delivery Models
A Cloud Computing system is potential enough to provide security,
trust valuable computations for the trustable users who start using a pool of
resources in the network within stipulated band of time.
Software as a Service:
The Cloud Computing platform combines different heterogeneous
systems and communicate in an internet medium for his or her applicative
software requirements. The IT business enterprise industry incorporated with
enterprise resource management, customer relationship management and
human resource management serve to worldwide different customers for their
computational needs and requests. Customization performs an essential role
in Software as Service and the application software usage demands the
evaluation of Potential Software Service Provider. The peer Consumer can
manage and make use of the Software as Service with discussion to the SaaS
Provider for limited instances. The accountability of application software
programs used in a network should be measurable and scalable in every
perspective of Consumer. The procedure of Software as Service maintains the
ID of Consumer, collecting of customer requirements, arranging the requested
demands and delivering the requested application program within time in an
interactive way. The primary problem exposes in Software as Service is
Quality of Service (QoS), which is an iterative demand in service landscape of
Consumer service requests. The services of Software as Service Provider
should be ubiquitous. The concept of metering of resource utilization for the
Consumer request should be accountable by the Service Provider and both of
them should also abide with the agreement values as declared in SLA.

Platform as a Service:
The enhanced commitments of software usage were prioritized to the
demanding situations. An adoption of Platform as a Service for today's
computing environment is a Cost benefit approach. The concept of enabled
virtualization for system resources utilization was easily monitored in cloud

17
computing model with this Platform as a Service. This model exhibit some
features of license contracts, arranging simple platform for a complex
requests, minimizes unit installation cost. Elasticity of resources feature in
Platform as a Service distributes the resources utilization with enabled
virtualization, heterogeneity of resource sharing, wide-ranging network
access, service measuring, and service orientation with advanced security
levels. The context of virtualization, Para virtualization and their execution
procedure provides fine usage to customer request of services. Security
checkpoints are maintained at each degree of service levels in Platform as a
Service. The feature of interoperability merges in the service requests of
Consumer and service delivery of Provider. The idea of migration allows
Consumer in search of best Service provider for his service requests. The
escalating activity of QoS determines the bondage link between Consumer
and Provider.
Infrastructure as a Service:
The purchasing of hardware resources for the existing computing needs
may not be simple from the Consumer perspective. The stereoscopic view of
hardware and its cost was not being in a streamline for purchasing to those
demanding situations. The cloud computing efficiently explores in assisting
this issue with cost minimized no more initial large investment. Infrastructure
as a Service provides the services and operations of Host operating system,
VMM logs and DNS server logs to the Consumer service requests. There are
many challenges involved in the Infrastructure as a Service which determines
the empirical profile of the Consumer strategies and actions for utilization of
his service requests. The involvement of security is extensible to all the
computing viewpoints of this service to support for maintaining the consistent
environment with Scalability, flexibility, availability features. A scaling of
different rental schemes, licensing of services and optimal services can be
facilitated to the Consumer, as his demands are legally valuable to the
availability of resources.
Panoramic view of Cloud Computing: BUILDING CLOUD COMPUTING
ENVIRONMENTS

Fig: Panoramic view of Cloud Computing

18
Ethical issues in cloud computing
Cloud computing is based on a paradigm shift with profound implications on
computing ethics. The main elements of this shift are:
1. the control is relinquished to third party services;
2. the data is stored on multiple sites administered by several
organizations; and
3. multiple services interoperate across the network.
Unauthorized access, data corruption, infrastructure failure, or
unavailability are some of the risks related to relinquishing the control to third
party services; moreover, it is difficult to identify the source of the problem
and the entity causing it. Systems can span the boundaries of multiple
organizations and cross the security borders, a process called
deperimeterisation. As a result of de-perimeterisation “not only the border of
the organizations IT infrastructure blurs, also the border of the accountability
becomes less clear”.
The complex structure of cloud services can make it difficult to determine
who is responsible in case something undesirable happens. In a complex
chain of events or systems, many entities contribute to an action with
undesirable consequences, some of them have the opportunity to prevent
these consequences, and therefore no one can be held responsible, the so-
called “problem of many hands.”
Accountability is a necessary ingredient of cloud computing; adequate
information about how data is handled within the cloud and about allocation
of responsibility are key elements to enforcing ethics rules in cloud
computing. Recorded evidence allows us to assign responsibility; but there
can be tension between privacy and accountability and it is important to
establish what is being recorded, and who has access to the records.
Unwanted dependency on a cloud service provider, the so-called vendor lock-
in, is a serious concern and the current standardization efforts at NIST
attempt to address this problem. Another concern for the users is a future
with only a handful of companies which dominate the market and dictate
prices and policies

Challenges of Cloud Computing


Cloud computing is a hot topic at the moment, and there is a lot of
ambiguity when it comes to managing its features and resources. Technology
is evolving, and as companies scale up, their need to use the latest Cloud
frameworks also increases. Some of the benefits introduced by cloud solutions
include data security, flexibility, efficiency, and high performance. Smoother
processes and improved collaboration between enterprises while reducing
costs are among its perks. However, the Cloud is not perfect and has its own
set of drawbacks when it comes to data management and privacy concerns.
Thus, there are various benefits and challenges of cloud computing.

19
1. Data Security and Privacy
Data security is a major concern when working with Cloud environments. It
is one of the major challenges in cloud computing as users have to take
accountability for their data, and not all Cloud providers can assure 100%
data privacy. Lack of visibility and control tools, no identity access
management, data misuse, and Cloud misconfiguration are the common
causes behind Cloud privacy leaks. There are also concerns with insecure
APIs, malicious insiders, and oversights or neglect in Cloud data
management.
2. Multi-Cloud Environments
Common cloud computing issues and challenges with multi-cloud
environments are - configuration errors, lack of security patches, data
governance, and no granularity. It is difficult to track the security
requirements of multi-clouds and apply data management policies across
various boards.
3. Performance Challenges
The performance of Cloud computing solutions depends on the vendors who
offer these services to clients, and if a Cloud vendor goes down, the business
gets affected too. It is one of the major challenges associated with cloud
computing.
4. Interoperability and Flexibility
Interoperability is a challenge when you try to move applications between two
or multiple Cloud ecosystems. It is one of the challenges faced in cloud
computing. Some common issues faced are:
 Rebuilding application stacks to match the target cloud environment's
specifications
 Handling data encryption during migration
 Setting up networks in the target cloud for operations
 Managing apps and services in the target cloud ecosystem

20
5. High Dependence on Network
Lack of sufficient internet bandwidth is a common problem when transferring
large volumes of information to and from Cloud data servers. It is one of the
various challenges in cloud computing. Data is highly vulnerable, and there
is a risk of sudden outages. Enterprises that want to lower hardware costs
without sacrificing performance need to ensure there is high bandwidth,
which will help prevent business losses from sudden outages.
6. Lack of Knowledge and Expertise
Organizations are finding it tough to find and hire the right Cloud talent,
which is another common challenge in cloud computing. There is a shortage
of professionals with the required qualifications in the industry. Workloads
are increasing, and the number of tools launched in the market is increasing.
Enterprises need good expertise in order to use these tools and find out which
ones are ideal for them.
7. Reliability and Availability
High unavailability of Cloud services and a lack of reliability are two major
concerns in these ecosystems. Organizations are forced to seek additional
computing resources in order to keep up with changing business
requirements. If a Cloud vendor gets hacked or affected, the data of
organizations using their services gets compromised. It is another one of the
many cloud security risks and challenges faced by the industry.
8. Password Security
Account managers use the same passwords to manage all their Cloud
accounts. Password management is a critical problem, and it is often found
that users resort to using reused and weak passwords.
9. Cost Management
Even though Cloud Service Providers (CSPs) offer a pay-as-you-go
subscription for services, the costs can add up. Hidden costs appear in the
form of underutilized resources in enterprises.
10. Lack of expertise
Cloud computing is a highly competitive field, and there are many
professionals who lack the required skills and knowledge to work in the
industry. There is also a huge gap in supply and demand for certified
individuals and many job vacancies.
11. Control or Governance
Good IT governance ensures that the right tools are used, and assets get
implemented according to procedures and agreed-to policies. Lack of
governance is a common problem, and companies use tools that do not align
with their vision. IT teams don't get total control of compliance, risk
management, and data quality checks, and there are many uncertainties
faced when migrating to the Cloud from traditional infrastructures.

21
12. Compliance
Cloud Service Providers (CSP) are not up-to-date when it comes to having the
best data compliance policies. Whenever a user transfers data from internal
servers to the Cloud, they run into compliance issues with state laws and
regulations.
13. Multiple Cloud Management
Enterprises depend on multiple cloud environments due to scaling up and
provisioning resources. One of the hybrid cloud security challenges is that
most companies follow a hybrid cloud strategy, and many resort to multi-
cloud. The problem is that infrastructures grow increasingly complex and
difficult to manage when multiple cloud providers get added, especially due
to technological cloud computing challenges and differences.
14. Migration
Migration of data to the Cloud takes time, and not all organizations are
prepared for it. Some report increased downtimes during the process, face
security issues, or have problems with data formatting and conversions.
Cloud migration projects can get expensive and are harder than anticipated.
15. Hybrid-Cloud Complexity
Hybrid-cloud complexity refers to cloud computing challenges arising from
mixed computing, storage, and services, and multi-cloud security causes
various challenges. It comprises private cloud services, public Clouds, and
on-premises infrastructures, for example, products like Microsoft Azure and
Amazon Web Services - which are orchestrated on various platforms.
Net-Centric Computing in Cloud Computing
Net-centric computing is a set of principles that have been heavily
adopted by the nonprofit organization Cloud Security Alliance. Cloud
computing is here to stay, but the cloud ecosystem can be complex to
navigate. This article will look at net-centric principles and break them down
for you in plain English.
The cloud is a computer network of remote computing servers, made
accessible on-demand to users who may be located anywhere. The cloud
gives you the ability to access, store and share your information and data
from any Internet-connected device.
The cloud has revolutionized the way that companies store and share
information in comparison to traditional on-premise infrastructures.
However, not all organizations have yet taken advantage of this technology.
The Cloud Computing Service Provider industry includes those firms are as
follows:
IaaS (Infrastructure as a Service)
PaaS (Platform as a Service)
SaaS (Software as a Service)

22
SaaS is basically the application delivery over the Internet. The
application is installed on to the cloud provider’s servers and each user has
a web browser interface to access the applications. The data that you store
in this environment can be accessed from any device with an internet
connection.
PaaS offers a platform over the cloud where each user can access
resources such as databases, storage, and bandwidth with a single login.
The platform enables users to develop and deploy applications in which they
can use applications programming interfaces (API).
IaaS provides storage, processor power, memory, operating systems,
and networking capabilities to customers so that they do not have to buy
and maintain their own computer system infrastructure.
Net-Centric:
Net-Centric is a way to manage your data, applications, and
infrastructure in the cloud. Net-centric cloud computing can be considered
an evolution of Software as a Service (SaaS). It leverages the power of the
Internet to provide an environment for data, applications, and infrastructure
on demand. It allows you to manage everything from one interface without
worrying about hardware or server management issues.
The term net-centric combines network-based computing with its
integration of various types of information technology resources – servers,
storage devices, servers, computers – into centralized repositories that are
served using standard Web-based protocols such as HTTP or HTTPS via a
global computer communications network like the internet.
Net-centric computing allows organizations to focus on their core
business needs without limiting themselves by software or hardware
limitations imposed on their infrastructure. In other words, when an
organization adopts net-centric principles, they are able to completely
virtualize its IT footprint while still being able to take advantage of modern
networking technologies like LANs and WANs.
Net-centric cloud computing service is a combination of IaaS, PaaS,
and SaaS. What this means is that instead of buying hardware and software
for your own data center, you buy it from the cloud provider. This gives you
the ability to move your data to the cloud and access it from anywhere.
Net-centric computing service allows you to centralize your
applications with a single interface. It provides fully managed services
according to user’s specific requirements, which are invoked in real-time as
needed rather than being provided on-demand or already provisioned for
use. The concept of net-centric computing enables multiple distributed
clients to access a single entity’s applications in real-time.
Benefits of Net-Centric Computing:
Net-centric computing allows organizations to effectively manage their
IT infrastructure via a unified application that is more flexible and easier to

23
maintain without the added overhead of operating multiple hardware
platforms. In turn, organizations of all sizes can now enjoy the same benefits
that larger more traditional enterprises are able to with their own data
centers. The net-centric virtualization platform establishes a single
management point for security, performance, and capacity, as well as cloud
applications and services.
In cloud computing, there are many advantages over traditional data
center technologies. Cloud computing allows for agility on a business level
by not having to invest in maintaining multiple physical data centers.
Cloud computing has gained traction with both enterprises and
consumers. It is expected that CSPs will continue to embrace this technology
as it becomes the norm for organizations of all kinds. As a result, CISOs
need to be trained on how to adopt net-centric principles for managing the
cloud without limits in order to be successful within this new market.
What is Network Centric Content?
Network Centric Content is content designed with the purpose of being
centrally managed in an environment where it is distributed via a wide-area
network. Network Centric Content is designed to be consumed on any device,
in any location, and at any time. In short, network centric content is a new
way of creating and deploying digital assets. It means creating content once
and then making it available across all the digital channels where your
customers expect to find it. Network centric content is not just a single
approach. It’s an industry-wide movement to radically change the way
organizations create, manage, and deliver content across all of their digital
channels. It’s a significant shift away from the old, monolithic content
management systems (CMS) that were designed to be hosted on-premises and
built around a single application.
Network Centric Computing is the utilization of computer networks and
their components to enhance the delivery, scalability, and functionality of
information systems. Network centric computing infrastructure is designed to
enable end-to-end connectivity so that different systems and services can be
integrated for end users. In short, network centric computing is a method for
delivering computing services—typically software applications like Salesforce,
Workday, and Box — across a network rather than on dedicated hardware or
software installed locally.
Difference between Network Centric Computing and Network Centric
Content
 Network Centric Computing and Network Centric Content are
often confused with one another, but the two systems actually
have very different goals.
 Network Centric Content is designed to be consumed on any
device, in any location, and at any time.
 Network Centric Computing is designed to enable end-to-end
connectivity so that different systems and services can be
integrated for end users.

24
Advantages of Network-Centric Content
There are a number of advantages to adopting a network centric content
approach, including:
 Easy Scalability - The ability to quickly and easily scale up or down to
meet the needs of your organization, regardless of whether those needs
are seasonal or cyclical in nature.
 Reduced Costs - Reducing the amount of hardware, software, and
infrastructure required to support your organization.
 Improved Security - Ensuring appropriate levels of confidentiality,
availability, and integrity by using modern security technologies such
as encryption, key management, and identity and access management.
 Improved Customer Experience - Providing a consistent and seamless
experience for customers across all channels and locations, including
devices, offices, and regions.
What is P2P(Peer-to-Peer Process)
A peer-to-peer network is a simple network of computers. It first came
into existence in the late 1970s. Here each computer acts as a node for file
sharing within the formed network. Here each node acts as a server and thus
there is no central server in the network. This allows the sharing of a huge
amount of data. The tasks are equally divided amongst the nodes. Each node
connected in the network shares an equal workload. For the network to stop
working, all the nodes need to individually stop working. This is because
each node works independently.
History of P2P Networks
Before the development of P2P, USENET came into existence in 1979.
The network enabled the users to read and post messages. Unlike the forums
we use today, it did not have a central server. It is used to copy the new
messages to all the servers of the node.
 In the 1980s the first use of P2P networks occurred after personal
computers were introduced.
 In August 1988, the internet relay chat was the first P2P network built
to share text and chat.
 In June 1999, Napster was developed which was a file-sharing P2P
software. It could be used to share audio files as well. This software
was shut down due to the illegal sharing of files. But the concept of
network sharing i.e P2P became popular.
 In June 2000, Gnutella was the first decentralized P2P file sharing
network. This allowed users to access files on other users’ computers
via a designated folder.
Types of P2P networks
Unstructured P2P networks: In this type of P2P network, each device is
able to make an equal contribution. This network is easy to build as devices
can be connected randomly in the network. But being unstructured, it
becomes difficult to find content. For example, Napster, Gnutella, etc.
Structured P2P networks: It is designed using software that creates a
virtual layer in order to put the nodes in a specific structure. These are not
easy to set up but can give easy access to users to the content. For example,
P-Grid, Kademlia, etc.

25
Hybrid P2P networks: It combines the features of both P2P networks and
client-server architecture. An example of such a network is to find a node
using the central server.
Features of P2P network
These networks do not involve a large number of nodes, usually less
than 12. All the computers in the network store their own data but this data
is accessible by the group.
Unlike client-server networks, P2P uses resources and also provides
them. This results in additional resources if the number of nodes increases.
It requires specialized software. It allows resource sharing among the
network.
Since the nodes act as clients and servers, there is a constant threat of
attack. Almost all OS today support P2P networks.
P2P Network Architecture
In the P2P network architecture, the computers connect with each
other in a workgroup to share files, and access to internet and printers.
Each computer in the network has the same set of responsibilities and
capabilities.
The architecture is useful in residential areas, small offices, or small
companies where each computer act as an independent workstation and
stores the data on its hard drive. Each computer in the network has the
ability to share data with other computers in the network.
The architecture is usually composed of workgroups of 12 or more
computers.

How Does P2P Network Work?


Let’s understand the working of the Peer-to-Peer network through an
example. Suppose, the user wants to download a file through the peer-to-
peer network then the download will be handled in this way:

26
 If the peer-to-peer software is not already installed, then the user first
has to install the peer-to-peer software on his computer.
 This creates a virtual network of peer-to-peer application users.
 The user then downloads the file, which is received in bits that come
from multiple computers in the network that have already that file.
 The data is also sent from the user’s computer to other computers in
the network that ask for the data that exist on the user’s computer.
 Thus, it can be said that in the peer-to-peer network the file transfer
load is distributed among the peer computers.
How to Use a P2P Network Efficiently?
 Firstly secure your network via privacy solutions. Below are some of
the measures to keep the P2P network secure:
Share and download legal files:
Double-check the files that are being downloaded before sharing them
with other employees. It is very important to make sure that only legal files
are downloaded.
Design strategy for sharing:
Design a strategy that suits the underlying architecture in order to
manage applications and underlying data.
Keep security practices up-to-date:
Keep a check on the cyber security threats which might prevail in the
network. Invest in good quality software that can sustain attacks and
prevent the network from being exploited. Update your software regularly.
Scan all downloads:
This is used to constantly check and scan all the files for viruses before
downloading them. This helps to ensure that safe files are being downloaded
and in case, any file with potential threat is detected then report to the IT
Staff.
Proper shutdown of P2P networking after use:
It is very important to correctly shut down the software to avoid
unnecessary access to third persons to the files in the network. Even if the
windows are closed after file sharing but the software is still active then the
unauthorized user can still gain access to the network which can be a major
security breach in the network.
Applications of P2P Network
Below are some of the common uses of P2P network:
File sharing:
P2P network is the most convenient, cost-efficient method for file sharing for
businesses. Using this type of network there is no need for intermediate
servers to transfer the file.
Blockchain:
The P2P architecture is based on the concept of decentralization. When a
peer-to-peer network is enabled on the blockchain it helps in the
maintenance of a complete replica of the records ensuring the accuracy of
the data at the same time. At the same time, peer-to-peer networks ensure
security also.

27
Direct messaging:
P2P network provides a secure, quick, and efficient way to
communicate. This is possible due to the use of encryption at both the peers
and access to easy messaging tools.
Collaboration:
The easy file sharing also helps to build collaboration among other
peers in the network.
File sharing networks:
Many P2P file sharing networks like G2, and eDonkey have
popularized peer-to-peer technologies.
Content distribution:
In a P2P network, unline the client-server system so the clients can
both provide and use resources. Thus, the content serving capacity of the
P2P networks can actually increase as more users begin to access the
content.
IP Telephony:
Skype is one good example of a P2P application in VoIP.
Advantages of P2P Network
Easy to maintain:
The network is easy to maintain because each node is independent of
the other.
Less costly:
Since each node acts as a server, therefore the cost of the central
server is saved. Thus, there is no need to buy an expensive server.
No network manager:
In a P2P network since each node manages his or her own computer,
thus there is no need for a network manager.
Adding nodes is easy: Adding, deleting, and repairing nodes in this network
is easy.
Less network traffic:
In a P2P network, there is less network traffic than in a client/ server
network.

Disadvantages of P2P Network

Data is vulnerable:
Because of no central server, data is always vulnerable to getting lost
because of no backup.
Less secure:
It becomes difficult to secure the complete network because each node
is independent.
Slow performance:
In a P2P network, each computer is accessed by other computers in
the network which slows down the performance of the user.
Files hard to locate:
In a P2P network, the files are not centrally stored, rather they are
stored on individual computers which makes it difficult to locate the files.
Examples of P2P networks

28
P2P networks can be basically categorized into three levels.
 The first level is the basic level which uses a USB to create a P2P
network between two systems.
 The second is the intermediate level which involves the usage of copper
wires in order to connect more than two systems.
 The third is the advanced level which uses software to establish
protocols in order to manage numerous devices across the internet.

S.NO Concurrency Parallelism

While parallelism is the task


Concurrency is the task of running
of running multiple
1. and managing the multiple
computations
computations at the same time.
simultaneously.

Concurrency is achieved through the


interleaving operation of processes on While it is achieved by
2. the central processing unit (CPU) or through multiple central
in other words by the context processing units (CPUs).
switching.

While this can’t be done by


Concurrency can be done by using a using a single processing
3.
single processing unit. unit. it needs multiple
processing units.

While it improves the


Concurrency increases the amount of throughput and
4.
work finished at a time. computational speed of the
system.

Concurrency deals lot of things While it do lot of things


5.
simultaneously. simultaneously.

Concurrency is the non-deterministic While it is deterministic


6.
control flow approach. control flow approach.

While in this debugging is


In concurrency debugging is very
7. also hard but simple than
hard.
concurrency.

29
Explain Microsoft Windows Azure?
Azure is Microsoft’s cloud platform, just like Google has its Google
Cloud and Amazon has its Amazon Web Service or AWS.000. Generally, it is
a platform through which we can use Microsoft’s resources. For example, to
set up a huge server, we will require huge investment, effort, physical space,
and so on. In such situations, Microsoft Azure comes to our rescue. It will
provide us with virtual machines, fast processing of data, analytical and
monitoring tools, and so on to make our work simpler. The pricing of Azure is
also simpler and cost-effective. Popularly termed as “Pay As You Go”, which
means how much you use, pay only for that.
Azure History
Microsoft unveiled Windows Azure in early October 2008 but it went to live
after February 2010. Later in 2014, Microsoft changed its name from Windows
Azure to Microsoft Azure. Azure provided a service platform for .NET services,
SQL Services, and many Live Services. Many people were still very skeptical
about “the cloud”. As an industry, we were entering a brave new world with
many possibilities. Microsoft Azure is getting bigger and better in the coming
days. More tools and more functionalities are getting added. It has two
releases as of now. It’s a famous version of Microsoft Azure v1 and
later Microsoft Azure v2. Microsoft Azure v1 was more JSON script-driven
than the new version v2, which has interactive UI for simplification and easy
learning. Microsoft Azure v2 is still in the preview version.
Azure can help our business in the following ways-
Capital less: We don’t have to worry about the capital as Azure cuts out the
high cost of hardware. You simply pay as you go and enjoy a subscription-
based model that’s kind to your cash flow. Also, setting up an Azure account
is very easy. You simply register in Azure Portal and select your required
subscription and get going.
Less Operational Cost: Azure has a low operational cost because it runs on
its servers whose only job is to make the cloud functional and bug-free, it’s
usually a whole lot more reliable than your own, on-location server.
Cost Effective: If we set up a server on our own, we need to hire a tech
support team to monitor them and make sure things are working fine. Also,
there might be a situation where the tech support team is taking too much
time to solve the issue incurred in the server. So, in this regard is way too
pocket-friendly.
Easy Back-Up and Recovery options: Azure keeps backups of all your
valuable data. In disaster situations, you can recover all your data in a single
click without your business getting affected. Cloud-based backup and
recovery solutions save time, avoid large up-front investments and roll up
third-party expertise as part of the deal.
Easy to implement: It is very easy to implement your business models in
Azure. With a couple of on-click activities, you are good to go. Even there are
several tutorials to make you learn and deploy faster.

30
Better Security: Azure provides more security than local servers. Be carefree
about your critical data and business applications. As it stays safe in the
Azure Cloud. Even, in natural disasters, where the resources can be harmed,
Azure is a rescue. The cloud is always on.
Work from anywhere: Azure gives you the freedom to work from anywhere
and everywhere. It just requires a network connection and credentials. And
with most serious Azure cloud services offering mobile apps, you’re not
restricted to which device you’ve got to hand.
Increased collaboration: With Azure, teams can access, edit and share
documents anytime, from anywhere. They can work and achieve future goals
hand in hand. Another advantage of Azure is that it preserves records of
activity and data. Timestamps are one example of Azure’s record-keeping.
Timestamps improve team collaboration by establishing transparency and
increasing accountability.

Microsoft Azure Services


Compute: Includes Virtual Machines, Virtual Machine Scale Sets, Functions
for serverless computing, Batch for containerized batch workloads, Service
Fabric for microservices and container orchestration, and Cloud Services for
building cloud-based apps and APIs.
Networking: With Azure, you can use a variety of networking tools, like the
Virtual Network, which can connect to on-premise data centers; Load
Balancer; Application Gateway; VPN Gateway; Azure DNS for domain hosting,
Content Delivery Network, Traffic Manager, ExpressRoute dedicated private
network fiber connections; and Network Watcher monitoring and diagnostics
Storage: Includes Blob, Queue, File, and Disk Storage, as well as a Data Lake
Store, Backup, and Site Recovery, among others.
Web + Mobile: Creating Web + Mobile applications is very easy as it includes
several services for building and deploying applications.
Containers: Azure has a property that includes Container Service, which
supports Kubernetes, DC/OS or Docker Swarm, and Container Registry, as
well as tools for microservices.
Databases: Azure also included several SQL-based databases and related
tools.
Data + Analytics: Azure has some big data tools like HDInsight for Hadoop
Spark, R Server, HBase, and Storm clusters
AI + Cognitive Services: With Azure developing applications with artificial
intelligence capabilities, like the Computer Vision API, Face API, Bing Web
Search, Video Indexer, and Language Understanding Intelligent.
Internet of Things: Includes IoT Hub and IoT Edge services that can be
combined with a variety of machine learning, analytics, and communications
services.
Security + Identity: Includes Security Center, Azure Active Directory, Key
Vault, and Multi-Factor Authentication Services.

31
Developer Tools: Includes cloud development services like Visual Studio
Team Services, Azure DevTest Labs, HockeyApp mobile app deployment and
monitoring, Xamarin cross-platform mobile development, and more.
Amazon Web Services - Cloud Computing
In 2006, Amazon Web Services (AWS) started to offer IT services to the
market in the form of web services, which is nowadays known as cloud
computing. With this cloud, we need not plan for servers and other IT
infrastructure which takes up much of time in advance. Instead, these
services can instantly spin up hundreds or thousands of servers in minutes
and deliver results faster. We pay only for what we use with no up-front
expenses and no long-term commitments, which makes AWS cost efficient.
Today, AWS provides a highly reliable, scalable, low-cost infrastructure
platform in the cloud that powers multitude of businesses in 190 countries
around the world.

AWS CAF Platform perspective capabilities


Platform architecture – Establish and maintain guidelines, principles,
patterns, and guardrails for your cloud environment. A well-architectedcloud
environment will help you accelerate implementation, reduce risk, and drive
cloud adoption. Create consensus within your organization for enterprise
standards that will drive cloud adoption. Define best Practice
blueprints and guardrails to facilitate authentication, security, networking,
and logging and monitoring. Consider what workloads you may need to
retain on-premises due to latency, data processing, or data residency
requirements. Evaluate such hybrid cloud use cases as cloud bursting,
backup and disaster recovery to the cloud, distributed data processing, and
edge computing.
Data architecture – Design and evolve a fit-for-purpose data and analytics
architecture. A well-designed data and analytics architecture can help you
reduce complexity, cost, and technical debt while enabling you to gain
actionable insights from exponentially growing data volumes. Adopt a layered
and modular architecture that will allow you to use the right tool for the right
job as well as iteratively and incrementally evolve your architecture to meet
emerging requirements and use cases.

32
Platform engineering – Build a compliant multi-account cloud environment
with enhanced security features, and packaged, reusable cloud products. An
effective cloud environment will allow your teams to easily provision new
accounts, while ensuring that those accounts conform to organizational
policies. A curated set of cloud products will enable you to codify best
practices, helping you with governance while increasing the speed and
consistency of your cloud deployments. Deploy your best practice blueprints,
and detective and preventative guardrails. Integrate your cloud environment
with your existing ecosystem to enable desired hybrid cloud use cases.
Data engineering – Automate and orchestrate data flows across your
organization. Automated data and analytics platforms and pipelines may help
you improve productivity and accelerate time to market. Form cross-
functional data engineering teams comprising infrastructure and operations,
software engineering, and data management. Leverage metadata to
automate pipelines that consume raw and produce optimized data. Implement
relevant architectural guardrails and security controls, as well as monitoring,
logging, and alerting to help with pipeline failures. Identify common data
integration patterns and build reusable blueprints that abstract away the
complexity of pipeline development. Share blueprints with business analysts
and data scientists and enable them to operate using self-service methods.
Provisioning and orchestration – Create, manage, and distribute catalogs of
approved cloud products to end users. Maintaining consistent infrastructure
provisioning in a scalable and repeatable manner becomes more complex as
your organization grows.
Modern application development – Build well-architected cloud-native
applications. Modern application development practices can help you realize
the speed and agility that go with innovation. Using containers and serverless
technologies can help you optimize your resource utilization and
automatically scale from zero to peak demands. Consider decoupling your
applications by building them as independent microservices leveraging event-
driven architectures. Implement security in all layers and at each stage of the
application development lifecycle.
Continuous integration and continuous delivery – Evolve and improve
applications and services at a faster pace than organizations using traditional
software development and infrastructure management processes.
Adopting DevOps practices with continuous integration, testing,
and deployment will help you to become more agile so that you can innovate
faster, adapt to changing markets better, and grow more efficient at driving
business results. Implement continuous integration and continuous delivery
(CI/CD) pipelines.
Explain the cloud infrastructure at Microsoft Windows Azure
perspectives
Windows Azure
Windows Azure is a cloud based operating system which enables
running of business applications, services and workload in the cloud
itself. This works similar to traditional operating system functionality on any
hardware platform and it allows applications to run in a virtual environment

33
by providing them indispensable physical hardware components and a set of
services.
Microsoft designed Azure to enable .NET professionals to strengthen
their capability to develop ASP.NET websites, XML and WCF web services. The
main advantage of this is that it offers a distributed operating system in
which we can build, test, and deploy applications without worrying about the
front-end interface. Moreover, we forget choices of hardware, capacity, server,
location of disk space or the computer names while using PaaS. The standard
protocols such as REST and HTTP are associated with the interactions with
this kind of virtual storage.
Benefits of Windows Azure
Azure platform offers reduction of the cost of IT management, and eliminates
the need for building on-premises resources. User can develop, alter, test and
deploy applications over the web with a minimum number of available
resources. As well as, a user can create, debug, test and distribute web
services very quickly. In Azure, there is no need to purchase or configure a
security solution and is redundant to install and configure a database cluster.
Azure carries out expensive services off-premises, including large data
processing, high volume computation and batch processing.
Windows Azure Platform
Windows Azure platform mainly stands with the foundation of running
applications and keeping the data on the cloud. This contains computer
services, storage services and fabric. Windows Azure affords a wide range of
capabilities in the form of computing services to run applications, storage
services and provides a framework that supports several applications as well
as host services and manages all together centrally.
The Azure platform is a group of three cloud technologies as shown below:

Windows Azure
Windows Azure provides a virtual Windows runtime for executing
applications and storing data on computers in Microsoft data center which
includes computational services, basic storage, queues, web servers,
management services, and load-balancers. This also offers a local
development fabric for building and testing services before they are deployed
to Windows Azure in the cloud. The application that are developed for
Windows Azure scales better and more reliable, requires less administration
than that are developed through traditional Windows programming model.

34
Users just spend money for the computing and storage they are consuming,
instead of maintaining an enormous set of servers.
AppFabric (.NET Services)
Windows Azure platform’s major backbone is AppFabric which is a
cloud-based infrastructure service for applications running in the cloud and
it allows the creation of combined access and distributed messaging across
clouds and enterprises. The goal of Fabric is to put altogether the massive
distributed processing power in a unified manner. AppFabric is a middleware
component that consists of services like Access Control, Workflow service and
service bus.

SQL Azure
The core RDBMS is offered by SQL Azure as a service in the cloud
environment. The developers can access it using a tabular data stream that
is the typical way to access on-premise SQL Server instances. Developers can
create tables, indexes and views, use Stored Procedures and define triggers
alike SQL Server’s features. Application software can access SQL Azure data
using Entity Framework, ADO.NET and other Windows data access
interfaces. The significant benefit of SQL Azure is that the management
requirements are significantly reduced because they need not worry about
other operations, such as monitoring disk usage and service log files.

Azure Marketplace
The Windows Azure marketplace contains data and various other
application market segments including data and web services from leading
commercial data providers and authorized public data sources. The Windows
Azure Marketplace is further divided into the following two categories:
App Market: It exposes the applications or service built by developers to
potential customers; so that they can easily choose from them to meet their
needs.

35
Data Market: Today, many organizations express their readiness to sell many
kinds of data, including demographic information, financial information, legal
information, and much more. Hence, Data Market offers a chance to expose
their offerings to more customers using Microsoft’s cloud platform. In simple
words, Data Market provides a single place to find, buy, and access a variety
of commercial datasets.
Azure Development Life Cycle
 Create a Windows Azure account and Login using Microsoft Live ID.
 Prepare the development fabric to build an application in the local cloud
platform.
 Test the application in the development fabric.
 Package the application for cloud deployment.
 Test the application on Windows Azure in the cloud.
 Deploy the application in the production farm.

Open-Source Cloud
Open-source cloud is any cloud service or solution that is built using
open-source software and technologies. This includes any public, private or
hybrid cloud model providing SaaS, IaaS, PaaS or XaaS built and operated
entirely on open-source technologies.
Techopedia Explains Open-Source Cloud
 An open-source cloud is designed and developed using open-source
technologies and software such as:
 Open-source operating system, DBMS and software development
frameworks
 Open-source workflow and business applications
 Virtualization stack (Hypervisor, virtualization management)
 Hardware with open-source firmware
Moreover, open-source cloud may also refer to any cloud service that
provides open-source software or service to end users or businesses.
Businesses/cloud providers have the option to customize open-source cloud
solutions to a greater extent, which is generally prohibited in closed-source
cloud models. Open-source cloud solutions generally are interoperable with
any back-end platform and can easily be migrated to a different IT
infrastructure environment, Open Nebula, Open Stack and Virtual Box are
common examples of open-source cloud.

What is Cloud Storage?


Cloud Storage is a mode of computer data storage in which digital data
is stored on servers in off-site locations. The servers are maintained by a third-
party provider who is responsible for hosting, managing, and securing data
stored on its infrastructure. The provider ensures that data on its servers is
always accessible via public or private internet connections.
Cloud Storage enables organizations to store, access, and maintain
data so that they do not need to own and operate their own data centers,
moving expenses from a capital expenditure model to operational. Cloud

36
Storage is scalable, allowing organizations to expand or reduce their data
footprint depending on need. Google Cloud provides a variety of scalable
options for organizations to store their data in the cloud. Learn more
about Cloud Storage at Google Cloud.
How does Cloud Storage work?
Cloud Storage uses remote servers to save data, such as files, business
data, videos, or images. Users upload data to servers via an internet
connection, where it is saved on a virtual machine on a physical server. To
maintain availability and provide redundancy, cloud providers will often
spread data to multiple virtual machines in data centers located across the
world. If storage needs increase, the cloud provider will spin up more virtual
machines to handle the load. Users can access data in Cloud Storage through
an internet connection and software such as web portal, browser, or mobile
app via an application programming interface (API).

Cloud Storage is available in four different models:


Public
Public Cloud Storage is a model where an organization stores data in a
service provider’s data centers that are also utilized by other companies. Data
in public Cloud Storage is spread across multiple regions and is often offered
on a subscription or pay-as-you-go basis. Public Cloud Storage is considered
to be “elastic” which means that the data stored can be scaled up or down
depending on the needs of the organization. Public cloud providers typically
make data available from any device such as a smartphone or web portal.
Private
Private Cloud Storage is a model where an organization utilizes its own
servers and data centers to store data within their own network. Alternatively,
organizations can deal with cloud service providers to provide dedicated
servers and private connections that are not shared by any other organization.
Private clouds are typically utilized by organizations that require more control
over their data and have stringent compliance and security requirements.

37
Hybrid
A hybrid cloud model is a mix of private and public cloud storage
models. A hybrid cloud storage model allows organizations to decide which
data it wants to store in which cloud. Sensitive data and data that must meet
strict compliance requirements may be stored in a private cloud while less
sensitive data is stored in the public cloud. A hybrid cloud storage model
typically has a layer of orchestration to integrate between the two clouds. A
hybrid cloud offers flexibility and allows organizations to still scale up with
the public cloud if need arises.
Multicloud
A multicloud storage model is when an organization sets up more than
one cloud model from more than one cloud service provider (public or private).
Organizations might choose a multicloud model if one cloud vendor offers
certain proprietary apps, an organization requires data to be stored in a
specific country, various teams are trained on different clouds, or the
organization needs to serve different requirements that are not stated in the
servicers’ Service Level Agreements. A multicloud model offers organizations
flexibility and redundancy.

Advantages of Cloud Storage


Total cost of ownership
Cloud Storage enables organizations to move from a capital expenditure
to an operational expenditure model, allowing them to adjust budgets and
resources quickly.
Elasticity
Cloud Storage is elastic and scalable, meaning that it can be scaled up
(more storage added) or down (less storage needed) depending on the
organization’s needs.
Flexibility
Cloud Storage offers organizations flexibility on how to store and access
data, deploy and budget resources, and architect their IT infrastructure.
Security
Most cloud providers offer robust security, including physical security
at data centers and cutting edge security at the software and application
levels. The best cloud providers offer zero trust architecture, identity
and access management, and encryption.
Sustainability
One of the greatest costs when operating on-premises data centers is
the overhead of energy consumption. The best cloud providers operate on
sustainable energy through renewable resources.
Redundancy
Redundancy (replicating data on multiple servers in different locations)
is an inherent trait in public clouds, allowing organizations to recover from
disasters while maintaining business continuity.
Disadvantages of Cloud Storage
Compliance
Certain industries such as finance and healthcare have stringent
requirements about how data is stored and accessed. Some public cloud

38
providers offer tools to maintain compliance with applicable rules and
regulations.
Latency
Traffic to and from the cloud can be delayed because of network traffic
congestion or slow internet connections.
Control
Storing data in public clouds relinquishes some control over access and
management of that data, entrusting that the cloud service provider will
always be able to make that data available and maintain its systems and
security.
Outages
While public cloud providers aim to ensure continuous availability,
outages sometimes do occur, making stored data unavailable.
Challenges
Storing the data in cloud is not that simple task. Apart from its
flexibility and convenience, it also has several challenges faced by the
customers. The customers must be able to:
 Get provision for additional storage on-demand.
 Know and restrict the physical location of the stored data.
 Verify how data was erased.
 Have access to a documented process for disposing of data storage
hardware.
 Have administrator access control over data.

Explain about energy use and ecological impact in cloud computing

Energy Efficiency in Cloud Computing


Cloud computing is an internet based computing which provides
metering based services to consumers. It means accessing data from a
centralized pool of compute resources that can be ordered and consumed on
demand. It also provides computing resources through virtualization over
internet.
Data center is the most prominent in cloud computing which contains
collection of servers on which Business information is stored and
applications run. Data center which includes servers, cables, air conditioner,
network etc. consumes more power and releases huge amount of Carbon-di-
oxide (CO2) to the environment. One of the most important challenge faced
in cloud computing is the optimization of Energy Utilization. Hence the
concept of green cloud computing came into existence.
There are multiple techniques and algorithms used to minimize the energy
consumption in cloud.
Techniques include:
Dynamic Voltage and Frequency Scaling (DVFS)
Virtual Machine (VM)
Migration and VM Consolidation
Algorithms are:
Maximum Bin Packing
Power Expand Min-Max and Minimization Migrations
Highest Potential growth

39
The main purpose of all these approaches is to optimize the energy utilization
in cloud.
Cloud Computing as per NIST is, “Cloud Computing is a model for
enabling ubiquitous, convenient, on-demand network access to a shared
pool of configurable computing resources (e.g., networks, servers, storage,
applications and services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction.” Now-a-days
most of the business enterprises and individual IT Companies are opting for
cloud in order to share business information.
The main expectation of cloud service consumer is to have a reliable
service. To satisfy consumer’s expectation several Data centers are
established all over the world and each Data center contains thousands of
servers. Small amount of workload on server consumes 50% of the power
supply. .Cloud service providers ensure that reliable and load balancing
services to the consumers around the world by keeping servers ON all the
time. To satisfy this SLA provider has to supply power continuously to data
centers leads to huge amount of energy utilization by the data center and
simultaneously increases the cost of investment.
The major challenge is utilization of energy efficiently and hence
develops an eco-friendly cloud computing.
The idle servers and resources in data center wastes huge amount of
energy. Energy also wasted when the server is overloaded.Few techniques
such as load balancing, VM virtualization, VM migration, resource allocation
and job scheduling etc. are used to solve the problem. It is also found that
transporting data between data centers and home computers can consume
even larger amounts of energy than storing it.
Green Computing
Green computing is the Eco-friendly use of computers and their
resources. It is also defined as the study and practice of designing,
engineering, manufacturing and disposing computing resources with
minimal environmental damage.

Green cloud computing is using Internet computing services from a


service provider that has taken measures to reduce their environmental
effect and also green cloud computing is cloud computing with less
environmental impact.
Some measures taken by the Internet service providers to make their
services greener are:

40
Use renewable energy sources.
 Make the data center more energy efficient, for example by maximizing
power usage efficiency (PUE).
 Reuse waste heat from computer servers (e.g. to heat nearby
buildings).
 Make sure that all hardware is properly recycled at the end of its life.
 Use hardware that has a long lifespan and contains little to no toxic
materials.
What is the Shared Responsibility Model?
The Shared Responsibility Model is a security and compliance
framework that outlines the responsibilities of cloud service providers
(CSPs) and customers for securing every aspect of the cloud environment,
including hardware, infrastructure, endpoints, data, configurations,
settings, operating system (OS), network controls and access rights.
In its simplest terms, the Shared Responsibility Model dictates that
the cloud provider—such as Amazon Web Service (AWS), Microsoft Azure,
or Google Cloud Platform (GCP)—must monitor and respond to security
threats related to the cloud itself and its underlying infrastructure.
Meanwhile, end users, including individuals and companies, are
responsible for protecting data and other assets they store in any cloud
environment.
Unfortunately, this notion of shared responsibility can be
misunderstood, leading to the assumption that cloud workloads – as well
as any applications, data or activity associated with them – are fully
protected by the cloud provider. This can result in users unknowingly
running workloads in a public cloud that are not fully protected, making
them vulnerable to attacks that target the operating system, data or
applications. Even securely configured workloads can become a target at
runtime, as they are vulnerable to zero-day exploits.
The Shared Responsibility Model in practice
Direct Control
While the Shared Responsibility Model is based on the idea that two
or more parties play a role in ensuring security of distinct elements within
the public cloud environment, it is important to note that the customer
and CSP do not share responsibility for the same asset.
Rather, the CSP or the customer has full and complete responsibility for
the security of all assets under their direct control, regardless of the service
model type.
For example, the customer will always have responsibility for data
security, compliance and access regardless of whether they are following a
SaaS, PaaS or IaaS model. Practically speaking, this is because CSPs have
no visibility into data that is stored in the public cloud and therefore
cannot effectively manage data security or access.
Customers are typically also responsible for:
 Identity Access and Management (IAM)
 User security and credentials
 Endpoint security
 Network security
 Security of workloads and containers
41
 Configurations
 APIs and middleware
 Code
Meanwhile, the cloud provider—such as Amazon, Microsoft or Google—are
responsible for areas for which they possess direct control.
This typically includes security of:
 The physical layer and all associated hardware and infrastructure
 The virtualization layer
 Network controls and provider services
 Facilities that run cloud resources
Divided responsibilities
In some IaaS and PaaS models, security responsibilities may vary
depending on the cloud provider or the terms outlined in the service level
agreement (SLA).
For example, when it comes to a network control like a firewall, the
cloud service provider may be responsible for providing the firewall service.
However, it is up to the user to manage all other aspects such as
configuration, rules, monitoring and response. While both parties play a
role in the security element, the responsibilities are still clearly defined
and divided.
Likewise, if a customer is using a public cloud data storage service
offered by a CSP, then the cloud provider is responsible for all aspects of
that cloud datacenter, including security, monitoring, maintenance and
updating. However, the customer is still wholly responsible for securing
any data within the cloud environment, as well as ensuring only authorized
users can access it.
Based on the concept of divided responsibility, no party has
authority over another in terms of how they protect their assets. For
example, a customer cannot dictate how or when their CSP performs
monitoring and testing. That said, the service agreement should outline
the steps the provider will take to protect customers, as well as how
documentation for that activity will be shared. Typically, cloud vendors
produce regular audit reports to confirm that they are taking the necessary
and proper steps to protect their customers.
Shared Responsibility Model Advantages
While a shared security model is complex and requires careful
consideration and coordination between the CSP and customer, the
approach offers several important benefits to users. These include:
Efficiency: Though the customer bears significant levels of responsibility
under the Shared Responsibility Model, some key aspects of security – such
as security of hardware, infrastructure and the virtualization layer – are
almost always managed by the CSP. In a traditional on-premises model, these
aspects were managed by the customer. The shift to the cloud frees up IT staff
to refocus efforts on other tasks and needs, as well as dedicate available
resources and investments to those areas for which they bear responsibility.
Enhanced protection: Cloud service providers are hyper focused on the
security of their cloud environment and typically dedicate significant
resources to ensuring their customers are fully protected. As part of the

42
service agreement, CSPs conduct robust monitoring and testing, as well as
timely patching and updating.
Expertise: CSPs often have a higher level of knowledge and expertise when it
comes to the emerging field of cloud security. When customers engage a cloud
vendor, they benefit from the partner organization’s experience, assets and
resources.
User Experience
User experience (UX) is a concept in computing system and application
design that studies and evaluates human feelings and expressions when
using such systems. UX facilitates and enables the development of computing
systems that are centered on ease of use and accessibility for a human user.
Driving a significant impact through UX for Cloud Applications
To implement UX into cloud applications successfully, you need to consider
some important aspects as mentioned below:
Know your user first
Researching your users before designing for them is very important and
should be routinely followed. Perform quality checks on how the user interacts
with the application and regulate its main features as per user preferences.
Trying to understand the expectations of the user will be vital in designing the
user interface of the application.
Smooth User flow
If you want your product to be user-friendly, it would be necessary to create
a user flow that is seamless, logical, and intuitive. You should know what
your customer exactly wants to create and accordingly aim for an easy and
clean user flow to offer a smooth and minimal user journey.
Discuss first
Discuss multiple ideas before implementing the design to bring out the best
possible ideas and outcomes to improve the UX. Ensure to always review your
design, share it with relevant stakeholders to add suggestions, and derive
maximum user inputs and satisfaction.
Adopting the right approach: UX Design principles
 Keep designs simple
Designs and/or systems should be kept simple and wherever possible,
complexities must be avoided—as simplicity guarantees the greatest levels of
user acceptance and interaction. Remove unwanted elements and focus on
what’s important.
 Make designs easy to use and consistent
Consistency means designs and functionalities are uniform across all your
pages and products. Creating an interface that feels familiar can comfort the
users and navigation through pages get easy.
 Create designs that are more transparent, effective, and efficient
for users
It simply means that your design should be able to reduce the user’s work
steps and minimize the time taken to reach their goals when they interact
with your product. This will help build the user’s interest as well.
 The design and structure should look good, feel great and work in
auto-mode
Design is an act of communication. A good interface is an entry point to any
user, where trust is established between the customer and the product.

43
 Assist your customer with video tutorials, instructions, pop-up,
documentation
Video tutorials, instructions, pop-ups, and documentation will help the user
to get a clearer perspective and understand the functionalities. It is the easiest
way to engage and help your customer.
 Optimize loading time
Web page loading speed is a crucial part of a site’s usability. You must ensure
the product page loads speedily to keep the customer engaged and interested
as well as boost the site’s crawl rate.
 Construct a strong and well-organized information architecture
With a strong IA, it becomes significantly easier to make key decisions
regarding new features and implementations, to understand timelines for
product changes, and to follow user behavior through multiple processes.
Implementing User Experience best practices to build CloudEnsure
We are constantly striving to deliver the best user experiences for our
customers. To fulfill client requirements our technical team has been
dedicatedly working on introducing new functionalities backed by in-depth
research, requirement gathering, and crafting the best design solutions.
Different Software License models in Cloud Computing
On-demand, pay-as-per-use, and short-range licensing models are termed
as cloud computing licensing models. It is suggested that clients should not
agree long term agreements or contracts with respect to volumes and time
period. There are various licensing models which are described below with
the issues occurs in respective licensing models.
1. Enterprise wide Model –
In this model, an independent software vendor (ISV) licenses software to
a complete enterprise. This category of license involves installation and
utilization of software by personnel of the enterprise. Enterprises licenses
are not proposed to be consumed by service providers for reselling the
software to the client. Many providers have comprehended this requirement
and have delivered licenses especially for service providers.
2. Concurrent Users Model –
In this model, the client purchases a pool of licenses. Licenses can be
checkout concurrently which is depend on category of software involved.
Check-in and Check-out of license is instinctive, bound to a web session or
the working of an application. Generally, it is applied via application hooks
which summons a license manager service.
This kind of licensing works same in cloud like it works in private
network, on condition that the license terms allows. In multi-client cloud
service, it can be expected to sub-categorized the license pool to impose a
allocations to the clients which may not be reinforced by license manager
software.
3. Ownership – Copyright Holder Model –
In this model, top Cloud service vendors usually prefer a combination of
free or open-source software and software from domestic development
projects. Corporates that owns the copyrights either over construction or
purchase can use the software for any purpose they want.

44
4. Named User Model –
In this model, a license is bounded to a particular client. That client is
licensed to deploy the software on any kind of device and on number of
devices simultaneously. Generally, licenses are knotted to the internal
service directory of the organization, but cloud-based services use internet
identity vendors. Named user licenses are traded to an organization as a pool
many times which can then allocated and migrated as per need.
5. Site-Wide Model –
In this model, Site licenses are exchanged rely on the general size of the
site of the client. The premise is that the site is responsive to a local area
network (LAN) or network within organization and the software can be
installed from an open source and can be used on this network. Cloud
computing openly abstracts physical site and network restrictions make this
kind of license a inconvenient fit for cloud computing.
6. Token Based Model –
In this model, a physical license key, like only readable drive or dongle,
must be connected to the host executing the software. It is beneficial for the
clients as the key can migrate from one machine to another easily. As clouds
and most virtualized infrastructure do not permits access to servers to the
clients So, this type of service is not very convenient.
7. Host ID-Based Model –
In this model, more than one server hardware components are queried
which generates a unique host ID. Client gets a license key from the provider
that is knotted to the host ID. It doesn’t function well in cloud-based systems
due to various reasons. Mostly in cloud, hardware is abstracted from the
clients. Even if a host ID can be created for the physical device, the virtual
server may be transferred to another hardware as a function to manage the
cloud. Clouds have schema to compatible with automated provisioning and
flexible scaling, which cannot be achieve if the software provider is contacted
every time a host is transferred or supplied.
8. Free Open Source Model –
In this model, Corporates can get the software from open-source project
sites rather than from a paid commercial provider. This kind of licensing is
good for cloud service vendors as well as consumers. There is no
manipulation of the customers as it open-source, any client or consumer
can use it as well as no boundaries of number of users, location or size of
the hardware.
Cloud Computing Applications
Cloud service providers provide various applications in the field of art,
business, data storage and backup services, education, entertainment,
management, social networking, etc.

45
The most widely used cloud computing applications are given below –

1. Art Applications
Cloud computing offers various art applications for quickly and easily
design attractive cards, booklets, and images. Some most commonly used
cloud art applications are given below:
i Moo
Moo is one of the best cloud art applications. It is used for designing and
printing business cards, postcards, and mini cards.
ii. Vistaprint
Vistaprint allows us to easily design various printed marketing products such
as business cards, Postcards, Booklets, and wedding invitations cards.
iii. Adobe Creative Cloud
Adobe creative cloud is made for designers, artists, filmmakers, and other
creative professionals. It is a suite of apps which includes PhotoShop image
editing programming, Illustrator, InDesign, TypeKit, Dreamweaver, XD, and
Audition.
2. Business Applications
Business applications are based on cloud service providers. Today, every
organization requires the cloud business application to grow their business.
It also ensures that business applications are 24*7 available to users.
There are the following business applications of cloud computing -
i. MailChimp
MailChimp is an email publishing platform which provides various options
to design, send, and save templates for emails.
iii. Salesforce
Salesforce platform provides tools for sales, service, marketing, e-commerce,
and more. It also provides a cloud development platform.
iv. Chatter
Chatter helps us to share important information about the organization in
real time.
v. Bitrix24
Bitrix24 is a collaboration platform which provides communication,
management, and social collaboration tools.
vi. Paypal
Paypal offers the simplest and easiest online payment mode using a secure
internet account. Paypal accepts the payment through debit cards, credit
cards, and also from Paypal account holders.

46
vii. Slack
Slack stands for Searchable Log of all Conversation and Knowledge. It
provides a user-friendly interface that helps us to create public and private
channels for communication.
viii. Quickbooks
Quickbooks works on the terminology "Run Enterprise anytime, anywhere,
on any device." It provides online accounting solutions for the business. It
allows more than 20 users to work simultaneously on the same system.
3. Data Storage and Backup Applications
Cloud computing allows us to store information (data, files, images, audios,
and videos) on the cloud and access this information using an internet
connection. As the cloud provider is responsible for providing security, so they
offer various backup recovery application for retrieving the lost data.
A list of data storage and backup applications in the cloud are given below -
i. Box.com
Box provides an online environment for secure content management,
workflow, and collaboration. It allows us to store different files such as
Excel, Word, PDF, and images on the cloud. The main advantage of using box
is that it provides drag & drop service for files and easily integrates with Office
365, G Suite, Salesforce, and more than 1400 tools.
ii. Mozy
Mozy provides powerful online backup solutions for our personal and
business data. It schedules automatically back up for each day at a specific
time.
iii. Joukuu
Joukuu provides the simplest way to share and track cloud-based backup
files. Many users use joukuu to search files, folders, and collaborate on
documents.
iv. Google G Suite
Google G Suite is one of the best cloud storage and backup application. It
includes Google Calendar, Docs, Forms, Google+, Hangouts, as well as cloud
storage and tools for managing cloud apps. The most popular app in the
Google G Suite is Gmail. Gmail offers free email services to users.
4. Education Applications
Cloud computing in the education sector becomes very popular. It offers
various online distance learning platforms and student information portals to
the students. The advantage of using cloud in the field of education is that it
offers strong virtual classroom environments, Ease of accessibility, secure
data storage, scalability, greater reach for the students, and minimal
hardware requirements for the applications.
There are the following education applications offered by the cloud -
i. Google Apps for Education
Google Apps for Education is the most widely used platform for free web-based
email, calendar, documents, and collaborative study.
ii. Chromebooks for Education
Chromebook for Education is one of the most important Google's projects. It
is designed for the purpose that it enhances education innovation.
iii. Tablets with Google Play for Education

47
It allows educators to quickly implement the latest technology solutions into
the classroom and make it available to their students.
iv. AWS in Education
AWS cloud provides an education-friendly environment to universities,
community colleges, and schools.
5. Entertainment Applications
Entertainment industries use a multi-cloud strategy to interact with the target
audience. Cloud computing offers various entertainment applications such as
online games and video conferencing.
i. Online games
Today, cloud gaming becomes one of the most important entertainment
media. It offers various online games that run remotely from the cloud. The
best cloud gaming services are Shaow, GeForce Now, Vortex, Project xCloud,
and PlayStation Now.
ii. Video Conferencing Apps
Video conferencing apps provides a simple and instant connected experience.
It allows us to communicate with our business partners, friends, and relatives
using a cloud-based video conferencing. The benefits of using video
conferencing are that it reduces cost, increases efficiency, and removes
interoperability.
6. Management Applications
Cloud computing offers various cloud management tools which help admins
to manage all types of cloud activities, such as resource deployment, data
integration, and disaster recovery. These management tools also provide
administrative control over the platforms, applications, and infrastructure.
Some important management applications are -
i. Toggl
Toggl helps users to track allocated time period for a particular project.
ii. Evernote
Evernote allows you to sync and save your recorded notes, typed notes, and
other notes in one convenient place. It is available for both free as well as a
paid version.
It uses platforms like Windows, macOS, Android, iOS, Browser, and Unix.
iii. Outright
Outright is used by management users for the purpose of accounts. It helps
to track income, expenses, profits, and losses in real-time environment.
iv. GoToMeeting
GoToMeeting provides Video Conferencing and online meeting apps, which
allows you to start a meeting with your business partners from anytime,
anywhere using mobile phones or tablets. Using GoToMeeting app, you can
perform the tasks related to the management such as join meetings in
seconds, view presentations on the shared screen, get alerts for upcoming
meetings, etc.
7. Social Applications
Social cloud applications allow a large number of users to connect with each
other using social networking applications such as Facebook, Twitter,
Linkedln, etc.
There are the following cloud based social applications -
i. Facebook

48
Facebook is a social networking website which allows active users to share
files, photos, videos, status, more to their friends, relatives, and business
partners using the cloud storage system. On Facebook, we will always get
notifications when our friends like and comment on the posts.
ii. Twitter
Twitter is a social networking site. It is a microblogging system. It allows
users to follow high profile celebrities, friends, relatives, and receive news. It
sends and receives short posts called tweets.
iii. Yammer
Yammer is the best team collaboration tool that allows a team of employees
to chat, share images, documents, and videos.
iv. LinkedIn
LinkedIn is a social network for students, freshers, and professionals.
Architecture of Cloud Computing
Cloud Computing Architecture :
The cloud architecture is divided into 2 parts i.e.
Frontend
Backend
The below figure represents an internal architectural view of cloud
computing.

Architecture of cloud computing is the combination of both SOA (Service


Oriented Architecture) and EDA (Event Driven Architecture). Client
infrastructure, application, service, runtime cloud, storage, infrastructure,
management and security all these are the components of cloud computing
architecture.
Frontend :
Frontend of the cloud architecture refers to the client side of cloud
computing system. Means it contains all the user interfaces and applications
which are used by the client to access the cloud computing
services/resources. For example, use of a web browser to access the cloud
platform.
Client Infrastructure – Client Infrastructure is a part of the frontend
component. It contains the applications and user interfaces which are
required to access the cloud platform.

49
In other words, it provides a GUI( Graphical User Interface ) to interact with
the cloud.
Backend :
Backend refers to the cloud itself which is used by the service provider. It
contains the resources as well as manages the resources and provides
security mechanisms. Along with this, it includes huge storage, virtual
applications, virtual machines, traffic control mechanisms, deployment
models, etc.
Application –
Application in backend refers to a software or platform to which client
accesses. Means it provides the service in backend as per the client
requirement.
Service –
Service in backend refers to the major three types of cloud based services
like SaaS, PaaS and IaaS. Also manages which type of service the user
accesses.
Runtime Cloud-
Runtime cloud in backend provides the execution and Runtime
platform/environment to the Virtual machine.
Storage –
Storage in backend provides flexible and scalable storage service and
management of stored data.
Infrastructure –
Cloud Infrastructure in backend refers to the hardware and software
components of cloud like it includes servers, storage, network devices,
virtualization software etc.
Management –
Management in backend refers to management of backend components like
application, service, runtime cloud, storage, infrastructure, and other
security mechanisms etc.
Security –
Security in backend refers to implementation of different security
mechanisms in the backend for secure cloud resources, systems, files, and
infrastructure to end-users.
Internet –
Internet connection acts as the medium or a bridge between frontend and
backend and establishes the interaction and communication between
frontend and backend.
Database– Database in backend refers to provide database for storing
structured data, such as SQL and NOSQL databases. Example of Databases
services include Amazon RDS, Microsoft Azure SQL database and Google
CLoud SQL.
Networking– Networking in backend services that provide networking
infrastructure for application in the cloud, such as load balancing, DNS and
virtual private networks.
Analytics– Analytics in backend service that provides analytics capabillities
for data in the cloud, such as warehousing, bussness intellegence and
machine learning.

50
Benefits of Cloud Computing Architecture :
 Makes overall cloud computing system simpler.
 Improves data processing requirements.
 Helps in providing high security.
 Makes it more modularized.
 Results in better disaster recovery.
 Gives good user accessibility.
 Reduces IT operating costs.
 Provides high level reliability.
 Scalability.

What is Apache ZooKeeper?


Zookeeper is a distributed, open-source coordination service for
distributed applications. It exposes a simple set of primitives to implement
higher-level services for synchronization, configuration maintenance, and
group and naming.
In a distributed system, there are multiple nodes or machines that need
to communicate with each other and coordinate their actions. ZooKeeper
provides a way to ensure that these nodes are aware of each other and can
coordinate their actions. It does this by maintaining a hierarchical tree of data
nodes called “Znodes“, which can be used to store and retrieve data and
maintain state information. ZooKeeper provides a set of primitives, such as
locks, barriers, and queues that can be used to coordinate the actions of
nodes in a distributed system. It also provides features such as leader
election, failover, and recovery, which can help ensure that the system is
resilient to failures. ZooKeeper is widely used in distributed systems such as
Hadoop, Kafka, and HBase, and it has become an essential component of
many distributed applications.
Why do we need it?
Coordination services: The integration/communication of services in a
distributed environment.
Coordination services are complex to get right. They are especially prone to
errors such as race conditions and deadlock.
Race condition-Two or more systems trying to perform some task.
Deadlocks– Two or more operations are waiting for each other.
To make the coordination between distributed environments easy, developers
came up with an idea called zookeeper so that they don’t have to relieve
distributed applications of the responsibility of implementing coordination
services from scratch.
What is distributed system?
Multiple computer systems working on a single problem.
It is a network that consists of autonomous computers that are connected
using distributed middleware.
Key Features: Concurrent, resource sharing, independent, global, greater
fault tolerance, and price/performance ratio is much better.
Key Goals: Transparency, Reliability, Performance, Scalability.
Challenges: Security, Fault, Coordination, and resource sharing.

51
Coordination Challenge
Coordination or configuration management for a distributed application that
has many systems. Master Node where the cluster data is stored. Worker
nodes or slave nodes get the data from this master node.
 Single point of failure.
 Synchronization is not easy.
 Careful design and implementation are needed.
Apache Zookeeper
Apache Zookeeper is a distributed, open-source coordination service for
distributed systems. It provides a central place for distributed applications to
store data, communicate with one another, and coordinate activities.
Zookeeper is used in distributed systems to coordinate distributed processes
and services. It provides a simple, tree-structured data model, a simple API,
and a distributed protocol to ensure data consistency and availability.
Zookeeper is designed to be highly reliable and fault-tolerant, and it can
handle high levels of read and write throughput.
Zookeeper is implemented in Java and is widely used in distributed
systems, particularly in the Hadoop ecosystem. It is an Apache Software
Foundation project and is released under the Apache License 2.0
Architecture of Zookeeper
The ZooKeeper architecture consists of a hierarchy of nodes called
znodes, organized in a tree-like structure. Each znode can store data and has
a set of permissions that control access to the znode. The znodes are organized
in a hierarchical namespace, similar to a file system. At the root of the
hierarchy is the root znode, and all other znodes are children of the root znode.
The hierarchy is similar to a file system hierarchy, where each znode can have
children and grandchildren, and so on.
Important Components in Zookeeper

ZooKeeper Services
Leader & Follower
Request Processor – Active in Leader Node and is responsible for processing
write requests. After processing, it sends changes to the follower nodes
Atomic Broadcast – Present in both Leader Node and Follower Nodes. It is
responsible for sending the changes to other Nodes.
In-memory Databases (Replicated Databases)-It is responsible for storing the
data in the zookeeper. Every node contains its own databases. Data is also
written to the file system providing recoverability in case of any problems with
the cluster.

52
Other Components
Client – One of the nodes in our distributed application cluster. Access
information from the server. Every client sends a message to the server to let
the server know that client is alive.
Server– Provides all the services to the client. Gives acknowledgment to the
client.
Ensemble– Group of Zookeeper servers. The minimum number of nodes that
are required to form an ensemble is 3.
Zookeeper Data Model

ZooKeeper data model


In Zookeeper, data is stored in a hierarchical namespace, similar to a
file system. Each node in the namespace is called a Znode, and it can store
data and have children. Znodes are similar to files and directories in a file
system. Zookeeper provides a simple API for creating, reading, writing, and
deleting Znodes. It also provides mechanisms for detecting changes to the
data stored in Znodes, such as watches and triggers. Znodes maintain a stat
structure that includes: Version number, ACL, Timestamp, Data Length
Types of Znodes:
Persistence: Alive until they’re explicitly deleted.
Ephemeral: Active until the client connection is alive.
Sequential: Either persistent or ephemeral.
Why do we need ZooKeeper in the Hadoop?
Zookeeper is used to manage and coordinate the nodes in a Hadoop cluster,
including the NameNode, DataNode, and ResourceManager. In a Hadoop
cluster, Zookeeper helps to:
Maintain configuration information: Zookeeper stores the configuration
information for the Hadoop cluster, including the location of the NameNode,
DataNode, and ResourceManager.
Manage the state of the cluster: Zookeeper tracks the state of the nodes in the
Hadoop cluster and can be used to detect when a node has failed or become
unavailable.
Coordinate distributed processes: Zookeeper can be used to coordinate
distributed processes, such as job scheduling and resource allocation, across
the nodes in a Hadoop cluster.
Zookeeper helps to ensure the availability and reliability of a Hadoop cluster
by providing a central coordination service for the nodes in the cluster.

53
How ZooKeeper in Hadoop Works?
ZooKeeper operates as a distributed file system and exposes a simple
set of APIs that enable clients to read and write data to the file system. It
stores its data in a tree-like structure called a znode, which can be thought of
as a file or a directory in a traditional file system. ZooKeeper uses a consensus
algorithm to ensure that all of its servers have a consistent view of the data
stored in the Znodes. This means that if a client writes data to a znode, that
data will be replicated to all of the other servers in the ZooKeeper ensemble.
One important feature of ZooKeeper is its ability to support the notion of a
“watch.” A watch allows a client to register for notifications when the data
stored in a znode changes. This can be useful for monitoring changes to the
data stored in ZooKeeper and reacting to those changes in a distributed
system.
In Hadoop, ZooKeeper is used for a variety of purposes, including:
Storing configuration information: ZooKeeper is used to store configuration
information that is shared by multiple Hadoop components. For example, it
might be used to store the locations of NameNodes in a Hadoop cluster or the
addresses of JobTracker nodes.
Providing distributed synchronization: ZooKeeper is used to coordinate
the activities of various Hadoop components and ensure that they are working
together in a consistent manner. For example, it might be used to ensure that
only one NameNode is active at a time in a Hadoop cluster.
Maintaining naming: ZooKeeper is used to maintain a centralized naming
service for Hadoop components. This can be useful for identifying and locating
resources in a distributed system.
ZooKeeper is an essential component of Hadoop and plays a crucial role
in coordinating the activity of its various subcomponents.
Reading and Writing in Apache Zookeeper
ZooKeeper provides a simple and reliable interface for reading and writing
data. The data is stored in a hierarchical namespace, similar to a file system,
with nodes called znodes. Each znode can store data and have children
znodes. ZooKeeper clients can read and write data to these znodes by using
the getData() and setData() methods, respectively.
.

54
MapReduce Architecture
MapReduce and HDFS are the two major components of Hadoop which
makes it so powerful and efficient to use. MapReduce is a programming
model used for efficient processing in parallel over large data-sets in a
distributed manner. The data is first split and then combined to produce the
final result. The libraries for MapReduce is written in so many programming
languages with various different-different optimizations. The purpose of
MapReduce in Hadoop is to Map each of the jobs and then it will reduce it
to equivalent tasks for providing less overhead over the cluster network and
to reduce the processing power. The MapReduce task is mainly divided into
two phases Map Phase and Reduce Phase.

MapReduce Architecture:

Components of MapReduce Architecture:

Client: The MapReduce client is the one who brings the Job to the
MapReduce for processing. There can be multiple clients available that
continuously send jobs for processing to the Hadoop MapReduce Manager.
Job: The MapReduce Job is the actual work that the client wanted to do
which is comprised of so many smaller tasks that the client wants to process
or execute.
Hadoop MapReduce Master: It divides the particular job into subsequent
job-parts.
Job-Parts: The task or sub-jobs that are obtained after dividing the main
job. The result of all the job-parts combined to produce the final output.
Input Data: The data set that is fed to the MapReduce for processing.
Output Data: The final result is obtained after the processing.
In MapReduce, we have a client. The client will submit the job of a
particular size to the Hadoop MapReduce Master. Now, the MapReduce
master will divide this job into further equivalent job-parts. These job-parts
are then made available for the Map and Reduce Task. This Map and Reduce
task will contain the program as per the requirement of the use-case that
the particular company is solving. The developer writes their logic to fulfill
the requirement that the industry requires. The input data which we are
using is then fed to the Map Task and the Map will generate intermediate

55
key-value pair as its output. The output of Map i.e. these key-value pairs are
then fed to the Reducer and the final output is stored on the HDFS. There
can be n number of Map and Reduce tasks made available for processing
the data as per the requirement. The algorithm for Map and Reduce is made
with a much optimized way such that the time complexity or space
complexity is minimum.
The MapReduce task is mainly divided into 2 phases i.e. Map phase and
Reduce phase.
Map: As the name suggests its main use is to map the input data in key-
value pairs. The input to the map may be a key-value pair where the key can
be the id of some kind of address and value is the actual value that it keeps.
The Map() function will be executed in its memory repository on each of these
input key-value pairs and generates the intermediate key-value pair which
works as input for the Reducer or Reduce() function.
Reduce: The intermediate key-value pairs that work as input for Reducer
are shuffled and sort and send to the Reduce() function. Reducer aggregate
or group the data based on its key-value pair as per the reducer algorithm
written by the developer.
MapReduce programming offers several benefits to help you gain
valuable insights from your big data:
Scalability. Businesses can process petabytes of data stored in the Hadoop
Distributed File System (HDFS).
Flexibility. Hadoop enables easier access to multiple sources of data and
multiple types of data.
Speed. With parallel processing and minimal data movement, Hadoop offers
fast processing of massive amounts of data.
Simple. Developers can write code in a choice of languages, including Java,
C++ and Python.

High performance Computing


It is the use of parallel processing for running advanced application
programs efficiently, relatives, and quickly. The term applies especially is a
system that function above a teraflop (10 12) (floating opm per second). The
term High-performance computing is occasionally used as a synonym for
supercomputing. Although technically a supercomputer is a system that
performs at or near currently highest operational rate for computers. Some
supercomputers work at more than a petaflop (1012) floating points opm per
second. The most common HPC system all scientific engineers & academic
institutions. Some Government agencies particularly military are also relying
on APC for complex applications.
High-performance Computers:
High Performance Computing (HPC) generally refers to the practice of
combining computing power to deliver far greater performance than a typical
desktop or workstation, in order to solve complex problems in science,
engineering, and business. Processors, memory, disks, and OS are elements
of high-performance computers of interest to small & medium size
businesses today are really clusters of computers. Each individual computer
in a commonly configured small cluster has between one and four processors
and today‘s processors typically are from 2 to 4 crores, HPC people often

56
referred to individual computers in a cluster as nodes. A cluster of interest
to a small business could have as few as 4 nodes on 16 crores. Common
cluster size in many businesses is between 16 & 64 crores or from 64 to 256
crores. The main reason to use this is that in its individual node can work
together to solve a problem larger than any one computer can easily solve.
These nodes are so connected that they can communicate with each other
in order to produce some meaningful work. There are two popular HPC’s
software i. e, Linux, and windows. Most of installations are in Linux because
of Linux legacy in supercomputer and large scale machines. But one can use
it with his / her requirements.
Importance of High performance Computing:
 It is used for scientific discoveries, game-changing innovations, and to
improve quality of life.
 It is a foundation for scientific & industrial advancements.
 It is used in technologies like IoT, AI, 3D imaging evolves & amount of
data that is used by organization is increasing exponentially to
increase ability of a computer, we use High-performance computer.
 HPC is used to solve complex modeling problems in a spectrum of
disciplines. It includes AI, Nuclear Physics, Climate Modelling, etc.
 HPC is applied to business uses, data warehouses & transaction
processing.
Need of High performance Computing:
 It will complete a time-consuming operation in less time.
 It will complete an operation under a light deadline and perform a high
numbers of operations per second.
 It is fast computing, we can compute in parallel over lot of computation
elements CPU, GPU, etc. It set up very fast network to connect between
elements.
Need of ever increasing Performance :
 Climate modeling
 Drug discovery
 Data Analysis
 Protein folding
 Energy research
How Does HPC Work?
User/Scheduler → Compute cluster → Data storage
To create a high-performance computing architecture, multiple
computer servers are networked together to form a compute cluster.
Algorithms and software programs are executed simultaneously on the
servers, and the cluster is networked to data storage to retrieve the results.
All of these components work together to complete a diverse set of tasks.
To achieve maximum efficiency, each module must keep pace with
others, otherwise, the performance of the entire HPC infrastructure would
suffer.
Challenges with HPC
Cost: The cost of the hardware, software, and energy consumption is
enormous, making HPC systems exceedingly expensive to create and

57
operate. Additionally, the setup and management of HPC systems require
qualified workers, which raises the overall cost.
Scalability: HPC systems must be made scalable so they may be modified
or expanded as necessary to meet shifting demands. But creating a scalable
system is a difficult endeavour that necessitates thorough planning and
optimization.
Data Management: Data management can be difficult when using HPC
systems since they produce and process enormous volumes of data. These
data must be stored and accessed using sophisticated networking and
storage infrastructure, as well as tools for data analysis and visualization.
Programming: Parallel programming techniques, which can be more
difficult than conventional programming approaches, are frequently used in
HPC systems. It might be challenging for developers to learn how to create
and optimise algorithms for parallel processing.
Support for software and tools: To function effectively, HPC systems need
specific software and tools. The options available to users may be
constrained by the fact that not all software and tools are created to function
with HPC equipment.
Power consumption and cooling: To maintain the hardware functioning at
its best, specialised cooling technologies are needed for HPC systems’ high
heat production. Furthermore, HPC systems consume a lot of electricity,
which can be expensive and difficult to maintain.
Applications of HPC
High Performance Computing (HPC) is a term used to describe the use of
supercomputers and parallel processing strategies to carry out difficult
calculations and data analysis activities. From scientific research to
engineering and industrial design, HPC is employed in a wide range of
disciplines and applications. Here are a few of the most significant HPC use
cases and applications:
Scientific research: HPC is widely utilized in this sector, especially in areas
like physics, chemistry, and astronomy. With standard computer
techniques, it would be hard to model complex physical events, examine
massive data sets, or carry out sophisticated calculations.
Weather forecasting: The task of forecasting the weather is difficult and
data-intensive, requiring sophisticated algorithms and a lot of computational
power. Simulated weather models are executed on HPC computers to predict
weather patterns.
Healthcare: HPC is being used more and more in the medical field for
activities like medication discovery, genome sequencing, and image analysis.
Large volumes of medical data can be processed by HPC systems rapidly and
accurately, improving patient diagnosis and care.
Energy and environmental studies: HPC is employed to simulate and
model complex systems, such as climate change and renewable energy
sources, in the energy and environmental sciences. Researchers can use
HPC systems to streamline energy systems, cut carbon emissions, and
increase the resilience of our energy infrastructure.
Engineering and Design: HPC is used in engineering and design to model
and evaluate complex systems, like those found in vehicles, buildings, and
aeroplanes. Virtual simulations performed by HPC systems can assist

58
engineers in identifying potential problems and improving designs before
they are built.
Role of Cloud Computing in the Life Science Industry
The life science industry focuses on the fields of pharmaceuticals,
biotechnology, biomedical diagnostics and generally improving the lives of
organisms. In the case of pharmaceutical companies, they are essential for
developing and distributing medicines to treat or prevent disease and
infections. Biotechnology firms create and manufacture commercial products
relating to medical applications. They function slightly differently from
pharmaceutical companies because they research and exploit living
organisms to produce or develop a product. Biomedical diagnostic companies
develop instrumentation to detect and diagnose various diseases and
infections, which is critical for early intervention. Life science is one of two
branches of natural science and is concerned with living organisms. The other
branch, physical science, is concerned with non-living matter.
The life science industry functions on large amounts of data. Hence, it
requires an efficient and secure means of storing the data. The current
method of doing this is in separate systems or silos. An effective way of making
the data more accessible is by employing cloud-based platforms for storing
data and executing software. This allows processes to be accelerated because
of the global knowledge-sharing generated, which results in the faster rollout
of vaccines, medicines, and an understanding of chemical and biological
systems.
The benefits of cloud computing in the life science industry
Vaccines can take upwards of 10 years to develop. However, with the
advent of global knowledge sharing with cloud computing, that process can
now be completed in one year by removing the barriers around the physical
location of data. In addition, better care can be provided to patients using
cloud computing by allowing for a more efficient response to their needs. This
is achieved by efficiently knowledge sharing by having a cloud-based platform
to access data globally, which results in fewer rare cases being observed.
Hence, patient recovery increases because of earlier intervention and
customized treatment.
One of the many benefits of cloud computing in the life science industry
is the ability to reduce IT operational costs by outsourcing IT maintenance
and support to the cloud provider. This enables the life science industry to
reallocate resources towards key objectives such as developing new vaccines
or medicines.
In addition, cloud applications in the industry allow for upgrades
without the need for on-site servicing. Hence, there is a growing number of
life science and pharmaceutical companies that are recognizing the value of
cloud computing. However, a major reservation to the technology is data
security.
Virtualization in Cloud Computing and Types
Virtualization is a technique how to separate a service from the underlying
physical delivery of that service. It is the process of creating a virtual version
of something like computer hardware. It was initially developed during the
mainframe era. It involves using specialized software to create a virtual or
software-created version of a computing resource rather than the actual

59
version of the same resource. With the help of Virtualization, multiple
operating systems and applications can run on the same machine and its
same hardware at the same time, increasing the utilization and flexibility of
hardware.
In other words, one of the main cost-effective, hardware-reducing, and
energy-saving techniques used by cloud providers is Virtualization.
Virtualization allows sharing of a single physical instance of a resource or
an application among multiple customers and organizations at one time. It
does this by assigning a logical name to physical storage and providing a
pointer to that physical resource on demand. The term virtualization is often
synonymous with hardware virtualization, which plays a fundamental role
in efficiently delivering Infrastructure-as-a-Service (IaaS) solutions for cloud
computing. Moreover, virtualization technologies provide a virtual
environment for not only executing applications but also for storage,
memory, and networking.

Host Machine: The machine on which the virtual machine is going to be built
is known as Host Machine.
Guest Machine: The virtual machine is referred to as a Guest Machine.
Work of Virtualization in Cloud Computing
Virtualization has a prominent impact on Cloud Computing. In the
case of cloud computing, users store data in the cloud, but with the help of
Virtualization, users have the extra benefit of sharing the infrastructure.
Cloud Vendors take care of the required physical resources, but these cloud
providers charge a huge amount for these services which impacts every user
or organization. Virtualization helps Users or Organisations in maintaining
those services which are required by a company through external (third-
party) people, which helps in reducing costs to the company. This is the way
through which Virtualization works in Cloud Computing.
Benefits of Virtualization
 More flexible and efficient allocation of resources.
 Enhance development productivity.
 It lowers the cost of IT infrastructure.
 Remote access and rapid scalability.
 High availability and disaster recovery.
 Pay peruse of the IT infrastructure on demand.
 Enables running multiple operating systems.

60
Drawback of Virtualization
 High Initial Investment: Clouds have a very high initial investment,
but it is also true that it will help in reducing the cost of companies.
 Learning New Infrastructure: As the companies shifted from Servers
to Cloud, it requires highly skilled staff who have skills to work with
the cloud easily, and for this, you have to hire new staff or provide
training to current staff.
 Risk of Data: Hosting data on third-party resources can lead to
putting the data at risk, it has the chance of getting attacked by any
hacker or cracker very easily.
 Characteristics of Virtualization
 Increased Security: The ability to control the execution of a guest
program in a completely transparent manner opens new possibilities
for delivering a secure, controlled execution environment. All the
operations of the guest programs are generally performed against the
virtual machine, which then translates and applies them to the host
programs.
 Managed Execution: In particular, sharing, aggregation, emulation,
and isolation are the most relevant features.
 Sharing: Virtualization allows the creation of a separate computing
environment within the same host.
 Aggregation: It is possible to share physical resources among several
guests, but virtualization also allows aggregation, which is the
opposite process.

1. Application Virtualization: Application virtualization helps a user to


have remote access to an application from a server. The server stores all
personal information and other characteristics of the application but can
still run on a local workstation through the internet. An example of this
would be a user who needs to run two different versions of the same
software. Technologies that use application virtualization are hosted
applications and packaged applications.
2. Network Virtualization: The ability to run multiple virtual networks with
each having a separate control and data plan. It co-exists together on top of
one physical network. It can be managed by individual parties that are
potentially confidential to each other. Network virtualization provides a
facility to create and provision virtual networks, logical switches,
routers, firewalls, load balancers, Virtual Private Networks (VPN), and
workload security within days or even weeks.

61
3. Desktop Virtualization: Desktop virtualization allows the users’ OS to
be remotely stored on a server in the data center. It allows the user to access
their desktop virtually, from any location by a different machine. Users who
want specific operating systems other than Windows Server will need to have
a virtual desktop. The main benefits of desktop virtualization are user
mobility, portability, and easy management of software installation, updates,
and patches.
4. Storage Virtualization: Storage virtualization is an array of servers that
are managed by a virtual storage system. The servers aren’t aware of exactly
where their data is stored and instead function more like worker bees in a
hive. It makes managing storage from multiple sources be managed and
utilized as a single repository. Storage virtualization software maintains
smooth operations, consistent performance, and a continuous suite of
advanced functions despite changes, breaks down, and differences in the
underlying equipment.
5. Server Virtualization: This is a kind of virtualization in which the
masking of server resources takes place. Here, the central server (physical
server) is divided into multiple different virtual servers by changing the
identity number, and processors. So, each system can operate its operating
systems in an isolated manner. Where each sub-server knows the identity of
the central server. It causes an increase in performance and reduces the
operating cost by the deployment of main server resources into a sub-server
resource. It’s beneficial in virtual migration, reducing energy consumption,
reducing infrastructural costs, etc.
6. Data Virtualization: This is the kind of virtualization in which the data
is collected from various sources and managed at a single place without
knowing more about the technical information like how data is collected,
stored & formatted then arranged that data logically so that its virtual view
can be accessed by its interested people and stakeholders, and users
through the various cloud services remotely. Many big giant companies are
providing their services like Oracle, IBM, At scale, Cdata, etc.
Uses of Virtualization
 Data-integration
 Business-integration
 Service-oriented architecture data-services
 Searching organizational data
Types of Virtual Machines
We will study about virtual machines, types of virtual machines, and
virtual machine languages. Virtual Machine is like fake computer system
operating on your hardware. It partially uses the hardware of your system

62
(like CPU, RAM, disk space, etc.) but its space is completely separated from
your main system. Two virtual machines don’t interrupt in each other’s
working and functioning nor can they access each other’s space which gives
an illusion that we are using totally different hardware system. More detail
at Virtual Machine.

Types of Virtual Machines: You can classify virtual machines into two
types:
1. System Virtual Machine: These types of virtual machines gives us
complete system platform and gives the execution of the complete virtual
operating system. Just like virtual box, system virtual machine is providing
an environment for an OS to be installed completely. We can see in below
image that our hardware of Real Machine is being distributed between two
simulated operating systems by Virtual machine monitor. And then some
programs, processes are going on in that distributed hardware of simulated
machines separately.

2. Process Virtual Machine: While process virtual machines, unlike system


virtual machine, does not provide us with the facility to install the virtual
operating system completely. Rather it creates virtual environment of that
OS while using some app or program and this environment will be destroyed
as soon as we exit from that app. Like in below image, there are some apps
running on main OS as well some virtual machines are created to run other
apps. This shows that as those programs required different OS, process
virtual machine provided them with that for the time being those programs
are running. Example – Wine software in Linux helps to run Windows
applications.

Virtual Machine Language: It’s type of language which can be understood


by different operating systems. It is platform-independent. Just like to run
any programming language (C, python, or java) we need specific compiler
that actually converts that code into system understandable code (also

63
known as byte code). The same virtual machine language works. If we want
to use code that can be executed on different types of operating systems like
(Windows, Linux, etc) then virtual machine language will be helpful.
Difference between Full Virtualization and Paravirtualization
Full Virtualization: Full Virtualization was introduced by IBM in the year
1966. It is the first software solution for server virtualization and uses binary
translation and direct approach techniques. In full virtualization, guest OS
is completely isolated by the virtual machine from the virtualization layer
and hardware. Microsoft and Parallels systems are examples of full
virtualization.

2. Paravirtualization: Paravirtualization is the category of CPU


virtualization which uses hypercalls for operations to handle instructions at
compile time. In paravirtualization, guest OS is not completely isolated but
it is partially isolated by the virtual machine from the virtualization layer
and hardware. VMware and Xen are some examples of paravirtualization.

The difference between Full Virtualization and Paravirtualization are as


follows:
S.No. Full Virtualization Paravirtualization

In Full virtualization, virtual In paravirtualization, a virtual


machines permit the execution machine does not implement full
of the instructions with the isolation of OS but rather provides
running of unmodified OS in a different API which is utilized
1. an entirely isolated way. when OS is subjected to alteration.

64
S.No. Full Virtualization Paravirtualization

While the Paravirtualization is


Full Virtualization is less
more secure than the Full
secure.
2. Virtualization.

Full Virtualization uses binary


While Paravirtualization uses
translation and a direct
hypercalls at compile time for
approach as a technique for
operations.
3. operations.

Full Virtualization is slow than Paravirtualization is faster in


paravirtualization in operation as compared to full
4. operation. virtualization.

Full Virtualization is more Paravirtualization is less portable


5. portable and compatible. and compatible.

Examples of full virtualization


Examples of paravirtualization are
are Microsoft and Parallels
Microsoft Hyper-V, Citrix Xen, etc.
6. systems.

The guest operating system has to


It supports all guest operating
be modified and only a few
systems without modification.
7. operating systems support it.

Using the drivers, the guest


The guest operating system
operating system will directly
will issue hardware calls.
8. communicate with the hypervisor.

It is less streamlined compared


It is more streamlined.
9. to para-virtualization.

It provides less isolation compared


It provides the best isolation.
10. to full virtualization.

What is virtualized security?


Virtualized security, or security virtualization, refers to security
solutions that are software-based and designed to work within a virtualized
IT environment. This differs from traditional, hardware-based network
security, which is static and runs on devices such as traditional firewalls,
routers, and switches.
In contrast to hardware-based security, virtualized security is flexible
and dynamic. Instead of being tied to a device, it can be deployed anywhere

65
in the network and is often cloud-based. This is key for virtualized networks,
in which operators spin up workloads and applications dynamically;
virtualized security allows security services and functions to move around
with those dynamically created workloads.
Cloud security considerations (such as isolating multitenant
environments in public cloud environments) are also important to virtualized
security. The flexibility of virtualized security is helpful for securing hybrid
and multi-cloud environments, where data and workloads migrate around a
complicated ecosystem involving multiple vendors.
What are the benefits of virtualized security?
Virtualized security is now effectively necessary to keep up with the
complex security demands of a virtualized network, plus it’s more flexible and
efficient than traditional physical security. Here are some of its specific
benefits:
Cost-effectiveness: Virtualized security allows an enterprise to maintain a
secure network without a large increase in spending on expensive proprietary
hardware. Pricing for cloud-based virtualized security services is often
determined by usage, which can mean additional savings for organizations
that use resources efficiently.
Flexibility: Virtualized security functions can follow workloads anywhere,
which is crucial in a virtualized environment. It provides protection across
multiple data centers and in multi-cloud and hybrid cloud environments,
allowing an organization to take advantage of the full benefits of virtualization
while also keeping data secure.
Operational efficiency: Quicker and easier to deploy than hardware-based
security, virtualized security doesn’t require IT teams to set up and configure
multiple hardware appliances. Instead, they can set up security systems
through centralized software, enabling rapid scaling. Using software to run
security technology also allows security tasks to be automated, freeing up
additional time for IT teams.
Regulatory compliance: Traditional hardware-based security is static and
unable to keep up with the demands of a virtualized network, making
virtualized security a necessity for organizations that need to maintain
regulatory compliance.
How does virtualized security work?
Virtualized security can take the functions of traditional security
hardware appliances (such as firewalls and antivirus protection) and deploy
them via software. In addition, virtualized security can also perform additional
security functions. These functions are only possible due to the advantages
of virtualization, and are designed to address the specific security needs of a
virtualized environment.
For example, an enterprise can insert security controls (such as
encryption) between the application layer and the underlying infrastructure,
or use strategies such as micro-segmentation to reduce the potential attack
surface.
Virtualized security can be implemented as an application directly on
a bare metal hypervisor (a position it can leverage to provide
effective application monitoring) or as a hosted service on a virtual machine.

66
In either case, it can be quickly deployed where it is most effective, unlike
physical security, which is tied to a specific device.
What are the risks of virtualized security?
The increased complexity of virtualized security can be a challenge for
IT, which in turn leads to increased risk. It’s harder to keep track of workloads
and applications in a virtualized environment as they migrate across servers,
which makes it more difficult to monitor security policies and configurations.
And the ease of spinning up virtual machines can also contribute to security
holes.
It’s important to note, however, that many of these risks are already
present in a virtualized environment, whether security services are virtualized
or not. Following enterprise security best practices (such as spinning down
virtual machines when they are no longer needed and using automation to
keep security policies up to date) can help mitigate such risks.
How is physical security different from virtualized security?
Traditional physical security is hardware-based, and as a result, it’s
inflexible and static. The traditional approach depends on devices deployed at
strategic points across a network and is often focused on protecting the
network perimeter (as with a traditional firewall). However, the perimeter of a
virtualized, cloud-based network is necessarily porous and workloads and
applications are dynamically created, increasing the potential attack surface.
Traditional security also relies heavily upon port and protocol filtering,
an approach that’s ineffective in a virtualized environment where addresses
and ports are assigned dynamically. In such an environment, traditional
hardware-based security is not enough; a cloud-based network requires
virtualized security that can move around the network along with workloads
and applications.
What are the different types of virtualized security?
There are many features and types of virtualized security,
encompassing network security, application security, and cloud security.
Some virtualized security technologies are essentially updated, virtualized
versions of traditional security technology (such as next-generation firewalls).
Others are innovative new technologies that are built into the very fabric of
the virtualized network.
Some common types of virtualized security features include:
Segmentation, or making specific resources available only to specific
applications and users. This typically takes the form of controlling traffic
between different network segments or tiers.
Micro-segmentation, or applying specific security policies at the
workload level to create granular secure zones and limit an attacker’s ability
to move through the network. Micro-segmentation divides a data center into
segments and allows IT teams to define security controls for each segment
individually, bolstering the data center’s resistance to attack.
Isolation, or separating independent workloads and applications on the
same network. This is particularly important in a multitenant public
cloud environment, and can also be used to isolate virtual networks from the
underlying physical infrastructure, protecting the infrastructure from attack.

67
Virtualization, Hypervisor
A platform virtualization approach that allows efficient full virtualization
with the help of hardware capabilities, primarily from the host processor is
referred to as Hardware based virtualization in computing. To simulate a
complete hardware environment, or virtual machine, full virtualization is
used in which an unchanged guest operating system (using the common
instruction set as the host machine) executes in sophisticated isolation.

The different logical layers of operating system-based virtualization, in


which the VM is first installed into a full host operating system and
subsequently used to generate virtual machines.
An abstract execution environment in terms of computer hardware in
which guest OS can be run, referred to as Hardware-level virtualization. In
this, an operating system represents the guest, the physical computer
hardware represents a host, its emulation represents a virtual machine, and
the hypervisor represents the Virtual Machine Manager. When the virtual
machines are allowed to interact with hardware without any intermediary
action requirement from the host operating system generally makes
hardware-based virtualization more efficient. A fundamental component of
hardware virtualization is the hypervisor, or virtual machine manager
(VMM).
Basically, there are two types of Hypervisors which are described below:

Type-I hypervisors:
Hypervisors of type I run directly on top of the hardware. As a result,
they stand in for operating systems and communicate directly with the ISA
interface offered by the underlying hardware, which they replicate to allow

68
guest operating systems to be managed. Because it runs natively on
hardware, this sort of hypervisor is also known as a native virtual machine.
Type-II hypervisors:
To deliver virtualization services, Type II hypervisors require the
assistance of an operating system. This means they’re operating system-
managed applications that communicate with it via the ABI and simulate the
ISA of virtual hardware for guest operating systems. Because it is housed
within an operating system, this form of hypervisor is also known as a hosted
virtual machine.
A hypervisor has a simple user interface that needs some storage
space. It exists as a thin layer of software and to establish a virtualization
management layer, it does hardware management function. For the
provisioning of virtual machines, device drivers and support software are
optimized while many standard operating system functions are not
implemented. Essentially, to enhance performance overhead inherent to the
coordination which allows multiple VMs to interact with the same hardware
platform this type of virtualization system is used.
Hardware compatibility is another challenge for hardware-based
virtualization. The virtualization layer interacts directly with the host
hardware, which results that all the associated drivers and support software
must be compatible with the hypervisor. As hardware devices drivers
available to other operating systems may not be available to hypervisor
platforms similarly. Moreover, host management and administration
features may not contain the range of advanced functions that are common
to the operating systems.
Features of hardware-based virtualization are:
Isolation: Hardware-based virtualization provides strong isolation between
virtual machines, which means that any problems in one virtual machine
will not affect other virtual machines running on the same physical host.
Security: Hardware-based virtualization provides a high level of security as
each virtual machine is isolated from the host operating system and other
virtual machines, making it difficult for malicious code to spread from one
virtual machine to another.
Performance: Hardware-based virtualization provides good performance as
the hypervisor has direct access to the physical hardware, which means that
virtual machines can achieve close to native performance.
Resource allocation: Hardware-based virtualization allows for flexible
allocation of hardware resources such as CPU, memory, and I/O bandwidth
to virtual machines.
Snapshot and migration: Hardware-based virtualization allows for the
creation of snapshots, which can be used for backup and recovery purposes.
It also allows for live migration of virtual machines between physical hosts,
which can be used for load balancing and other purposes.
Support for multiple operating systems: Hardware-based virtualization
supports multiple operating systems, which allows for the consolidation of
workloads onto fewer physical machines, reducing hardware and
maintenance costs.

69
Compatibility: Hardware-based virtualization is compatible with most
modern operating systems, making it easy to integrate into existing IT
infrastructure.
Advantages of hardware-based virtualization –
It reduces the maintenance overhead of paravirtualization as it reduces
(ideally, eliminates) the modification in the guest operating system. It is also
significantly convenient to attain enhanced performance. A practical benefit
of hardware-based virtualization has been mentioned by VMware engineers
and Virtual Iron.
Disadvantages of hardware-based virtualization –

Hardware-based virtualization requires explicit support in the host CPU,


which may not available on all x86/x86_64 processors. A “pure” hardware-
based virtualization approach, including the entire unmodified guest
operating system, involves many VM traps, and thus a rapid increase in CPU
overhead occurs which limits the scalability and efficiency of server
consolidation. This performance hit can be mitigated by the use of para-
virtualized drivers; the combination has been called “hybrid virtualization”.
Evolution of Cloud Computing
Cloud computing is all about renting computing services. This idea
first came in the 1950s. In making cloud computing what it is today, five
technologies played a vital role. These are distributed systems and its
peripherals, virtualization, web 2.0, service orientation, and utility
computing.

Distributed Systems:
It is a composition of multiple independent systems but all of them are
depicted as a single entity to the users. The purpose of distributed systems
is to share resources and also use them effectively and efficiently.
Distributed systems possess characteristics such as scalability,
concurrency, continuous availability, heterogeneity, and independence in
failures. But the main problem with this system was that all the systems
were required to be present at the same geographical location. Thus to solve
this problem, distributed computing led to three more types of computing
and they were-Mainframe computing, cluster computing, and grid
computing.

70
Mainframe computing:
Mainframes which first came into existence in 1951 are highly
powerful and reliable computing machines. These are responsible for
handling large data such as massive input-output operations. Even today
these are used for bulk processing tasks such as online transactions etc.
These systems have almost no downtime with high fault tolerance. After
distributed computing, these increased the processing capabilities of the
system. But these were very expensive. To reduce this cost, cluster
computing came as an alternative to mainframe technology.

Cluster computing:
In 1980s, cluster computing came as an alternative to mainframe
computing. Each machine in the cluster was connected to each other by a
network with high bandwidth. These were way cheaper than those
mainframe systems. These were equally capable of high computations. Also,
new nodes could easily be added to the cluster if it was required. Thus, the
problem of the cost was solved to some extent but the problem related to
geographical restrictions still pertained. To solve this, the concept of grid
computing was introduced.
Grid computing:
In 1990s, the concept of grid computing was introduced. It means that
different systems were placed at entirely different geographical locations and
these all were connected via the internet. These systems belonged to different
organizations and thus the grid consisted of heterogeneous nodes. Although
it solved some problems but new problems emerged as the distance between
the nodes increased. The main problem which was encountered was the low
availability of high bandwidth connectivity and with it other network
associated issues. Thus. cloud computing is often referred to as “Successor
of grid computing”.
Virtualization:
It was introduced nearly 40 years back. It refers to the process of creating a
virtual layer over the hardware which allows the user to run multiple
instances simultaneously on the hardware. It is a key technology used in
cloud computing. It is the base on which major cloud computing services
such as Amazon EC2, VMware vCloud, etc work on. Hardware virtualization
is still one of the most common types of virtualization.
Web 2.0:
It is the interface through which the cloud computing services interact
with the clients. It is because of Web 2.0 that we have interactive and
dynamic web pages. It also increases flexibility among web pages. Popular
examples of web 2.0 include Google Maps, Facebook, Twitter, etc. Needless
to say, social media is possible because of this technology only. It gained
major popularity in 2004.
Service orientation:
It acts as a reference model for cloud computing. It supports low-cost,
flexible, and evolvable applications. Two important concepts were introduced
in this computing model. These were Quality of Service (QoS) which also
includes the SLA (Service Level Agreement) and Software as a Service (SaaS).

71
Utility computing:
It is a computing model that defines service provisioning techniques for
services such as computer services along with other major services such as
storage, infrastructure, etc which are provisioned on a pay-per-use basis.
Evolving Higher Data Storage Capabilities
In the past two decades, due to exponential rise in data usage,
data centers developed stringent requirements for greater storage
capacity per square area and faster data transmission, the industry
continued to evolve. Innovators focused on finding ways to achieve larger
capacity and faster throughput, while using limited space and staying
within their power budget.
Flash technology became popular because of its small size and
ability to deliver faster insights using significantly lower power
consumption than hard drive technology. However, even though this
option solves some size and power problems, it has limitations. For
example, the lifespan of most flash devices allows you to write data to
them only a certain number of times before they fail mechanically.
Over the past 90 years, data storage evolved from magnetic drums
and tapes to hard disk drives, then to mixed media, flash, and finally
cloud storage. That’s where we are today, and as our storage needs
increase, innovation continues to evolve in multiple areas.
The Paradigm Shift to Data Storage at Edge
Big data plays a pivotal role in almost everything we do these days,
but it’s no longer enough to just have access to data-driven insights—
particularly if they are outdated and obsolete. As the amount of data
generated grows and data capture increasingly moves closer to edge
environments, urgent processing is critical to deliver timely intelligence
that reflects real-time circumstances.
Organizations are progressively experiencing more pressure to
obtain and apply insights rapidly, before situations change. This fact
makes it imperative for business leaders across all mainstream
industries to embrace active data and deploy ways of capturing and
transporting it for immediate processing.
The Challenges of Managing Big Data
To optimize AI for the future, we also need high-performance
systems. These could be storage or cloud-based systems, processed by
modern, data-hungry applications. The more data you feed these
applications, the faster they can run their algorithms and deliver
insights, whether these are for micro strategy tools or business
intelligence tools. This is usually called data mining, and, in the past,
we did it by putting the data into a warehouse and then running
applications to process it.

72
Different types of cloud storage models
Cloud storage models are models of cloud computing that stores data
on the internet via cloud computing providers. These providers manage and
operate data storage as a service.
Cloud storage is basically an online storage of data. Data that is stored
can be accessed from multiple connected devices, which constitute a cloud.
Cloud storage can provide various benefits like greater accessibility and
reliability, strong protection of data backup, rapid deployment, and disaster
recovery purposes.
Moving to the cloud also decreases overall storage costs due to cutting
costs incurred on the purchase of storage devices and their maintenance.
As companies have started embracing the virtual disk model, the
landscape of the data center is shifting.
These models are pioneered in virtualization also providing new models
that enable fully virtualized storage stacks.
The cloud environment tries to provide a self-service with a precise
separation between application and infrastructure.
There the 3 cloud storage models
 Instance storage: Virtual disks in the cloud
 Volume storage: SAN sans the physical
 Object storage: Web-scale NAS

Instance storage: Virtual disks in the cloud


In a traditional virtualized environment, the virtual disk storage model
is the eminent one. The nomenclature of this model is based upon this very
reason, instance storage, meaning storage that is used like conventional
virtual disks. It is crucial to note that instance storage is a storage model, not
a storage protocol. This storage can be implemented in numerous ways. For
example, DAS is generally used to implement instance storage. It is often
stated as ephemeral storage as the storage isn’t highly reliable.
Advantages & Disadvantages
Hard drives that instance storages run on are physically attached to the
EC2 hosts which are running the store. Their endurance depends upon the
lifetime of the instances attached to them.
Both Instance and Elastic Block Storage (EBS) volumes are stored in a
series somewhere in the same AZ. All the HDP’s and can be re-arranged with
a new EC2 instance unless they are bound to delete when the attached EC2
instance is terminated.
Due to the paucity of speed and persistence in instance storage, it’s
usually used on data that requires quick but temporary assessment, like swap
or paging files.
However, it is also used to store data that requires regular replication
to multiple locations.
Also, EC2’s using instance storage for their root device should keep a
copy of their AMI on the instance store disk because instance storage does

73
not endure any data, which is the reason behind longer boot time when
compared to instances backed by EBS.
Volume storage: SAN sans the physical Volume storage is also known as
block storage. It supports operations like read/write and keeping the system
files of running virtual machines.
As suggested by its name, data is stored in structured blocks and
volumes where files are split into equal-sized blocks. Each block has its own
address. However, unlike objects, they don’t possess any metadata. Files are
bifurcated into simpler blocks of fixed size, storing large amounts of data,
which are dispensed amongst the storage nodes.
Public cloud providers allow the creation of various file systems on their
block storage systems, thus enabling users to store incessant data like a
database.
Additionally, an Amazon EBS volume is accessed from an Amazon EC2
instance through an AWS shared or dedicated network.
Another advantage of using volume/block storage is its backup
mechanism. For example, AWS EBS extends a snapshot feature that is
necessarily an incremental point in a timely backup of your volume.
Object storage: Web-scale NAS
Cloud-native applications need space, for storing data that is shared
between different VMs. However, often there’s a need for spaces that can
extend to various data centers across multiple geographies which is catered
by Object storage.
For example, Amazon Simple Storage Service (S3) caters to a single
space across an entire region, probably, across the entire world.
Object storage stores data as objects, unlike others which go for a file
hierarchy system. But it provides for eventual consistency.
Each object/block consists of data, metadata, and a unique identifier.
What object storage does differently is that it tries to explore address
capabilities that are overlooked by other storages viz a namespace, directly
programmable interface, data distribution, etc.
Object storage also saves a substantial amount of unstructured data.
This kind of storage is used for storing songs on audio applications, photos
on social media, or online services like Dropbox.
Advantages & Disadvantages
Storing virtually unlimited files is one of many advantages for installing
object storage. Having an HTTP(S) based interface, object storage also
maintains file revisions.
In this kind of storage, files are distributed in different nodes which
means that to modify a file you will need to upload a new revision of the entire
file which can significantly impact performance.
Difference between File System and DBMS
File System :
The file system is basically a way of arranging the files in a storage
medium like a hard disk. The file system organizes the files and helps in the
retrieval of files when they are required. File systems consist of different files
which are grouped into directories. The directories further contain other
folders and files. The file system performs basic operations like management,
file naming, giving access rules, etc.

74
Example: NTFS (New Technology File System), EXT (Extended File System).

DBMS (Database Management System):


Database Management System is basically software that manages the
collection of related data. It is used for storing data and retrieving the data
effectively when it is needed. It also provides proper security measures for
protecting the data from unauthorized access. In Database Management
System the data can be fetched by SQL queries and relational algebra. It also
provides mechanisms for data recovery and data backup.
Example:
Oracle, MySQL, MS SQL server.

Difference between File System and DBMS:


Basics File System DBMS

The file system is a way of


DBMS is software for
arranging the files in a storage
managing the database.
Structure medium within a computer.

Data Redundant data can be In DBMS there is no


Redundancy present in a file system. redundant data.

It provides backup and


It doesn’t provide backup and
Backup and recovery of data even if it is
recovery of data if it is lost.
Recovery lost.

Query There is no efficient query Efficient query processing


processing processing in the file system. is there in DBMS.

There is more data


There is less data consistency
consistency because of the
in the file system.
Consistency process of normalization.

75
Basics File System DBMS

It has more complexity in


It is less complex as compared
handling as compared to
to DBMS.
Complexity the file system.

File systems provide less DBMS has more security


Security security in comparison to mechanisms as compared
Constraints DBMS. to file systems.

It has a comparatively
It is less expensive than
higher cost than a file
DBMS.
Cost system.

Data There is no data In DBMS data


Independence independence. independence exists.

Only one user can access data Multiple users can access
User Access at a time. data at a time.

The user has to write


The user not required to write
procedures for managing
procedures.
Meaning databases

Data is distributed in many


Due to centralized nature
files. So, not easy to share
sharing is easy
Sharing data

Data It give details of storage and It hides the internal details


Abstraction representation of data of Database

Integrity Integrity Constraints are Integrity constraints are


Constraints difficult to implement easy to implement

To access data in a file , user


No such attributes are
Attributes requires attributes such as
required.
file name ,file location.

Example Cobol, C++ Oracle, SQL Server

Google File System


Google Inc. developed the Google File System (GFS), a scalable
distributed file system (DFS), to meet the company’s growing data processing
needs. GFS offers fault tolerance, dependability, scalability, availability, and
performance to big networks and connected nodes. GFS is made up of a
number of storage systems constructed from inexpensive commodity
hardware parts. The search engine, which creates enormous volumes of data

76
that must be kept, is only one example of how it is customized to meet
Google’s various data use and storage requirements.
 The Google File System reduced hardware flaws while gains of
commercially available servers.
 GoogleFS is another name for GFS. It manages two types of data
namely File metadata and File Data.
The GFS node cluster consists of a single master and several chunk
servers that various client systems regularly access. On local discs, chunk
servers keep data in the form of Linux files. Large (64 MB) pieces of the stored
data are split up and replicated at least three times around the network.
Reduced network overhead results from the greater chunk size. Without
hindering applications, GFS is made to meet Google’s huge cluster
requirements. Hierarchical directories with path names are used to store
files. The master is in charge of managing metadata, including namespace,
access control, and mapping data. The master communicates with each
chunk server by timed heartbeat messages and keeps track of its status
updates.
More than 1,000 nodes with 300 TB of disc storage capacity make up
the largest GFS clusters. This is available for constant access by hundreds
of clients.

Components of GFS
A group of computers makes up GFS. A cluster is just a group of
connected computers. There could be hundreds or even thousands of
computers in each cluster. There are three basic entities included in any
GFS cluster as follows:
GFS Clients: They can be computer programs or applications which may be
used to request files. Requests may be made to access and modify already-
existing files or add new files to the system.
GFS Master Server: It serves as the cluster’s coordinator. It preserves a
record of the cluster’s actions in an operation log. Additionally, it keeps track
of the data that describes chunks, or metadata. The chunks’ place in the
overall file and which files they belong to are indicated by the metadata to
the master server.
GFS Chunk Servers: They are the GFS’s workhorses. They keep 64 MB-
sized file chunks. The master server does not receive any chunks from the
chunk servers. Instead, they directly deliver the client the desired chunks.

77
The GFS makes numerous copies of each chunk and stores them on various
chunk servers in order to assure stability; the default is three copies. Every
replica is referred to as one.
Features of GFS
 Namespace management and locking.
 Fault tolerance.
 Reduced client and master interaction because of large chunk server
size.
 High availability.
 Critical data replication.
 Automatic and efficient data recovery.
 High aggregate throughput.
Advantages of GFS
 High accessibility Data is still accessible even if a few nodes fail.
(replication) Component failures are more common than not, as the
saying goes.
 Excessive throughput.
 Many nodes operating concurrently.
 Dependable storing. Data that has been corrupted can be found and
duplicated.
 Disadvantages of GFS
 Not the best fit for small files.
 Master may act as a bottleneck.
 Unable to type at random.
 Suitable for procedures or data that are written once and only read
(appended) lat

Introduction to Hadoop Distributed File System (HDFS)


With growing data velocity the data size easily outgrows the storage
limit of a machine. A solution would be to store the data across a network of
machines. Such filesystems are called distributed filesystems. Since data is
stored across a network all the complications of a network come in. This is
where Hadoop comes in. It provides one of the most reliable filesystems.
HDFS (Hadoop Distributed File System) is a unique design that provides
storage for extremely large files with streaming data access pattern and it
runs on commodity hardware. Let’s elaborate the terms:
Extremely large files: Here we are talking about the data in range of
petabytes (1000 TB).
Streaming Data Access Pattern: HDFS is designed on principle of write-
once and read-many-times. Once data is written large portions of dataset can
be processed any number times.
Commodity hardware: Hardware that is inexpensive and easily available in
the market. This is one of feature which specially distinguishes HDFS from
other file system.
Nodes: Master-slave nodes typically forms the HDFS cluster.
NameNode(MasterNode):
Manages all the slave nodes and assign work to them.

78
 It executes filesystem namespace operations like opening, closing,
renaming files and directories.
 It should be deployed on reliable hardware which has the high config.
not on commodity hardware.
DataNode(SlaveNode):
 Actual worker nodes, who do the actual work like reading, writing,
processing etc.
 They also perform creation, deletion, and replication upon instruction
from the master.
 They can be deployed on commodity hardware.
HDFS daemons: Daemons are the processes running in background.
Namenodes:
 Run on the master node.
 Store metadata (data about data) like file path, the number of blocks,
block Ids. etc.
 Require high amount of RAM.
 Store meta-data in RAM for fast retrieval i.e to reduce seek time.
Though a persistent copy of it is kept on disk.
DataNodes:
 Run on slave nodes.
 Require high memory as data is actually stored here.
Data storage in HDFS: Now let’s see how the data is stored in a distributed
manner.

Lets assume that 100TB file is inserted, then masternode (namenode) will
first divide the file into blocks of 10TB (default size is 128 MB in Hadoop 2.x
and above). Then these blocks are stored across different
datanodes(slavenode). Datanodes(slavenode)replicate the blocks among
themselves and the information of what blocks they contain is sent to the
master. Default replication factor is 3 means for each block 3 replicas are
created (including itself). In hdfs.site.xml we can increase or decrease the
replication factor i.e we can edit its configuration here.
Terms related to HDFS:
HeartBeat: It is the signal that datanode continuously sends to namenode.
If namenode doesn’t receive heartbeat from a datanode then it will consider
it dead.

79
Balancing: If a datanode is crashed the blocks present on it will be gone too
and the blocks will be under-replicated compared to the remaining blocks.
Here master node (namenode) will give a signal to datanodes containing
replicas of those lost blocks to replicate so that overall distribution of blocks
is balanced.
Replication: It is done by datanode.
Note: No two replicas of the same block are present on the same datanode.
Features:
 Distributed data storage.
 Blocks reduce seek time.
 The data is highly available as the same block is present at multiple
datanodes.
 Even if multiple datanodes are down we can still do our work, thus
making it highly reliable.
 High fault tolerance.
Limitations: Though HDFS provide many features there are some areas
where it doesn’t work well.
Low latency data access: Applications that require low-latency access to
data i.e in the range of milliseconds will not work well with HDFS, because
HDFS is designed keeping in mind that we need high-throughput of data
even at the cost of latency.
Small file problem: Having lots of small files will result in lots of seeks and
lots of movement from one datanode to another datanode to retrieve each
small file, this whole process is a very inefficient data access pattern.
Introduction to Google Cloud Bigtable
You may store terabytes or even petabytes of data in Google Cloud
BigTable, a sparsely populated table that can scale to billions of rows and
thousands of columns. The row key is the lone index value that appears in
every row and is also known as the row value. Low-latency storage for
massive amounts of single-keyed data is made possible by Google Cloud
Bigtable. It is the perfect data source for MapReduce processes since it
enables great read and write throughput with low latency.
Applications can access Google Cloud BigTable through a variety of
client libraries, including a supported Java extension to the Apache HBase
library. Because of this, it is compatible with the current Apache ecosystem
of open-source big data software.
Powerful backend servers from Google Cloud Bigtable have a number
of advantages over a self-managed HBase installation, including:
Exceptional scalability In direct proportion to the number of machines in
your cluster, Google Cloud Bigtable scales. After a certain point, a self-
managed HBase system has a design bottleneck that restricts performance.
This bottleneck does not exist for Google Cloud Bigtable, therefore you can
extend your cluster to support more reads and writes.
Ease of administration Upgrades and restarts are handled by Google
Cloud Bigtable transparently, and it automatically upholds strong data
durability. Simply add a second cluster to your instance to begin replicating
your data; replication will begin immediately. Simply define your table
schemas, and Google Cloud Bigtable will take care of the rest for you. No
more managing replication or regions.

80
Cluster scaling with minimal disruption. Without any downtime, you
may scale down a Google Cloud Bigtable cluster after increasing its capacity
for a few hours to handle a heavy load. Under load, Google Cloud Bigtable
usually balances performance across all of the nodes in your cluster within
a few minutes after you modify the size of a cluster.
Why use BigTable?
Applications that require high throughput and scalability for
key/value data, where each value is typically no more than 10 MB, should
use Google Cloud BigTable. Additionally, Google Cloud Bigtable excels as a
storage engine for machine learning, stream processing, and batch
MapReduce operations.
All of the following forms of data can be stored in and searched using Google
Cloud Bigtable:
Time-series information, such as CPU and memory utilization patterns
across various servers.
Marketing information, such as consumer preferences and purchase
history. Financial information, including stock prices, currency exchange
rates, and transaction histories.
Internet of Things data, such as consumption statistics from home
appliances and energy meters. Graph data, which includes details on the
connections between users.
BigTable Storage Concept:
Each massively scalable table in Google Cloud Bigtable is a sorted
key/value map that holds the data. The table is made up of columns that
contain unique values for each row and rows that typically describe a single
object. A single row key is used to index each row, and a column family is
often formed out of related columns. The column family and a column
qualifier, a distinctive name within the column family, are combined to
identify each column.
Multiple cells may be present at each row/column intersection. A
distinct timestamped copy of the data for that row and column is present in
each cell. When many cells are put in a column, a history of the recorded
data for that row and column is preserved. Cloud by Google Bigtable tables
is sparse, taking up no room if a column is not used in a given row.

It’s important to note that data is never really saved in Google Cloud Bigtable
nodes; rather, each node contains pointers to a collection of tablets that are
kept on Colossus. Because the real data is not duplicated, rebalancing
tablets from one node to another proceeds swiftly. When a Google Cloud

81
Bigtable node fails, no data is lost; recovery from a node failure is quick since
only metadata must be moved to the new node. Google Cloud Bigtable merely
changes the pointers for each node.
Load balancing
A primary process oversees each Google Cloud Bigtable zone,
balancing workload and data volume within clusters. By dividing
busier/larger tablets in half and combining less-used/smaller tablets, this
procedure moves tablets across nodes as necessary. Google Cloud Bigtable
divides a tablet into two when it experiences a spike in traffic, and then
moves one of the new tablets to a different node. By handling the splitting,
merging, and rebalancing automatically with Google Cloud Bigtable, you
may avoid having to manually manage your tablets.
Obtainable data types
For the majority of uses, Google Cloud Bigtable treats all data as raw
byte strings. Only during increment operations, where the destination must
be a 64-bit integer encoded as an 8-byte big-endian value, does Google Cloud
Bigtable attempt to ascertain the type.
Use of the disc and memory
The sections that follow explain how various Google Cloud Bigtable features
impact the amount of memory and disc space used by your instance.
Inactive columns
A Google Cloud Bigtable row doesn’t have any room for columns that
aren’t being used. Each row is essentially made up of a set of key/value
entries, where the key is made up of the timestamp, column family, and
column qualifier. The key/value entry is just plain absent if a row doesn’t
have a value for a certain column.
Columns that qualify
Since each column qualifier used in a row is stored in that row, column
qualifiers occupy space in rows. As a result, using column qualifiers as data
is frequently effective.
Compactions
To make reads and writes more effective and to eliminate removed
entries, Google Cloud Bigtable periodically rewrites your tables. This
procedure is called compaction. Your data is automatically compacted by
Google Cloud Big Table; there are no tuning options.
Removals and Modifications
Because Google Cloud Bigtable saves mutations sequentially and only
periodically compacts them, updates to a row require more storage space. A
table is compacted by Google Cloud Bigtable by removing values that are no
longer required. The original value and the updated value will both be kept
on disc until the data is compressed if you change a cell’s value.
Because deletions are actually a particular kind of mutation, they also
require more storage space, at least initially. A deletion consumes additional
storage rather than releasing space up until the table is compacted.
Compression of data: - Your data is automatically compressed by Google
Cloud Bigtable using a clever algorithm. Compression settings for your table
cannot be configured. To store data effectively so that it may be compressed,
though, is useful.

82
Patterned data can be compressed more effectively than random data. •
Compression performs best when identical values are next to one another,
either in the same row or in adjacent rows. Text, like as the page you’re
reading right now, is a type of patterned data. The data can be efficiently
compressed if your row keys are arranged so that rows with similar pieces
of data are near to one another.
Before saving values in Google Cloud Bigtable, compress those that
are greater than 1 MiB. This compression conserves network traffic, server
memory, and CPU cycles. Compression is automatically off for values greater
than 1 MiB in Google Cloud Bigtable.
Data longevity
When you use Google Cloud Bigtable, your information is kept on
Colossus, an internal, incredibly resilient file system, employing storage
components located in Google’s data centers. To use Google Cloud Bigtable,
you do not need to run an HDFS cluster or any other type of file system.
Beyond what conventional HDFS three-way replication offers, Google
employs customized storage techniques to ensure data persistence.
Additionally, we make duplicate copies of your data to enable disaster
recovery and protection against catastrophic situations.
Dependable model
Single-cluster Strong consistency is provided via Google Cloud
Bigtable instances.
IAM roles that you can apply for security stop specific users from
creating new instances, reading from tables, or writing to tables. Any of your
tables cannot be accessed by anyone who does not have access to your
project or who does not have an IAM role with the necessary Google Cloud
Bigtable permissions.
At the level of projects, instances, and tables, security can be
managed. There are no row-level, column-level, or cell-level security
constraints supported by Google Cloud Bigtable.
Encryption
The same hardened key management mechanisms that we employ for
our own encrypted data are used by default for all data stored within Google
Cloud, including the data in Google Cloud Big Table tables.
Customer-managed encryption keys provide you more control over the keys
used to protect your Google Cloud Bigtable data at rest (CMEK).
Backups
With Google Cloud Bigtable backups, you may copy the schema and
data of a table and later restore it to a new table using the backup. You can
recover from operator errors, such as accidentally deleting a table and
application-level data destruction with the use of backups.
Google’s Megastore
Megastore is a storage system developed to meet the requirements of
today’s interactive online services. Megastore blends the scalability of a
NoSQL data store with the convenience of a traditional RDBMS in a novel way,
and provides both strong consistency guarantees and high-availability. We
provide fully serializable ACID semantics within fine-grained partitions of
data. This partitioning allows us to synchronously replicate each write across

83
a wide area network with reasonable latency and support seamless failover
between data centers.
The mission
 Support Internet apps such as Google’s AppEngine.
 Scale to millions of users
 Responsive despite Internet latencies to impatient users
 Easy for developers
 Fault resilience from drive failures to data center loss and everything in
between
 Low-latency synchronous replication to distant sites
The how Scale by partitioning the data store and replicating each partition
separately, providing full ACID semantics within partitions but limited
consistency guarantees across them. Offer some traditional database features
if they scale with tolerable latency.
The key assumptions are that data for many apps can be partitioned,
for example by user, and that a selected set of DB features can make
developers productive.
Availability and scale: To achieve availability and global scale the designers
implemented two key architectural features:
 For availability, an asynchronous log replicator optimized for long-
distance
 For scale, data partitioned into small databases each with its own
replicated log
 Rather than implement a master/slave or optimistic replication
strategy, the team decided to use Paxos, a consensus algorithm that
does not require a master, with a novel extension. A single Paxos log
would soon become a bottleneck with millions of users so each partition
gets its own replicated Paxos log.
Data is partitioned into entity groups which are synchronously replicated
over a wide area while the data itself is stored in NoSQL storage. ACID
transaction records within the entities are replicated using Paxos.
For transactions across entities, the synchronous replication requirement
is relaxed and an asynchronous message queue is used. Thus it’s key that
entity group boundaries reflect application usage and user expectations.
Entities
An e-mail account is a natural entity. But defining other entities is more
complex.
Geographic data lacks natural granularity. For example, the globe is
divided into non-overlapping entities. Changes across these geographic
entities use (expensive) two-phase commits.
The design problem: entities large enough to make two-phase commits
uncommon but small enough to keep transaction rates low.
Each entity has a root table and may have child tables. Each child table has
a single root table. Example: a user’s root table may have each of the user’s
photo collections as a child. Most applications find natural entity group
boundaries.
API
The insight driving the API is that the big win is scalable performance rather

84
than a rich query language. Thus a focus on controlling physical locality and
hierarchical layouts.
For example, joins are implemented in application code. Queries specify scans
or lookups against particular tables and indexes. Therefore, the application
needs to understand the data schema to perform well.
Replication
 Megastore uses Paxos to manage synchronous replication. But in order
to make Paxos practical despite high latencies the team developed some
optimizations:
 Fast reads. Current reads are usually from local replicas since most
writes succeed on all replicas.
Fast writes. Since most apps repeatedly write from the same region, the initial
writer is granted priority for further replica writes. Using local replicas and
reducing write contention for distant replicas minimizes latency.
Replica types. In addition to full replicas Megastore has 2 other replica types:
witness replicas: Witnesses vote in Paxos rounds and store the write-ahead
log but do not store entity data or indexes to keep storage costs low. They are
also tiebreakers when isn’t a quorum.
Read-only replicas are the inverse: nonvoting replicas that contain full
snapshots of the data. Their data may be slightly stale but they help
disseminate the data over a wide area without slowing writes.
Architecture
What does Megastore look like in practice? Here’s an example.

Availability
As coordinator servers do most local reads their availability is critical to
maintaining Megastore’s performance. The coordinators use an out-of-band
protocol to track other coordinators and use Google’s Chubby distributed lock
service to obtain remote locks. If the coordinator loses a majority of its locks
it will consider all entities in its purview to be out of date until the locks are
regained and the coordinator is current.
There are a variety of network and race conditions that can affect
coordinator availability. The team believes the simplicity of the coordinator
architecture and their light network traffic makes the availability risks
acceptable.
Performance
Because Megastore is geographically distributed, application servers in
different locations may initiate writes to the same end entity group

85
simultaneously. Only one of them will succeed and the other writers will have
to retry.
Limiting writes to a few per second per entity group makes contention
insignificant, e-mail for example.
For multiuser applications with higher write requirements developers can
shard entity groups more finely or batch user operations into fewer
transactions. Fine-grained advisory locks and sequencing transactions are
other techniques to handle higher write loads.
The real world
Megastores been deployed for several years and more than 100
production applications using today. The paper provides these figures on
availability and average latencies.
Introduction to AWS Simple Storage Service (AWS S3)
AWS Storage Services: AWS offers a wide range of storage services that can
be provisioned depending on your project requirements and use case. AWS
storage services have different provisions for highly confidential data,
frequently accessed data, and the not so frequently accessed data. You can
choose from various storage types namely, object storage, file storage, block
storage services, backups, and data migration options. All of which fall under
the AWS Storage Services list.
AWS Simple Storage Service (S3): From the aforementioned list, S3, is the
object storage service provided by AWS. It is probably the most commonly
used, go-to storage service for AWS users given the features like extremely
high availability, security, and simple connection to other AWS Services.
AWS S3 can be used by people with all kinds of use cases like mobile/web
applications, big data, machine learning and many more.
AWS S3 Terminology:
Bucket: Data, in S3, is stored in containers called buckets.
Each bucket will have its own set of policies and configuration. This enables
users to have more control over their data.
Bucket Names must be unique.
Can be thought of as a parent folder of data.
There is a limit of 100 buckets per AWS accounts. But it can be increased if
requested from AWS support.
Bucket Owner: The person or organization that owns a particular bucket is
its bucket owner.
Import/Export Station: A machine that uploads or downloads data
to/from S3.
Key: Key, in S3, is a unique identifier for an object in a bucket. For example
in a bucket ‘ABC’ your GFG.java file is stored
at javaPrograms/GFG.java then ‘javaPrograms/GFG.java’ is your object key
for GFG.java.
It is important to note that ‘bucketName+key’ is unique for all objects.
This also means that there can be only one object for a key in a bucket. If
you upload 2 files with the same key. The file uploaded latest will overwrite
the previously contained file.
Versioning: Versioning means to always keep a record of previously
uploaded files in S3. Points to note:

86
Versioning is not enabled by default. Once enabled, it is enabled for all
objects in a bucket.
Versioning keeps all the copies of your file, so, it adds cost for storing
multiple copies of your data. For example, 10 copies of a file of size 1GB will
have you charged for using 10GBs for S3 space.
Versioning is helpful to prevent unintended overwrites and deletions.
Note that objects with the same key can be stored in a bucket if versioning
is enabled (since they have a unique version ID).
Null Object: Version ID for objects in a bucket where versioning is
suspended is null. Such objects may be referred to as null objects.
For buckets with versioning enabled, each version of a file has a specific
version ID.
Object: Fundamental entity type stored in AWS S3.
Access Control Lists (ACL): A document for verifying the access to S3
buckets from outside your AWS account. Each bucket has its own ACL.
Bucket Policies: A document for verifying the access to S3 buckets from
within your AWS account, this controls which services and users have what
kind of access to your S3 bucket. Each bucket has its own Bucket Policies.
Lifecycle Rules: This is a cost-saving practice that can move your files to
AWS Glacier (The AWS Data Archive Service) or to some other S3 storage
class for cheaper storage of old data or completely delete the data after the
specified time.
Features of AWS S3:
Durability: AWS claims Amazon S3 to have a 99.999999999% of durability
(11 9’s). This means the possibility of losing your data stored on S3 is one in
a billion.
Availability: AWS ensures that the up-time of AWS S3 is 99.99% for
standard access.
Note that availability is related to being able to access data and durability is
related to losing data altogether.
Server-Side-Encryption (SSE): AWS S3 supports three types of SSE
models:
SSE-S3: AWS S3 manages encryption keys.
SSE-C: The customer manages encryption keys.
SSE-KMS: The AWS Key Management Service (KMS) manages the encryption
keys.
File Size support: AWS S3 can hold files of size ranging from 0 bytes to 5
terabytes. A 5TB limit on file size should not be a blocker for most of the
applications in the world.
Infinite storage space: Theoretically AWS S3 is supposed to have infinite
storage space. This makes S3 infinitely scalable for all kinds of use cases.
Pay as you use: The users are charged according to the S3 storage they
hold.
AWS-S3 is region-specific.
S3 storage classes:
AWS S3 provides multiple storage types that offer different performance and
features and different cost structure.
Standard: Suitable for frequently accessed data, that needs to be highly
available and durable.

87
Standard Infrequent Access (Standard IA): This is a cheaper data-storage
class and as the name suggests, this class is best suited for storing
infrequently accessed data like log files or data archives. Note that there may
be a per GB data retrieval fee associated with Standard IA class.
Intelligent Tiering: This service class classifies your files automatically into
frequently accessed and infrequently accessed and stores the infrequently
accessed data in infrequent access storage to save costs. This is useful for
unpredictable data access to an S3 bucket.
One Zone Infrequent Access (One Zone IA): All the files on your S3 have
their copies stored in a minimum of 3 Availability Zones. One Zone IA stores
this data in a single availability zone. It is only recommended to use this
storage class for infrequently accessed, non-essential data. There may be a
per GB cost for data retrieval.
Reduced Redundancy Storage (RRS): All the other S3 classes ensure the
durability of 99.999999999%. RRS only ensures a 99.99% durability. AWS
no longer recommends RRS due to its less durability. However, it can be
used to store non-essential data.
What is cloud security?
Preparing your business for future success starts with switching from
on-premises hardware to the cloud for your computing needs. The cloud gives
you access to more applications, improves data accessibility, helps your team
collaborate more effectively, and provides easier content management.
Some people may have reservations about switching to the cloud due to
security concerns, but a reliable cloud service provider (CSP) can put your
mind at ease and keep your data safe with highly secure cloud services.
Find out more about what cloud security is, the main types of cloud
environments you'll need security for, the importance of cloud security, and
its primary benefits.
Definition of cloud security
Cloud security, also known as cloud computing security, is a collection
of security measures designed to protect cloud-based infrastructure,
applications, and data. These measures ensure user and
device authentication, data and resource access control, and data
privacy protection. They also support regulatory data compliance. Cloud
security is employed in cloud environments to protect a company's data from
distributed denial of service (DDoS) attacks, malware, hackers, and
unauthorized user access or use.
Types of cloud environments
When you're looking for cloud-based security, you'll find three main
types of cloud environments to choose from. The top options on the market
include public clouds, private clouds, and hybrid clouds. Each of these
environments has different security concerns and benefits, so it's important
to know the difference between them:
1. Public clouds
Public cloud services are hosted by third-party cloud service providers.
A company doesn't have to set up anything to use the cloud, since the provider
handles it all. Usually, clients can access a provider's web services via web
browsers. Security features, such as access control, identity management,
and authentication, are crucial to public clouds.

88
2. Private clouds
Private clouds are typically more secure than public clouds, as they're
usually dedicated to a single group or user and rely on that group or user's
firewall. The isolated nature of these clouds helps them stay secure from
outside attacks since they're only accessible by one organization. However,
they still face security challenges from some threats, such as social
engineering and breaches. These clouds can also be difficult to scale as your
company's needs expand.
3. Hybrid clouds
Hybrid clouds combine the scalability of public clouds with the greater
control over resources that private clouds offer. These clouds connect multiple
environments, such as a private cloud and a public cloud that can scale more
easily based on demand. Successful hybrid clouds allow users to access all
their environments in a single integrated content management platform.
Why is cloud security important?
Cloud security is critical since most organizations are already using
cloud computing in one form or another. This high rate of adoption of public
cloud services is reflected in Gartner’s recent prediction that the worldwide
market for public cloud services will grow 23.1% in 2021.
IT professionals remain concerned about moving more data and
applications to the cloud due to security, governance, and compliance issues
when their content is stored in the cloud. They worry that highly sensitive
business information and intellectual property may be exposed through
accidental leaks or due to increasingly sophisticated cyber threats.
Cloud security benefits
Security in cloud computing is crucial to any company looking to keep
its applications and data protected from bad actors. Maintaining a strong
cloud security posture helps organizations achieve the now widely recognized
benefits of cloud computing. Cloud security comes with its own advantages
as well, helping you achieve lower upfront costs, reduced ongoing operational
and administrative costs, easier scaling, increased reliability and availability,
and improved DDoS protection.
Here are the top security benefits of cloud computing:
1. Lower upfront costs
One of the biggest advantages of using cloud computing is that you
don't need to pay for dedicated hardware. Not having to invest in dedicated
hardware helps you initially save a significant amount of moneyand can also
help you upgrade your security. CSPs will handle your security needs
proactively once you've hired them. This helps you save on costs and reduce
the risks associated with having to hire an internal security team to safeguard
dedicated hardware.
2. Reduced ongoing operational and administrative expenses
Cloud security can also lower your ongoing administrative and
operational expenses. A CSP will handle all your security needs for you,
removing the need to pay for staff to provide manual security updates and
configurations. You can also enjoy greater security, as the CSP will have
expert staff able to handle any of your security issues for you.

89
3. Increased reliability and availability
You need a secure way to immediately access your data. Cloud security
ensures your data and applications are readily available to authorized users.
You'll always have a reliable method to access your cloud applications and
information, helping you quickly take action on any potential security issues.
4. Centralized security
Cloud computing gives you a centralized location for data and
applications, with many endpoints and devices requiring security. Security
for cloud computing centrally manages all your applications, devices, and
data to ensure everything is protected. The centralized location allows cloud
security companies to more easily perform tasks, such as implementing
disaster recovery plans, streamlining network event monitoring, and
enhancing web filtering.
5. Greater ease of scaling
Cloud computing allows you to scale with new demands, providing more
applications and data storage whenever you need it. Cloud security easily
scales with your cloud computing services. When your needs change, the
centralized nature of cloud security allows you to easily integrate new
applications and other features without sacrificing your data's safety. Cloud
security can also scale during high traffic periods, providing more security
when you upgrade your cloud solution and scaling down when traffic
decreases.
6. Improved DDoS protection
Distributed Denial of Service (DDoS) attacks are some of the biggest
threats to cloud computing. These attacks aim a lot of traffic at servers at
once to cause harm. Cloud security protects your servers from these attacks
by monitoring and dispersing them.
What is a Privacy Impact Assessment (PIA)?
A Privacy Impact Assessment (PIA) is a type of impact assessment
conducted by an organization such as a government agency or corporation to
determine the impact that a new technology project, initiatives or proposed
programs and policies might have on the privacy of individuals. It sets out
recommendations for managing, minimizing, or eliminating that impact. A key
goal of the PIA is to effectively communicate the privacy risks of new
technology initiatives. It also provides decision-makers with the information
necessary to make informed policy decisions based on an understanding of
the privacy risks and the options available for mitigating those risks. PIA
provides benefits to various stakeholders, including the initiating organization
itself as well as its customers. PIA implementations can help build trust with
stakeholders and the user community by demonstrating due diligence and
compliance with privacy best practices. Privacy issues that are not adequately
addressed can impact the community’s trust in an organization, project, or
policy.
Key benefits include:
 Reduces future costs in legal expenses, damage to reputation, and
potential negative publicity, by considering privacy issues early in a
project.
 Demonstrates to employees, contractors, customers, citizens that your
organization is committed to protecting and upholding privacy rights.

90
 Provides a way to detect and mitigate privacy problems before they
occur to avoid costly or embarrassing privacy mistakes.
 Promotes and demonstrates awareness and understanding of privacy
issues within your organization.
 Provides evidence that an organization attempted to prevent privacy
risks
 Helps your organization gain public trust and confidence.
When to Undertake a Privacy Impact Assessment?
The very first step of the PIA process is to determine whether it is
required. If an organization discovers that there is the potential that a project
they are about to undertake has a high risk of impact on user privacy, it
should carry out a privacy impact assessment. This will ensure that privacy
risks and impacts that may be associated with the project are identified and
mitigated.
4 components of the privacy
There are 4 components of the privacy of PIA. The PIA process from the various
PIA frameworks and guidelines can be summarized in the following four core
components:
Project Initiation: The project initiation phase is the first phase of the PIA
process life cycle. It is at this stage that the actual scope of the PIA process is
determined and defined. If the details of the project are not yet clear at this
stage, the organization may choose to do a preliminary PIA to determine the
type of personal information involved in the project. The preliminary analysis
is important at the early stage of the project so that the resulting PIA process
can be considered in the project execution. Once the details of the project
become clearer and you discover that there is the potential that the project
has a high risk of impact on user privacy, you can then move on with a full
PIA.
Data Flow Analysis: The data flow analysis phase involves a description and
analysis of the detailed data flow, architecture, business processes, and
mapping out how the personal information flows through the organization as
a result of the intended technology implementation. The purpose of mapping
information flows is to describe how your project deals with personal
information. A clearly mapped information flow helps to identify privacy
issues in the PIA process. Diagrams and tables are usually used to depict the
flow of personal information, and the different types of personal information
to be used in the project.
Privacy Analysis: The privacy analysis phase is focused on the data flows in
the context of applicable privacy policies and legislation, and to expose gaps
that may lead to a breach of user privacy and public trust. Questionnaires are
usually used as a checklist that facilitates the identification of major privacy
risks and issues associated with the intended project. The questionnaires are
to be completed by the personnel involved with the movement of personal
information. This helps to gauge conformity with relevant privacy regulations
and other privacy best practices and to bring to the attention of the project
board or decision-makers any privacy issues directly or indirectly associated
with the project that may raise public concerns.
Privacy Impact Assessment Report: This is the final and most critical
component of the privacy impact assessment process. A privacy impact report

91
seeks to identify and document the privacy risks and the associated
implications of those risks along with a discussion of possible remedies or
mitigation plans. The report should be submitted to the project board or
project steering committee for review and approval. The approved PIA report
serves as an effective communications tool that demonstrates a commitment
to transparency and shows that the project has been designed with privacy in
mind.
Operating System Security
Protection refers to a mechanism that controls the access of programs,
processes, or users to the resources defined by a computer system. We can
take protection as a helper to multiprogramming operating systems so that
many users might safely share a common logical namespace such as a
directory or files.
Security can be attacked in the following ways:
 Authorization
 Browsing
 Trap doors
 Invalid Parameters
 Line Tapping
 Electronic Data Capture
 Lost Line
 Improper Access Controls
 Waste Recovery
 Rogue Software
Operating Systems Employ Security and Protection
Measures to prevent a person from illegally using resources in a
computer system, or interfering with them in any manner. These measures
ensure that data and programs are used only by authorized users and only
in a desired manner, and that they are neither modified nor denied to
authorized users. Security measures deal with threats to resources that
come from outside a computer system, while protection measures deal with
internal threats. Passwords are the principal security tool.
A password requirement thwarts attempts by unauthorized persons to
masquerade as legitimate users of a system. The confidentiality of passwords
is upheld by encryption. Computer users need to share data and programs
stored in files with collaborators, and here is where an operating system’s
protection measures come in.
Security measures guard a user’s data and programs against interference
from persons or programs outside the operating system; we broadly refer to
such persons and their programs as nonusers.
Buffer Overflow Technique
The buffer overflow technique can be employed to force a server
program to execute an intruder-supplied code to breach the host computer
system’s security. It has been used to a devastating effect in mail servers
and other Web servers.
How a buffer overflow can be used to launch a security attack?
1. The stack grows downward, i.e., toward smaller addresses in
memory. It looks as shown on the left before the currently
executing function calls the function sample.
92
2. The code of the calling function pushes a return address and two
parameters of sample onto the stack. Each of these occupies four
bytes.
3. The code of sample allocates the variable beta and other variables on
the stack. The stack now looks as shown on the right. Notice that the
start address of beta is at the low end of the memory allocated to it.
The end address of beta adjoins the last byte of the parameters.
4. The function sample copies 412 bytes into the variable beta. The first
408 bytes contain code whose execution would cause a security
violation. Bytes 409–412 contain the start address of this code. These
four bytes overwrite the return address in the stack.
5. The function sample executes a return statement. Control is
transferred to the address found in the stack entry that is expected to
contain the return address. Effectively, the code in variable beta is
invoked. It executes with the privileges of the calling function.
1. Authorization: It means verification of access to the system resources.
Intruders may guess or steal password and use it. Intruder may use a
vendor-supplied password, which is expected to use by system
administrator. It may find password by trial and error method. If the
user logs on and goes for a break then the intruder may use the
terminal. An intruder can write a dummy login program to fool user and
that program collects information for its use later on.
2. Authentication: Authentication is verification of a user’s identity.
Operating systems most often perform authentication by knowledge.
That is, a person claiming to be some user X is called upon to exhibit
some knowledge shared only between the OS and user X, such as a
password
3. Browsing: Files are very permissive so one can easily browse system
files. Due to that it may access database and confidential information
can be read.
4. Trap doors: Sometimes Software designers want to modify their
programs after installation. for that there are some secret entry points
which programmers keep and it does not require and permission . These
are called trap doors. Intrudes can use these trap doors.
5. Invalid Parameters: Due to invalid parameters some security violation
can take place.
6. Line Tapping: Tapings in the communication line can access or modify
confidential data.
7. Electronic data capture: Using wiretaps or mechanism to pick up
screen radiation and recognize what is displayed on screen is termed
electronic data capture.
8. Lost Line: In networking, the line way gets lost. In such case some o/s
log out and allow access only after correct identify of user. some o/s
cannot do this. So process will be floating and allow intruder to access
data.
9. Improper Access Controls: Some administrators may not plan about
all rights. So some users may have more access and some users have
very less access.

93
10. Waste Recovery: If the block is deleted its information will be as it is.
Until it is allocated to another file. Intruder may use some mechanism
to scan these blocks.
11. Rogue Software: Programs are written to create mischief. Some of the
programs under this are as follows:
Virtual Machine Security in Cloud
The term “Virtualized Security,” sometimes known as “security
virtualization,” describes security solutions that are software-based and
created to operate in a virtualized IT environment. This is distinct from
conventional hardware-based network security, which is static and is
supported by equipment like conventional switches, routers, and firewalls.
Virtualized security is flexible and adaptive, in contrast to hardware-based
security. It can be deployed anywhere on the network and is frequently
cloud-based so it is not bound to a specific device.
In Cloud Computing, where operators construct workloads and
applications on-demand, virtualized security enables security services and
functions to move around with those on-demand-created workloads. This is
crucial for virtual machine security. It’s crucial to protect virtualized security
in cloud computing technologies such as isolating multitenant setups in
public cloud settings. Because data and workloads move around a complex
ecosystem including several providers, virtualized security’s flexibility is
useful for securing hybrid and multi-cloud settings.
Types of Hypervisors
Type-1 Hypervisors
Its functions are on unmanaged systems. Type 1 hypervisors
include Lynx Secure, RTS Hypervisor, Oracle VM, Sun xVM Server, and
Virtual Logic VLX. Since they are placed on bare systems, type 1 hypervisor
do not have any host operating systems.
Type-2 Hypervisor
It is a software interface that simulates the hardware that a system
typically communicates with. Examples of Type 2 hypervisors
include containers, KVM, Microsoft Hyper V, VMWare Fusion, Virtual Server
2005 R2, Windows Virtual PC, and VMware workstation 6.0.
Type I Virtualization
In this design, the Virtual Machine Monitor (VMM) sits directly above
the hardware and eavesdrops on all interactions between the VMs and the
hardware. On top of the VMM is a management VM that handles other guest
VM management and handles the majority of a hardware connections. The
Xen system is a common illustration of this kind of virtualization design.
Type II virtualization
In these architectures, like VMware Player, allow for the operation of
the VMM as an application within the host operating system (OS). I/O drivers
and guest VM management are the responsibilities of the host OS.
Service Provider Security
The system’s virtualization hardware shouldn’t be physically
accessible to anyone not authorized. Each VM can be given an access control
that can only be established through the Hypervisor in order to safeguard it
against unwanted access by Cloud administrators. The three fundamental
tenets of access control, identity, authentication, and authorization, will

94
prevent unauthorized data and system components from being accessed by
administrators.
Hypervisor Security
The Hypervisor’s code integrity is protected via a technology called
Hyper safe. Securing the write-protected memory pages, expands the
hypervisor implementation and prohibits coding changes. By restricting
access to its code, it defends the Hypervisor from control-flow hijacking
threats. The only way to carry out a VM Escape assault is through a local
physical setting. Therefore, insider assaults must be prevented in the
physical Cloud environment. Additionally, the host OS and the interaction
between the guest machines need to be configured properly.
Virtual Machine Security
The administrator must set up a program or application that prevents
virtual machines from consuming additional resources without permission.
Additionally, a lightweight process that gathers logs from the VMs and
monitors them in real-time to repair any VM tampering must operate on a
Virtual Machine. Best security procedures must be used to harden the guest
OS and any running applications. These procedures include setting up
firewalls, host intrusion prevention systems (HIPS), anti-virus and anti-
spyware programmers, online application protection, and log monitoring in
guest operating systems.
Guest Image Security
A policy to control the creation, use, storage, and deletion of images
must be in place for organizations that use virtualization. To find viruses,
worms, spyware, and rootkits that hide from security software running in a
guest OS, image files must be analyzed.
Benefits of Virtualized Security
Virtualized security is now practically required to meet the intricate
security requirements of a virtualized network, and it is also more adaptable
and effective than traditional physical security.
Cost-Effectiveness: Cloud computing’s virtual machine security enables
businesses to keep their networks secure without having to significantly
raise their expenditures on pricey proprietary hardware. Usage-based
pricing for cloud-based virtualized security services can result in significant
savings for businesses that manage their resources effectively.
Flexibility: It is essential in a virtualized environment that security
operations can follow workloads wherever they go. A company is able to
profit fully from virtualization while simultaneously maintaining data
security thanks to the protection it offers across various data centers, in
multi-cloud, and hybrid-cloud environments.
Operational Efficiency: Virtualized security can be deployed more quickly
and easily than hardware-based security because it doesn’t require IT,
teams, to set up and configure several hardware appliances. Instead, they
may quickly scale security systems by setting them up using centralized
software. Security-related duties can be automated when security
technology is used, which frees up more time for IT employees.
Regulatory Compliance: Virtual machine security in cloud computing is a
requirement for enterprises that need to maintain regulatory compliance

95
because traditional hardware-based security is static and unable to keep up
with the demands of a virtualized network.
Virtualization Machine Security Challenges
In a cloud context, more recent assaults might be caused via VM
rootkits, hypervisor malware, or guest hopping and hijacking. Man-in-the-
middle attacks against VM migrations are another form of attack. Typically,
passwords or sensitive information are stolen during passive attacks. Active
attacks could alter the kernel’s data structures, seriously harming cloud
servers.
HIDS or NIDS are both types of IDSs. To supervise and check the execution
of code, use programmed shepherding. The RIO dynamic optimization
infrastructure, the v Safe and v Shield tools from VMware, security
compliance for hypervisors, and Intel vPro technology are some further
protective solutions.
 Four Steps to ensure VM Security in Cloud Computing
 Protect Hosted Elements by Segregation
To secure virtual machines in cloud computing, the first step is to
segregate the newly hosted components. Let’s take an example where three
features that are now running on an edge device may be placed in the cloud
either as part of a private subnetwork that is invisible or as part of the service
data plane, with addresses that are accessible to network users.
All Components are Tested and Reviewed
Before allowing virtual features and functions to be implemented, you
must confirm that they comply with security standards as step two of cloud-
virtual security. Virtual networking is subject to outside attacks, which can
be dangerous, but insider attacks can be disastrous. When a feature with a
backdoor security flaw is added to a service, it becomes a part of the
infrastructure of the service and is far more likely to have unprotected attack
paths to other infrastructure pieces.
Separate Management APIs to Protect the Network
The third step is to isolate service from infrastructure management
and orchestration. Because they are created to regulate features, functions,
and service behaviors, management APIs will always pose a significant risk.
All such APIs should be protected, but the ones that keep an eye on
infrastructure components that service users should never access must also
be protected.
Security Issues in Cloud Computing:
There is no doubt that Cloud Computing provides various Advantages
but there are also some security issues in cloud computing. Below are some
following Security Issues in Cloud Computing as follows.
Data Loss –
Data Loss is one of the issues faced in Cloud Computing. This is also known
as Data Leakage. As we know that our sensitive data is in the hands of
somebody else, and we don’t have full control over our database. So, if the
security of cloud service is to break by hackers then it may be possible that
hackers will get access to our sensitive data or personal files.
Interference of Hackers and Insecure API’s –
As we know, if we are talking about the cloud and its services it means we
are talking about the Internet. Also, we know that the easiest way to

96
communicate with Cloud is using API. So it is important to protect the
Interface’s and API’s which are used by an external user. But also in cloud
computing, few services are available in the public domain which are the
vulnerable part of Cloud Computing because it may be possible that these
services are accessed by some third parties.
User Account Hijacking –
Account Hijacking is the most serious security issue in Cloud Computing.
If somehow the Account of User or an Organization is hijacked by a hacker
then the hacker has full authority to perform Unauthorized Activities.
Changing Service Provider –
Vendor lock-In is also an important Security issue in Cloud Computing.
Many organizations will face different problems while shifting from one
vendor to another. For example, An Organization wants to shift from AWS
Cloud to Google Cloud Services then they face various problems like shifting
of all data, also both cloud services have different techniques and functions,
so they also face problems regarding that.
Lack of Skill –
While working, shifting to another service provider, need an extra feature,
how to use a feature, etc. are the main problems caused in IT Company who
doesn’t have skilled Employees. So it requires a skilled person to work with
Cloud Computing.
Denial of Service (DoS) attack –
This type of attack occurs when the system receives too much traffic. Mostly
DoS attacks occur in large organizations such as the banking sector,
government sector, etc. When a DoS attack occurs, data is lost. So, in order
to recover data, it requires a great amount of money as well as time to handle
it.
Data Breach
Data Breach is the process in which the confidential data is viewed, accessed,
or stolen by the third party without any authorization, so organization's data
is hacked by the hackers.
Vendor lock-in
Vendor lock-in is the of the biggest security risks in cloud computing.
Organizations may face problems when transferring their services from one
vendor to another. As different vendors provide different platforms, that can
cause difficulty moving one cloud to another.
Increased complexity strains IT staff
Migrating, integrating, and operating the cloud services is complex for the IT
staff. IT staff must require the extra capability and skills to manage, integrate,
and maintain the data to the cloud.
Spectre & Meltdown
Spectre & Meltdown allows programs to view and steal data which is currently
processed on computer. It can run on personal computers, mobile devices,
and in the cloud. It can store the password, your personal information such
as images, emails, and business documents in the memory of other running
programs.
Denial of Service (DoS) attacks
Denial of service (DoS) attacks occur when the system receives too much
traffic to buffer the server. Mostly, DoS attackers target web servers of large

97
organizations such as banking sectors, media companies, and government
organizations. To recover the lost data, DoS attackers charge a great deal of
time and money to handle the data.
Account hijacking
Account hijacking is a serious security risk in cloud computing. It is the
process in which individual user's or organization's cloud account (bank
account, e-mail account, and social media account) is stolen by hackers. The
hackers use the stolen account to perform unauthorized activities.
Difference between Cloud Computing and Distributed Computing:
S.No. CLOUD COMPUTING DISTRIBUTED COMPUTING

Cloud computing refers to providing on Distributed computing refers to solve a


demand IT resources/services like problem over distributed autonomous
server, storage, database, networking, computers and they communicate
01. analytics, software etc. over internet. between them over a network.

In simple distributed computing can be


In simple cloud computing can be said said as a computing technique which
as a computing technique that delivers allows to multiple computers to
hosted services over the internet to its communicate and work to solve a single
02. users/customers. problem.

It is classified into 3 different types such


It is classified into 4 different types such as Distributed Computing Systems,
as Public Cloud, Private Cloud, Distributed Information Systems and
03. Community Cloud and Hybrid Cloud. Distributed Pervasive Systems.

There are many benefits of cloud


computing like cost effective, elasticity There are many benefits of distributed
and reliable, economies of Scale, access computing like flexibility, reliability,
04. to the global market etc. improved performance etc.

Distributed computing helps to achieve


Cloud computing provides services computational tasks more faster than
such as hardware, software, networking using a single computer as it takes a lot of
05. resources through internet. time.

The goal of distributed computing is to


The goal of cloud computing is to distribute a single task among multiple
provide on demand computing services computers and to solve it quickly by
06. over internet on pay per use model. maintaining coordination between them.

Some characteristics of distributed


Some characteristics of cloud computing are distributing a single task
computing are providing shared pool of among computers to progress the work at
configurable computing resources, on- same time, Remote Procedure calls and
demand service, pay per use, Remote Method Invocation for distributed
07. provisioned by the Service Providers etc. computations.

Some disadvantage of cloud computing


includes less control especially in the Some disadvantage of distributed
case of public clouds, restrictions on computing includes chances of failure of
available services may be faced and nodes, slow network may create problem
08. cloud security. in communication.

98

You might also like