KEMBAR78
Lecture 1 - 8 Introduction | PDF | Computer Cluster | Apache Hadoop
0% found this document useful (0 votes)
4 views37 pages

Lecture 1 - 8 Introduction

The document outlines an introductory lecture on cloud computing, covering its definition, architecture, service models, and historical evolution. It discusses distributed computing, its motivations, properties, and applications, including examples like SETI@Home and Hadoop. Additionally, it highlights the benefits of HPC clusters, utility computing, and the importance of grid computing in scientific collaboration and resource sharing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views37 pages

Lecture 1 - 8 Introduction

The document outlines an introductory lecture on cloud computing, covering its definition, architecture, service models, and historical evolution. It discusses distributed computing, its motivations, properties, and applications, including examples like SETI@Home and Hadoop. Additionally, it highlights the benefits of HPC clusters, utility computing, and the importance of grid computing in scientific collaboration and resource sharing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Lecture One

Introduction to the Course and to Cloud Computing

ODM&C Introduction
Introduction and course overview

What is the Cloud

Virtualization and Containers

What is the Cloud Computing architecture and service model

Data Management in the Cloud

Cloud Security and economy

...and examples based on public clouds and scientific use cases.

ODM&C Introduction
Cloud Computing: an historical view

Cloud Computing concept is the result of the evolution of the computing


concept driven by the technology improvements and by users
requirements.

Cloud Computing
HPC HTC
Utility computing

Grid Computing Cluster computing


Distributed computing

Centralized Computing (mainframe)

ODM&C Introduction
What is computing

“Computing is the process of using computer technology to complete


a given goal-oriented task. [...] Computing may encompass the design
and development of software and hardware systems for a broad
range of purposes”

(Association of Computing Machinery, 2005)

ODM&C Introduction
Computing as a “numerical laboratory”

Each scientific instrument is critically dependent on computing for


sensor control, data processing, international collaboration, and
access.

Computational modeling and data analytics are applicable to all


areas of science and engineering

Capture and analyze the torrent of experimental data being


produced by a new generation of scientific instruments

ODM&C Introduction
Distributed Computing

From a single computer to a “network” of collaborating systems.


“A distributed system is a collection of autonomous computers that are
interconnected with each other and cooperate, thereby sharing resources such as
printers and databases” (C. Leopold)

We first introduce the role of the network as a glue of multiple resources

ODM&C Introduction
Distributed Computing Motivations

Some applications are inherently distributed problems (they are solved


most easily using the means of distributed computing)
Computing intensive problems where communications is limited (High
Throughput Computing)
Data Intensive problems: computing task deal with a large amount or large
size of data.
Distributed computing allows for “scavenging.” By integrating the computers
into a distributed system, the excess computing power can be made available
to other users or applications (e.g. Condor)
Robustness: no single point of failure.
more….

ODM&C Introduction
Distributed computing properties

Fault tolerance
if a node fails the whole system still work
each node play a partial role (partial inputs and outputs)
check node status
Resource sharing
Load Sharing and balance
to distribute computing on different nodes to share loading to the whole
system
Easy to expand (scalability)
Improve performance

ODM&C Introduction
Distributed computing Architecture
“interconnect processes running on different CPUs with some sort of
communication system.”

client-server: resource
management centralized at a
server
3-Tier architecture: move the
client intelligence to a middle tier
to simplify application
development.
Peer-to-Peer: responsibilities are
uniformly divided among all
machines, known as peers that
serves both as client and servers

ODM&C Introduction
Distributed Applications

“A distributed application is software that is executed or run on multiple


computers within a network. These applications interact in order to
achieve a specific goal or task.”

ODM&C Introduction
Examples of Distributed Systems

SETI@Home, Folding@Home…
Peer-to-Peer networks
High Availability Systems
Distributed databases
High Throughput Computing
Parallel Computing

...even the World Wide Web is a distributed system.

ODM&C Introduction
BigData distributed computing and Hadoop

Apache Hadoop system implements a MapReduce model for data


analytics
A distributed file system (HDFS) manages large numbers of large files,
distributed (with block replication) across the storage of multiple resources
Tools for high-level programming model for the two-phase MapReduce
model (e.g. PIG)
Can be coupled with streaming data (Storm and Flume), graph (Giraph),
and relational data (Sqoop) support, tools (such as Mahout) for
classification, recommendation, and prediction via supervised and
unsupervised learning.

ODM&C Introduction
Example of MapReduce

https://www.guru99.com/introduction-to-mapreduce.html
ODM&C Introduction
Distributed applications in Astronomy using
Hadoop

Hierarchical Equal Area iso-Latitude Pixelization (HEALPix).

ODM&C Introduction
ODM&C Introduction
ODM&C Introduction
Cluster Computing and HPC

A computer cluster is a group of linked computers, working together


closely so that in many respects they form a single computer. The
components of a cluster are commonly, but not always, connected to each
other through fast local area networks.
Clusters are usually deployed to improve performance and/or
availability over that provided by a single computer, while typically being
much more cost-effective than single computers of comparable speed or
availability.

ODM&C Introduction
Cluster Classification

ODM&C Introduction
HPC clusters components

Login node
Controller node
Computing Nodes
Parallel filesystem
System software

ODM&C Introduction
Benefit of HPC clusters

Cost-effective
Much cheaper than a super-computer with the same amount of
computing power!
When the supercomputer crashes, everything crashes, when a
single/few nodes in HPC fail, cluster continues to function.
Highly scalable
Multi-user shared environment: not everyone needs all the computing
power all the time.
higher utilization: can accommodate variety of workloads (#CPUs,
memory etc), at the same time.
Can be expanded, partitioned or shrunk, as needed.

ODM&C Introduction
HPC clusters today

HPC clusters are heterogeneous environments where the computing


power is given by CPU and Accelerators

FUGAKU - 48 cores Armv8.2


2.2GHz System, Tofu
Interconnect, 158976 nodes,
7299072 cores, 415530 TFlop/s

ODM&C Introduction
High Performance Data Analysis

The ability of increasingly powerful HPC systems to run data-intensive


problems at larger scale, at higher resolution, and with more elements (e.g.,
inclusion of the carbon cycle in climate ensemble models)
The proliferation of larger, more complex scientific instruments and sensor
networks, from "smart" power grids to the Large Hadron Collider and Square
Kilometer Array.
The growth of stochastic modeling, parametric modeling and other iterative
problem-solving methods, whose cumulative results produce large data volumes.
The availability of newer advanced analytics methods and tools:
MapReduce/Hadoop, graph analytics (NVIDIA IndeX), semantic analysis,
knowledge discovery algorithms (IBM Watson), COMPS and pyCOMS, and more
The escalating need to perform advanced analytics in near-real time a need that
is causing a new wave of commercial firms to adopt HPC for the first time

ODM&C Introduction
HPC Clusters usage

Complexity. HPC technology allows scientist to aim more complex,


intelligent questions at their data infrastructures.
Time to value. Science faces ever-shortening innovation and production
cycles. Analytics (including Hadoop and Spark) is moving from batch
processing toward low-latency, interactive capabilities.
Variability. “deep” vs “Wide”
“large amount of data” vs “many variables”

ODM&C Introduction
How to use a cluster

Batch systems: a job is executed via a scheduler that optimize the


use of cluster resources
Applications must be executed using a shell script to load the proper
application environment (libraries, paths, tools)
Not suitable for interactive jobs (not completely true we can use so
tricks to make it works)
Batch is organized in queue with a limited computing time (application
snapshot and restart capabilities)
Complex filesystem structure: home, scratch, data

ODM&C Introduction
Grid Computing

“a single seamless computational environment in which cycles, communication,


and data are shared, and in which the workstation across the continent is no less
than one down the hall"

“wide-area environment that transparently consists of workstations, personal


computers, graphic rendering engines, supercomputers and non-traditional
devices: e.g., TVs, toasters, etc."

“[framework for] flexible, secure, coordinated resource sharing among dynamic


collections of individuals, institutions, and resources"

“collection of geographically separated resources (people, computers,


instruments, databases) connected by a high speed network [...distinguished by...]
a software layer, often called middleware, which transforms a collection of
independent resources into a single, coherent, virtual machine”

ODM&C Introduction
Why do we need Grid computing?

Going further in scientific knowledge

New high sensitivity sensors and instruments

Globally distributed collaborations

Delocalized knowledge

Scientific and technical knowledge is “distributed”

Laboratories are distributed

Scientific data are distributed

Exploiting under utilized resources.

ODM&C Introduction
Virtual Organizations

ODM&C Introduction
Grid Concepts

ODM&C Introduction
Grid Middleware

It’s the software layer that glue all the resources


Everything that lies between the OS and the application

ODM&C Introduction
Examples of Grid Computing

Globus alliance (Globus Toolkit)


gLite (EGEE middleware)
Unicore (DE)
GridBus
GRIA

LHC data has been distributed on a tiered architecture based on LHC


Computational Grid (gLite) and processed using the LHC Grid.

ODM&C Introduction
Grid Limitations

Very Rigid environment: all the resources must be installed,

maintained and monitored homogeneously.

Useful for applications that requires an HTC environment, but a high

level of complexity is introduced to use it efficiently

Licensing problems across different domains

Implementation limits due to the middleware used.

Political challenges associated to resource sharing

ODM&C Introduction
Utility Computing

It is a theoretical concept, and CC implements this concept in practice

“It is a service provisioning model in which a service provider makes


computing resources and infrastructure available to customers and
charges them for specific usage rather than a flat rate” (on-demand)

Low or no initial cost to get a resource (the resource is essentially


rented)

Pay-per-use model
maximize the efficient use of resources minimizing costs

ODM&C Introduction
Main Concept of Utility Computing

1. Pay-per-use Pricing Business Model

2. Optimize resource utilization

3. Outsourcing

4. “infinite resource availability”

5. Access to applications or libraries

6. Automation

ODM&C Introduction
Utility Computing

The principle of utility computing is very simple: One company pays


another company for servicing. The services include software rental, data
storage space, use of applications or access to computer processing
power. It all depends on what the client wants and what the company can
offer.

ODM&C Introduction
Utility Computing in practice

Data backup,
Data Security
Partners competences
Defining a SLA
Getting Value from charge back

ODM&C Introduction
Computing evolution

ODM&C Introduction
End of lecture 1

Next time we will see the basic of CC and CC architecture.

ODM&C Introduction

You might also like