KEMBAR78
Daftar
Login
Intro to HPCA Computer Architecture notes | PDF
Download free for 30 days
Sign in
Upload
Language (EN)
Support
Business
Mobile
Social Media
Marketing
Technology
Art & Photos
Career
Design
Education
Presentations & Public Speaking
Government & Nonprofit
Healthcare
Internet
Law
Leadership & Management
Automotive
Engineering
Software
Recruiting & HR
Retail
Sales
Services
Science
Small Business & Entrepreneurship
Food
Environment
Economy & Finance
Data & Analytics
Investor Relations
Sports
Spiritual
News & Politics
Travel
Self Improvement
Real Estate
Entertainment & Humor
Health & Medicine
Devices & Hardware
Lifestyle
Change Language
Language
English
Español
Português
Français
Deutsche
Cancel
Save
Submit search
EN
Uploaded by
teleyob985
2 views
Intro to HPCA Computer Architecture notes
Notes on HPCA MTech
Engineering
◦
Read more
0
Save
Share
Embed
Download
Download to read offline
1
/ 44
2
/ 44
3
/ 44
4
/ 44
5
/ 44
6
/ 44
7
/ 44
8
/ 44
9
/ 44
10
/ 44
11
/ 44
12
/ 44
13
/ 44
14
/ 44
15
/ 44
16
/ 44
17
/ 44
18
/ 44
19
/ 44
20
/ 44
21
/ 44
22
/ 44
23
/ 44
24
/ 44
25
/ 44
26
/ 44
27
/ 44
28
/ 44
29
/ 44
30
/ 44
31
/ 44
32
/ 44
33
/ 44
34
/ 44
35
/ 44
36
/ 44
37
/ 44
38
/ 44
39
/ 44
40
/ 44
41
/ 44
42
/ 44
43
/ 44
44
/ 44
More Related Content
PDF
Introduction to Parallel Computing
by
Akhila Prabhakaran
PPT
parallel computing.ppt
by
ssuser413a98
PPTX
Parallel Computing-Part-1.pptx
by
krnaween
PPT
Parallel Processing Concepts
by
Dr Shashikant Athawale
PPTX
20090720 smith
by
Michael Karpov
PDF
Cloud Computing-Parallel computing-unit-i
by
vijayabhargavi11
PPT
Parallel Computing 2007: Overview
by
Geoffrey Fox
PPT
Parallel Computing
by
Ameya Waghmare
Introduction to Parallel Computing
by
Akhila Prabhakaran
parallel computing.ppt
by
ssuser413a98
Parallel Computing-Part-1.pptx
by
krnaween
Parallel Processing Concepts
by
Dr Shashikant Athawale
20090720 smith
by
Michael Karpov
Cloud Computing-Parallel computing-unit-i
by
vijayabhargavi11
Parallel Computing 2007: Overview
by
Geoffrey Fox
Parallel Computing
by
Ameya Waghmare
Similar to Intro to HPCA Computer Architecture notes
PPT
Parallel Programming Primer
by
Sri Prasanna
PPT
PMSCS 657_Parallel and Distributed processing
by
Md. Mashiur Rahman
PPTX
Introduction to Parallel Computing
by
Roshan Karunarathna
PPT
Parallel Programming Primer 1
by
mobius.cn
PPTX
Parallel computing and its applications
by
Burhan Ahmed
PPTX
lecture2-moreaboutparallelcomputing-190108134115.pptx
by
MAHERMOHAMED27
PPTX
Lecture 1 Introduction Parallel Computing.pptx
by
m97579656
PPTX
Chap 1(one) general introduction
by
Malobe Lottin Cyrille Marcel
PPT
Lecture1
by
tt_aljobory
PDF
A REVIEW ON PARALLEL COMPUTING
by
Amy Roman
PPTX
Lecture 04 Chapter 1 - Introduction to Parallel Computing
by
National College of Business Administration & Economics ( NCBA&E)
PPT
Introduction to parallel_computing
by
Mehul Patel
PPTX
Assignment-1 Updated Version advanced comp.pptx
by
ErickWasonga2
PDF
Lecture 2 more about parallel computing
by
Vajira Thambawita
PPTX
Chapter 02 - Asynchronous and Parallel Programming in .NET.pptx
by
TrnHuy921814
PPT
01-MessagePassingFundamentals.ppt
by
HarshitPal37
PPT
Detailed Parallel Computing Explained.ppt
by
mughairduet18
PPTX
Asynchronous and Parallel Programming in .NET
by
ssusere19c741
PPTX
distributed system lab materials about ad
by
milkesa13
PPT
Parallel computing
by
Vinay Gupta
Parallel Programming Primer
by
Sri Prasanna
PMSCS 657_Parallel and Distributed processing
by
Md. Mashiur Rahman
Introduction to Parallel Computing
by
Roshan Karunarathna
Parallel Programming Primer 1
by
mobius.cn
Parallel computing and its applications
by
Burhan Ahmed
lecture2-moreaboutparallelcomputing-190108134115.pptx
by
MAHERMOHAMED27
Lecture 1 Introduction Parallel Computing.pptx
by
m97579656
Chap 1(one) general introduction
by
Malobe Lottin Cyrille Marcel
Lecture1
by
tt_aljobory
A REVIEW ON PARALLEL COMPUTING
by
Amy Roman
Lecture 04 Chapter 1 - Introduction to Parallel Computing
by
National College of Business Administration & Economics ( NCBA&E)
Introduction to parallel_computing
by
Mehul Patel
Assignment-1 Updated Version advanced comp.pptx
by
ErickWasonga2
Lecture 2 more about parallel computing
by
Vajira Thambawita
Chapter 02 - Asynchronous and Parallel Programming in .NET.pptx
by
TrnHuy921814
01-MessagePassingFundamentals.ppt
by
HarshitPal37
Detailed Parallel Computing Explained.ppt
by
mughairduet18
Asynchronous and Parallel Programming in .NET
by
ssusere19c741
distributed system lab materials about ad
by
milkesa13
Parallel computing
by
Vinay Gupta
Recently uploaded
PPTX
Data types and datatype conversions of python programming.pptx
by
AmyPrasannaTella1
PDF
Vertical Roller Mill more detail inside function
by
agungcemindo
PDF
SE APPIA LIFE VOL 2-3GR for service manual.pdf
by
troneelektrik
PDF
alireza payravi-resume english 1404-07-24.pdf
by
Alireza Payravi
PDF
How Are Learning-Based Methods Reshaping Trajectory Planning in Autonomous D...
by
imagnejane
PDF
TOPIC 1.2 THE CONSTRUCTION PROJECT BY KJFF
by
KHEN49
PDF
Energias renovables estudio energias.pdf
by
CarlosPrezBuelga
PDF
AppiaII-Manual-2-3Gr for service technician.pdf
by
troneelektrik
PDF
How Are Learning-Based Methods Reshaping Trajectory Planning in Autonomous D...
by
imagnejane
PPTX
Introduction of gdg and info session on study jam.pptx
by
DeepakkumarSingh415123
PPT
Basic electronics concepts like what is resistor , inductor capacitor
by
ibrahimshaikh112026
PPTX
Build an UWB Indoor Positioning System using ESP32 and Qorvo DWM3000
by
CircuitDigest
PPT
23 EL PGS Sec-I (Shared).ppt power generation systems
by
Mehran university of engineering technology Pakistan
PDF
An architecture to build high performance infrastructures on cloud computing ...
by
TELKOMNIKA JOURNAL
PDF
Overcoming QoS Challenges in a Full Automotive Ethernet Architecture
by
RealTime-at-Work (RTaW)
PDF
(36-50)ANCIENT INGENUITY A REAPPRAISAL OF THE ARCHITECTURAL (3 files merged).pdf
by
Dharmsinh Desai of University
PPTX
AUV DESIGN and DEVELOPMENT FOR DEEP SEA MINING ( POLYMETALLIC NODULES) in MRC...
by
Arijit Biswas
PDF
Replacing react with hotwire - Scot Ruby September 2025
by
pythonandchips
PDF
Somnath Mukherjee_BIM Specialist_Portfolio.pdf
by
SomnathMukherjee980757
PDF
Ingenieria offshore energias renovables.pdf
by
CarlosPrezBuelga
Data types and datatype conversions of python programming.pptx
by
AmyPrasannaTella1
Vertical Roller Mill more detail inside function
by
agungcemindo
SE APPIA LIFE VOL 2-3GR for service manual.pdf
by
troneelektrik
alireza payravi-resume english 1404-07-24.pdf
by
Alireza Payravi
How Are Learning-Based Methods Reshaping Trajectory Planning in Autonomous D...
by
imagnejane
TOPIC 1.2 THE CONSTRUCTION PROJECT BY KJFF
by
KHEN49
Energias renovables estudio energias.pdf
by
CarlosPrezBuelga
AppiaII-Manual-2-3Gr for service technician.pdf
by
troneelektrik
How Are Learning-Based Methods Reshaping Trajectory Planning in Autonomous D...
by
imagnejane
Introduction of gdg and info session on study jam.pptx
by
DeepakkumarSingh415123
Basic electronics concepts like what is resistor , inductor capacitor
by
ibrahimshaikh112026
Build an UWB Indoor Positioning System using ESP32 and Qorvo DWM3000
by
CircuitDigest
23 EL PGS Sec-I (Shared).ppt power generation systems
by
Mehran university of engineering technology Pakistan
An architecture to build high performance infrastructures on cloud computing ...
by
TELKOMNIKA JOURNAL
Overcoming QoS Challenges in a Full Automotive Ethernet Architecture
by
RealTime-at-Work (RTaW)
(36-50)ANCIENT INGENUITY A REAPPRAISAL OF THE ARCHITECTURAL (3 files merged).pdf
by
Dharmsinh Desai of University
AUV DESIGN and DEVELOPMENT FOR DEEP SEA MINING ( POLYMETALLIC NODULES) in MRC...
by
Arijit Biswas
Replacing react with hotwire - Scot Ruby September 2025
by
pythonandchips
Somnath Mukherjee_BIM Specialist_Portfolio.pdf
by
SomnathMukherjee980757
Ingenieria offshore energias renovables.pdf
by
CarlosPrezBuelga
Intro to HPCA Computer Architecture notes
1.
Centre for Development
of Advanced Computing An introduction to High Performance Computing and its Applications Ashish P. Kuvelkar Senior Director (HPC- Tech) C-DAC, Pune
2.
© Centre for
Development of Advanced Computing Outline • Introduction to HPC • Architecting a HPC system • Approach to Parallelization • Parallelization Paradigm • Applications in area of Science and Engineering
3.
© Centre for
Development of Advanced Computing What is a HPC? High Performance Computing • Set of Computing technologies for very fast numeric simulation, modeling and data processing • Employed for specialised applications that require lot of mathematical calculations • Using computer power to execute a few applications extremely fast
4.
© Centre for
Development of Advanced Computing (C) 2001, C-DAC What is HPC?(continued) Definition 1 • High Performance Computing (HPC) is the use of parallel processing for running advanced application programs efficiently, reliably and quickly. • A supercomputer is a system that performs at or near the currently highest operational rate for computers. Definition 2 (Wikipedia) • High Performance Computing (HPC) uses Supercomputers and Computer Clusters to solve advanced computation problems.
5.
© Centre for
Development of Advanced Computing Evolution of Supercomputers • Supercomputer in the 1980s and 90s • Custom-built computer systems • Very expensive • Supercomputer after 1990s • Build using commodity off-the-shelf” components • Uses cluster computing techniques
6.
© Centre for
Development of Advanced Computing Supercomputers Cray Supercomputer PARAM Yuva II
7.
© Centre for
Development of Advanced Computing Switch Fabric Compute Nodes Parallel File System Tape Library/ Backup storage HSM/ Backup Server Login Nodes Accelerated Compute Nodes Storage Acceleration Networking Gateway Primary Interconnect Boot Servers/ Management Nodes 1GbE for administration Local Network Components of Cluster
8.
© Centre for
Development of Advanced Computing HPC Software Stack
9.
© Centre for
Development of Advanced Computing Single CPU Systems • Can run a single stream of code • Performance can be improvement through • Increasing ALU width • Increasing clock frequency • Making use of pipelining • Improved compilers • But still, there is a limit to each of these techniques • Parallel computing, provides relief
10.
© Centre for
Development of Advanced Computing Why use Parallel Computing? • Overcome limitations of single CPU systems • Sequential systems are slow • Calculations make take days, weeks, years • More CPUs can get job done faster • Sequential systems are small • Data set may not fit in memory • More CPUs can give access to more memory • So, the advantages are • Save time • Solve bigger problems
11.
© Centre for
Development of Advanced Computing Single Processor Parallelism • Instruction level Parallelism is achieved through • Pipelining • Superscaler implementation • Multicore architecture • Using advanced extensions
12.
© Centre for
Development of Advanced Computing Pipelined Processors • A new instruction enters every clock • Instruction parallelism = No. of pipeline stages Diagram Souce: Quora
13.
© Centre for
Development of Advanced Computing 13 Superscaler Cache/ Memory Fetch Unit E U E U E U Register File Decode/ issue Unit Multiple Instructions • Multiple execution units • Sequential instructions, multiple issue
14.
© Centre for
Development of Advanced Computing Multicore Processor • Single computing component with two or more independent processing units • Each unit is called cores, which read and execute program instructions Source: Wikipedia.
15.
© Centre for
Development of Advanced Computing Advanced Vector eXtensions • Useful for algorithms that can take advantage of SIMD • AVX were introduced by Intel and AMD in x86 • Using AVX-512, applications can pack • 32 double precision or 64 single precision floating point operations or • eight 64-bit and sixteen 32-bit integers • Accelerates performance for workloads such as • Scientific simulations, artificial intelligence (AI)/deep learning, image and audio/video processing
16.
Centre for Development
of Advanced Computing Parallelization Approach
17.
© Centre for
Development of Advanced Computing Means of achieving parallelism • Implicit Parallelism • Done by the compiler and runtime system • Explicit Parallelism • Done by the programmer
18.
© Centre for
Development of Advanced Computing Implicit Parallelism • Parallelism is exploited implicitly by the compiler and runtime system • Automatically detects potential parallelism in the program • Assigns the tasks for parallel execution • Controls and synchronizes execution (+) Frees the programmer from the details of parallel execution (+) it is a more general and flexible solution (-) very hard to achieve an efficient solution for many applications
19.
© Centre for
Development of Advanced Computing Explicit Parallelism • It is the programmer who has to • Annotate the tasks for parallel execution • Assign tasks to processors • Control the execution and the synchronization points (+) Experienced programmers achieve very efficient solutions for specific problems (-) programmers are responsible for all details (-) programmers must have deep knowledge of the computer architecture to achieve maximum performance.
20.
© Centre for
Development of Advanced Computing Explicit Parallel Programming Models Two dominant parallel programming models • Shared-variable model • Message-passing model
21.
© Centre for
Development of Advanced Computing • Uses the concept of single address space • Typically SMP architecture is used • Scalability is not good (Contd…) Shared Memory Model
22.
© Centre for
Development of Advanced Computing Shared Memory Model • Multiple threads operate independently but share same memory resources • Data is not explicitly allocated • Changes in a memory location effected by one process is visible to all other processes • Communication is implicit • Synchronization is explicit
23.
© Centre for
Development of Advanced Computing Advantages & Disadvantages of Shared Memory Model Advantages : • Data sharing between threads is fast and uniform • Global address space provides user friendly programming Disadvantages : • Lack of scalability between memory and CPUs • Programmer is responsible for specifying synchronization, e.g. locks • Expensive
24.
© Centre for
Development of Advanced Computing Message Passing Model
25.
© Centre for
Development of Advanced Computing Characteristics of Message Passing Model • Asynchronous parallelism • Separate address spaces • Explicit interaction • Explicit allocation by user
26.
© Centre for
Development of Advanced Computing How Message Passing Model Works • A parallel computation consists of a number of processes • Each process has purely local variables • No mechanism for any process to directly access memory of another • Sharing of data among processes is done by explicitly message passing • Data transfer requires cooperative operations by each process
27.
© Centre for
Development of Advanced Computing Usefulness of Message Passing Model • Extremely general model • Essentially, any type of parallel computation can be cast in the message passing form • Can be implemented on wide variety of platforms, from networks of workstations to even single processor machines • Generally allows more control over data location and flow within a parallel application than in, for example the shared memory model • Good scalability
28.
Centre for Development
of Advanced Computing Parallelization Paradigms
29.
© Centre for
Development of Advanced Computing Ideal Situation !!! • Each Processor has a Unique work to do • Communication among processes is largely unnecessary • All processes do equal work
30.
© Centre for
Development of Advanced Computing Writing parallel codes • Distribute the data to memories • Distribute the code to processors • Organize and synchronize the workflow • Optimize the resource requirements by means of efficient algorithms and coding techniques
31.
© Centre for
Development of Advanced Computing Parallel Algorithm Paradigms • Phase parallel • Divide and conquer • Pipeline • Process farm • Domain Decomposition
32.
© Centre for
Development of Advanced Computing o The parallel program consists of a number of super steps, and each has two phases. o In a computation phase, multiple processes each perform an independent computation. o In interaction phase, the processes perform one or more synchronous interaction operations, such as a barrier or a blocking communication. Phase Parallel Model
33.
© Centre for
Development of Advanced Computing o A parent process divides its workload into several smaller pieces and assigns them to a number of child processes. o The child processes then compute their workload in parallel and the results are merged by the parent. o This paradigm is very natural for computations such as quick sort. Divide and Conquer model
34.
© Centre for
Development of Advanced Computing o In pipeline paradigm, a number of processes form a virtual pipeline. o A continuous data stream is fed into the pipeline, and the processes execute at different pipeline stages simultaneously. Data Stream Pipeline Model
35.
© Centre for
Development of Advanced Computing o Also known as the master- worker paradigm. o A master process executes the essentially sequential part of the parallel program o It spawns a number of worker processes to execute the parallel workload. o When a worker finishes its workload, it informs the master which assigns a new workload to the slave. o The coordination is done by the master. Master Worker Worker Worker Process Farm Model
36.
© Centre for
Development of Advanced Computing Program 1 Domain n threads n sub-domains Program This methods solve a boundary value problem by splitting it into smaller boundary value problems on subdomains and iterating to coordinate the solution between adjacent subdomains. Domain Decomposition
37.
© Centre for
Development of Advanced Computing Desirable Attributes for Parallel Algorithms • Concurrency • Ability to perform many actions simultaneously • Scalability • Resilience to increasing processor counts • Data Locality • High ratio of local memory accesses to remote memory accesses (through communication) • Modularity: • Decomposition of complex entities into simpler components
38.
© Centre for
Development of Advanced Computing Massive processing power introduces I/O challenge • Getting data to and from the processing units can take as long as the processing itself • Requires careful software design and deep understanding of algorithms and architecture of Processors (Cache effects, memory bandwidth) GPU accelerators Interconnects (Ethernet, IB, 10 Gigabit Ethernet), Storage (local disks, NFS, parallel file systems) 4 cores
39.
Centre for Development
of Advanced Computing Application Areas of HPC in Science & Engineering
40.
© Centre for
Development of Advanced Computing HPC in Science Space Science • Applications in Astrophysics and Astronomy Earth Science • Applications in understanding Physical Properties of Geological Structures, Water Resource Modelling, Seismic Exploration Atmospheric Science • Applications in Climate and Weather Forecasting, Air Quality
41.
© Centre for
Development of Advanced Computing HPC in Science Life Science • Applications in Drug Designing, Genome Sequencing, Protein Folding Nuclear Science • Applications in Nuclear Power, Nuclear Medicine (cancer etc.), Defence Nano Science • Applications in Semiconductor Physics, Microfabrication, Molecular Biology, Exploration of New Materials
42.
© Centre for
Development of Advanced Computing HPC in Engineering Crash Simulation • Applications in Automobile and Mechanical Engineering Aerodynamics Simulation & Aircraft Designing • Applications in Aeronautics and Mechanical Engineering Structural Analysis • Applications in Civil Engineering and Architecture
43.
© Centre for
Development of Advanced Computing Multimedia and Animation DreamWorks Animation SKG produces all its animated movies using HPC graphic technology Graphical Animation Application in Multimedia and Animation
44.
Centre for Development
of Advanced Computing Thank You ashishk@cdac.in
Download