Multicore Architectures
Multicore Architectures L,T,P,J,C
Subject Code:
2,0,0,4,3
Preamble This course is to provide knowledge on multicore architectures that
lays the foundation for the development of High Performance
Applications through OpenMP, CUDA parallel programming
platforms. It enables to analyse the performance of HPC applications
using various profiling tools
Objective of the course
To provide knowledge on basics of Multicore architectures
To understand concepts of parallel computers and its programming
models
To design and develop parallel programs
To practice parallel programming using OpenMP, CUDA parallel
programming platforms
To apply program optimizations on parallel programs
To analyse the performance using profiling tools
To explore various contemporary tools and recent trends in field of
multicore architectures
Expected Outcome After successfully completing the course the student should be able to
1) Describe various parallel programming models
2) Design and develop High Performance Applications using
contemporary tools
3) Improve performance of applications through program
optimizations
4) Analyse performance of parallel applications
Student Learning 2. Having a clear understanding of the subject related concepts and of
Outcome contemporary issues
11. Having interest in lifelong learning
14. Having an ability to design and conduct experiments, as well as to
analyze and interpret data
Module Topics L Hrs SLO
1 Introduction to Multi-Core Architectures
Evolution of multicores through Moor's Law, Comparisons of 2 2
single core, multi-core, multi-processing and hyper threading
2 Parallel Computers and programming
Threading Concepts, Communication Architectures and
Communication Costs, Thread Level Parallelism(TLP),
Instruction Level Parallelism(ILP), Comparisons, Cache 5 2
Hierarchy and Memory-level Parallelism, Cache Coherence,
Parallel programming models, Shared Memory and Message
Passing, Vectorization
3 OpenMP programming (Open multi-processing)
Introduction to OpenMP, Parallel constructs, Runtime Library
routines, Work-sharing constructs, Scheduling clauses, Data 5 2
environment clauses, atomic, master Nowait Clause, Barrier
Construct
4 CUDA Programming(Compute Unified Device Architecture)
Introduction to GPU Computing, CUDA Programming Model,
CUDA API, Simple Matrix, Multiplication in CUDA , CUDA 6 2
Memory Model, Shared Memory Matrix Multiplication,
Additional CUDA API Features
5 Performance Analysers
Trace analyzer and collector (ITAC), VTune Amplifier XE,
Energy Efficient Performance, Integrated Performance Primitives 4 14
(IPP)
6 Contemporary tools
MKL (Math Kernel Library), Threading Building Blocks, CUDA 3 14
Tools
7 HTC and MTC
HTC (High Throughput Computing), MTC (Many Task Computing),
3 14
Top 500 Super computers in the world, Top 10 Super Computer
architectural details, Exploring Linpack.
8 Recent Trends 2 11
30
Project 60
[Non 17
Contact
Projects may be given as group projects hrs]
Design and development of High Performance applications through parallel
programming platforms in the following areas
Network Security
Data Compression
Image Processing
Bio-Medical
Information retrieval
Natural Language Processing
Health care Applications
Reference Books
1. Rob Farber, “CUDA Application Design and Development”, Morgan Kaufmann
Publishers, 2013
2. Shameem Akhter and Jason Roberts, “Multi-Core Programming”, 1st edition, Intel Press,
2012
3. Cameron Hughes, Tracey Hughes, “Professional Multicore Programming Design and
Implementation for C++ Developers”, Wiley, 2008
4. Robert Oshana, “Multicore Software Development Techniques: Applications, Tips, and
Tricks”, Newnes,1 edition, 2015
5. David B. Kirk , Wen-mei W. Hwu, “Programming Massively Parallel Processors: A
Hands-on Approach (Applications of GPU Computing Series)”, 1st edition, Morgan
Kaufmann, 2010.
Multicore Architectures
Knowledge areas that contain topics and learning outcomes covered in the course
Knowledge Area Total Hours of Coverage
Systems Fundamentals (SF) 30
Body of Knowledge coverage
[List the Knowledge Units covered in whole or in part in the course. If in part, please indicate
which topics and/or learning outcomes are covered. For those not covered, you might want to
indicate whether they are covered in another course or not covered in your curriculum at all.
This section will likely be the most time-consuming to complete, but is the most valuable for
educators planning to adopt the CS2013 guidelines.]
KA Knowledge Unit Topics Covered Hours
SF Computer Evolution of multicores through Moor's Law, 2
Organization Comparisons of single core, multi-core, multi-
processing and hyper threading.
SF Parallelism Threading Concepts, Communication Architectures 5
and Communication Costs, TLP, ILP, Comparisons,
Cache Hierarchy and Memory-level Parallelism,
Cache Coherence, Parallel programming models,
Shared Memory and Message Passing, Vectorization
SF Parallel Programming Introduction to OpenMP, Parallel constructs, 5
Language Runtime Library routines, Work-sharing constructs,
Scheduling clauses, Data environment clauses,
atomic, master Nowait Clause, Barrier Construct
Heterogeneous Introduction to GPU Computing, CUDA 6
architecture and its Programming Model, CUDA API, Simple Matrix,
programming Multiplication in CUDA , CUDA Memory Model,
Shared Memory Matrix Multiplication, Additional
CUDA API Features
SF Program Analyzer Trace analyzer and collector (ITAC), VTune 4
Amplifier XE, Energy Efficient Performance,
Integrated Performance Primitives (IPP)
SF Contemporary Tools MKL (Math Kernel Library), Threading Building 3
Blocks, CUDA Tools
SF HTC and MTC HTC (High Throughput Computing), MTC (Many Task 3
Computing), Top 500 Super computers in the world, Top
10 Super Computer architectural details, Exploring
Linpack.
SF Recent Trends HTC (High Throughput Computing), MTC (Many Task 2
Computing), Top 500 Super computers in the world, Top
10 Super Computer architectural details, Exploring
Linpack.
Total hours 30
What is covered in the course?
[A short description, and/or a concise list of topics - possibly from your course syllabus.(This is
likely to be your longest answer)]
Module 1: Introduction to Multi-Core architecture
Evolution of multi-cores through Moor's Law, Comparisons of single core, multi-core, multi-
processing and hyper threading.
Module 2: Parallel Computers & its programming
Threading Concepts, Communication Architectures and Communication Costs, TLP, ILP,
Comparisons, Cache Hierarchy and Memory-level Parallelism, Cache Coherence, Parallel
programming models, Shared Memory and Message Passing, Vectorization
Module 3: OpenMP Programming
Introduction to OpenMP, Parallel constructs, Runtime Library routines, Work-sharing constructs,
Scheduling clauses, Data environment clauses, atomic, master Nowait Clause, Barrier Construct
Module 4: CUDA Programming
Introduction to GPU Computing, CUDA Programming Model, CUDA API, Simple Matrix,
Multiplication in CUDA , CUDA Memory Model, Shared Memory Matrix Multiplication,
Additional CUDA API Features
Module 5: Performance Analysers
Trace analyzer and collector (ITAC), VTune Amplifier XE, Energy Efficient Performance,
Integrated Performance Primitives (IPP)
Module 6: Contemporary Tools
MKL (Math Kernel Library), Threading Building Blocks, CUDA Tools
Module 7: HTC and MTC
HTC (High Throughput Computing), MTC (Many Task Computing), Top 500 Super computers in the
world, Top 10 Super Computer architectural details, Exploring Linpack.
Module 8: Recent trends
What is the format of the course?
[Is it face to face, online or blended? How many contact hours? Does it have lectures, lab
sessions, discussion classes?]
This Course is designed with 100 minutes of in-classroom sessions per week as well as 200
minutes of non-contact time spent on implementing course related project.. Generally this
course should have the combination of lectures, in-class discussion, case studies, guest-lectures,
mandatory off-class reading material, assignments and quizzes.
How are students assessed?
[What type, and number, of assignments are students are expected to do? (papers, problem sets,
programming projects, etc.). How long do you expect students to spend on completing assessed
work?]
Students are assessed based on group activities, classroom discussion, assignments, quiz,
projects, continuous (CAT) assessment test, and final assessment test.
Additional topics
[List notable topics covered in the course that you do not find in the CS2013 Body of
Knowledge]
Cuda Programming, Top 10 Super Computers in the world and Benchmarks
Other comments
Nil
Session wise plan
Student Outcomes Covered: 2, 5, 9, 17
Sl. Class Hour Topic Covered levels of Reference Remarks
No. mastery Book
1 2 Evolution of Usage 2
multi-cores
through Moor's
Law,
Comparisons of
single core,
multi-core,
multi-processing
and hyper
threading.
2 5 Threading Usage 2
Concepts,
Communication
Architectures
and
Communication
Costs, TLP, ILP,
Comparisons,
Cache Hierarchy
and Memory-
level
Parallelism,
Cache
Coherence,
Parallel
programming
models, Shared
Memory and
Message
Passing,
Vectorization
3 5 Introduction to Usage 2 Assignments
OpenMP,
Parallel
constructs,
Runtime Library
routines, Work-
sharing
constructs,
Scheduling
clauses, Data
environment
clauses, atomic,
master Nowait
Clause, Barrier
Construct
4 6 Introduction to Usage 1 Assignments
GPU
Computing,
CUDA
Programming
Model, CUDA
API, Simple
Matrix,
Multiplication in
CUDA , CUDA
Memory Model,
Shared Memory
Matrix
Multiplication,
Additional
CUDA API
Features
5 4 Trace analyzer Usage 1
and collector
(ITAC), VTune
Amplifier XE,
Energy Efficient
Performance,
Integrated
Performance
Primitives (IPP)
6 3 MKL (Math Usage 2
Kernel Library),
Threading
Building Blocks,
CUDA Tools
7 3 HTC (High Familiarity
Throughput
Computing),
MTC (Many Task
Computing), Top
500 Super
computers in the
world, Top 10
Super Computer
architectural
details, Exploring
Linpack.
8 2 Recent Trends
30 Hours