0% found this document useful (0 votes)

22 views23 pages

Lecture 1

This document serves as an introduction to parallel programming using the Message Passing Interface (MPI) for the academic year 2024-2025. It covers prerequisites, the need for parallel computing, various parallel programming models, and the architecture of shared and distributed memory systems. Additionally, it provides insights on MPI's role, its implementations, and examples of writing MPI programs in C, C++, or Fortran.

Uploaded by

elbana795

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views23 pages

Lecture 1

Uploaded by

elbana795

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Introduction to Parallel

Programming with MPI

Lecture #1: Introduction

Andrea Mignone

Academic Year: 2024-2025

Dipartimento di Fisica
Turin University, Torino (TO)
Course Requisites
▪ In order to follow these lecture notes and the course material you will need
to have some acquaintance with

• Linux shell
• C / C++ or Fortran compiler
• Basic knowledge of numerical methods

▪ Further Reading & Links:

• The latest reference of the MPI standard: https://www.mpi-forum.org/docs/
Huge - but indispensible to understand what lies beyond the surface of the API
• Online tutorials:
• https://mpitutorial.com (by Wes Kendall)
• https://www.codingame.com/playgrounds/349/introduction-to-mpi/introduction-
to-distributed-computing
• http://adl.stanford.edu/cme342/Lecture_Notes.html
The Need for Parallel Computing
▪ Memory- and CPU-intensive computations can be carried out using
parallelism.

▪ Parallel programming methods on parallel computers provides access to

increased memory and CPU resources not available on serial computers.
This allows large problems to be solved with greater speed or not even
feasible when compared to the typical execution time on a single processor.

▪ Serial application (codes) can be turned into parallel ones by fulfilling some
requirements which are typically hardware-dependent.

▪ Parallel programming paradigms rely on the usage of message passing

libraries. These libraries manage transfer of data between instances of a
parallel program unit on multiple processors in a parallel computing
architecture.
Parallel Programming Models
▪ Serial applications will not run automatically on parallel architectures (no
such thing as automatic parallelism !).

▪ Any parallel programming model must specify how parallelism is achieved

through a set of program abstractions for fitting parallel activities from the
application to the underlying parallel hardware.

▪ It spans over different layers: applications, programming languages,

compilers, libraries, network communication, and I/O systems.

▪ Flynn’s taxonomy: a classification (proposed by M.J.Flynn in 1966) based on

number of simultaneous instruction and data streams seen by the processor
during program execution.

▪ Flynn’s taxonomy is employed as a tool in the design of modern processors

and their functionality.
Flynn’s Taxonomy
▪ SISD (Single Instruction, Single Data): a sequential computer which exploits no
parallelism in either the instruction or data streams;

▪ SIMD (Single Instruction, Multiple Data): processes execute the same instruction
(or operation) on a different data elements.
• Example: an application where the same value is added (or subtracted) from a large
number of data points (e.g. multimedia applications).
• Advantage: processing multiple data elements at the same time with a single
instruction can noticeably improve the performance.
• Employed by vector computers.

▪ MISD (Multiple Instruction, Single Data): multiple instructions operate on one

data stream (uncommon architecture);

▪ MIMD (Multiple Instruction, Multiple Data): at any time, different processes

execute different instructions on different portions of data:
• Single Program, Multiple Data (SPMD)
• Multiple Programs, Multiple Data (MPMD)
SPMD Parallelism
▪ The majority of MPI programs is based on the Single-Program-Multiple-Data
(SPMD) paradigm;

▪ In this model, all processors run simultaneously a copy of the same

program;
Source
File

Executable Executable Executable

…
Proc Proc Proc
#0 #1 #N

▪ However, each process works on a separate copy of the data;

▪ Processes can follow different control paths during the execution, depending
of the process ID.
MPMD Parallelism
▪ In the Multiple programs, multiple data (MPMD) parallelism, each task can
execute a different program:

Source Source Source

File 1 File 2 File N

Program 1 Program 2 Program N

…
Proc Proc Proc
#0 #1 #N
Parallel Programming Models
▪ By far, SIMD and SPMD are the most dominant parallel models.

▪ In the course we will be concerned with SPMD only.

▪ SPMD programs can exploit two different memory models:

• Shared memory;

• Distributed memory.

▪ The latest generation of parallel computers now uses a mixed

shared/distributed memory architecture. Each node consists of a group of 2
to 32 (or more) processors connected via local, shared memory and the
multiprocessor nodes are, in turn, connected via a high-speed
communications fabric.
Parallel Architectures: Shared Memory
▪ In a shared memory computer, multiple processors share access to a global
memory space via a high-speed memory bus.

▪ This global memory space allows the processors to efficiently exchange or

share access to data.

▪ Typically, the number of processors used in shared memory architectures is

limited to only a handful (2 - 16) of processors. This is because the amount
of data that can be processed is limited by the bandwidth of the memory
bus connecting the processors.
Parallel Architectures: Distributed Memory
▪ Distributed memory parallel computers are essentially a collection of serial
computers (nodes) working together to solve a problem.

▪ Each node has rapid access to its own local memory and access to the
memory of other nodes via some sort of communications network, usually a
proprietary high-speed communications network.

▪ Data are exchanged between nodes as messages over the network.

The Message Passing Interface (MPI)
▪ The Message Passing Interface (MPI) is a standardized, vendor-independent
and portable message-passing library defining syntax and semantic
standards of a core of library routines useful to a wide range of users
writing portable message-passing programs in C, C++, and Fortran.

▪ MPI has over 40 participating organizations, including vendors, researchers,

software library developers, and users.

▪ The goal of MPI is to establish a portable, efficient, and flexible standard for
message passing that will be widely used for writing message passing
programs.

▪ MPI is not an IEEE or ISO standard, but has in fact, become the "industry
standard" for writing message passing programs on HPC platforms.
What MPI is NOT
▪ MPI is not a programming language; but instead a realization of a computer
model.

▪ It’s not a new way of parallel programming (rather a realization of the old
message passing paradigm that was around before as POSIX sockets)

▪ It’s not automatically parallelizing code (rather, the programmer gets full
manual control over all communications) ;
Downloading & Installing MPI
Two common implementations of MPI are:

▪ MPICH (http://www.mpich.org - recommended) is a high performance and

widely portable implementation of the Message Passing Interface (MPI)
standard. MPICH is distributed as source (with an open-source, freely
available license). It has been tested on several platforms, including Linux
(on IA32 and x86-64), Mac OS/X (PowerPC and Intel), Solaris (32- and 64-
bit), and Windows.

▪ The Open MPI Project (https://www.open-mpi.org) is an open source

Message Passing Interface implementation that is developed and maintained
by a consortium of academic, research, and industry partners.

MPICH is supposed to be high-quality reference implementation of the latest

MPI standard and the basis for derivative implementations to meet special
purpose needs. Open-MPI targets the common case, both in terms of usage
and network conduits.
Running MPI on a single CPU
▪ Modern laptop computer are equipped with more than one core (typically 2-
6). In this case you can fully exploit the MPI library and achieve performance
gain.

▪ If you have multiple cores, each process will run on a separate core.

▪ If you ask for more processes than the available core CPUs, everything will
run, but with a lower efficiency. MPI creates virtual “processes” in this case.
So, if you have a single CPU single-core machine, you can still use MPI but
(yes, you can run multi-process jobs on a single-cpu single-core machine...)
Writing an MPI Program
MPI Group & Communicators
▪ An MPI group is a fixed, ordered set of unique MPI processes. In other
words, an MPI group is a collection of processes, e.g.

3 MPI_COMM_WORLD
7
0 4
6
5
GROUP_BLUE 1
2
GROUP_GREEN

3
7
2
0
4
1 COMM_BLUE COMM_GREEN 6
5

▪ A communicator represents a process' membership in a larger process group.

It holds a group of processes that can communicate with each other.

▪ You can see a communicator as a box grouping processes together, allowing

them to communicate.

▪ First the group is created with the desired processes to be included within
the group and then the group is used to create a communicator.
The Default Communicator
▪ The default communicator is called MPI_COMM_WORLD.

▪ It basically groups all the processes when the

program starts.

▪ The diagram shows a program which runs with five

processes. Every process is connected and can
communicate inside MPI_COMM_WORLD.

▪ The size of a communicator does not change once it

is created. Each process inside a communicator has
a unique number to identify it: rank of the process.

▪ In the previous example, the size of MPI_COMM_WORLD

is 5. The rank of each process is the number inside
each circle. The rank of a process always ranges
from

▪ MPI_COMM_WORLD is not the only communicator in

MPI. Custom communicators may also be created.
Example #1: our 1st MPI Program
▪ [program1.c] Applications
can be written in C, C++ or Fortran and appropriate
calls to MPI can be added where required

#include <mpi.h> Include MPI library header file

#include <stdio.h>

int main(int argc, char ** argv)

{
int rank, size; Initialize the MPI execution environment

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank); Obtain the rank and size of the communicator

MPI_Comm_size(MPI_COMM_WORLD, &size);

printf("I am rank # %d of %d\n", rank, size);

Terminate MPI execution environment
MPI_Finalize();
return 0;
}
Example #1: our 1st MPI Program
▪ Applications can be written in C, C++ or Fortran and appropriate calls to
MPI can be added where required

▪ For a serial application we normally compile and run the code

> gcc my_program.c –o my_program # Compile the code

> ./my_program # run on a single processor

▪ For a parallel application using MPI:

> mpicc my_program.c –o my_program # Compile the code using MPI C compiler
> mpirun –np 4 ./my_program # run on a 4 processors

▪ The result of the execution should be

I am rank # 0 of 4
I am rank # 1 of 4
I am rank # 3 of 4
I am rank # 2 of 4
Example #2: Multiplication Tables
▪ Suppose we now want different processors to do the
[mult_tables.c]
multiplication table of 1, 2, 3, and 4:

#include <mpi.h>
#include <stdio.h> ▪ Note that each processor will create a
int main(int argc, char ** argv) different table, depending on its rank.
{ So rank #0 will print 1,2,3,… while
int i, rank, size;
rank #1 will do 2, 4, 6, 8, … and so on;
/* -- Initialize MPI environment -- */

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
▪ The output may be chaotic and not
/* -- Create multiplication table -- */
predictable since all processors try to
printf ("[Rank # %d]\n", rank); access the output stream (stdio)
for (i = 1; i <= 10; i++){
printf (" %d\n", i*(rank+1)); concurrently.
}

MPI_Finalize();
return 0;

}
Example #3: solving an Initial Value ODE
▪ [multi_ode.c] We now wish to solve the pendulum 2 nd order ordinary differential
equation

for 0 < t < 10, using different initial conditions at t=0:

▪ Using 4 processes we can have each task solving the same equation using a
different initial condition.

▪ A 2nd –order Runge Kutta scheme with Δt = 0.1 can be used. The code will be
the same, however:
• Make sure the initial condition is assigned based on the rank
• The output data file should be different for each processes.
Example #3: solving an Initial value ODE
▪ We cast the 2nd –order ODE as a system of

two 1st –order ODE:

▪ The RK2 method can be illustrated as follows

▪ Here Y and R are 2-element arrays containing the unknowns and the right
hand sides for the two 1st –order ODE:

▪ The output file should be a multi-column text file containing

. . .
tn θn ωn
. . .
Visualizing Data
▪ We can now plot the 4 solutions of the ODE using, e.g., gnuplot.

Computing LLNL Gov
No ratings yet
Computing LLNL Gov
42 pages
Message Passing Interface (MPI) : EC3500: Introduction To Parallel Computing
100% (1)
Message Passing Interface (MPI) : EC3500: Introduction To Parallel Computing
40 pages
Message Passing Interface (MPI) : Author: Blaise Barney, Lawrence Livermore National Laboratory
No ratings yet
Message Passing Interface (MPI) : Author: Blaise Barney, Lawrence Livermore National Laboratory
41 pages
Mpi Openmp Handouts
No ratings yet
Mpi Openmp Handouts
67 pages
Message Passing Interface (MPI)
No ratings yet
Message Passing Interface (MPI)
22 pages
Mpi Half Day Public
No ratings yet
Mpi Half Day Public
140 pages
Mpi
No ratings yet
Mpi
17 pages
MPI Tutorial Fall Break 2022
No ratings yet
MPI Tutorial Fall Break 2022
60 pages
CS-3006 - 5 - MPI Basics
No ratings yet
CS-3006 - 5 - MPI Basics
53 pages
Lecture 6 Parallel Programming Models 1
No ratings yet
Lecture 6 Parallel Programming Models 1
14 pages
Cs-3006 6 Mpi Basics 2
No ratings yet
Cs-3006 6 Mpi Basics 2
52 pages
2013 02 24 Ppopp Mpi Basic
No ratings yet
2013 02 24 Ppopp Mpi Basic
102 pages
MiniTool Partition Wizard Crack 12 Key Download Free 2025
0% (1)
MiniTool Partition Wizard Crack 12 Key Download Free 2025
29 pages
Lec 9 DR Marwa Abbas
No ratings yet
Lec 9 DR Marwa Abbas
64 pages
Using MPI
No ratings yet
Using MPI
385 pages
Lect5 PDF
No ratings yet
Lect5 PDF
24 pages
Intro To MPI
No ratings yet
Intro To MPI
44 pages
BIg Data Anslysi
No ratings yet
BIg Data Anslysi
57 pages
Lecture 6 Parallel Programming Models
No ratings yet
Lecture 6 Parallel Programming Models
17 pages
Introduction To MPI Basics
No ratings yet
Introduction To MPI Basics
8 pages
And What Is More Preferable?: PVM MPI
No ratings yet
And What Is More Preferable?: PVM MPI
45 pages
Introduction To MPI Ranger Lonestar
No ratings yet
Introduction To MPI Ranger Lonestar
67 pages
An Introduction To MPI: Parallel Programming With The Message Passing Interface
No ratings yet
An Introduction To MPI: Parallel Programming With The Message Passing Interface
48 pages
Message Passing Interface (MPI)
No ratings yet
Message Passing Interface (MPI)
4 pages
Asg 03 - MPI
No ratings yet
Asg 03 - MPI
8 pages
Mpi Course
No ratings yet
Mpi Course
93 pages
Parallel Programming and MPI
No ratings yet
Parallel Programming and MPI
54 pages
Message Passing Interface (MPI)
No ratings yet
Message Passing Interface (MPI)
14 pages
Elts
No ratings yet
Elts
45 pages
Lecture 10-Introduction To MPI
No ratings yet
Lecture 10-Introduction To MPI
51 pages
Chapter 4 - Message-Passing Programming, MPI
No ratings yet
Chapter 4 - Message-Passing Programming, MPI
79 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Lecture 11 Distributed Memory Programming
No ratings yet
Lecture 11 Distributed Memory Programming
28 pages
MPI Programming Guide
No ratings yet
MPI Programming Guide
74 pages
Bcs702 Parallel Computing Module 1
100% (2)
Bcs702 Parallel Computing Module 1
35 pages
Week 10
No ratings yet
Week 10
52 pages
Apk Nokepoi
No ratings yet
Apk Nokepoi
29 pages
MPI Parallel Processing Course
No ratings yet
MPI Parallel Processing Course
49 pages
Mpi Lecture
No ratings yet
Mpi Lecture
129 pages
PDC Week 11 Synchronization
No ratings yet
PDC Week 11 Synchronization
6 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Parallel Programming
No ratings yet
Parallel Programming
108 pages
Multi Core Architectures and Programming
No ratings yet
Multi Core Architectures and Programming
10 pages
Clase 4 - Tutorial de MPI
No ratings yet
Clase 4 - Tutorial de MPI
35 pages
3 Mpi
No ratings yet
3 Mpi
44 pages
3 ParallelProgrammingModels
No ratings yet
3 ParallelProgrammingModels
20 pages
Module 08
No ratings yet
Module 08
89 pages
CS621 Final Term Current Papers
100% (1)
CS621 Final Term Current Papers
9 pages
Parallel Programming Using MPI
No ratings yet
Parallel Programming Using MPI
69 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Message-Passing Multicomputer
No ratings yet
Message-Passing Multicomputer
57 pages
PA
No ratings yet
PA
87 pages
02 Message Passing Interface Tutorial
No ratings yet
02 Message Passing Interface Tutorial
34 pages
Sheet2 Chapter2 Os
No ratings yet
Sheet2 Chapter2 Os
3 pages
Sheet5 Chapter5
No ratings yet
Sheet5 Chapter5
6 pages
Sheet4 Chapter4 Os
No ratings yet
Sheet4 Chapter4 Os
6 pages
Database Chapter 1 CS307
No ratings yet
Database Chapter 1 CS307
60 pages
NAS Manual
No ratings yet
NAS Manual
120 pages
03 Variables in C Programming Language
No ratings yet
03 Variables in C Programming Language
6 pages
Java OOP for 4th Semester B.Tech
No ratings yet
Java OOP for 4th Semester B.Tech
6 pages
Review of Signals and Systems
No ratings yet
Review of Signals and Systems
12 pages
MIPS Mul Div, and MIPS Floating Point Instructions
No ratings yet
MIPS Mul Div, and MIPS Floating Point Instructions
13 pages
Tiger Graph Spark Datasheet
No ratings yet
Tiger Graph Spark Datasheet
2 pages
TS Appliance Sensor Specifications 062025
No ratings yet
TS Appliance Sensor Specifications 062025
7 pages
An Introduction and Active Directory Basics - Q & A
No ratings yet
An Introduction and Active Directory Basics - Q & A
7 pages
Uvm Topics
No ratings yet
Uvm Topics
44 pages
Not Yet Answered Marked Out of 1.00: Clear My Choice
No ratings yet
Not Yet Answered Marked Out of 1.00: Clear My Choice
34 pages
Homework Operating Systems
No ratings yet
Homework Operating Systems
1 page
Content Text HTML Charset Utf-8 Title Gmail Title Meta Httpequiv X-Ua-Compatible
No ratings yet
Content Text HTML Charset Utf-8 Title Gmail Title Meta Httpequiv X-Ua-Compatible
146 pages
Coherent Taxonomy of Vehicular Ad Hoc Networks (VANETs) - Enabled by Fog Computing - A Review
No ratings yet
Coherent Taxonomy of Vehicular Ad Hoc Networks (VANETs) - Enabled by Fog Computing - A Review
28 pages
NC 3 Open Test (1) (1) 25 July 2022
No ratings yet
NC 3 Open Test (1) (1) 25 July 2022
9 pages
4814-02 3+open LAN Manager 1.1 Release Notes Aug89
No ratings yet
4814-02 3+open LAN Manager 1.1 Release Notes Aug89
30 pages
PCF Users Guide
No ratings yet
PCF Users Guide
104 pages
Compiler Construction Guide
No ratings yet
Compiler Construction Guide
20 pages
ACIT 16501 Cyber Security BTech CSE 6th Sem
No ratings yet
ACIT 16501 Cyber Security BTech CSE 6th Sem
2 pages
Operating System Gate Questions
No ratings yet
Operating System Gate Questions
11 pages
Bytecodes: Hello - Java Being Translated Into Bytecodes. The File of Bytecodes (Machine Language For
No ratings yet
Bytecodes: Hello - Java Being Translated Into Bytecodes. The File of Bytecodes (Machine Language For
3 pages
Efficient Embedded Systems Design - Arm®
No ratings yet
Efficient Embedded Systems Design - Arm®
9 pages
Hashing Notes For Beginners Fixed
No ratings yet
Hashing Notes For Beginners Fixed
4 pages
Emerging Ass't Sec 16 Group 1
No ratings yet
Emerging Ass't Sec 16 Group 1
9 pages
Online Hotel Booking System
No ratings yet
Online Hotel Booking System
102 pages
IBM Dumps
No ratings yet
IBM Dumps
31 pages
Stegnography
No ratings yet
Stegnography
28 pages
DCS Vivaldi DAC v2 0 Datasheet
No ratings yet
DCS Vivaldi DAC v2 0 Datasheet
2 pages
ICT Exam Paper: Questions & Guidelines
No ratings yet
ICT Exam Paper: Questions & Guidelines
5 pages
Networking Cat
No ratings yet
Networking Cat
21 pages
Update Set
No ratings yet
Update Set
5 pages

Lecture 1

Uploaded by

Lecture 1

Uploaded by

Introduction to Parallel

Programming with MPI

Lecture #1: Introduction

Academic Year: 2024-2025

▪ Further Reading & Links:

▪ Parallel programming methods on parallel computers provides access to

▪ Parallel programming paradigms rely on the usage of message passing

▪ Any parallel programming model must specify how parallelism is achieved

▪ It spans over different layers: applications, programming languages,

▪ Flynn’s taxonomy: a classification (proposed by M.J.Flynn in 1966) based on

▪ Flynn’s taxonomy is employed as a tool in the design of modern processors

▪ MISD (Multiple Instruction, Single Data): multiple instructions operate on one

▪ MIMD (Multiple Instruction, Multiple Data): at any time, different processes

▪ In this model, all processors run simultaneously a copy of the same

Executable Executable Executable

▪ However, each process works on a separate copy of the data;

Source Source Source

Program 1 Program 2 Program N

▪ In the course we will be concerned with SPMD only.

▪ SPMD programs can exploit two different memory models:

▪ The latest generation of parallel computers now uses a mixed

▪ This global memory space allows the processors to efficiently exchange or

▪ Typically, the number of processors used in shared memory architectures is

▪ Data are exchanged between nodes as messages over the network.

▪ MPI has over 40 participating organizations, including vendors, researchers,

▪ MPICH (http://www.mpich.org - recommended) is a high performance and

▪ The Open MPI Project (https://www.open-mpi.org) is an open source

MPICH is supposed to be high-quality reference implementation of the latest

▪ A communicator represents a process' membership in a larger process group.

▪ You can see a communicator as a box grouping processes together, allowing

▪ It basically groups all the processes when the

▪ The diagram shows a program which runs with five

▪ The size of a communicator does not change once it

▪ In the previous example, the size of MPI_COMM_WORLD

▪ MPI_COMM_WORLD is not the only communicator in

#include <mpi.h> Include MPI library header file

int main(int argc, char ** argv)

MPI_Comm_rank(MPI_COMM_WORLD, &rank); Obtain the rank and size of the communicator

printf("I am rank # %d of %d\n", rank, size);

▪ For a serial application we normally compile and run the code

> gcc my_program.c –o my_program # Compile the code

▪ For a parallel application using MPI:

▪ The result of the execution should be

for 0 < t < 10, using different initial conditions at t=0:

two 1st –order ODE:

▪ The RK2 method can be illustrated as follows

▪ The output file should be a multi-column text file containing

You might also like