Eijkhout HPCtutorials
Eijkhout HPCtutorials
Victor Eijkhout
Public draft This book is open for comments. What is missing or incomplete or unclear? Is material
presented in the wrong sequence? Kindly mail me with any comments you may have.
You may have found this book in any of a number of places; the authoritative download location is https:
//theartofhpc.com/ That page also links to lulu.com where you can get a nicely printed copy.
4
Contents
Victor Eijkhout 5
Contents
Exercises
lesson Topic Book Slides in-class homework
1 Unix 1 unix 1.42
2 Git 5
3 Programming2 programming 2.3 2.4
4 Libraries 2 programming
5 Debugging 11 root code
6 LATEX 15 15.13
7 Make 3 3.1, 3.2
A good part of being an effective practitioner of High Performance Scientific Computing is what can be
called ‘HPC Carpentry’: a number of skills that are not scientific in nature, but that are still indispensable
to getting your work done.
The vast majority of scientific programming is done on the Unix platform so we start out with a tutorial
on Unix in chapter 1, followed by an explanation of the how your code is handled by compilers and linkers
and such in chapter 2.
Next you will learn about some tools that will increase your productivity and effectiveness:
• The Make utility is used for managing the building of projects; chapter 3.
• Source control systems store your code in such a way that you can undo changes, or maintain
multiple versions; in chapter 5 you will see the subversion software.
• Storing and exchanging scientific data becomes an important matter once your program starts to
produce results; in chapter 7 you will learn the use of HDF5.
• Visual output of program data is important, but too wide a topic to discuss here in great detail;
chapter 9 teaches you the basics of the gnuplot package, which is suitable for simple data plotting.
We also consider the activity of program development itself: chapter 10 considers how to code to prevent
errors, and chapter 11 teaches you to debug code with gdb. Chapter 13 contains some information on how
to write a program that uses more than one programming language.
Finally, chapter 15 teaches you about the LATEX document system, so that you can report on your work in
beautifully typeset articles.
Many of the tutorials are very hands-on. Do them while sitting at a computer!
Table 1 gives a proposed lesson outline for the carpentry section of a course. The article by Wilson [24]
is a good read on the thinking behind this ‘HPC carpentry’.
6 HPC Carpentry
Chapter 1
Unix intro
Unix is an Operating System (OS), that is, a layer of software between the user or a user program and the
hardware. It takes care of files and screen output, and it makes sure that many processes can exist side by
side on one system. However, it is not immediately visible to the user.
Most of this tutorial will work on any Unix-like platform, however, there is not just one Unix:
• Traditionally there are a few major flavors of Unix: ATT or System V , and BSD. Apple has Darwin
which is close to BSD; IBM and HP have their own versions of Unix, and Linux is yet another
variant. These days many Unix versions adhere to the POSIX standard. The differences between
these are deep down and if you are taking this tutorial you probably won’t see them for quite a
while.
• Within Linux there are various Linux distributions such as Red Hat or Ubuntu. These mainly differ
in the organization of system files and again you probably need not worry about them.
• The issue of command shells will be discussed below. This actually forms the most visible differ-
ence between different computers ‘running Unix’.
1.1 Shells
Most of the time that you use Unix, you are typing commands which are executed by an interpreter called
the shell. The shell makes the actual OS calls. There are a few possible Unix shells available
• Most of this tutorial is focused on the sh or bash shell.
• For a variety of reasons (see for instance section 3.5), bash-like shells are to be preferred over the
csh or tcsh shell. These latter ones will not be covered in this tutorial.
• Recent versions of the Apple Mac OS have the zsh as default. While this shell has many things in
common with bash, we will point out differences explicitly.
7
1. Unix intro
1.2.1.1 ls
Without any argument, the ls command gives you a listing of files that are in your present location.
Exercise 1.1. Type ls. Does anything show up?
Intended outcome. If there are files in your directory, they will be listed; if there are none,
no output will be given. This is standard Unix behavior: no output does not mean that
something went wrong, it only means that there is nothing to report.
Exercise 1.2. If the ls command shows that there are files, do ls name on one of those. By
using an option, for instance ls -s name you can get more information about name.
Things to watch out for. If you mistype a name, or specify a name of a non-existing file,
you’ll get an error message.
The ls command can give you all sorts of information. In addition to the above ls -s for the size, there
is ls -l for the ‘long’ listing. It shows (things we will get to later such as) ownership and permissions,
as well as the size and creation date.
Remark 1 There are several dates associated with a file, corresponding to changes in content, changes in
permissions, and access of any sort. The stat command gives all of them.
1.2.1.2 cat
The cat command (short for ‘concatenate’) is often used to display files, but it can also be used to create
some simple content.
Exercise 1.3. Type cat > newfilename (where you can pick any filename) and type some
text. Conclude with Control-d on a line by itself: press the Control key and hold it
while you press the d key. Now use cat to view the contents of that file: cat newfilename.
8 HPC Carpentry
1.2. Files and such
Intended outcome. In the first use of cat, text was appended from the terminal to a file;
in the second the file was cat’ed to the terminal output. You should see on your screen
precisely what you typed into the file.
Things to watch out for. Be sure to type Control-d as the first thing on the last line of
input. If you really get stuck, Control-c will usually get you out. Try this: start creating
a file with cat > filename and hit Control-c in the middle of a line. What are the
contents of your file?
Remark 2 Instead of Control-d you will often see the notation ^D. The capital letter is for historic reasons:
you use the control key and the lowercase letter.
1.2.1.3 man
The primary (though not always the most easily understood) source for unix commands is the man com-
mand, for ‘manual’. The descriptions available this way are referred to as the manual pages.
Exercise 1.4. Read the man page of the ls command: man ls. Find out the size and the time /
date of the last change to some files, for instance the file you just created.
Intended outcome. Did you find the ls -s and ls -l options? The first one lists the
size of each file, usually in kilobytes, the other gives all sorts of information about a file,
including things you will learn about later.
The man command puts you in a mode where you can view long text documents. This viewer is common
on Unix systems (it is available as the more or less system command), so memorize the following ways
of navigating: Use the space bar to go forward and the u key to go back up. Use g to go to the beginning
fo the text, and G for the end. Use q to exit the viewer. If you really get stuck, Control-c will get you out.
Remark 3 If you already know what command you’re looking for, you can use man to get online information
about it. If you forget the name of a command, man -k keyword can help you find it.
1.2.1.4 touch
The touch command creates an empty file, or updates the timestamp of a file if it already exists. Use ls
-l to confirm this behavior.
Victor Eijkhout 9
1. Unix intro
Exercise 1.6. Rename a file. What happens if the target name already exists?
Files are deleted with rm. This command is dangerous: there is no undo. For this reason you can do rm
-i (for ‘interactive’) which asks your confirmation for every file. See section 1.2.4 for more aggressive
removing.
Sometimes you want to refer to a file from two locations. This is not the same as having a copy: you want
to be able to edit either one, and have the other one change too. This can be done with ln: ‘link’.
This snippet creates a file and a link to it:
$ echo contents > arose
$ cd mydir
$ ln ../arose anyothername
$ cat anyothername
contents
$ echo morestuff >> anyothername
$ cd ..
$ cat arose
contents
morestuff
1.2.2 Directories
Purpose. Here you will learn about the Unix directory tree, how to manipulate it and
how to move around in it.
10 HPC Carpentry
1.2. Files and such
A unix file system is a tree of directories, where a directory is a container for files or more directories. We
will display directories as follows:
/..............................................The root of the directory tree
bin ................................................... Binary programs
home ....................................... Location of users directories
The root of the Unix directory tree is indicated with a slash. Do ls / to see what the files and directories
there are in the root. Note that the root is not the location where you start when you reboot your personal
machine, or when you log in to a server.
Exercise 1.9. The command to find out your current working directory is pwd. Your home di-
rectory is your working directory immediately when you log in. Find out your home
directory.
Intended outcome. You will typically see something like /home/yourname or /Users/yourname.
This is system dependent.
Do ls to see the contents of the working directory. In the displays in this section, directory names will be
followed by a slash: dir/ but this character is not part of their name. You can get this output by using ls
-F, and you can tell your shell to use this output consistently by stating alias ls=ls -F at the start of
your session. Example:
/home/you/
adirectory/
afile
Remark 4 If you need to create a directory several levels deep, you could
mkdir sub1
cd sub1
mkdir sub2
cd sub2
## et cetera
but it’s shorter to use the -p option (for ‘parent’) and write:
mkdir -p sub1/sub2/sub3
Victor Eijkhout 11
1. Unix intro
The command for going into another directory, that is, making it your working directory, is cd (‘change
directory’). It can be used in the following ways:
• cd Without any arguments, cd takes you to your home directory.
• cd <absolute path> An absolute path starts at the root of the directory tree, that is, starts
with /. The cd command takes you to that location.
• cd <relative path> A relative path is one that does not start at the root. This form of the cd
command takes you to <yourcurrentdir>/<relative path>.
Exercise 1.11. Do cd newdir and find out where you are in the directory tree with pwd. Con-
firm with ls that the directory is empty. How would you get to this location using an
absolute path?
Intended outcome. pwd should tell you /home/you/newdir, and ls then has no output,
meaning there is nothing to list. The absolute path is /home/you/newdir.
Exercise 1.12. Let’s quickly create a file in this directory: touch onefile, and another direc-
tory: mkdir otherdir. Do ls and confirm that there are a new file and directory.
Intended outcome. You should now have:
/home/you/
newdir/...................................................you are here
onefile
otherdir/
The ls command has a very useful option: with ls -a you see your regular files and hidden files, which
have a name that starts with a dot. Doing ls -a in your new directory should tell you that there are the
following files:
/home/you/
newdir/...................................................you are here
.
..
onefile
otherdir/
The single dot is the current directory, and the double dot is the directory one level back.
Exercise 1.13. Predict where you will be after cd ./otherdir/.. and check to see if you were
right.
Intended outcome. The single dot sends you to the current directory, so that does not
change anything. The otherdir part makes that subdirectory your current working di-
rectory. Finally, .. goes one level back. In other words, this command puts your right
back where you started.
Since your home directory is a special place, there are shortcuts for cd’ing to it: cd without arguments,
cd ~, and cd $HOME all get you back to your home.
Go to your home directory, and from there do ls newdir to check the contents of the first directory you
created, without having to go there.
Exercise 1.14. What does ls .. do?
12 HPC Carpentry
1.2. Files and such
Intended outcome. Recall that .. denotes the directory one level up in the tree: you should
see your own home directory, plus the directories of any other users.
Let’s practice the use of the single and double dot directory shortcuts.
Exercise 1.15. From your home directory:
mkdir -p sub1/sub2/sub3
cd sub1/sub2/sub3
touch a
What is the difference between cp -r newdir somedir where somedir is an exiting directory, and cp
-r newdir thirddir where thirddir is not an existing directory?
1.2.3 Permissions
Purpose. In this section you will learn about how to give various users on your system
permission to do (or not to do) various things with your files.
Unix files, including directories, have permissions, indicating ‘who can do what with this file’. Actions
that can be performed on a file fall into three categories:
• reading r: any access to a file (displaying, getting information on it) that does not change the file;
• writing w: access to a file that changes its content, or even its metadata such as ‘date modified’;
• executing x: if the file is executable, to run it; if it is a directory, to enter it.
The people who can potentially access a file are divided into three classes too:
Victor Eijkhout 13
1. Unix intro
Examples:
chmod 766 file # set to rwxrw-rw-
chmod g+w file # give group write permission
chmod g=rx file # set group permissions
chod o-w file # take away write permission from others
chmod o= file # take away all permissions from others.
chmod g+r,o-x file # give group read permission
# remove other execute permission
This is a legitimate shell script. What happens when you type ./com? Can you get the script executed?
In the three permission categories it is clear who ‘you’ and ‘others’ refer to. How about ‘group’? We’ll go
into that in section 1.13.
14 HPC Carpentry
1.3. Text searching and regular expressions
Exercise 1.18. Suppose you’re an instructor and you want to make a ‘dropbox’ directory for
students to deposit homework assignments in. What would be an appropriate mode for
that directory? (Assume that you have co-teachers that are in your group, and who also
need to be able to see the contents. In other words, group permission should be identical
to the owner permission.)
Remark 5 There are more obscure permissions. For instance the setuid bit declares that the program should
run with the permissions of the creator, rather than the user executing it. This is useful for system utilities
such passwd or mkdir, which alter the password file and the directory structure, for which root privileges
are needed. Thanks to the setuid bit, a user can run these programs, which are then so designed that a user
can only make changes to their own password entry, and their own directories, respectively. The setuid bit is
set with chmod: chmod 4ugo file.
1.2.4 Wildcards
You already saw that ls filename gives you information about that one file, and ls gives you all files in
the current directory. To see files with certain conditions on their names, the wildcard mechanism exists.
The following wildcards exist:
* any number of characters
? any character.
Example:
%% ls
s sk ski skiing skill
%% ls ski*
ski skiing skill
The second option lists all files whose name start with ski, followed by any number of other characters’;
below you will see that in different contexts ski* means ‘sk followed by any number of i characters’.
Confusing, but that’s the way it is.
You can use rm with wildcards, but this can be dangerous.
rm -f foo ## remove foo if it exists
rm -r foo ## remove directory foo with everything in it
rm -rf foo/* ## delete all contents of foo
Zsh note. Removing with a wildcard rm foo* is an error of there are no such files. Set setopt +o
nomatch to allow no matches to occur.
For this section you need at least one file that contains some amount of text. You can for instance get
random text from http://www.lipsum.com/feed/html.
Victor Eijkhout 15
1. Unix intro
The grep command can be used to search for a text expression in a file.
Exercise 1.19. Search for the letter q in your text file with grep q yourfile and search for it
in all files in your directory with grep q *. Try some other searches.
Intended outcome. In the first case, you get a listing of all lines that contain a q; in the
second case, grep also reports what file name the match was found in: qfile:this
line has q in it.
Things to watch out for. If the string you are looking for does not occur, grep will simply
not output anything. Remember that this is standard behavior for Unix commands if there
is nothing to report.
In addition to searching for literal strings, you can look for more general expressions.
^ the beginning of the line
$ the end of the line
. any character
* any number of repetitions
[xyz] any of the characters xyz
This looks like the wildcard mechanism you just saw (section 1.2.4) but it’s subtly different. Compare the
example above with:
%% cat s
sk
ski
skill
skiing
%% grep "ski*" s
sk
ski
skill
skiing
In the second case you search for a string consisting of sk and any number of i characters, including zero
of them.
Some more examples: you can find
• All lines that contain the letter ‘q’ with grep q yourfile;
• All lines that start with an ‘a’ with grep "^a" yourfile (if your search string contains special
characters, it is a good idea to use quote marks to enclose it);
• All lines that end with a digit with grep "[0-9]$" yourfile.
Exercise 1.20. Construct the search strings for finding
• lines that start with an uppercase character, and
• lines that contain exactly one character.
Intended outcome. For the first, use the range characters [], for the second use the period
to match any character.
Exercise 1.21. Add a few lines x = 1, x = 2, x = 3 (that is, have different numbers of
spaces between x and the equals sign) to your test file, and make grep commands to
search for all assignments to x.
16 HPC Carpentry
1.4. Other useful commands: tar
The characters in the table above have special meanings. If you want to search that actual character, you
have to escape it.
Exercise 1.22. Make a test file that has both abc and a.c in it, on separate lines. Try the com-
mands grep "a.c" file, grep a\.c file, grep "a\.c" file.
Intended outcome. You will see that the period needs to be escaped, and the search string
needs to be quoted. In the absence of either, you will see that grep also finds the abc
string.
will display the characters in position 2–5 of every line of myfile. Make a test file and verify this example.
Maybe more useful, you can give cut a delimiter character and have it split a line on occurrences of
that delimiter. For instance, your system will mostly likely have a file /etc/passwd that contains user
information1 , with every line consisting of fields separated by colons. For instance:
daemon:*:1:1:System Services:/var/root:/usr/bin/false
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
The seventh and last field is the login shell of the user; /bin/false indicates that the user is unable to
log in.
You can display users and their login shells with:
cut -d ":" -f 1,7 /etc/passwd
This tells cut to use the colon as delimiter, and to print fields 1 and 7.
1. This is traditionally the case; on Mac OS information about users is kept elsewhere and this file only contains system
services.
Victor Eijkhout 17
1. Unix intro
Remark 6 Like any good programming language, the shell language as comments. Any line starting with
a hash character # is ignored.
to enable it.
If you type a command such as ls, the shell does not just rely on a list of commands: it will actually go
searching for a program by the name ls. This means that you can have multiple different commands with
the same name, and which one gets executed depends on which one is found first.
Exercise 1.23. What you may think of as ‘Unix commands’ are often just executable files in a
system directory. Do which ls, and do an ls -l on the result.
Intended outcome. The location of ls is something like /bin/ls. If you ls that, you will
see that it is probably owned by root. Its executable bits are probably set for all users.
18 HPC Carpentry
1.5. Command execution
The locations where unix searches for commands is the search path, which is stored in the environment
variable (for more details see below) PATH.
Exercise 1.24. Do echo $PATH. Can you find the location of cd? Are there other commands in
the same location? Is the current directory ‘.’ in the path? If not, do export PATH=".:$PATH".
Now create an executable file cd in the current director (see above for the basics), and do
cd.
Intended outcome. The path will be a list of colon-separated directories,
for instance /usr/bin:/usr/local/bin:/usr/X11R6/bin. If the working directory
is in the path, it will probably be at the end: /usr/X11R6/bin:. but most likely it will
not be there. If you put ‘.’ at the start of the path, unix will find the local cd command
before the system one.
Some people consider having the working directory in the path a security risk. If your directory is writable,
someone could put a malicious script named cd (or any other system command) in your directory, and
you would execute it unwittingly.
The safest way to execute a program in the current directory is:
./my_program
This holds both for compiled programs and shell scripts; section 1.9.1.
Remark 7 Not all Unix commands correspond to executables. The type command gives more information
than which:
$ type echo
echo is a shell builtin
$ type \ls
ls is an alias for ls -F
$ unalias ls
$ type ls
ls is /bin/ls
$ type module
module is a shell function from /usr/local/Cellar/lmod/8.7.2/init/zsh
1.5.2 Aliases
It is possible to define your own commands as aliases of existing commands.
Exercise 1.25. Do alias chdir=cd and convince yourself that now chdir works just like cd.
Do alias rm='rm -i'; look up the meaning of this in the man pages. Some people find
this alias a good idea; can you see why?
Intended outcome. The -i ‘interactive’ option for rm makes the command ask for confir-
mation before each delete. Since unix does not have a trashcan that needs to be emptied
explicitly (as on Windows or the Mac OS), this can be a good idea.
Victor Eijkhout 19
1. Unix intro
This is convenient if you repeat the same two commands a number of times: you only need to up-arrow
once to repeat them both.
There is a problem: if you type
cc -o myprog myprog.c ; ./myprog
and the compilation fails, the program will still be executed, using an old version of the executable if that
exists. This is very confusing.
A better way is:
cc -o myprog myprog.c && ./myprog
which only executes the second command if the first one was successful.
1.5.3.2 Pipelining
Instead of taking input from a file, or sending output to a file, it is possible to connect two commands
together, so that the second takes the output of the first as input. The syntax for this is cmdone | cmdtwo;
this is called a pipeline. For instance, grep a yourfile | grep b finds all lines that contains both an
a and a b.
Exercise 1.26. Construct a pipeline that counts how many lines there are in your file that con-
tain the string th. Use the wc command (see above) to do the counting.
1.5.3.3 Backquoting
There are a few more ways to combine commands. Suppose you want to present the result of wc a bit
nicely. Type the following command
echo The line count is wc -l foo
where foo is the name of an existing file. The way to get the actual line count echoed is by the backquote:
echo The line count is `wc -l foo`
Anything in between backquotes is executed before the rest of the command line is evaluated.
Exercise 1.27. The way wc is used here, it prints the file name. Can you find a way to prevent
that from happening?
There is another mechanism for out-of-order evaluation:
echo "There are $( cat Makefile | wc -l ) lines"
This mechanism makes it possible to nest commands, but for compatibility and legacy purposes back-
quotes may still be preferable when nesting is not neeeded.
20 HPC Carpentry
1.5. Command execution
This only catches the last command. You could for instance group the three commands in a subshell and
catch the output of that:
( configure ; make ; make install ) > installation.log 2>&1
The script reports that the file was created even though it wasn’t.
Improved script:
[nowrite] cat ../betterfile
#!/bin/bash
touch $1
if [ $? -eq 0 ] ; then
echo "Created file: $1"
else
echo "Problem creating file: $1"
fi
Victor Eijkhout 21
1. Unix intro
22 HPC Carpentry
1.6. Input/output Redirection
Exercise 1.29. Type Control-z. This suspends the foreground process. It will give you a num-
ber like [1] or [2] indicating that it is the first or second program that has been sus-
pended or put in the background. Now type bg to put this process in the background.
Confirm that there is no foreground process by hitting return, and doing an ls.
Intended outcome. After you put a process in the background, the terminal is available
again to accept foreground commands. If you hit return, you should see the command
prompt. However, the background process still keeps generating output.
Exercise 1.30. Type jobs to see the processes in the current session. If the process you just put
in the background was number 1, type fg %1. Confirm that it is a foreground process
again.
Intended outcome. If a shell is executing a program in the foreground, it will not accept
command input, so hitting return should only produce blank lines.
Exercise 1.31. When you have made the hello script a foreground process again, you can kill
it with Control-c. Try this. Start the script up again, this time as ./hello & which
immediately puts it in the background. You should also get output along the lines of [1]
12345 which tells you that it is the first job you put in the background, and that 12345
is its process ID. Kill the script with kill %1. Start it up again, and kill it by using the
process number.
Intended outcome. The command kill 12345 using the process number is usually enough
to kill a running program. Sometimes it is necessary to use kill -9 12345.
So far, the unix commands you have used have taken their input from your keyboard, or from a file named
on the command line; their output went to your screen. There are other possibilities for providing input
from a file, or for storing the output in a file.
Victor Eijkhout 23
1. Unix intro
Unix has three standard files that handle input and output:
Standard file
stdin is the file that provides input for processes.
stdout is the file where the output of a process is written.
stderr is the file where error output is written.
In an interactive session, all three files are connected to the user terminal. Using input or output redirection
then means that the input is taken or the output sent to a different file than the terminal.
Just as with the input, you can redirect the output of your program. In the simplest case, grep string
yourfile > outfile will take what normally goes to the terminal, and redirect the output to outfile.
The output file is created if it didn’t already exist, otherwise it is overwritten. (To append, use grep text
yourfile >> outfile.)
Exercise 1.32. Take one of the grep commands from the previous section, and send its output
to a file. Check that the contents of the file are identical to what appeared on your screen
before. Search for a string that does not appear in the file and send the output to a file.
What does this mean for the output file?
Intended outcome. Searching for a string that does not occur in a file gives no terminal
output. If you redirect the output of this grep to a file, it gives a zero size file. Check this
with ls and wc.
Exercise 1.33. Generate a text file that contains your information:
My user name is:
eijkhout
My home directory is:
/users/eijkhout
I made this script on:
isp.tacc.utexas.edu
24 HPC Carpentry
1.7. Shell environment variables
Idiom
program 2>/dev/null send only errors to the null device
program >/dev/null 2>&1 send output to dev-null, and errors to output
Note the counterintuitive sequence of specifica-
tions!
program 2>&1 | less send output and errors to less
Remark 8 This does not include variables you define yourself, unless you export them; see below.
Exercise 1.34. Check on the value of the PATH variable by typing echo $PATH. Also find the
value of PATH by piping env through grep.
We start by exploring the use of this dollar sign in relation to shell variables.
You see that the shell treats everything as a string, unless you explicitly tell it to take the value of a variable,
by putting a dollar in front of the name. A variable that has not been previously defined will print as a
blank string.
Shell variables can be set in a number of ways. The simplest is by an assignment as in other programming
languages.
When you do the next exercise, it is good to bear in mind that the shell is a text based language.
Victor Eijkhout 25
1. Unix intro
Exercise 1.35. Type a=5 on the commandline. Check on its value with the echo command.
Define the variable b to another integer. Check on its value.
Now explore the values of a+b and $a+$b, both by echo’ing them, or by first assigning
them.
Intended outcome. The shell does not perform integer addition here: instead you get
a string with a plus-sign in it. (You will see how to do arithmetic on variables in sec-
tion 1.10.1.)
Things to watch out for. Beware not to have space around the equals sign; also be sure to
use the dollar sign to print the value.
[] exit
exit
[] export a=21
[] /bin/bash
[] echo $a
21
[] exit
[]
The syntax where you set the value, as a prefix without using a separate command, sets the value
just for that one command.
26 HPC Carpentry
1.8. Control structures
[]
That is, you defined the variable just for the execution of a single command.
In section 1.8 you will see that the for construct also defines a variable; section 1.9.1 shows some more
built-in variables that apply in shell scripts.
If you want to un-set an environment variable, there is the unset command.
1.8.1 Conditionals
The conditional of the bash shell is predictably called if, and it can be written over several lines:
if [ $PATH = "" ] ; then
echo "Error: path is empty"
fi
or on a single line:
if [ `wc -l file` -gt 100 ] ; then echo "file too long" ; fi
(The backquote is explained in section 1.5.3.3.) There are a number of tests defined, for instance -f
somefile tests for the existence of a file. Change your script so that it will report -1 if the file does
not exist.
The syntax of this is finicky:
• if and elif are followed by a conditional, followed by a semicolon.
• The brackets of the conditional need to have spaces surrounding them.
• There is no semicolon after then of else: they are immediately followed by some command.
Exercise 1.36. Bash conditionals have an elif keyword. Can you predict the error you get from
this:
if [ something ] ; then
foo
else if [ something_else ] ; then
bar
fi
Code it out and see if you were right.
Zsh note. The zsh shell has an extended conditional syntax with double square brackets. For
instance, pattern matching:
if [[ $myvar == *substring* ]] ; then ....
Victor Eijkhout 27
1. Unix intro
1.8.2 Looping
In addition to conditionals, the shell has loops. A for loop looks like
for var in listofitems ; do
something with $var
done
In a more meaningful example, here is how you would make backups of all your .c files:
for cfile in *.c ; do
cp $cfile $cfile.bak
done
Shell variables can be manipulated in a number of ways. Execute the following commands to see that you
can remove trailing characters from a variable:
[] a=b.c
[] echo ${a%.c}
b
(See the section 1.10 on expansion.) With this as a hint, write a loop that renames all your .c files to .x
files.
The above construct loops over words, such as the output of ls. To do a numeric loop, use the command
seq:
[shell:474] seq 1 5
1
2
3
4
5
Note the backtick, which is necessary to have the seq command executed before evaluating the loop.
You can break out of a loop with break; this can even have a numeric argument indicating how many
levels of loop to break out of.
28 HPC Carpentry
1.9. Scripting
1.9 Scripting
The unix shells are also programming environments. You will learn more about this aspect of unix in this
section.
and type ./script1 on the command line. Result? Make the file executable and try again.
Zsh note. Bash scripts If you use the zsh, but you have bash scripts that you wrote in the past, they will
keep working. The ‘hash-bang’ line determines which shell executes the script, and it is perfectly possible
to have bash in your script, while using zsh for interactive use.
In order write scripts that you want to invoke from anywhere, people typically put them in a directory
bin in their home directory. You would then add this directory to your search path, contained in PATH;
see section 1.5.1.
You will now learn how to incorporate this functionality in your scripts.
First of all, all commandline arguments and options are available as variables $1,$2 et cetera in the script,
and the number of command line arguments is available as $#:
#!/bin/bash
Formally:
variable meaning
$# number of arguments
$0 the name of the script
$1,$2,... the arguments
$*,$@ the list of all arguments
Victor Eijkhout 29
1. Unix intro
Exercise 1.37. Write a script that takes as input a file name argument, and reports how many
lines are in that file.
Edit your script to test whether the file has less than 10 lines (use the foo -lt bar test),
and if it does, cat the file. Hint: you need to use backquotes inside the test.
Add a test to your script so that it will give a helpful message if you call it without any
arguments.
The standard way to parse argument is using the shift command, which pops the first argument off the
list of arguments. Parsing the arguments in sequence then involves looking at $1, shifting, and looking at
the new $1.
Code: Output
[code/shell] arguments:
// arguments.sh
while [ $# -gt 0 ] ; do missing snippet
echo "argument: $1" code/shell/arguments.runout :
shift looking in codedir=code missing
done snippet code/shell/arguments.runout
: looking in codedir=code
Exercise 1.38. Write a script say.sh that prints its text argument. However, if you invoke it
with
./say.sh -n 7 "Hello world"
it should be print it as many times as you indicated. Using the option -u:
./say.sh -u -n 7 "Goodbye cruel world"
should print the message in uppercase. Make sure that the order of the arguments does
not matter, and give an error message for any unrecognized option.
The variables $@ and $* have a different behavior with respect to double quotes. Let’s say we evaluate
myscript "1 2" 3, then
• Using $* is the list of arguments after removing quotes: myscript 1 2 3.
• Using "$*" is the list of arguments, with quotes removed, in quotes: myscript "1 2 3".
• Using "$@" preserved quotes: myscript "1 2" 3.
30 HPC Carpentry
1.10. Expansion
which makes the script abort if any command fails. The additional option
set -o pipefail
The crucial second line contains an ‘or’ condition: either some_command succeeds, or you set errcode to its
exit code. This conjunction always succeeds, so now you can inspect the exit code.
1.10 Expansion
The shell performs various kinds of expansion on a command line, that is, replacing part of the comman-
dline with different text.
Brace expansion:
[] echo a{b,cc,ddd}e
abe acce addde
This can for instance be used to delete all extension of some base file name:
[] rm tmp.{c,s,o} # delete tmp.c tmp.s tmp.o
There are many variations on parameter expansion. Above you already saw that you can strip trailing
characters:
Victor Eijkhout 31
1. Unix intro
[] a=b.c
[] echo ${a%.c}
b
The backquote mechanism (section 1.5.3.3 above) is known as command substitution. It allows you to
evaluate part of a command and use it as input for another. For example, if you want to ask what type of
file the command ls is, do
[] file `which ls`
This first evaluates which ls, giving /bin/ls, and then evaluates file /bin/ls. As another example,
here we backquote a whole pipeline, and do a test on the result:
[] echo 123 > w
[] cat w
123
[] wc -c w
4 w
[] if [ `cat w | wc -c` -eq 4 ] ; then echo four ; fi
four
32 HPC Carpentry
1.11. Startup files
You would do this, for instance, if you have edited your startup file.
Unfortunately, there are several startup files, and which one gets read is a complicated functions of cir-
cumstances. Here is a good common sense guideline2 :
• Have a .profile that does nothing but read the .bashrc:
# ~/.profile
if [ -f ~/.bashrc ]; then
source ~/.bashrc
fi
Victor Eijkhout 33
1. Unix intro
• You can put more than one command on a line, separated by semicolons: mkdir foo; cd foo.
The shell will execute these commands in sequence.
• Your input line is not a full command, for instance while [ 1]. The shell will recognize that
there is more to come, and use a different prompt to show you that it is waiting for the remainder
of the command.
• Your input line would be a legitimate command, but you want to type more on a second line. In
that case you can end your input line with a backslash character, and the shell will recognize that
it needs to hold off on executing your command. In effect, the backslash will hide (escape) the
return.
When the shell has collected a command line to execute, by using one or more of your input line or only
part of one, as described just now, it will apply expansion to the command line (section 1.10). It will then
interpret the commandline as a command and arguments, and proceed to invoke that command with the
arguments as found.
There are some subtleties here. If you type ls *.c, then the shell will recognize the wildcard character
and expand it to a command line, for instance ls foo.c bar.c. Then it will invoke the ls command
with the argument list foo.c bar.c. Note that ls does not receive *.c as argument! In cases where you
do want the unix command to receive an argument with a wildcard, you need to escape it so that the shell
will not expand it. For instance, find . -name \*.c will make the shell invoke find with arguments .
-name *.c.
This still doesn’t tell you what Linux distribution you are on. For that, some of the following may work:
$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64
:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing
-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core
34 HPC Carpentry
1.13. The system and other users
or
$ ls /etc/*release
/etc/centos-release /etc/os-release@ /etc/redhat-release@ /etc/system-release@
$ cat /etc/*release
CentOS Linux release 7.9.2009 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
1.13.2 Users
Unix is a multi-user operating system. Thus, even if you use it on your own personal machine, you are a
user with an account and you may occasionally have to type in your username and password.
If you are on your personal machine, you may be the only user logged in. On university machines or other
servers, there will often be other users. Here are some commands relating to them.
whoami show your login name.
who show the other users currently logged in.
finger otheruser get information about another user; you can specify a user’s login name here, or
their real name, or other identifying information the system knows about.
Victor Eijkhout 35
1. Unix intro
While you can change the group of a file, at least between groups that you belong to, changing the owning
user of a file with chown needs root priviliges. See section 1.13.4.
where the yourname can be omitted if you have the same name on both machines.
To only copy a file from one machine to another you can use the ‘secure copy’ scp, a secure variant of
‘remote copy’ rcp. The scp command is much like cp in syntax, except that the source or destination can
have a machine prefix.
To copy a file from the current machine to another, type:
36 HPC Carpentry
1.15. The sed and awk tools
where yourname can again be omitted, and otherdirectory can be an absolute path, or a path relative
to your home directory:
# absolute path:
scp localfile yourname@othercomputer:/share/
# path relative to your home directory:
scp localfile yourname@othercomputer:mysubdirectory
Leaving the destination path empty puts the file in the remote home directory:
scp localfile yourname@othercomputer:
Note the colon at the end of this command: if you leave it out you get a local file with an ‘at’ in the name.
You can also copy a file from the remote machine. For instance, to copy a file, preserving the name:
scp yourname@othercomputer:otherdirectory/otherfile .
will apply the substitute command s/foo/bar/ to every line of myfile. The output is shown on your
screen so you should capture it in a new file; see section 1.6 for more on output redirection.
• If you have more than one edit, you can specify them with
sed -e 's/one/two/' -e 's/three/four/'
• If an edit needs to be done only on certain lines, you can specify that by prefixing the edit with
the match string. For instance
sed '/^a/s/b/c/'
only applies the edit on lines that start with an a. (See section 1.3 for regular expressions.)
You can also apply it on a numbered line:
sed '25/s/foo/bar'
Victor Eijkhout 37
1. Unix intro
• The a and i commands are for ‘append’ and ‘insert’ respectively. They are somewhat strange
in how they take their argument text: the command letter is followed by a backslash, with the
insert/append text on the next line(s), delimited by the closing quote of the command.
sed -e '/here/a\
appended text
' -e '/there/i\
inserted text
' -i file
• Traditionally, sed could only function in a stream, so the output file always had to be different
from the input. The GNU version, which is standard on Linux systems, has a flag -i which edits
‘in place’:
sed -e 's/ab/cd/' -e 's/ef/gh/' -i thefile
1.15.2 awk
The awk utility also operates on each line, but it can be described as having a memory. An awk program
consists of a sequence of pairs, where each pair consists of a match string and an action. The simplest
awk program is
cat somefile | awk '{ print }'
where the match string is omitted, meaning that all lines match, and the action is to print the line. Awk
breaks each line into fields separated by whitespace. A common application of awk is to print a certain
field:
awk '{print $2}' file
Exercise 1.39. Build a command pipeline that prints of each subroutine header only the sub-
routine name. For this you first use sed to replace the parentheses by spaces, then awk
to print the subroutine name field.
Awk has variables with which it can remember things. For instance, instead of just printing the second
field of every line, you can make a list of them and print that later:
cat myfile | awk 'BEGIN {v="Fields:"} {v=v " " $2} END {print v}'
As another example of the use of variables, here is how you would print all lines in between a BEGIN and
END line:
cat myfile | awk '/END/ {p=0} p==1 {print} /BEGIN/ {p=1} '
Exercise 1.40. The placement of the match with BEGIN and END may seem strange. Rearrange
the awk program, test it out, and explain the results you get.
38 HPC Carpentry
1.16. Review questions
For simplicity, we simulate this by making a directory submissions and two different
files student1.txt and student2.txt. After
submit_homework student1.txt
submit_homework student2.txt
there should be copies of both files in the submissions directory. Start by writing a
simple script; it should give a helpful message if you use it the wrong way.
Try to detect if a student is cheating. Explore the diff command to see if the submitted
file is identical to something already submitted: loop over all submitted files and
1. First print out all differences.
2. Count the differences.
3. Test if this count is zero.
Now refine your test by catching if the cheating student randomly inserted some spaces.
For a harder test: try to detect whether the cheating student inserted newlines. This can
not be done with diff, but you could try tr to remove the newlines.
Victor Eijkhout 39
Chapter 2
This command can also tell you about binary files. Here the output differs by operating system.
$$ which ls
/bin/ls
# on a Mac laptop:
$$ file /bin/ls
/bin/ls: Mach-O 64-bit x86_64 executable
# on a Linux box
$$ file /bin/ls
/bin/ls: ELF 64-bit LSB executable, x86-64
40
2.1. File types in programming
Exercise 2.1. Apply the file command to sources for different programming language. Can
you find out how file figures things out?
In figure 2.1 you find a brief summary of file types. We will now discuss them in more detail.
Text files
Source Program text that you write
Header also written by you, but not really program text.
Binary files
Object file The compiled result of a single source file
Library Multiple object files bundled together
Executable Binary file that can be invoked as a command
Data files Written and read by a program
Victor Eijkhout 41
2. Compilers and libraries
// binary_write.c // binary_read.c
FILE *binfile; binfile = fopen("binarydata.out","rb");
binfile = fopen("binarydata.out","wb"); for (int i=0; i<10; i++) {
for (int i=0; i<10; i++) int ival;
fwrite(&i,sizeof(int),1,binfile); fread(&ival,sizeof(int),1,binfile);
fclose(binfile); printf("%d ",ival);
}
printf("\n");
Fortran works differently: each record, that is, the output of each Write statement, has the record
length (in bytes) before and after it.
42 HPC Carpentry
2.2. Simple compilation
0000040 05 00 00 00 04 00 00 00 04 \ // binary_write.F90
00 00 00 06 00 00 00 Open(Unit=13,File="binarydata.out",Form="
unformatted")
0000050 04 00 00 00 04 00 00 00 07 \
do i=0,9
00 00 00 04 00 00 00 write(13) i
0000060 04 00 00 00 08 00 00 00 04 \ end do
00 00 00 04 00 00 00 Close(Unit=13)
0000070 09 00 00 00 04 00 00 00
In this tutorial you will mostly be concerned with executable binary files. We then distinguish between:
• program files, which are executable by themselves;
• object files, which are like bit of programs; and
• library files, which combine object files, but are not executable.
Object files come from the fact that your source is often spread over multiple source files, and these can
be compiled separately. In this way, an object file, is a piece of an executable: by itself it does nothing, but
it can be combined with other object files to form an executable.
If you have a collection of object files that you need for more than one program, it is usually a good idea to
make a library: a bundle of object files that can be used to form an executable. Often, libraries are written
by an expert and contain code for specialized purposes such as linear algebra manipulations. Libraries
are important enough that they can be commercial, to be bought if you need expert code for a certain
purpose.
You will now learn how these types of files are created and used.
2.2.1 Compilers
Your main tool for turning source into a program is the compiler. Compilers are specific to a language:
you use a different compiler for C than for Fortran. You can also have two compilers for the same lan-
guage, but from different ‘vendors’. For instance, while many people use the open source gcc or clang
compiler families, companies like Intel and IBM offer compilers that may give more efficient code on their
processors.
Victor Eijkhout 43
2. Compilers and libraries
#include <stdlib.h>
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}
Compile this program with your favorite compiler; we will use gcc in this tutorial, but substitute your
own as desired.
TACC note. On TACC clusters, the Intel compiler icc is preferred.
As a result of the compilation, a file a.out is created, which is the executable.
%% gcc hello.c
%% ./a.out
hello world
You can get a more sensible program name with the -o option:
%% gcc -o helloprog hello.c
%% ./helloprog
hello world
44 HPC Carpentry
2.2. Simple compilation
3. The main tangible result of the compilation is hello.o, the object file, containing actual machine
language. We will go into this more below. The object file is not directly readable, but later you’ll
see the nm tool that can give you some information.
4. Finally, the linker hooks together the object file and system libraries into a self-contained exe-
cutable or a library file.
As an illustration of linking, let’s consider a program where the source is contained in more than one file.
Victor Eijkhout 45
2. Compilers and libraries
However, you can also do it in steps, compiling each file separately and then linking them together. This
is illustrated in figure 2.3.
Output
[code/compile] makeseparatecompile:
clang -g -O2 -o oneprogram fooprog.o foosub.o
./oneprogram
hello world
The -c option tells the compiler to compile the source file, giving an object file. The third command then
acts as the linker, tieing together the object files into an executable.
Exercise 2.3.
Exercise for separate compilation. Structure:
• Compile in one:
icc -o program fooprog.c foosub.c
• Compile in steps:
icc -c fooprog.c
icc -c foosub.c
46 HPC Carpentry
2.2. Simple compilation
2.2.5 Paths
If your program uses libraries, maybe even of your own making, you need to tell
1. the compiler where to find the header files, and
2. the linker where to find the library file.
(C++ knows ‘header only’ libraries, so the second step is not always needed.) These locations are indicated
by commandline options:
gcc -c mysource.cpp -I${SOMELIB_INC_DIR}
gcc -o myprogram mysource.o -L${SOMELIB_LIB_DIR} -lsomelib
(Instead of listing these on the commandline every time, you would of course put them in a makefile, or
use CMake to generate such commandlines.) The compile line has the -I option for ‘include’, that specifies
the location of the library header file. You can specify multiple include options.
The link line has the -L option for ‘library’, that specifies the location of the actual library files, and the
-l option for the library name. You can specify multiple library directories. The -l option is interpreted
as follows: -l somelib makes the linker search for files
libsomelib.a
libsomelib.so
libsomelib.dylib
Lines with T indicate routines that are defined; lines with U indicate routines that are used but not define
in this file. In this case, printf is a system routine that will be supplied in the linker stage.
Victor Eijkhout 47
2. Compilers and libraries
gives:
[] nm foosub.o
0000000000000000 T __Z3barNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
U __ZNSt8ios_base4InitC1Ev
U __ZNSt8ios_base4InitD1Ev
U
__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
U __ZSt4cout
You sort of recognize the bar function in this. Add an option nm -C to get de-mangled names:
[scientific-computing-private/code/compilecxx 1354] nm -C !$
[] nm -C foosub.o
0000000000000000 T bar(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator
<char> >)
U std::ios_base::Init::Init()
U std::ios_base::Init::~Init()
U std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<
char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char
const*, long)
U std::cout
You may also want to demangle names in linker errors, when your object files leave some references
undefined. For this, add an option -Wl,-demangle to the link line:
Compile: g++ -g -O2 -std=c++17 -c undefined.cxx
Link line: g++ -Wl,-demangle -o undefined undefined.o
Undefined symbols for architecture arm64:
"f(int, int)", referenced from:
_main in undefined.o
ld: symbol(s) not found for architecture arm64
collect2: error: ld returned 1 exit status
make[2]: *** [undefined] Error 1
48 HPC Carpentry
2.2. Simple compilation
#include <cxxabi.h>
char* abi::__cxa_demangle
( const char* mangled, char* output, size_t* length,int status );
Remark 9 There are other tools with similar functionality to nm, such as otool, objdump, readelf.
Remark 10 Sometimes you will come across stripped binary file, and nm will report No symbols. In that
case nm -D may help, which displays ‘dynamic symbols’.
Victor Eijkhout 49
2. Compilers and libraries
Exercise 2.4. From level zero to one we get (in the above example; in general this depends on
the compiler) an improvement of 2× to 3×. Can you find an obvious factor of two?
Use the optimization report facility of your compiler to see what other optimizations are
applied. One of them is a good lesson in benchmark design!
Many compilers can generate a report of what optimizations they perform.
Generally, optimizations leave the semantics of your code intact. (Makes kinda sense, not?) However, at
higher levels, usually level 3, the compiler is at liberty to make transformations that are not legal according
to the language standard, but that in the majority of cases will still give the right outcome. For instance,
the C language specifies that arithmetic operations are evaluated left-to-right. Rearranging arithmetic
expressions is usually safe, but not always. Be careful when applying higher optimization levels!
See: https://stackoverflow.com/a/65017597/2044454.
2.3 Libraries
Purpose. In this section you will learn about creating libraries.
If you have written some subprograms, and you want to share them with other people (perhaps by selling
them), then handing over individual object files is inconvenient. Instead, the solution is to combine them
into a library. This section shows you the basic Unix mechanisms. You would typically use these in a
Makefile; if you use CMake it’s all done for you; see chapter 4.
50 HPC Carpentry
2.3. Libraries
First we look at static libraries, for which the archive utility ar is used. A static library is linked into your
executable, becoming part of it. This has the advantage that you can give such an executable to someone
else, and it will immediately work. On the other hand, this may lead to large executables; you will learn
about shared libraries next, which do not suffer from this problem.
Create a directory to contain your library, and create the library file there. The library can be linked into
your executable by explicitly giving its name, or by specifying a library path:
Victor Eijkhout 51
2. Compilers and libraries
Output
[code/compilecxx] staticprogram:
---- running:
hello world
---- done
The nm command tells you what’s in the library, just like it did with object files, but now it also tells you
what object files are in the library:
We show this for C:
Code: Output
[code/compilec] staticlib:
==== Making static library ====
for o in foosub.o ; do ar cr libs/libfoo.a ${o} ; done
nm libs/libfoo.a foosub.o:
0000000000000000 T bar
0000000000000000 N .debug_info_seg
U printf
For C++ we show the output in figure 2.5, where we note the -C flag for name demangling.
../lib/libfoo.so(single module):
00000fc4 t __dyld_func_lookup
52 HPC Carpentry
2.3. Libraries
00000000 t __mh_dylib_header
00000fd2 T _bar
U _printf
00001000 d dyld__mach_header
00000fb0 t dyld_stub_binding_helper
Remark 12 On Apple OS Ventura the use of LD_LIBRARY_PATH is no longer supported for security reasons,
so using the rpath is the only option.
Use the command ldd to get information about what shared libraries your executable uses. (On Mac OS X,
use otool -L instead.)
Victor Eijkhout 53
2. Compilers and libraries
Sometimes functionality is requested from a later version than is present in your system.
54 HPC Carpentry
2.3. Libraries
Code: Output
[code/compilecxx] staticlib:
==== Making static library ====
for o in foosub.o ; do ar cr libs/libfoo.a ${o} ; done
nm -C libs/libfoo.a foosub.o:
U __cxa_atexit
0000000000000000 N .debug_info_seg
U __dso_handle
U
__gxx_personality_v0
0000000000000010 t __sti__$E
0000000000000000 T bar(std::
__cxx11::basic_string<char,
std::char_traits<char>, std::
allocator<char> >)
0000000000000000 b
_INTERNALaee936d8::std::
__ioinit
0000000000000000 W std::__cxx11::
basic_string<char, std::
char_traits<char>, std::
allocator<char> >::data()
const
0000000000000000 W std::__cxx11::
basic_string<char, std::
char_traits<char>, std::
allocator<char> >::size()
const
U std::ios_base::
Init::Init()
U std::ios_base::
Init::~Init()
U std::cout
U std::
basic_ostream<char, std::
char_traits<char> >& std::
operator<< <char, std::
char_traits<char>, std::
allocator<char> >(std::
basic_ostream<char, std::
char_traits<char> >&, std::
__cxx11::basic_string<char,
std::char_traits<char>, std::
allocator<char> > const&)
Victor Eijkhout 55
2. Compilers and libraries
Output
[code/compile] dynamicprogram:
.. running by itself:
56 HPC Carpentry
Chapter 3
The Make utility helps you manage the building of projects: its main task is to facilitate rebuilding only
those parts of a multi-file project that need to be recompiled or rebuilt. This can save lots of time, since it
can replace a minutes-long full installation by a single file compilation.
Make is a Unix utility with a long history, and traditionally there are variants with slightly different
behavior, for instance on the various flavors of Unix such as HP-UX, AUX, IRIX. These days, it is advisable,
no matter the platform, to use the GNU version of Make which has some very powerful extensions; it is
available on all Unix platforms (on Linux it is the only available variant), and it is a de facto standard. The
manual is available at http://www.gnu.org/software/make/manual/make.html, or you can read the
book [14].
The examples in this tutorial will be for the C and Fortran languages, but Make can work with any lan-
guage, and in fact with things like TEX that are not really a language at all; see section 3.7.
3.1.1 C++
Make the following files:
foo.cxx
#include <iostream>
using std::cout;
#include "bar.h"
int main()
{
int a=2;
57
3. Managing projects with Make
bar.cxx
#include "bar.h"
int bar(int a)
{
int b=10;
return b*a;
}
bar.h
int bar(int);
and a makefile:
Makefile
fooprog : foo.o bar.o
icpc -o fooprog foo.o bar.o
foo.o : foo.cxx
icpc -c foo.cxx
bar.o : bar.cxx
icpc -c bar.cxx
clean :
rm -f *.o fooprog
58 HPC Carpentry
3.1. A simple example
Expected outcome. The above rules are applied: make without arguments tries to build the first
target, fooprog. In order to build this, it needs the prerequisites foo.o and bar.o, which do not
exist. However, there are rules for making them, which make recursively invokes. Hence you see
two compilations, for foo.o and bar.o, and a link command for fooprog.
Caveats. Typos in the makefile or in file names can cause various errors. In particular, make sure
you use tabs and not spaces for the rule lines. Unfortunately, debugging a makefile is not simple.
Make’s error message will usually give you the line number in the make file where the error was
detected.
Exercise. Do make clean, followed by mv foo.cxx boo.cxx and make again. Explain the error mes-
sage. Restore the original file name.
Expected outcome. Make will complain that there is no rule to make foo.cxx. This error was
caused when foo.cxx was a prerequisite for making foo.o, and was found not to exist. Make
then went looking for a rule to make it and no rule for creating .cxx files exists.
Now add a second argument to the function bar. This would require you to edit all of bar.cxx, bar.h,
and foo.cxx, but let’s say we forget to edit the last two, so only edit bar.cxx However, it also requires
you to edit foo.c, but let us for now ‘forget’ to do that. We will see how Make can help you find the
resulting error.
Exercise. Update the header file, and call make again. What happens, and what had you been hoping
would happen?
Expected outcome. Only the linker stage is done, and it gives the same error about an unresolved
reference. Were you hoping that the main program would be recompiled?
Caveats.
The way out of this problem is to tie the header file to the source files in the makefile.
In the makefile, change the line
foo.o : foo.cxx
to
foo.o : foo.cxx bar.h
which adds bar.h as a prerequisite for foo.o. This means that, in this case where foo.o already exists,
Make will check that foo.o is not older than any of its prerequisites. Since bar.h has been edited, it is
younger than foo.o, so foo.o needs to be reconstructed.
Remark 13 As already noted above, in C++ fewer errors are caught by this mechanism than in C, because
of polymorphism. You might wonder if it would be possible to generate header files automatically. This is of
course possible with suitable shell scripts, but tools such as Make (or CMake) do not have this built in.
Victor Eijkhout 59
3. Managing projects with Make
3.1.2 C
Make the following files:
foo.c
#include "bar.h"
int c=3;
int d=4;
int main()
{
int a=2;
return(bar(a*c*d));
}
bar.c
#include "bar.h"
int bar(int a)
{
int b=10;
return(b*a);
}
bar.h
extern int bar(int);
and a makefile:
Makefile
fooprog : foo.o bar.o
cc -o fooprog foo.o bar.o
foo.o : foo.c
cc -c foo.c
bar.o : bar.c
cc -c bar.c
clean :
rm -f *.o fooprog
60 HPC Carpentry
3.1. A simple example
Expected outcome. The above rules are applied: make without arguments tries to build the first
target, fooprog. In order to build this, it needs the prerequisites foo.o and bar.o, which do not
exist. However, there are rules for making them, which make recursively invokes. Hence you see
two compilations, for foo.o and bar.o, and a link command for fooprog.
Caveats. Typos in the makefile or in file names can cause various errors. In particular, make sure
you use tabs and not spaces for the rule lines. Unfortunately, debugging a makefile is not simple.
Make’s error message will usually give you the line number in the make file where the error was
detected.
Exercise. Do make clean, followed by mv foo.c boo.c and make again. Explain the error message.
Restore the original file name.
Expected outcome. Make will complain that there is no rule to make foo.c. This error was caused
when foo.c was a prerequisite for making foo.o, and was found not to exist. Make then went
looking for a rule to make it and no rule for creating .c files exists.
Now add a second argument to the function bar. This requires you to edit bar.c and bar.h: go ahead
and make these edits. However, it also requires you to edit foo.c, but let us for now ‘forget’ to do that.
We will see how Make can help you find the resulting error.
Expected outcome. Even through conceptually foo.c would need to be recompiled since it uses
the bar function, Make did not do so because the makefile had no rule that forced it.
to
foo.o : foo.c bar.h
which adds bar.h as a prerequisite for foo.o. This means that, in this case where foo.o already exists,
Make will check that foo.o is not older than any of its prerequisites. Since bar.h has been edited, it is
younger than foo.o, so foo.o needs to be reconstructed.
Exercise. Confirm that the new makefile indeed causes foo.o to be recompiled if bar.h is changed. This
compilation will now give an error, since you ‘forgot’ to edit the use of the bar function.
Victor Eijkhout 61
3. Managing projects with Make
3.1.3 Fortran
Make the following files:
foomain.F
call func(1,2)
end program
foomod.F
contains
subroutine func(a,b)
integer a,b
print *,a,b,c
end subroutine func
end module
and a makefile:
Makefile
fooprog : foomain.o foomod.o
gfortran -o fooprog foo.o foomod.o
foomain.o : foomain.F
gfortran -c foomain.F
foomod.o : foomod.F
gfortran -c foomod.F
clean :
rm -f *.o fooprog
If you call make, the first rule in the makefile is executed. Do this, and explain what happens.
Exercise. Do make clean, followed by mv foomod.c boomod.c and make again. Explain the error mes-
sage. Restore the original file name.
Expected outcome. Make will complain that there is no rule to make foomod.c. This error was
caused when foomod.c was a prerequisite for foomod.o, and was found not to exist. Make then
went looking for a rule to make it, and no rule for making .F files exists.
62 HPC Carpentry
3.2. Some general remarks
Expected outcome. Even through conceptually foomain.F would need to be recompiled, Make did
not do so because the makefile had no rule that forced it.
to
foomain.o : foomain.F foomod.o
which adds foomod.o as a prerequisite for foomain.o. This means that, in this case where foomain.o
already exists, Make will check that foomain.o is not older than any of its prerequisites. Recursively,
Make will then check if foomode.o needs to be updated, which is indeed the case. After recompiling
foomode.F, foomode.o is younger than foomain.o, so foomain.o will be reconstructed.
Exercise. Confirm that the corrected makefile indeed causes foomain.F to be recompiled.
Victor Eijkhout 63
3. Managing projects with Make
Exercise. Edit your makefile as indicated. First do make clean, then make foo (C) or make fooprog
(Fortran).
Expected outcome. You should see the exact same compile and link lines as before.
Caveats. Unlike in the shell, where braces are optional, variable names in a makefile have to be in
braces or parentheses. Experiment with what happens if you forget the braces around a variable
name.
One advantage of using variables is that you can now change the compiler from the commandline:
make CC="icc -O2"
make FC="gfortran -g"
Exercise. Invoke Make as suggested (after make clean). Do you see the difference in your screen output?
Expected outcome. The compile lines now show the added compiler option -O2 or -g.
64 HPC Carpentry
3.3. Variables and template rules
and use this variable instead of the program name in your makefile. This makes it easier to change your
mind about the name of the executable later.
Exercise. Edit your makefile to add this variable definition, and use it instead of the literal program name.
Construct a commandline so that your makefile will build the executable fooprog_v2.
Expected outcome. You need to specify the THEPROGRAM variable on the commandline using the
syntax make VAR=value.
Caveats. Make sure that there are no spaces around the equals sign in your commandline.
where the object file depends on the source file and another file.
We can take the commonalities and summarize them in one template rule1 :
%.o : %.c
${CC} -c $<
%.o : %.F
${FC} -c $<
This states that any object file depends on the C or Fortran file with the same base name. To regenerate
the object file, invoke the C or Fortran compiler with the -c flag. These template rules can function as a
replacement for the multiple specific targets in the makefiles above, except for the rule for foo.o.
The dependence of foo.o on bar.h, or foomain.o on foomod.o, can be handled by adding a rule
# C
foo.o : bar.h
# Fortran
foomain.o : foomod.o
1. This mechanism is the first instance you’ll see that only exists in GNU make, though in this particular case there is a similar
mechanism in standard make. That will not be the case for the wildcard mechanism in the next section.
Victor Eijkhout 65
3. Managing projects with Make
with no further instructions. This rule states, ‘if file bar.h or foomod.o changed, file foo.o or foomain.o
needs updating’ too. Make will then search the makefile for a different rule that states how this updating
is done, and it will find the template rule.
Figure 3.1: File structure with main program and two library files.
Changing a source file only recompiles that files: clang++ -o main libmain.o libf.o
libg.o
clang++ -c libf.cxx
clang++ -o main \ Changing the libapi.h recompiles everything:
libmain.o libf.o libg.o
clang++ -c libmain.cxx
Changing the implementation header only recom- clang++ -c libf.cxx
piles the library: clang++ -c libg.cxx
clang++ -o main libmain.o libf.o
clang++ -c libf.cxx libg.o
clang++ -c libg.cxx
For Fortran we don’t have header files so we use modules everywhere; figure 3.3. If you know how to use
submodules, a Fortran2008 feature, you can make the next exercise as efficient as the C version.
Exercise 3.2. Write a makefile for the following structure:
• There is one main file libmain.f90, that uses a module api.f90;
• There are two low level modules libf.f90 libg.f90 that are used in api.f90.
If you use modules, you’ll likely be doing more compilation than needed. For the optimal
solution, use submodules.
66 HPC Carpentry
3.3. Variables and template rules
Figure 3.3: File structure with main program and two library files.
3.3.3 Wildcards
Your makefile now uses one general rule for compiling any source file. Often, your source files will be
all the .c or .F files in your directory, so is there a way to state ‘compile everything in this directory’?
Indeed there is.
Add the following lines to your makefile, and use the variable COBJECTS or FOBJECTS wherever appro-
priate. The command wildcard gives the result of ls, and you can manipulate file names with patsubst.
# wildcard: find all files that match a pattern
CSOURCES := ${wildcard *.c}
# pattern substitution: replace one pattern string by another
COBJECTS := ${patsubst %.c,%.o,${SRC}}
Victor Eijkhout 67
3. Managing projects with Make
3.3.5 Conditionals
There are various ways of making the behavior of a makefile dynamic. You can for instance put a shell
conditional in an action line. However, this can make for a cluttered makefile; an easier way is to use
makefile conditionals. There are two types of conditionals: tests on string equality, and tests on environ-
ment variables.
The first type looks like
ifeq "${HOME}" "/home/thisisme"
# case where the executing user is me
else ifeq "${HOME}" "/home/buddyofmine"
# case for other user
else
# case where it's someone else
endif
68 HPC Carpentry
3.4. Miscellania
The text in the true and false part can be most any part of a makefile. For instance, it is possible to let one
of the action lines in a rule be conditionally included. However, most of the times you will use conditionals
to make the definition of variables dependent on some condition.
Exercise. Let’s say you want to use your makefile at home and at work. At work, your employer has a
paid license to the Intel compiler icc, but at home you use the open source Gnu compiler gcc. Write a
makefile that will work in both places, setting the appropriate value for CC.
3.4 Miscellania
3.4.1 Phony targets
The example makefile contained a target clean. This uses the Make mechanisms to accomplish some
actions that are not related to file creation: calling make clean causes Make to reason ‘there is no file
called clean, so the following instructions need to be performed’. However, this does not actually cause
a file clean to spring into being, so calling make clean again will make the same instructions being
executed.
To indicate that this rule does not actually make the target, you use the .PHONY keyword:
.PHONY : clean
Most of the time, the makefile will actually work fine without this declaration, but the main benefit of
declaring a target to be phony is that the Make rule will still work, even if you have a file (or folder) named
clean.
3.4.2 Directories
It’s a common strategy to have a directory for temporary material such as object files. So you would have
a rule
obj/%.o : %.c
${CC} -c $< -o $@
This raises the question how the obj directory is created. You could do:
obj/%.o : %.c
mkdir -p obj
${CC} -c $< -o $@
Victor Eijkhout 69
3. Managing projects with Make
obj :
mkdir -p obj
obj/%.o : %.c | obj
${CC} -c $< -o $@
This only tests for the existence of the object directory, but not its timestamp.
and likewise for make other. What goes wrong here is the use of $@.o as prerequisite. In Gnu Make,
you can repair this as follows2 :
.SECONDEXPANSION:
${PROGS} : $$@.o
${CC} -o $@ $@.o ${list of libraries goes here}
Exercise. Write a second main program foosecond.c or foosecond.F, and change your makefile so
that the calls make foo and make foosecond both use the same rule.
2. Technical explanation: Make will now look at lines twice: the first time $$ gets converted to a single $, and in the second
pass $@ becomes the name of the target.
70 HPC Carpentry
3.5. Shell scripting in a Makefile
In the makefiles you have seen so far, the command part was a single line. You can actually have as many
lines there as you want. For example, let us make a rule for making backups of the program you are
building.
Add a backup rule to your makefile. The first thing it needs to do is make a backup directory:
.PHONY : backup
backup :
if [ ! -d backup ] ; then
mkdir backup
fi
Did you type this? Unfortunately it does not work: every line in the command part of a makefile rule gets
executed as a single program. Therefore, you need to write the whole command on one line:
backup :
if [ ! -d backup ] ; then mkdir backup ; fi
(Writing a long command on a single is only possible in the bash shell, not in the csh shell. This is one
reason for not using the latter.)
Next we do the actual copy:
backup :
if [ ! -d backup ] ; then mkdir backup ; fi
cp myprog backup/myprog
But this backup scheme only saves one version. Let us make a version that has the date in the name of
the saved program.
The Unix date command can customize its output by accepting a format string. Type the following:
date This can be used in the makefile.
Exercise. Edit the cp command line so that the name of the backup file includes the current date.
Expected outcome. Hint: you need the backquote. Consult the Unix tutorial, section 1.5.3, if you
do not remember what backquotes do.
If you are defining shell variables in the command section of a makefile rule, you need to be aware of the
following. Extend your backup rule with a loop to copy the object files:
Victor Eijkhout 71
3. Managing projects with Make
(This is not the best way to copy, but we use it for the purpose of demonstration.) This leads to an error
message, caused by the fact that Make interprets $f as an environment variable of the outer process. What
works is:
backup :
if [ ! -d backup ] ; then mkdir backup ; fi
cp myprog backup/myprog
for f in ${OBJS} ; do \
cp $$f backup ; \
done
(In this case Make replaces the double dollar by a single one when it scans the commandline. During the
execution of the commandline, $f then expands to the proper filename.)
and keep repeating this. There is a danger in this: if the make fails, for instance because of com-
pilation problems, your program will still be executed. Instead, write
make myprogram && ./myprogram -options
3. There is a convention among software developers that a package can be installed by the sequence ./configure ; make ;
make install, meaning: Configure the build process for this computer, Do the actual build, Copy files to some system directory
such as /usr/bin.
72 HPC Carpentry
3.7. A Makefile for LATEX
info :
@echo "The following are possible:"
@echo " make"
@echo " make clean"
Now make without explicit targets informs you of the capabilities of the makefile.
If your makefile gets longer, you might want to document each section like this. This runs into a problem:
you can not have two rules with the same target, info in this case. However, if you use a double colon it
is possible. Your makefile would have the following structure:
info ::
@echo "The following target are available:"
@echo " make install"
install :
# ..... instructions for installing
info ::
@echo " make clean"
clean :
# ..... instructions for cleaning
%.pdf : %.tex
pdflatex $<
The command make myfile.pdf will invoke pdflatex myfile.tex, if needed, once. Next we repeat
invoking pdflatex until the log file no longer reports that further runs are needed:
%.pdf : %.tex
pdflatex $<
while [ `cat ${basename $@}.log | grep "Rerun to get" \
| wc -l` -gt 0 ] ; do \
pdflatex $< ; \
done
We use the ${basename fn} macro to extract the base name without extension from the target name.
In case the document has a bibliography or index, we run bibtex and makeindex.
Victor Eijkhout 73
3. Managing projects with Make
%.pdf : %.tex
pdflatex ${basename $@}
-bibtex ${basename $@}
-makeindex ${basename $@}
while [ `cat ${basename $@}.log | grep "Rerun to get" \
| wc -l` -gt 0 ] ; do \
pdflatex ${basename $@} ; \
done
The minus sign at the start of the line means that Make should not exit if these commands fail.
Finally, we would like to use Make’s facility for taking dependencies into account. We could write a
makefile that has the usual rules
mainfile.pdf : mainfile.tex includefile.tex
but we can also discover the include files explicitly. The following makefile is invoked with
make pdf TEXTFILE=mainfile
The pdf rule then uses some shell scripting to discover the include files (but not recursively), and it calls
Make again, invoking another rule, and passing the dependencies explicitly.
pdf :
export includes=`grep "^.input " ${TEXFILE}.tex \
| awk '{v=v FS $$2".tex"} END {print v}'` ; \
${MAKE} ${TEXFILE}.pdf INCLUDES="$$includes"
This shell scripting can also be done outside the makefile, generating the makefile dynamically.
74 HPC Carpentry
Chapter 4
Some people create the build directory in the source tree, in which case the CMake command is
cmake ..
Others put the build directory next to the source, in which case:
cmake ../src_directory
2. The build stage. Here the installation-specific compilation in the build directory is performed.
With Make as the ‘generator’ this would be
cd build
make
Alternatively, you could use generators such as ninja, Visual Studio, or XCode:
cmake -G ninja
## the usual arguments
3. The install stage. This can move binary files to a permanent location, such as putting library files
in /usr/lib:
make install
or
cmake --install <build directory>
75
4. The Cmake build system
General directives
cmake_minimum_required specify minimum cmake version
project name and version number of this project
install specify directory where to install targets
Project building directives
add_executable specify executable name
add_library specify library name
add_subdirectory specify subdirectory where cmake also needs to
run
target_sources specify sources for a target
target_link_libraries specify executable and libraries to link into it
target_include_directories specify include directories, privately or publicly
find_package other package to use in this build
Utility stuff
target_compile_options literal options to include
target_compile_features things that will be translated by cmake into op-
tions
target_compile_definitions macro definitions to be set private or publicly
file define macro as file list
message Diagnostic to print, subject to level specification
Control
if() else() endif() conditional
76 HPC Carpentry
4.1. CMake as build system
However, the install location already has to be set in the configuration stage. We will see later in
detail how this is done.
Summarizing, the out-of-source workflow as advocated in this tutorial is
ls some_package_1.0.0 # we are outside the source
ls some_package_1.0.0/CMakeLists.txt # source contains cmake file
mkdir builddir && cd builddir # goto build location
cmake -D CMAKE_INSTALL_PREFIX=../installdir \
../some_package_1.0.0
make
make install
dir dir
src src
build build
install install
Usage requirements:
target_some_requirement( <target> INTERFACE <requirements> )
Combined:
target_some_requirement( <target> PUBLIC <requirements> )
Victor Eijkhout 77
4. The Cmake build system
4.1.2 Languages
CMake is largely aimed at C++, but it easily supports C as well. For Fortran support, first do
enable_language(Fortran)
Note that capitalization: this also holds for all variables such as CMAKE_Fortran_COMPILER.
CMake is driven by the CMakeLists.txt file. This needs to be in the root directory of your project. (You
can additionally have files by that name in subdirectories.)
Since CMake has changed quite a bit over the years, and is still evolving, it is a good idea to start each
script with a declaration of the (minimum) required version:
cmake_minimum_required( VERSION 3.12 )
You also need to declare a project name and version, which need not correspond to any file names:
project( myproject VERSION 1.0 )
78 HPC Carpentry
4.2. Examples cases
Often, the name of the executable is the name of the project, CMakeLists.txt
so you’d specify:
add_executable( ${PROJECT_NAME} ) program.cxx
Makefile
In order to move the executable to the install location, you need a clause
install( TARGETS myprogram DESTINATION . ) myprogram
Without the DESTINATION clause, a default bin directory will be created; specifyinglots
DESTINATION
more foo will
put the target in a foo sub-directory of the installation directory.
In the figure on the right we have also indicated the build install
directory, which from now on we will not show again. It con-
tains automatically generated files that are hard to decyper, or myprogram
debug. Yes, there is a Makefile, but even for simple projects
this is too complicated to debug by hand if your CMake instal-
lation misbehaves.
Here is the full CMakeLists.txt:
cmake_minimum_required( VERSION 3.12 )
project( singleprogram VERSION 1.0 )
add_executable( program )
target_sources( program PRIVATE program.cxx )
install( TARGETS program DESTINATION . )
Victor Eijkhout 79
4. The Cmake build system
You can also put the non-main source files in a separate direc- CMakeLists.txt
tory:
add_executable( program ) program.cxx
target_sources( program PRIVATE program.cxx lib/aux.cxx lib/aux.h )
lib
However, often you will build libraries. We start by making a lib directory and indicating that header
files need to be found there:
target_include_directories( program PRIVATE lib )
aux.cxx
Next, you need to link that library into the program: program
target_link_libraries( program PRIVATE auxlib )
The PRIVATE clause means that the library is only for purposes of building the executable. (Use PUBLIC to
have the library be included in the installation; we will explore that in section 4.2.2.3.)
The full CMakeLists.txt:
cmake_minimum_required( VERSION 3.12 )
project( cmakeprogram VERSION 1.0 )
add_executable( program )
target_sources( program PRIVATE program.cxx )
80 HPC Carpentry
4.2. Examples cases
Note that private shared libraries make no sense, as they will give runtime unresolved references.
On the other hand, if we edit a header file, the main program needs to be recompiled too:
----------------
touch a source file and make:
Consolidate compiler generated dependencies of target auxlib
[ 25%] Building CXX object CMakeFiles/auxlib.dir/aux.cxx.o
[ 50%] Linking CXX static library libauxlib.a
[ 50%] Built target auxlib
Consolidate compiler generated dependencies of target program
[ 75%] Linking CXX executable program
[100%] Built target program
Victor Eijkhout 81
4. The Cmake build system
CMakeLists.txt
or adding a runtime flag
cmake -D BUILD_SHARED_LIBS=TRUE program.cxx
Note that we give a path to the library files. This is interpreted relative to the current directory, (as of
CMake-3.13); this current directory is available as CMAKE_CURRENT_SOURCE_DIR.
82 HPC Carpentry
4.2. Examples cases
#include "aux.h"
CMakeLists.txt
int main() {
aux1(); program.cxx
aux2();
return 0;
} src
Usually, when you start making such directory structure, you will also have sources in subdirectories. If
you only need to compile them into the main executable, you could list them into a variable
set( SOURCES program.cxx src/aux.cxx )
and use that variable. However, this is deprecated practice; it is recommended to use target_sources:
target_sources( program PRIVATE src/aux1.cxx src/aux2.cxx )
Victor Eijkhout 83
4. The Cmake build system
add_executable( program )
target_sources( program PRIVATE program.cxx )
## target_sources( program PRIVATE src/aux1.cxx src/aux2.cxx )
file( GLOB AUX_FILES "src/*.cxx" )
target_sources( program PRIVATE ${AUX_FILES} )
target_include_directories(
program PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}/inc" )
84 HPC Carpentry
4.2. Examples cases
to build the library file from the sources indicated, and to in- install
stall it in a lib subdirectory.
We also add a clause to install the header files in an include directory: program
install( FILES aux.h DESTINATION include )
lib
For installing multiple files, use
install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} libauxlib.so
DESTINATION ${LIBRARY_OUTPUT_PATH}
FILES_MATCHING PATTERN "*.h") include
One problem is to tell the executable where to find the library. For this we use the rpath mechanism. (See
aux.h
section 2.3.3.) By default, CMake sets it so that the executable in the build location can find the library. If
you use a non-trivial install prefix, the following lines work:
set( CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib" )
set( CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE )
Victor Eijkhout 85
4. The Cmake build system
add_executable( program )
target_sources( program PRIVATE program.cxx )
add_subdirectory( lib )
target_include_directories(
auxlib PUBLIC "${CMAKE_CURRECT_SOURCE_DIR}" )
target_link_libraries(
program PUBLIC auxlib )
86 HPC Carpentry
4.3. Finding and using external packages
Victor Eijkhout 87
4. The Cmake build system
add_executable( program )
target_sources( program PRIVATE program.cxx )
target_include_directories(
program PUBLIC
${AUX_INCLUDE_DIR} )
target_link_libraries( program PUBLIC auxlib )
target_link_directories(
program PUBLIC
${AUX_LIBRARY_DIR} )
install( TARGETS program DESTINATION . )
Some libraries come with a FOOConfig.cmake file, which is searched on the CMAKE_PREFIX_PATH through
find_library. You typically set this variable to the root of the package installation, and CMake will find
the directory of the .cmake file.
You can test the variables set:
find_library( FOOLIB foo )
if (FOOLIB)
target_link_libraries( myapp PRIVATE ${FOOLIB} )
else()
# throw an error
endif()
88 HPC Carpentry
4.3. Finding and using external packages
which you can then use in the target_include_directories and target_link_directories target_link_libraries
commands.
Name: @PROJECT_NAME@
Description: @CMAKE_PROJECT_DESCRIPTION@
Version: @PROJECT_VERSION@
Cflags: -I${includedir}
Libs: -L${libdir} -l@libtarget@
Here the at-signs delimit CMake macros that will be substituted when the .pc file is generated; Dollar
macros go into the file unchanged.
Generating the file is done by the following lines in the CMake configuration:
set( libtarget auxlib )
configure_file(
${CMAKE_CURRENT_SOURCE_DIR}/${PROJECT_NAME}.pc.in
${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}.pc
@ONLY
)
install(
FILES ${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}.pc
DESTINATION share/pkgconfig
)
Victor Eijkhout 89
4. The Cmake build system
4.3.5 Libraries
4.3.5.1 Example: MPI
(The files for this example are in tutorials/cmake/mpiprog.)
While many MPI implementations have a .pc file, it’s better to use the FindMPI module. This package
defines a number of variables that can be used to query the MPI found; for details see https://cmake.
org/cmake/help/latest/module/FindMPI.html Sometimes it’s necessary to set the MPI_HOME envi-
ronment variable to aid in discovery of the MPI package.
C version:
cmake_minimum_required( VERSION 3.12 )
project( cxxprogram VERSION 1.0 )
Fortran version:
cmake_minimum_required( VERSION 3.12 )
project( ${PROJECT_NAME} VERSION 1.0 )
enable_language(Fortran)
find_package( MPI )
if( MPI_Fortran_HAVE_F08_MODULE )
else()
message( FATAL_ERROR "No f08 module for this MPI" )
endif()
90 HPC Carpentry
4.3. Finding and using external packages
4.3.5.3 CUDA
FindCUDA
Changed in version 3.27: This module is available only if policy CMP0146 is not set to N
Deprecated since version 3.10: Do not use this module in new code.
New in version 3.17: To find and use the CUDA toolkit libraries manually, use the FindCU
Victor Eijkhout 91
4. The Cmake build system
#include <vector>
using namespace std;
#include "mkl_cblas.h"
int main() {
vector<double> values{1,2,3,2,1};
auto maxloc = cblas_idamax ( values.size(),values.data(),1);
cout << "Max abs at: " << maxloc << " (s/b 2)" << '\n';
return 0;
}
The following configuration file lists the various options and such:
cmake_minimum_required( VERSION 3.12 )
project( mklconfigfind VERSION 1.0 )
## https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/
top/getting-started/cmake-config-for-onemkl.html
92 HPC Carpentry
4.3. Finding and using external packages
target_link_directories(
program PUBLIC
${PETSC_LIBRARY_DIRS} )
target_link_libraries(
program PUBLIC petsc )
Victor Eijkhout 93
4. The Cmake build system
add_subdirectory( prolib )
target_link_libraries( program PUBLIC prolib )
Library file:
project( prolib )
94 HPC Carpentry
4.4. Customizing the compilation process
Alternatively, set environment variables CC, CXX, FC by the explicit paths of the compilers. For examples,
for Intel compilers:
export CC=`which icc`
export CXX=`which icpc`
export FC=`which ifort`
The variable CMAKE_CXX_COMPILE_FEATURES contains the list of all features you can set.
Optimization flags can be set by specifying the CMAKE_BUILD_TYPE:
Victor Eijkhout 95
4. The Cmake build system
Unfortunately, this seems to be the only way to influence optimization flags, other than explicitly setting
compiler flags; see next point.
The CMakeLists.txt file is a script, though it doesn’t much look like it.
• Instructions consist of a command, followed by a parenthesized list of arguments.
• (All arguments are strings: there are no numbers.)
• Each command needs to start on a new line, but otherwise whitespace and line breaks are ignored.
Comments start with a hash character.
96 HPC Carpentry
4.5. CMake scripting
Instead of STATUS you can specify other logging levels (this parameter is actually called ‘mode’ in the
documentation); running for instance
cmake --log-level=NOTICE
4.5.3 Variables
Variables are set with set, or can be given on the commandline:
cmake -D MYVAR=myvalue
Variables can also be queried by the CMake script using the option command:
option( SOME_FLAG "A flag that has some function" defaultvalue )
Some variables are set by other commands. For instance the project command sets PROJECT_NAME and
PROJECT_VERSION.
Victor Eijkhout 97
4. The Cmake build system
4.5.4.2 Looping
while( myvalue LESS 50 )
message( stuff )
endwhile()
98 HPC Carpentry
Chapter 5
In this tutorial you will learn git, the currently most popular version control (also source code control or
revision control) systems. Other similar systems are Mercurial and Microsoft Sharepoint. Earlier systems
were SCCS, CVS, Subversion, Bitkeeper.
Version control is a system that tracks the history of a software project, by recording the successive
versions of the files of the project. These versions are recorded in a repository, either on the machine you
are working on, or remotely.
This has many practical advantages:
• It becomes possible to undo changes;
• Sharing a repository with another developer makes collaboration possible, including multiple
edits on the same file.
• A repository records the history of the project.
• You can have multiple versions of the project, for instance for exploring new features, or for
customization for certain users.
The use of a version control system is industry standard practice, and git is by far the most popular system
these days.
99
5. Source code control through Git
branches for exploring new features. These branches can be merged when you’re satisfied that a new
feature has been sufficiently tested.
5.2 Git
This lab should be done two people, to simulate a group of programmers working on a joint project. You
can also do this on your own by using two clones of the repository, preferably opening two windows on
your computer.
This gives you a directory with the contents of the repository. If you leave out the local name, the directory
will have the name of the repository.
Cmd >> git clone https://github.com/TACC/empty.git
↪empty
Out >>
Cloning into 'empty'...
warning: You appear to have cloned an empty repository.
Cmd >> cd empty
Cmd >> ls -a
Out >>
. Clone an empty repository and
.. check that it is indeed empty
.git
Cmd >> git status
Out >>
On branch main
No commits yet
nothing to commit (create/copy files and use "git add"
↪to track)
As you see, even an empty repository contains a directory .git. This contains bookkeeping information
about your repository; you will hardly ever look into it.
The disadvantage of this method, over cloning an empty repo, is that you now have to connect your
directory to a remote repository. See section 5.6.
For good measure, check the name again with git status.
Cmd >> git branch -m main
Cmd >> git status
Out >>
On branch main Check branch names
No commits yet
nothing to commit (create/copy files and use "git add"
↪to track)
You need to git add on your file to tell git that the file belongs to the repository. (You can add a single file,
or use a wildcard to add multiple.) However, this does not actually add the file: it moves it to the staging
area. The status now says that it is a change to be committed.
Cmd >> git add firstfile
Cmd >> git status
Out >>
On branch main
No commits yet
Add the file to the local repository
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: firstfile
If you need to check what changes you have made, git diff on that file will tell you the differences the
between the edited, but not yet added or committed, file and the previous commit version.
Cmd >> git diff firstfile
Out >>
diff --git a/firstfile b/firstfile
index 257cc56..3bd1f0e 100644
--- a/firstfile See what the changes were wrt the
+++ b/firstfile previously commit version.
@@ -1 +1,2 @@
foo
+bar
You now need to repeat git add and git commit on that file.
Cmd >> git add firstfile
Cmd >> git commit -m "changes to first file"
Out >>
[main b1edf77] changes to first file
1 file changed, 1 insertion(+) Commit the changes to the local
Cmd >> git status repo.
Out >>
On branch main
nothing to commit, working tree clean
The changes are now in your local repo; you need to git push to update the upstream repo; see section 5.6.
Doing git log will give you the history of the repository, listing the commit numbers, and the messages
that you entered on those commits.
Doing git checkout on that file gets the last committed version and puts it back in your working directory.
Cmd >> git checkout firstfile
Out >>
Updated 1 path from the index
Cmd >> cat firstfile
Out >>
foo Restore previously committed ver-
bar sion.
Cmd >> git status
Out >>
On branch main
nothing to commit, working tree clean
Now do:
git checkout sdf234987238947 -- myfile myotherfile
This will restore the file to its state before the last add and commit, and it will in generally leave the
repository back in the state it was before that commit.
Cmd >> cat firstfile
Out >>
foo
Cmd >> git status See that we have indeed undone the
Out >> commit.
On branch main
nothing to commit, working tree clean
However, the log will show that you have reverted a certain commit.
Cmd >> git log
Out >>
commit 3dca724a1902e8a5e3dba007c325542c6753a424
Author: Victor Eijkhout <eijkhout@tacc.utexas.edu>
Date: Sat Jan 29 14:14:42 2022 -0600
Revert "changes to first file"
The git reset command can also be used for various types of undo.
We have some changes, added to the local repository with git add and git commit
Cmd >> git add newfile && git commit -m "adding first
↪file"
Out >>
[main 8ce1de4] adding first file
Committed changes.
1 file changed, 1 insertion(+)
create mode 100644 newfile
If the repository was created with git init, we need to connect it to some remote repository with
git remote add servername url
Finally, you can git push committed changes to this remote. Git doesn’t just push everything here: since
you can have multiple branches locally, and multiple upstreams remotely, you intially specify both:
git push -u servername branchname
but when you git push for the first time you get some permission-related errors.
Do
git remote -v
# output: origin https://username@bitbucket.org/username/reponame.git
git remote set-url origin git@bitbucket.org:username/reponame.git
Create another clone in person2. Normally the cloned repositories would be two user accounts, or the
accounts of one user on two machines.
Cmd >> git clone git@github.com:TACC/tinker.git person2
Out >> Person 2 makes a clone.
Cloning into 'person2'...
Now the first user creates a file, adds, commits, and pushes it. (This of course requires an upstream to be
set, but since we did a git clone, this is automatically done.)
Cmd >> ( cd person1 && echo 123 >> p1 && git add p1 &&
↪git commit -m "add p1" && git push )
Out >>
[main 6f6b126] add p1
1 file changed, 1 insertion(+)
Person 1 adds a file and pushes it.
create mode 100644 p1
To github.com:TACC/tinker.git
8863559..6f6b126 main -> main
to get these changes. Again, because we create the local repository by git clone it is clear where the pull
is coming from. The pull message will tell us what new files are created, or how many other files were
changes.
Cmd >> ( cd person2 && git pull )
Out >>
From github.com:TACC/tinker
8863559..6f6b126 main -> origin/main
Updating 8863559..6f6b126 Person 2 pulls, getting the new file.
Fast-forward
p1 | 1 +
1 file changed, 1 insertion(+)
create mode 100644 p1
The first user makes an edit on the first line; we confirm the state of the file;
Cmd >> ( cd person1 && sed -i -e '1s/1/one/' fourlines
↪&& cat fourlines )
Out >>
one Person 1 makes a change.
2
3
4
The other user also makes a change, but on line 4, so that there is no conflict;
Cmd >> ( cd person2 && sed -i -e '4s/4/four/'
↪fourlines && cat fourlines )
Out >>
1 Person 2 makes a different change to
2 the same file.
3
four
This change is added with git add and git commit, but we proceed more cautiously in pushing: first we
pull any changes made by others with
git pull --no-edit
git push
Cmd >> ( cd person2 && git add fourlines && git commit
↪-m "edit line four" && git pull --no-edit && git
↪push )
Out >>
[main 27fb2b2] edit line four
1 file changed, 1 insertion(+), 1 deletion(-)
From github.com:TACC/tinker
fdd70b7..6767e3f main -> origin/main
This change does not conflict, we can
Auto-merging fourlines pull/push.
Merge made by the 'recursive' strategy.
fourlines | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
To github.com:TACC/tinker.git
6767e3f..62bd424 main -> main
Now if the first user does a pull, they see all the merged changes.
Cmd >> ( cd person1 && git pull && cat fourlines )
Out >>
From github.com:TACC/tinker
6767e3f..62bd424 main -> origin/main
Updating 6767e3f..62bd424
Fast-forward
fourlines | 2 +-
Person 1 pulls to get all the changes.
1 file changed, 1 insertion(+), 1 deletion(-)
one
2
3
four
In the meantime, developer 2 makes another change, to the original file. This change can be added and
committed to the local repository without any problem.
Cmd >> ( cd person2 && sed -i -e '2s/2/two/' fourlines
↪&& cat fourlines && git add fourlines && git
↪commit -m "edit line two" )
Out >>
1
two
Change the 2 on line two to two.
3 We add and commit this to the local
4 repository.
[main c9b6ded] edit line two
1 file changed, 1 insertion(+), 1 deletion(-)
However, if we try to git push this change to the remote repository, we get an error that the remote is
ahead of the local repository. So we first pull the state of the remote repository. In the previous section
this led to an automatic merge; not so here.
You can now edit the file by hand, or using some merge tool.
Cmd >> ( cd person2 && cat fourlines )
Out >>
<<<<<<< HEAD
1
two
======= In between the chevron’ed lines you
one first get the HEAD, that is the local
2
>>>>>>> a10216da358649df80aaaeb94f1ceef909c2ed83
state, followed by the pulled remote
3 state. Edit this file, commit the merge
4 and push.
5.7 Branching
With a branch you can keep a completely separate version of all the files in your project.
Initially we have a file on the main branch.
Cmd >> cat firstfile
Out >>
foo
Cmd >> git status We have a file, committed and all.
Out >>
On branch main
nothing to commit, working tree clean
If we switch back to the main branch, everything is as before when we made the dev branch.
Cmd >> git checkout main && cat firstfile && git status
Out >>
Switched to branch 'main'
foo
The other branch is still unchanged.
On branch main
nothing to commit, working tree clean
The first time you try to push a new branch you need to establish it upstream:
git push --set-upstream origin mynewbranch
We switch to the dev branch and make another file. The change in the main branch is indeed not here.
Cmd >> git checkout dev
Out >>
Switched to branch 'dev'
Cmd >> sed -i -e '4s/4/four/' fourlines && cat
↪fourlines
Out >>
1
2
On line 4, change 4 to four. This
3 change is far enough away from the
four other change, that there should be no
Cmd >> git add fourlines && git commit -m "edit line 4" conflict.
Out >>
[dev dbb0c03] edit line 4
1 file changed, 1 insertion(+), 1 deletion(-)
If two developers make changes on the same line, or on adjacent lines, git will not be able to merge and
you have to edit the file as in section 5.6.4.
remote repo
git fe
local repo
working directory
Figure 5.1: Add local changes to the remote repository (left); Get changes that were made to the remote
repository (right).
5.8 Releases
At certain point in your development process you may want to mark the current state of the repository
as ‘finished’. You can do this by
1. Attaching tag to the state of the repository, or
2. Creating an archive: a released version that has the repo information stripped.
5.8.1 Tags
A tag is a marked state of the repository. There are two types of tags:
1. light-weight tags are no more than a synonym for a commit:
git tag v0.09
You list all tags with git tag, you get information on a tag with git show v0.1, and you push a tag to a
remote server with
git push origin v0.1
but beware that changes you now make can not be pushed to anything: this is a ‘detached HEAD’ state.
If you want to fix bugs in a tagged state, you can create a branch based on the tag:
git checkout -b version0.2 v0.1
In this section we will discuss libraries for dense linear algebra operations.
Dense linear algebra, that is linear algebra on matrices that are stored as two-dimensional arrays (as
opposed to sparse linear algebra; see HPC book, section 5.4, as well as the tutorial on PETSc Parallel
Programming book, part III) has been standardized for a considerable time. The basic operations are defined
by the three levels of Basic Linear Algebra Subprograms (BLAS):
• Level 1 defines vector operations that are characterized by a single loop [13].
• Level 2 defines matrix vector operations, both explicit such as the matrix-vector product, and
implicit such as the solution of triangular systems [7].
• Level 3 defines matrix-matrix operations, most notably the matrix-matrix product [6].
The name ‘BLAS’ suggests a certain amount of generality, but the original authors were clear [13] that
these subprograms only covered dense linear algebra. Attempts to standardize sparse operations have
never met with equal success.
Based on these building blocks, libraries have been built that tackle the more sophisticated problems such
as solving linear systems, or computing eigenvalues or singular values. Linpack 1 and Eispack were the first
to formalize these operations involved, using Blas Level 1 and Blas Level 2 respectively. A later develop-
ment, Lapack uses the blocked operations of Blas Level 3. As you saw in section HPC book, section 1.6.1,
this is needed to get high performance on cache-based CPUs.
With the advent of parallel computers, several projects arose that extended the Lapack functionality to
distributed computing, most notably Scalapack [4, 2], PLapack [23, 22], and most recently Elemental [19].
These packages are harder to use than Lapack because of the need for a two-dimensional cyclic distri-
bution; sections HPC book, section 7.2.3 and HPC book, section 7.3.2. We will not go into the details
here.
1. The linear system solver from this package later became the Linpack benchmark; see section HPC book, section 2.11.5.
122
6.1. Some general remarks
• Computational routines. These are the routines that drivers are built up out of. A user may have
occasion to call them by themselves.
• Auxiliary routines.
Expert driver names end on ’X’.
• Linear system solving. Simple drivers: -SV (e.g., DGESV) Solve 𝐴𝑋 = 𝐵, overwrite A with LU (with
pivoting), overwrite B with X.
Expert driver: -SVX Also transpose solve, condition estimation, refinement, equilibration
• Least squares problems. Drivers:
xGELS using QR or LQ under full-rank assumption
xGELSY ”complete orthogonal factorization”
xGELSS using SVD
xGELSD using divide-conquer SVD (faster, but more workspace than xGELSS)
Also: LSE & GLM linear equality constraint & general linear model
• Eigenvalue routines. Symmetric/Hermitian: xSY or xHE (also SP, SB, ST)
simple driver -EV
expert driver -EVX
divide and conquer -EVD
relative robust representation -EVR
General (only xGE)
Schur decomposition -ES and -ESX
eigenvalues -EV and -EVX
SVD (only xGE)
simple driver -SVD
divide and conquer SDD
Generalized symmetric (SY and HE; SP, SB)
simple driver GV
expert GVX
divide-conquer GVD
Nonsymmetric:
Schur: simple GGES, expert GGESX
eigen: simple GGEV, expert GGEVX
svd: GGSVD
On the other hand, many LAPACK routines can be based on the matrix-matrix product (BLAS routine
gemm), which you saw in section HPC book, section 7.4.1 has the potential for a substantial fraction of
peak performance. To achieve this, you should use an optimized version, such as
A simple example:
// example1.F90
do i=1,n
xarray(i) = 1.d0
end do
call dscal(n,scale,xarray,1)
do i=1,n
if (.not.assert_equal( xarray(i),scale )) print *,"Error in index",i
end do
The same in C:
// example1c.cxx
xarray = new double[n]; yarray = new double[n];
Many routines have an increment parameter. For xscale that’s the final parameter:
// example2.F90
integer :: inc=2
call dscal(n/inc,scale,xarray,inc)
do i=1,n
if (mod(i,inc)==1) then
if (.not.assert_equal( xarray(i),scale )) print *,"Error in index",i
else
if (.not.assert_equal( xarray(i),1.d0 )) print *,"Error in index",i
end if
end do
The matrix-vector product xgemv computes 𝑦 ← 𝛼𝐴𝑥 + 𝛽𝑦, rather than 𝑦 ← 𝐴𝑥. The specification of
the matrix takes the M,N size parameters, and a character argument 'N' to indicate that the matrix is not
transposed. Both of the vectors have an increment argument.
subroutine dgemv(character TRANS,
integer M,integer N,
double precision ALPHA,
double precision, dimension(lda,*) A,integer LDA,
double precision, dimension(*) X,integer INCX,
double precision BETA,double precision, dimension(*) Y,integer INCY
)
The same example in C has an extra parameter to indicate whether the matrix is stored in row or column
major storage:
// example3c.cxx
for (int j=0; j<n; j++) {
xarray[j] = 1.;
for (int i=0; i<m; i++)
matrix[ i+j*m ] = 1.;
}
There are many ways of storing data, in particular data that comes in arrays. A surprising number of
people stores data in spreadsheets, then exports them to ascii files with comma or tab delimiters, and
expects other people (or other programs written by themselves) to read that in again. Such a process is
wasteful in several respects:
• The ascii representation of a number takes up much more space than the internal binary repre-
sentation. Ideally, you would want a file to be as compact as the representation in memory.
• Conversion to and from ascii is slow; it may also lead to loss of precision.
For such reasons, it is desirable to have a file format that is based on binary storage. There are a few more
requirements on a useful file format:
• Since binary storage can differ between platforms, a good file format is platform-independent.
This will, for instance, prevent the confusion between big-endian and little-endian storage, as
well as conventions of 32 versus 64 bit floating point numbers.
• Application data can be heterogeneous, comprising integer, character, and floating point data.
Ideally, all this data should be stored together.
• Application data is also structured. This structure should be reflected in the stored form.
• It is desirable for a file format to be self-documenting. If you store a matrix and a right-hand side
vector in a file, wouldn’t it be nice if the file itself told you which of the stored numbers are the
matrix, which the vector, and what the sizes of the objects are?
This tutorial will introduce the HDF5 library, which fulfills these requirements. HDF5 is a large and com-
plicated library, so this tutorial will only touch on the basics. For further information, consult http:
//www.hdfgroup.org/HDF5/. While you do this tutorial, keep your browser open on http://www.
hdfgroup.org/HDF5/doc/ or http://www.hdfgroup.org/HDF5/RM/RM_H5Front.html for the ex-
act syntax of the routines.
7.1 Setup
As described above, HDF5 is a file format that is machine-independent and self-documenting. Each HDF5
file is set up like a directory tree, with subdirectories, and leaf nodes which contain the actual data. This
means that data can be found in a file by referring to its name, rather than its location in the file. In this
129
7. Scientific Data Storage with HDF5
section you will learn to write programs that write to and read from HDF5 files. In order to check that the
files are as you intend, you can use the h5dump utility on the command line.
Just a word about compatibility. The HDF5 format is not compatible with the older version HDF4, which
is no longer under development. You can still come across people using hdf4 for historic reasons. This
tutorial is based on HDF5 version 1.6. Some interfaces changed in the current version 1.8; in order to use
1.6 APIs with 1.8 software, add a flag -DH5_USE_16_API to your compile line.
7.1.1 Compilation
Include file for C:
#include <netcdf.h>
CMake for C:
find_package( PkgConfig REQUIRED )
pkg_check_modules( NETCDF REQUIRED netcdf )
target_include_directories(
${PROJECTNAME} PUBLIC
${NETCDF_INCLUDE_DIRS} )
target_link_libraries(
${PROJECTNAME} PUBLIC
${NETCDF_LIBRARIES} )
target_link_directories(
${PROJECTNAME} PUBLIC
${NETCDF_LIBRARY_DIRS} )
target_link_libraries(
${PROJECTNAME} PUBLIC netcdf )
target_include_directories(
${PROJECTNAME} PUBLIC
${NETCDFF_INCLUDE_DIRS}
)
target_link_libraries(
${PROJECTNAME} PUBLIC
${NETCDFF_LIBRARIES} ${NETCDF_LIBRARIES}
)
target_link_directories(
${PROJECTNAME} PUBLIC
${NETCDFF_LIBRARY_DIRS} ${NETCDF_LIBRARY_DIRS}
)
target_link_libraries(
${PROJECTNAME} PUBLIC netcdf )
Failure to create the object is indicated by a negative return parameter, so it would be a good idea to create
a file myh5defs.h containing:
#include "hdf5.h"
#define H5REPORT(e) \
{if (e<0) {printf("\nHDF5 error on line %d\n\n",__LINE__); \
return e;}}
hid_t h_id;
h_id = H5Xsomething(...); H5REPORT(h_id);
This file will be the container for a number of data items, organized like a directory tree.
Exercise. Create an HDF5 file by compiling and running the create.c example below.
Expected outcome. A file file.h5 should be created.
main() {
Note that an empty file corresponds to just the root of the directory tree that will hold the data.
7.3 Datasets
Next we create a dataset, in this example a 2D grid. To describe this, we first need to construct a dataspace:
dims[0] = 4; dims[1] = 6;
dataspace_id = H5Screate_simple(2, dims, NULL);
dataset_id = H5Dcreate(file_id, "/dset", dataspace_id, .... );
....
status = H5Dclose(dataset_id);
status = H5Sclose(dataspace_id);
Note that datasets and dataspaces need to be closed, just like files.
Exercise. Create a dataset by compiling and running the dataset.c code below
Expected outcome. This creates a file dset.h that can be displayed with h5dump.
#include "myh5defs.h"
#define FILE "dset.h5"
main() {
The datafile contains such information as the size of the arrays you store. Still, you may want to add
related scalar information. For instance, if the array is output of a program, you could record with what
input parameter was it generated.
parmspace = H5Screate(H5S_SCALAR);
parm_id = H5Dcreate
(file_id,"/parm",H5T_NATIVE_INT,parmspace,H5P_DEFAULT);
Exercise. Add a scalar dataspace to the HDF5 file, by compiling and running the parmwrite.c code
below.
Expected outcome. A new file wdset.h5 is created.
#define FILE "pdset.h5"
main() {
%% h5dump wdset.h5
HDF5 "wdset.h5" {
GROUP "/" {
DATASET "dset" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 0.5, 1.5, 2.5, 3.5, 4.5, 5.5,
(1,0): 6.5, 7.5, 8.5, 9.5, 10.5, 11.5,
(2,0): 12.5, 13.5, 14.5, 15.5, 16.5, 17.5,
(3,0): 18.5, 19.5, 20.5, 21.5, 22.5, 23.5
}
}
DATASET "parm" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 37
}
}
}
}
#include "myh5defs.h"
#define FILE "wdset.h5"
main() {
%% h5dump wdset.h5
HDF5 "wdset.h5" {
GROUP "/" {
DATASET "dset" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 0.5, 1.5, 2.5, 3.5, 4.5, 5.5,
(1,0): 6.5, 7.5, 8.5, 9.5, 10.5, 11.5,
(2,0): 12.5, 13.5, 14.5, 15.5, 16.5, 17.5,
(3,0): 18.5, 19.5, 20.5, 21.5, 22.5, 23.5
}
}
DATASET "parm" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 37
}
}
}
}
If you look closely at the source and the dump, you see that the data types are declared as ‘native’, but
rendered as LE. The ‘native’ declaration makes the datatypes behave like the built-in C or Fortran data
types. Alternatively, you can explicitly indicate whether data is little-endian or big-endian. These terms
describe how the bytes of a data item are ordered in memory. Most architectures use little endian, as you
can see in the dump output, but, notably, IBM uses big endian.
7.5 Reading
Now that we have a file with some data, we can do the mirror part of the story: reading from that file.
The essential commands are
h5file = H5Fopen( .... )
....
H5Dread( dataset, .... data .... )
where the H5Dread command has the same arguments as the corresponding H5Dwrite.
Exercise. Read data from the wdset.h5 file that you create in the previous exercise, by compiling and
running the allread.c example below.
Expected outcome. Running the allread executable will print the value 37 of the parameter, and
the value 8.5 of the (1,2) data point of the array.
Caveats. Make sure that you run parmwrite to create the input file.
#include "myh5defs.h"
#define FILE "wdset.h5"
main() {
herr_t status;
double data[24]; int parm;
status = H5Dread
(dataset,H5T_NATIVE_DOUBLE,H5S_ALL,H5S_ALL,H5P_DEFAULT,
data); H5REPORT(status);
printf("arbitrary data point [1,2]: %e\n",data[1*6+2]);
%% ./allread
parameter value: 37
arbitrary data point [1,2]: 8.500000e+00
Parallel I/O
Parallel I/O is a tricky subject. You can try to let all processors jointly write one file, or to write a file per
process and combine them later. With the standard mechanisms of your programming language there are
the following considerations:
• On clusters where the processes have individual file systems, the only way to write a single file
is to let it be generated by a single processor.
• Writing one file per process is easy to do, but
– You need a post-processing script;
– if the files are not on a shared file system (such as Lustre), it takes additional effort to bring
them together;
– if the files are on a shared file system, writing many files may be a burden on the metadata
server.
• On a shared file system it is possible for all files to open the same file and set the file pointer
individually. This can be difficult if the amount of data per process is not uniform.
Illustrating the last point:
// pseek.c
FILE *pfile;
pfile = fopen("pseek.dat","w");
fseek(pfile,procid*sizeof(int),SEEK_CUR);
// fseek(pfile,procid*sizeof(char),SEEK_CUR);
fprintf(pfile,"%d\n",procid);
fclose(pfile);
MPI also has its own portable I/O: MPI I/O, for which see chapter Parallel Programming book, chapter 10.
Alternatively, one could use a library such as hdf5; see 7.
For a great discussion see [15], from which figures here are taken.
139
8. Parallel I/O
The gnuplot utility is a simple program for plotting sets of points or curves. This very short tutorial will
show you some of the basics. For more commands and options, see the manual http://www.gnuplot.
info/docs/gnuplot.html.
or fig, latex, pbm, et cetera. Note that this will only cause the pdf commands to be written to your
screen: you need to direct them to file with
set output "myplot.pdf"
141
9. Plotting with GNUplot
9.2 Plotting
The basic plot commands are plot for 2D, and splot (‘surface plot’) for 3D plotting.
you get a plot of 𝑓 (𝑥) = 𝑥 2 ; gnuplot will decide on the range for 𝑥. With
set xrange [0:1]
plot 1-x title "down", x**2 title "up"
you get two graphs in one plot, with the 𝑥 range limited to [0, 1], and the appropriate legends for the
graphs. The variable x is the default for plotting functions.
Plotting one function against another – or equivalently, plotting a parametric curve – goes like this:
set parametric
plot [t=0:1.57] cos(t),sin(t)
9.2.3 Customization
Plots can be customized in many ways. Some of these customizations use the set command. For instance,
9.3 Workflow
Imagine that your code produces a dataset that you want to plot, and you run your code for a number
of inputs. It would be nice if the plotting can be automated. Gnuplot itself does not have the facilities for
this, but with a little help from shell programming this is not hard to do.
Suppose you have data files
data1.dat data2.dat data3.dat
and you want to plot them with the same gnuplot commands. You could make a file plot.template:
set term pdf
set output "FILENAME.pdf"
plot "FILENAME.dat"
The string FILENAME can be replaced by the actual file names using, for instance sed:
for d in data1 data2 data3 ; do
cat plot.template | sed s/FILENAME/$d/ > plot.cmd
gnuplot plot.cmd
done
Sooner or later, and probably sooner than later, every programmer is confronted with code not behaving
as intended. In this section you will learn some techniques of dealing with this problem. At first we will see
a number of techniques for preventing errors; in the next chapter we will discuss debugging, the process
of finding the inevitable errors in a program, once they have occurred.
10.1.1 Assertions
In the things that can go wrong with a program we can distinguish between errors and bugs. Errors are
things that legitimately happen but that should not. File systems are common sources of errors: a program
wants to open a file but the file doesn’t exist because the user mistyped the name, or the program writes
to a file but the disk is full. Other errors can come from arithmetic, such as overflow errors.
On the other hand, a bug in a program is an occurrence that cannot legitimately occur. Of course, ‘le-
gitimately’ here means ‘according to the programmer’s intentions’. Bugs can often be described as ‘the
computer always does what you ask, not necessarily what you want’.
144
10.1. Defensive programming
Assertions serve to detect bugs in your program: an assertion is a predicate that should be true at a certain
point in your program. Thus, an assertion failing means that you didn’t code what you intended to code.
An assertion is typically a statement in your programming language, or a preprocessor macro; upon failure
of the assertion, your program will stop.
Some examples of assertions:
• If a subprogram has an array argument, it is a good idea to test whether the actual argument is a
null pointer before indexing into the array.
• Similarly, you could test a dynamically allocated data structure for not having a null pointer.
• If you calculate a numerical result for which certain mathematical properties hold, for instance
you are writing a sine function, for which the result has to be in [−1, 1], you should test whether
this property indeed holds for the result.
Assertions are often disabled in a program once it’s sufficiently tested. The reason for this is that assertions
can be expensive to execute. For instance, if you have a complicated data structure, you could write a
complicated integrity test, and perform that test in an assertion, which you put after every access to the
data structure.
Because assertions are often disabled in the ‘production’ version of a code, they should not affect any
stored data . If they do, your code may behave differently when you’re testing it with assertions, versus
how you use it in practice without them. This is also formulated as ‘assertions should not have side-effects’.
which includes the literal text of the expression, the file name, and line number; and the program is
subsequently stopped. Here is an example:
#include<assert.h>
int main(void)
{
open_record(NULL);
}
which is used as
ASSERT(nItemsSet.gt.arraySize,"Too many elements set")
float value,result;
result = compute(value);
How do we handle the case where the user passes a negative number?
float compute(float val)
{
float result;
if (val<0) { /* then what? */
} else
result = ... sqrt(val) ... /* some computation */
return result;
}
We could print an error message and deliver some result, but the message may go unnoticed, and the
calling environment does not really receive any notification that something has gone wrong.
The following approach is more flexible:
int compute(float val,float *result)
{
float result;
if (val<0) {
return -1;
} else {
*result = ... sqrt(val) ... /* some computation */
}
return 0;
}
The C Preprocessor (CPP) has built-in macros that lend themselves to informative error reporting. The
following macro not only checks on an error condition, but also reports where the error occurred:
#define CHECK_FOR_ERROR(ierr) \
if (ierr!=0) { \
printf("Error %d detected in line %d of file %s\n",\
ierr,__LINE__,__FILE__); \
return -1 ; }
Note that this macro not only prints an error message, but also does a further return. This means that, if
you adopt this use of error codes systematically, you will get a full backtrace of the calling tree if an error
occurs. (In the Python language this is precisely the wrong approach since the backtrace is built-in.)
10.2.1.1 C
The C language has arrays, but they suffer from ‘pointer decay’: they behave largely like pointers in
memory. Thus, bounds checking is hard, other than with external tools like Valgrind.
10.2.1.2 C++
C++ has the containers such as std::vector which support bound checking:
vector<float> x(25);
x.at(26) = y; // throws an exception
On the other hand, the C-style x[26] does not perform such checks.
10.2.1.3 Fortran
Fortran arrays are more restricted than C arrays, so compilers often support a flag for activating runtime
bounds checking. For gfortran that is -fbounds-check.
The block of memory is allocated in each iteration, but the allocation of one iteration is no longer available
in the next. A similar example can be made with allocating inside a conditional.
It should be noted that this problem is far less serious in Fortran, where memory is deallocated automat-
ically as a variable goes out of scope.
There are various tools for detecting memory errors: Valgrind, DMALLOC, Electric Fence. For valgrind,
see section 11.8.
if this is available you should certainly make use of it. (The gcc compiler has a function mcheck, defined in
mcheck.h, that has a similar function.)
If you write in C, you will probably know the malloc and free calls:
int *ip;
ip = (int*) malloc(500*sizeof(int));
if (ip==0) {/* could not allocate memory */}
..... do stuff with ip .....
free(ip);
int *ip;
MYMALLOC(ip,500,int);
Runtime checks on memory usage (either by compiler-generated bounds checking, or through tools like
valgrind or Rational Purify) are expensive, but you can catch many problems by adding some functionality
to your malloc. What we will do here is to detect memory corruption after the fact.
We allocate a few integers to the left and right of the allocated object (line 1 in the code below), and put
a recognizable value in them (line 2 and 3), as well as the size of the object (line 2). We then return the
pointer to the actually requested memory area (line 4).
#define MEMCOOKIE 137
#define MYMALLOC(a,b,c) { \
char *aa; int *ii; \
aa = malloc(b*sizeof(c)+3*sizeof(int)); /* 1 */ \
ii = (int*)aa; ii[0] = b*sizeof(c); \
ii[1] = MEMCOOKIE; /* 2 */ \
aa = (char*)(ii+2); a = (c*)aa ; /* 4 */ \
aa = aa+b*sizesof(c); ii = (int*)aa; \
ii[0] = MEMCOOKIE; /* 3 */ \
}
Now you can write your own free, which tests whether the bounds of the object have not been written
over.
#define MYFREE(a) { \
char *aa; int *ii,; ii = (int*)a; \
if (*(--ii)!=MEMCOOKIE) printf("object corrupted\n"); \
n = *(--ii); aa = a+n; ii = (int*)aa; \
if (*ii!=MEMCOOKIE) printf("object corrupted\n"); \
You can extend this idea: in every allocated object, also store two pointers, so that the allocated memory
areas become a doubly linked list. You can then write a macro CHECKMEMORY which tests all your allocated
objects for corruption.
Such solutions to the memory corruption problem are fairly easy to write, and they carry little overhead.
There is a memory overhead of at most 5 integers per object, and there is practically no performance
penalty.
(Instead of writing a wrapper for malloc, on some systems you can influence the behavior of the system
routine. On linux, malloc calls hooks that can be replaced with your own routines; see http://www.
gnu.org/s/libc/manual/html_node/Hooks-for-Malloc.html.)
10.3 Testing
There are various philosophies for testing the correctness of a code.
• Correctness proving: the programmer draws up predicates that describe the intended behavior of
code fragments and proves by mathematical techniques that these predicates hold [10, 5].
• Unit testing: each routine is tested separately for correctness. This approach is often hard to do
for numerical codes, since with floating point numbers there is essentially an infinity of possible
inputs, and it is not easy to decide what would constitute a sufficient set of inputs.
• Integration testing: test subsystems
• System testing: test the whole code. This is often appropriate for numerical codes, since we often
have model problems with known solutions, or there are properties such as bounds that need to
hold on the global solution.
• Test-driven design: the program development process is driven by the requirement that testing
is possible at all times.
With parallel codes we run into a new category of difficulties with testing. Many algorithms, when exe-
cuted in parallel, will execute operations in a slightly different order, leading to different roundoff behav-
ior. For instance, the parallel computation of a vector sum will use partial sums. Some algorithms have an
inherent damping of numerical errors, for instance stationary iterative methods (section HPC book, sec-
tion 5.5.1), but others have no such built-in error correction (nonstationary methods; section HPC book,
section 5.5.8). As a result, the same iterative process can take different numbers of iterations depending
on how many processors are used.
• Global state in your program makes it hard to test, since it carries information between tests.
• Tests should not reproduce the logic of your code: if the program logic is faulty, the test will be
too.
• Tests should be short, and obey the single-responsibility principle. Naming your tests is good to
keep them focused.
Debugging
Debugging is like being the detective in a crime movie where you are also the murderer.
(Filipe Fortes, 2013)
When a program misbehaves, debugging is the process of finding out why. There are various strategies
of finding errors in a program. The crudest one is debugging by print statements. If you have a notion of
where in your code the error arises, you can edit your code to insert print statements, recompile, rerun,
and see if the output gives you any suggestions. There are several problems with this:
• The edit/compile/run cycle is time consuming, especially since
• often the error will be caused by an earlier section of code, requiring you to edit, compile, and
rerun repeatedly. Furthermore,
• the amount of data produced by your program can be too large to display and inspect effectively,
and
• if your program is parallel, you probably need to print out data from all processors, making the
inspection process very tedious.
For these reasons, the best way to debug is by the use of an interactive debugger, a program that allows
you to monitor and control the behavior of a running program. In this section you will familiarize yourself
with gdb and lldb, the open source debuggers of the GNU and clang projects respectively. Other debuggers
are proprietary, and typically come with a compiler suite. Another distinction is that gdb is a commandline
debugger; there are graphical debuggers such as ddd (a frontend to gdb) or DDT and TotalView (debuggers
for parallel codes). We limit ourselves to gdb, since it incorporates the basic concepts common to all
debuggers.
In this tutorial you will debug a number of simple programs with gdb and valgrind. The files can be found
in the repository in the directory code/gdb.
153
11. Debugging
Usually, you also need to lower the compiler optimization level: a production code will often be compiled
with flags such as -O2 or -Xhost that try to make the code as fast as possible, but for debugging you need
to replace this by -O0 (‘oh-zero’). The reason is that higher levels will reorganize your code, making it
hard to relate the execution to the source1 .
tutorials/gdb/c/hello.c
#include <stdlib.h>
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}
%% cc -g -o hello hello.c
# regular invocation:
%% ./hello
hello world
# invocation from gdb:
%% gdb hello
GNU gdb 6.3.50-20050815 # ..... [version info]
Copyright 2004 Free Software Foundation, Inc. .... [copyright info] ....
(gdb) run
Starting program: /home/eijkhout/tutorials/gdb/hello
Reading symbols for shared libraries +. done
hello world
1. Typically, actual code motion is done by -O3, but at level -O2 the compiler will inline functions and make other simplifica-
tions.
Important note: the program was compiled with the debug flag -g. This causes the symbol table (that is,
the translation from machine address to program variables) and other debug information to be included
in the binary. This will make your binary larger than strictly necessary, but it will also make it slower, for
instance because the compiler will not perform certain optimizations2 .
To illustrate the presence of the symbol table do
%% cc -g -o hello hello.c
%% gdb hello
GNU gdb 6.3.50-20050815 # ..... version info
(gdb) list
For a program with commandline input we give the arguments to the run command (Fortran users use
say.F):
2. Compiler optimizations are not supposed to change the semantics of a program, but sometimes do. This can lead to the
nightmare scenario where a program crashes or gives incorrect results, but magically works correctly with compiled with debug
and run in a debugger.
11.3.1 C programs
The following code has several errors. We will use the debugger to uncover them.
// square.c
int nmax,i;
float *squares,sum;
fscanf(stdin,"%d",nmax);
for (i=1; i<=nmax; i++) {
squares[i] = 1./(i*i); sum += squares[i];
}
printf("Sum: %e\n",sum);
%% cc -g -o square square.c
%% ./square
5000
Segmentation fault
The segmentation fault (other messages are possible too) indicates that we are accessing memory that we
are not allowed to, making the program exit. A debugger will quickly tell us where this happens:
%% gdb square
(gdb) run
50000
Apparently the error occurred in a function __svfscanf_l, which is not one of ours, but a system func-
tion. Using the backtrace (or bt, also where or w) command we display the call stack. This usually allows
us to find out where the error lies:
Displaying a stack trace
gdb lldb
(gdb) where (lldb) thread backtrace
(gdb) where
#0 0x00007fff824295ca in __svfscanf_l ()
#1 0x00007fff8244011b in fscanf ()
#2 0x0000000100000e89 in main (argc=1, argv=0x7fff5fbfc7c0) at square.c:7
We take a close look at line 7, and see that we need to change nmax to &nmax.
There is still an error in our program:
(gdb) run
50000
We investigate further:
(gdb) print i
$1 = 11237
(gdb) print squares[i]
Cannot access memory at address 0x10000f000
(gdb) print squares
$2 = (float *) 0x0
Memory errors can also occur if we have a legitimate array, but we access it outside its bounds. The
following program fills an array, forward, and reads it out, backward. However, there is an indexing error
in the second loop.
// up.c
int nlocal = 100,i;
double s, *array = (double*) malloc(nlocal*sizeof(double));
for (i=0; i<nlocal; i++) {
double di = (double)i;
array[i] = 1/(di*di);
}
s = 0.;
for (i=nlocal-1; i>=0; i++) {
double di = (double)i;
s += array[i];
}
You see that the index where the debugger finally complains is quite a bit larger than the size of the array.
Exercise 11.1. Can you think of a reason why indexing out of bounds is not immediately fatal?
What would determine where it does become a problem? (Hint: how is computer memory
structured?)
In section 11.8 you will see a tool that spots any out-of-bound indexing.
We take a close look at the code and see that we did not allocate squares properly.
Often the error in a program is sufficiently obscure that you need to investigate the program run in detail.
Compile the following program
// roots.c
float root(int n)
{
float r;
r = sqrt(n);
return r;
}
int main() {
feenableexcept(FE_INVALID | FE_OVERFLOW);
int i;
float x=0;
for (i=100; i>-100; i--)
x += root(i+5);
printf("sum: %e\n",x);
but before you run the program, you set a breakpoint at main. This tells the execution to stop, or ‘break’,
in the main program.
(gdb) break main
Breakpoint 1 at 0x100000ea6: file root.c, line 14.
Now the program will stop at the first executable statement in main:
(gdb) run
Starting program: tutorials/gdb/c/roots
Reading symbols for shared libraries +. done
If execution is stopped at a breakpoint, you can do various things, such as issuing the step command:
Breakpoint 1, main () at roots.c:14
14 float x=0;
(gdb) step
15 for (i=100; i>-100; i--)
(gdb)
16 x += root(i);
(gdb)
(if you just hit return, the previously issued command is repeated). Do a number of steps in a row by
hitting return. What do you notice about the function and the loop?
Switch from doing step to doing next. Now what do you notice about the loop and the function?
Set another breakpoint: break 17 and do cont. What happens?
Rerun the program after you set a breakpoint on the line with the sqrt call. When the execution stops
there do where and list.
• If you set many breakpoints, you can find out what they are with info breakpoints.
• You can remove breakpoints with delete n where n is the number of the breakpoint.
• If you restart your program with run without leaving gdb, the breakpoints stay in effect.
• If you leave gdb, the breakpoints are cleared but you can save them: save breakpoints <file>.
Use source <file> to read them in on the next gdb run.
11.6 Breakpoints
If a problem occurs in a loop, it can be tedious keep typing cont and inspecting the variable with print.
Instead you can add a condition to an existing breakpoint. First of all, you can make the breakpoint subject
to a condition: with
condition 1 if (n<0)
means that breakpoint 8 becomes (unconditionally) active after the condition n<0 is encountered.
Set a breakpoint
gdb lldb
break foo.c:12 breakpoint set [ -f foo.c ] -l 12
break foo.c:12 if n>0
using the fact that NaN is the only number not equal to itself.
Another possibility is to use ignore 1 50, which will not stop at breakpoint 1 the next 50 times.
Remove the existing breakpoint, redefine it with the condition n<0 and rerun your program. When the
program breaks, find for what value of the loop variable it happened. What is the sequence of commands
you use?
You can set a breakpoint in various ways:
• break foo.c to stop when code in a certain file is reached;
• break 123 to stop at a certain line in the current file;
• break foo to stop at subprogram foo
• or various combinations, such as break foo.c:123.
Information about breakpoints:
• If you set many breakpoints, you can find out what they are with info breakpoints.
• You can remove breakpoints with delete n where n is the number of the breakpoint.
• If you restart your program with run without leaving gdb, the breakpoints stay in effect.
• If you leave gdb, the breakpoints are cleared but you can save them: save breakpoints <file>.
Use source <file> to read them in on the next gdb run.
• In languages with exceptions, such as C++, you can set a catchpoint:
Set a breakpoint for exceptions
gdb clang
catch throw break set -E C++
Finally, you can execute commands at a breakpoint:
break 45
command
print x
cont
end
This states that at line 45 variable x is to be printed, and execution should immediately continue.
If you want to run repeated gdb sessions on the same program, you may want to save an reload break-
points. This can be done with
save-breakpoint filename
source filename
After the conditional, the allocated memory is not freed, but the pointer that pointed to has gone away.
This last type especially can be hard to find. Memory leaks will only surface in that your program runs
out of memory. That in turn is detectable because your allocation will fail. It is a good idea to always
check the return result of your malloc or allocate statement!
As a first example, consider out of bound addressing, also known as buffer overflow:
MISSING SNIPPET corruptbound
This is unlikely to crash your code, but the results are unpredictable, and this is certainly a failure of your
program logic.
Valgrind indicates that this is an invalid read, what line it occurs on, and where the block was allocated:
==9112== Invalid read of size 4
==9112== at 0x40233B: main (outofbound.cpp:10)
==9112== Address 0x595fde8 is 0 bytes after a block of size 40 alloc'd
==9112== at 0x4C2A483: operator new(unsigned long) (vg_replace_malloc.c:344)
==9112== by 0x4023CD: allocate (new_allocator.h:111)
==9112== by 0x4023CD: allocate (alloc_traits.h:436)
==9112== by 0x4023CD: _M_allocate (stl_vector.h:296)
==9112== by 0x4023CD: _M_create_storage (stl_vector.h:311)
==9112== by 0x4023CD: _Vector_base (stl_vector.h:260)
==9112== by 0x4023CD: _Vector_base (stl_vector.h:258)
==9112== by 0x4023CD: vector (stl_vector.h:415)
==9112== by 0x4023CD: main (outofbound.cpp:9)
Remark 17 Buffer overflows are a well-known security risk, typically associated with reading string input
from a user source. Buffer overflows can be largely avoided by using C++ constructs such as cin and string
instead of sscanf and character arrays.
Valgrind is informative but cryptic, since it works on the bare memory, not on variables. Thus, these error
messages take some exegesis. They state that line 10 reads a 4-byte object immediately after a block of 40
bytes that was allocated. In other words: the code is writing outside the bounds of an allocated array.
The next example performs a read on an array that has already been free’d. In this simple case you will
actually get the expected output, but if the read comes much later than the free, the output can be anything.
MISSING SNIPPET corruptfree
Valgrind again states that this is an invalid read; it gives both where the block was allocated and where it
was freed.
On the other hand, if you forget to free memory you have a memory leak (just imagine allocation, and
not free’ing, in a loop)
MISSING SNIPPET corruptleak
which valgrind reports on:
==283234== LEAK SUMMARY:
==283234== definitely lost: 40,000 bytes in 1 blocks
==283234== indirectly lost: 0 bytes in 0 blocks
==283234== possibly lost: 0 bytes in 0 blocks
==283234== still reachable: 8 bytes in 1 blocks
==283234== suppressed: 0 bytes in 0 blocks
Memory leaks are much more rare in C++ than in C because of containers such as std::vector. However,
in sophisticated cases you may still do your own memory management, and you need to be aware of the
danger of memory leaks.
If you do your own memory management, there is also a danger of writing to an array pointer that has
not been allocated yet:
MISSING SNIPPET corruptinit
The behavior of this code depends on all sorts of things: if the pointer variable is zero, the code will crash.
On the other hand, if it contains some random value, the write may succeed; provided you are not writing
too far from that location.
The output here shows both the valgrind diagnosis, and the OS message when the program aborted:
==283234== LEAK SUMMARY:
==283234== definitely lost: 40,000 bytes in 1 blocks
==283234== indirectly lost: 0 bytes in 0 blocks
==283234== possibly lost: 0 bytes in 0 blocks
==283234== still reachable: 8 bytes in 1 blocks
==283234== suppressed: 0 bytes in 0 blocks
Suppose your program has an out-of-bounds error. Running with gdb, this error may only become appar-
ent if the bounds are exceeded by a large amount. On the other hand, if the code is linked with libefence,
the debugger will stop at the very first time the bounds are exceeded.
Parallel debugging
When a program misbehaves, debugging is the process of finding out why. There are various strategies
of finding errors in a program. The crudest one is debugging by print statements. If you have a notion of
where in your code the error arises, you can edit your code to insert print statements, recompile, rerun,
and see if the output gives you any suggestions. There are several problems with this:
• The edit/compile/run cycle is time consuming, especially since
• often the error will be caused by an earlier section of code, requiring you to edit, compile, and
rerun repeatedly. Furthermore,
• the amount of data produced by your program can be too large to display and inspect effectively,
and
• if your program is parallel, you probably need to print out data from all proccessors, making the
inspection process very tedious.
For these reasons, the best way to debug is by the use of an interactive debugger, a program that allows you
to monitor and control the behaviour of a running program. In this section you will familiarize yourself
with gdb, which is the open source debugger of the GNU project. Other debuggers are proprietary, and
typically come with a compiler suite. Another distinction is that gdb is a commandline debugger; there
are graphical debuggers such as ddd (a frontend to gdb) or DDT and TotalView (debuggers for parallel
codes). We limit ourselves to gdb, since it incorporates the basic concepts common to all debuggers.
In this tutorial you will debug a number of simple programs with gdb and valgrind. The files can be found
in the repository in the directory tutorials/debug_tutorial_files.
167
12. Parallel debugging
There are few low-budget solutions to parallel debugging. The main one is to create an xterm for each
process. We will describe this next. There are also commercial packages such as DDT and TotalView, that
offer a GUI. They are very convenient but also expensive. The Eclipse project has a parallel package, Eclipse
PTP, that includes a graphic debugger.
Debugging in parallel is harder than sequentially, because you will run errors that are only due to inter-
action of processes such as deadlock; see section HPC book, section 2.6.3.6.
As an example, consider this segment of MPI code:
MPI_Init(0,0);
// set comm, ntids, mytid
for (int it=0; ; it++) {
double randomnumber = ntids * ( rand() / (double)RAND_MAX );
printf("[%d] iteration %d, random %e\n",mytid,it,randomnumber);
if (randomnumber>mytid && randomnumber<mytid+1./(ntids+1))
MPI_Finalize();
}
MPI_Finalize();
Each process computes random numbers until a certain condition is satisfied, then exits. However, con-
sider introducing a barrier (or something that acts like it, such as a reduction):
for (int it=0; ; it++) {
double randomnumber = ntids * ( rand() / (double)RAND_MAX );
printf("[%d] iteration %d, random %e\n",mytid,it,randomnumber);
if (randomnumber>mytid && randomnumber<mytid+1./(ntids+1))
MPI_Finalize();
MPI_Barrier(comm);
}
MPI_Finalize();
Now the execution will hang, and this is not due to any particular process: each process has a code path
from init to finalize that does not develop any memory errors or other runtime errors. However as soon as
one process reaches the finalize call in the conditional it will stop, and all other processes will be waiting
at the barrier.
Figure 12.1 shows the main display of the Allinea DDT debugger (http://www.allinea.com/products/
ddt) at the point where this code stops. Above the source panel you see that there are 16 processes, and
that the status is given for process 1. In the bottom display you see that out of 16 processes 15 are calling
MPI_Barrier on line 19, while one is at line 18. In the right display you see a listing of the local variables:
the value specific to process 1. A rudimentary graph displays the values over the processors: the value of
ntids is constant, that of mytid is linearly increasing, and it is constant except for one process.
Exercise 12.1. Make and run ring_1a. The program does not terminate and does not crash. In
the debugger you can interrupt the execution, and see that all processes are executing a
receive statement. This is probably a case of deadlock. Diagnose and fix the error.
Exercise 12.2. The author of ring_1c was very confused about how MPI works. Run the pro-
gram. While it terminates without a problem, the output is wrong. Set a breakpoint at
the send and receive statements to figure out what is happening.
create a number of xterm windows, each of which execute the commandline gdb ./program. And be-
cause these xterms have been started with mpirun, they actually form a communicator.
Problem1 This program has every process independently generate random numbers, and if the number
meets a certain condition, stops execution. There is no problem with this code as such, so let’s suppose
you simply want to monitor its execution.
• Compile abort.c. Don’t forget about the -g -O0 flags; if you use the makefile they are included
automatically.
• Run the program with DDT, you’ll see that it concludes succesfully.
• Set a breakpoint at the Finalize statement in the subroutine, by clicking to the left of the line
number. Now if you run the program you’ll get a message that all processes are stopped at a
breakpoint. Pause the execution.
• The ‘Stacks’ tab will tell you that all processes are the same point in the code, but they are not in
fact in the same iteration.
• You can for instance use the ‘Input/Output’ tabs to see what every process has been doing.
• Alternatively, use the variables pane on the right to examine the it variable. You can do that
for individual processes, but you can also control click on the it variable and choose View as
Array. Set up the display as a one-dimensional array and check the iteration numbers.
• Activate the barrier statement and rerun the code. Make sure you have no breakpoints. Reason
that the code will not complete, but just hang.
• Hit the general Pause button. Now what difference do you see in the ‘Stacks’ tab?
Problem2 Compile problem1.c and run it in DDT. You’ll get a dialog warning about an error condition.
• Pause the program in the dialog. Notice that only the root process is paused. If you want to inspect
other processes, press the general pause button. Do this.
• In the bottom panel click on Stacks. This gives you the ‘call stack’, which tells you what the
processes were doing when you paused them. Where is the root process in the execution? Where
are the others?
• From the call stack it is clear what the error was. Fix it and rerun with File > Restart Session.
Problem2
Language interoperability
Most of the time, a program is written is written in a single language, but in some circumstances it is
necessary or desirable to mix sources in more than one language for a single executable. One such case
is when a library is written in one language, but used by a program in another. In such a case, the library
writer will probably have made it easy for you to use the library; this section is for the case that you find
yourself in the place of the library writer. We will focus on the common case of interoperability between
C/C++ and Fortran or Python.
This issue is complicated by the fact that both languages have been around for a long time, and various
recent language standards have introduced mechanisms to facilitate interoperability. However, there is
still a lot of old code around, and not all compilers support the latest standards. Therefore, we discuss
both the old and the new solutions.
172
13.1. C/Fortran interoperability
After compilation you can use nm to investigate the binary object file:
%% nm fprog.o
0000000000000000 T _foo_
....
%% nm cprog.o
0000000000000000 T _foo
....
You see that internally the foo routine has different names: the Fortran name has an underscore appended.
This makes it hard to call a Fortran routine from C, or vice versa. The possible name mismatches are:
• The Fortran compiler appends an underscore. This is the most common case.
• Sometimes it can append two underscores.
• Typically the routine name is lowercase in the object file, but uppercase is a possibility too.
Since C is a popular language to write libraries in, this means that the problem is often solved in the C
library by:
• Appending an underscore to all C function names; or
• Including a simple wrapper call:
int SomeCFunction(int i,float f)
{
// this is the actual function
}
int SomeCFunction_(int i,float f)
{
return SomeCFunction(i,f);
}
The complex data types in C/C++ and Fortran are compatible with each other. Here is an example of a C++
program linking to Lapack’s complex vector scaling routine zscal.
// zscale.cxx
extern "C" {
void zscal_(int*,double complex*,double complex*,int*);
}
complex double *xarray,*yarray, scale=2.;
xarray = new double complex[n]; yarray = new double complex[n];
zscal_(&n,&scale,xarray,&ione);
%% ifort -c fbind.F90
%% nm fbind.o
.... T _s
.... C _x
use iso_c_binding
The latest version of Fortran, unsupported by many compilers at this time, has mechanisms for interfacing
to C.
• There is a module that contains named kinds, so that one can declare
INTEGER,KIND(C_SHORT) :: i
• Fortran pointers are more complicated objects, so passing them to C is hard; Fortran2003 has a
mechanism to deal with C pointers, which are just addresses.
• Fortran derived types can be made compatible with C structures.
If you compile this and inspect the output with nm you get:
$ gcc -c foochar.c && nm foochar.o | grep bar
0000000000000000 T _bar
That is, apart from a leading underscore the symbol name is clear.
On the other hand, the identical program compiled as C++ gives
$ g++ -c foochar.c && nm foochar.o | grep bar
0000000000000000 T __Z3barPc
Why is this? Well, because of polymorphism, and the fact that methods can be included in classes, you can
not have a unique linker symbol for each function name. Instead this mangled symbol includes enough
information to make the symbol unique.
You can retrieve the meaning of this mangled symbol a number of ways. First of all, there is a demangling
utility c++filt:
c++filt __Z3barPc
bar(char*)
.
.
#ifdef __cplusplus
}
#endif
You again get the same linker symbols as for C, so that the routine can be called from both C and Fortran.
If your main program is in C, you can use the C++ compiler as linker. If the main program is in Fortran,
you need to use the Fortran compiler as linker. It is then necessary to link in extra libraries for the C++
system routines. For instance, with the Intel compiler -lstdc++ -lc needs to be added to the link line.
The use of extern is also needed if you link other languages to a C++ main program. For instance, a
Fortran subprogram foo should be declared as
extern "C" {
void foo_();
}
13.3 Strings
Programming languages differ widely in how they handle strings.
• In C, a string is an array of characters; the end of the string is indicated by a null character, that
is the ascii character zero, which has an all zero bit pattern. This is called null termination.
• In Fortran, a string is an array of characters. The length is maintained in a internal variable, which
is passed as a hidden parameter to subroutines.
• In Pascal, a string is an array with an integer denoting the length in the first position. Since only
one byte is used for this, strings can not be longer than 255 characters in Pascal.
As you can see, passing strings between different languages is fraught with peril. This situation is made
even worse by the fact that passing strings as subroutine arguments is not standard.
Example: the main program in Fortran passes a string
Program Fstring
character(len=5) :: word = "Word"
call cstring(word)
end Program Fstring
which produces:
length = 5
<<Word >>
Recently, the ‘C/Fortran interoperability standard’ has provided a systematic solution to this.
can not be called from Fortran. There is a hack to get around this (check out the Fortran77 interface to
the Petsc routine VecGetValues) and with more cleverness you can use POINTER variables for this.
1. With a bit of cleverness and the right compiler, you can have a program that says print *,7 and prints 8 because of this.
13.5 Input/output
Both languages have their own system for handling input/output, and it is not really possible to meet in
the middle. Basically, if Fortran routines do I/O, the main program has to be in Fortran. Consequently, it
is best to isolate I/O as much as possible, and use C for I/O in mixed language programming.
3. You need to declare what the types are of the C routines in python:
test_add = mylib.test_add
test_add.argtypes = [ctypes.c_float, ctypes.c_float]
test_add.restype = ctypes.c_float
test_passing_array = mylib.test_passing_array
test_passing_array.argtypes = [ctypes.POINTER(ctypes.c_int), ctypes.c_int]
test_passing_array.restype = None
13.6.1 Swig
Another way to let C and python interact is through Swig.
Let’s assume you have C code that you want to use from Python. First of all, you need to supply an
interface file for the routines you want to use.
Source file: }
{ Interface file:
time_t ltime; %module example
time(<ime); %{
return ctime(<ime); /* Put header files here or function declarations like
} extern double My_variable;
extern int fact(int n);
extern int my_mod(int x, int y);
extern char *get_time();
%}
You now use a combination of Swig and the regular compiler to generate the interface:
swig -python example.i
${TACC_CC} -c example.c example_wrap.c \
-g -fPIC \
-I${TACC_PYTHON_INC}/python3.9
ld -shared example.o example_wrap.o -o _example.so
13.6.2 Boost
Another way to let C and python interact is through the Boost library.
Let’s start with a C/C++ file that was written for some other purpose, and with no knowledge of Python
or interoperability tools:
char const* greet()
{
return "hello, world";
}
With it, you should have a .h header file with the function signatures.
Next, you write a C++ file that uses the Boost tools:
#include <boost/python.hpp>
#include "hello.h"
BOOST_PYTHON_MODULE(hello_ext)
{
using namespace boost::python;
def("greet", greet);
}
The crucial step is compiling both C/C++ files together into a dynamic library:
icpc -shared -o hello_ext.so hello_ext.o hello.o \
-Wl,-rpath,/pythonboost/lib -L/pythonboost/lib -lboost_python39 \
-Wl,-rpath,/python/lib -L/python/lib -lpython3
You can now import this library in python, giving you access to the C function:
import hello_ext
print(hello_ext.greet())
Bit operations
In most of this book we consider numbers, such as integer or floating point representations of real num-
bers, as our lowest building blocks. Sometimes, however, it is necessary to dig deeper and consider the
actual representation of such numbers in terms of bits.
Various programming languages have support for bit operations. We will explore the various options. For
details on C++ and Fortran, see Introduction to Scientific Programming book, section 5.2.1 and Introduction
to Scientific Programming book, section 30.7 respectively.
gives octal and hexadecimal representation, but there is no format specifier for binary. Instead use the
following bit of magic:
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
for (int i=size-1; i>=0; i--) {
for (int j=7; j>=0; j--) {
byte = (b[i] >> j) & 1;
printf("%u", byte);
}
}
}
/* ... */
printBits(sizeof(i),&i);
181
14. Bit operations
14.1.2 Python
• The python int function converts a string to int. A second argument can indicate what base the
string is to be interpreted in:
five = int('101',2)
maxint32 = int('0xffffffff',16)
that allocates Nbytes of memory, where the first byte has an address that is a multiple
of aligned_bits.
183
15. LaTeX for scientific documentation
Originally, the latex compiler would output a device independent file format, named dvi, which could
then be translated to PostScript or PDF, or directly printed. These days, many people use the pdflatex
program which directly translates .tex files to .pdf files. This has the big advantage that the generated
PDF files have automatic cross linking and a side panel with table of contents. An illustration is found
below.
Let us do a simple example.
\documentclass{article}
\begin{document}
Hello world!
\end{document}
Exercise 15.1. Create a text file minimal.tex with the content as in figure 15.1. Try the com-
mand pdflatex minimal or latex minimal. Did you get a file minimal.pdf in the
first case or minimal.dvi in the second case? Use a pdf viewer, such as Adobe Reader,
or dvips respectively to view the output.
Things to watch out for. If you make a typo, TEX can be somewhat unfriendly. If you get
an error message and TEX is asking for input, typing x usually gets you out, or Ctrl-C.
Some systems allow you to type e to go directly into the editor to correct the typo.
\begin{document}
\end{document}
The ‘documentclass’ line needs a class name in between the braces; typical values are ‘article’ or ‘book’.
Some organizations have their own styles, for instance ‘ieeeproc’ is for proceedings of the IEEE.
All document text goes between the \begin{document} and \end{document} lines. (Matched ‘begin’
and ‘end’ lines are said to denote an ‘environment’, in this case the document environment.)
The part before \begin{document} is called the ‘preamble’. It contains customizations for this particular
document. For instance, a command to make the whole document double spaced would go in the preamble.
If you are using pdflatex to format your document, you want a line
\usepackage{hyperref}
here.
Have you noticed the following?
• The backslash character is special: it starts a LATEX command.
• The braces are also special: they have various functions, such as indicating the argument of a
command.
• The percent character indicates that everything to the end of the line is a comment.
Exercise 15.2. Create a file first.tex with the content of figure 15.1 in it. Type some text in
the preamble, that is, before the \begin{document} line and run pdflatex on your file.
Intended outcome. You should get an error message because you are not allowed to
have text in the preamble. Only commands are allowed there; all text has to go after
\begin{document}.
Exercise 15.3. Edit your document: put some text in between the \begin{document} and
\end{document} lines. Let your text have both some long lines that go on for a while,
and some short ones. Put superfluous spaces between words, and at the beginning or end
of lines. Run pdflatex on your document and view the output.
Intended outcome. You notice that the white space in your input has been collapsed in
the output. TEX has its own notions about what space should look like, and you do not
have to concern yourself with this matter.
Exercise 15.4. Edit your document again, cutting and pasting the paragraph, but leaving a blank
line between the two copies. Paste it a third time, leaving several blank lines. Format, and
view the output.
Intended outcome. TEX interprets one or more blank lines as the separation between
paragraphs.
Exercise 15.5. Add \usepackage{pslatex} to the preamble and rerun pdflatex on your
document. What changed in the output?
Intended outcome. This should have the effect of changing the typeface from the default
to Times Roman.
Things to watch out for. Typefaces are notoriously unstandardized. Attempts to use dif-
ferent typefaces may or may not work. Little can be said about this in general.
Add the following line before the first paragraph:
\section{This is a section}
and a similar line before the second. Format. You see that LATEX automatically numbers the sections, and
that it handles indentation different for the first paragraph after a heading.
Exercise 15.6. Replace article by artikel3 in the documentclass declaration line and refor-
mat your document. What changed?
Intended outcome. There are many documentclasses that implement the same commands
as article (or another standard style), but that have their own layout. Your document
should format without any problem, but get a better looking layout.
Things to watch out for. The artikel3 class is part of most distributions these days,
but you can get an error message about an unknown documentclass if it is missing or
if your environment is not set up correctly. This depends on your installation. If the file
seems missing, download the files from http://tug.org/texmf-dist/tex/latex/
ntgclass/ and put them in your current directory; see also section 15.2.9.
15.2.3 Math
Purpose. In this section you will learn the basics of math typesetting
One of the goals of the original TEX system was to facilitate the setting of mathematics. There are two
ways to have math in your document:
• Inline math is part of a paragraph, and is delimited by dollar signs.
• Display math is, as the name implies, displayed by itself.
Exercise 15.7. Put $x+y$ somewhere in a paragraph and format your document. Put \[x+y\]
somewhere in a paragraph and format.
Intended outcome. Formulas between single dollars are included in the paragraph where
you declare them. Formulas between \[...\] are typeset in a display.
For display equations with a number, use an equation environment. Try this.
Here are some common things to do in math. Make sure to try them out.
• Subscripts and superscripts: $x_i^2$. If the sub or superscript is more than a single symbol, it
needs to be grouped: $x_{i+1}^{2n}$. If you need a brace in a formula, use $\{ \}$.
15.2.4 Referencing
Purpose. In this section you will see TEX’s cross referencing mechanism in action.
So far you have not seen LATEX do much that would save you any work. The cross referencing mechanism
of LATEX will definitely save you work: any counter that LATEX inserts (such as section numbers) can be
referenced by a label. As a result, the reference will always be correct.
Start with an example document that has at least two section headings. After your first section heading,
put the command \label{sec:first}, and put \label{sec:other} after the second section heading.
These label commands can go on the same line as the section command, or on the next. Now put
As we will see in section~\ref{sec:other}.
in the paragraph before the second section. (The tilde character denotes a non-breaking space.)
Exercise 15.9. Make these edits and format the document. Do you see the warning about an
undefined reference? Take a look at the output file. Format the document again, and
check the output again. Do you have any new files in your directory?
Intended outcome. On a first pass through a document, the TEX compiler will gather all
labels with their values in a .aux file. The document will display a double question mark
for any references that are unknown. In the second pass the correct values will be filled
in.
Things to watch out for. If after the second pass there are still undefined references, you
probably made a typo. If you use the bibtex utility for literature references, you will
regularly need three passes to get all references resolved correctly.
Above you saw that the equation environment gives displayed math with an equation number. You can
add a label to this environment to refer to the equation number.
Exercise 15.10. Write a formula in an equation environment, and add a label. Refer to this
label anywhere in the text. Format (twice) and check the output.
Intended outcome. The \label and \ref command are used in the same way for formulas
as for section numbers. Note that you must use \begin/end{equation} rather than
\[...\] for the formula.
15.2.5 Lists
Purpose. In this section you will see the basics of lists.
Exercise 15.11. Add some lists to your document, including nested lists. Inspect the output.
Intended outcome. Nested lists will be indented further and the labeling and numbering
style changes with the list depth.
Exercise 15.12. Add a label to an item in an enumerate list and refer to it.
Intended outcome. Again, the \label and \ref commands work as before.
15.2.7 Graphics
Since you can not immediately see the output of what you are typing, sometimes the output may come
as a surprise. That is especially so with graphics. LATEX has no standard way of dealing with graphics, but
the following is a common set of commands:
\usepackage{graphicx} % this line in the preamble
The figure can be in any of a number of formats, except that PostScript figures (with extension .ps or
.eps) can not be used if you use pdflatex.
Since your figure is often not the right size, the include line will usually have something like:
\includegraphics[scale=.5]{myfigure}
A bigger problem is that figures can be too big to fit on the page if they are placed where you declare them.
For this reason, they are usually treated as ‘floating material’. Here is a typical declaration of a figure:
\begin{figure}[ht]
\includegraphics{myfigure}
\caption{This is a figure.}
\label{fig:first}
\end{figure}
declares that the figure has to be placed here if possible, at the bottom of the page if that’s not
possible, and on a page of its own if it is too big to fit on a page with text.
• A caption to be put under the figure, including a figure number;
• A label so that you can refer to the figure number by its label: figure~\ref{fig:first}.
• And of course the figure material. There are various ways to fine-tune the figure placement. For
instance
\begin{center}
\includegraphics{myfigure}
\end{center}
and format your document two more times. There should now be a bibliography in it, and a correct
citation. You will also see that files mydocument.bbl and mydocument.blg have been created.
15.3.1 Listings
The ‘listings’ package is makes it possible to have source code included, with coloring and indentation
automatically taken care of.
\documentclass{article} },emphstyle={[2]\color{green!40!black}}
}
\usepackage[pdftex]{hyperref} \lstset{emph={[3] %% types
\usepackage{pslatex} MPI_Aint,MPI_Comm,MPI_Count,MPI_Datatype,MPI_Errhandl
},emphstyle={[3]\color{yellow!30!brown}\bfseries},
}
%%%%
%%%% Import the listings package \begin{document}
%%%% \title{SSC 335: listings demo}
\usepackage{listings,xcolor} \author{Victor Eijkhout}
\date{today}
%%%% \maketitle
%%%% Set a basic code style
%%%% (see documentation for more options} \section{C examples}
%%%%
\lstdefinestyle{reviewcode}{ \lstset{language=C}
belowcaptionskip=1\baselineskip, breaklines=true,\begin{lstlisting}
frame=L,
xleftmargin=\parindent, showstringspaces=false, int main() {
basicstyle=\footnotesize\ttfamily, MPI_Init();
keywordstyle=\bfseries\color{blue}, MPI_Comm comm = MPI_COMM_WORLD;
commentstyle=\color{red!60!black}, if (x==y)
identifierstyle=\slshape\color{black}, MPI_Send( &x,1,MPI_INT,0,0,comm);
stringstyle=\color{green!60!black}, columns=fullflexible,
else
keepspaces=true,tabsize=8, MPI_Recv( &y,1,MPI_INT,1,1,comm,MPI_STATUS_IGNORE);
} MPI_Finalize();
\lstset{style=reviewcode} }
\end{lstlisting}
\lstset{emph={ %% MPI commands
MPI_Init,MPI_Initialized,MPI_Finalize,MPI_Finalized,MPI_Abort,
\section{Fortran examples}
MPI_Comm_size,MPI_Comm_rank,
MPI_Send,MPI_Isend,MPI_Rsend,MPI_Irsend,MPI_Ssend,MPI_Issend,
\lstset{language=Fortran}
MPI_Recv,MPI_Irecv,MPI_Mrecv,MPI_Sendrecv,MPI_Sendrecv_replace,
\begin{lstlisting}
},emphstyle={\color{red!70!black}\bfseries} Program myprogram
} Type(MPI_Comm) :: comm = MPI_COMM_WORLD
\lstset{emph={[2] %% constants call MPI_Init()
MPI_COMM_WORLD,MPI_STATUS_IGNORE,MPI_STATUSES_IGNORE,MPI_STATUS_SIZE,
if (.not. x==y ) then
MPI_INT,MPI_INTEGER, call MPI_Send( x,1,MPI_INTEGER,0,0,comm);
Victor Eijkhout
today
1 This is a section
This is a test document, used in [2]. It contains a discussion in section 2.
Exercise 1. Left to the reader.
Exercise 2. Also left to the reader, just like in exercise 1
This is a formula: a ⇐ b.
(k)
xi ← yi j · x j (1)
R1√
Text: 0 x dx
Z 1
√
x dx
0
As I showed in the introductory section 1, in the paper [1], it was shown that equation (1)
• There is an item.
Contents
1 This is a section 1
2 This is another section 1
List of Figures
1 this is the only figure 1
References
[1] Loyce M. Adams and Harry F. Jordan. Is SOR color-blind? SIAM J. Sci. Stat. Comput.,
7:490–506, 1986.
[2] Victor Eijkhout. Short LATEX demo. SSC 335, oct 1, 2008.
else \end{lstlisting}
call MPI_Recv( y,1,MPI_INTEGER,1,1,comm,MPI_STATUS_IGNORE)
end if \end{document}
call MPI_Finalize()
End Program myprogram
You have seen how to include graphics files, but it is also possible to let LATEX do the drawing. For this,
there is the tikz package. Here we show another package pgfplots that uses tikz to draw numerical plots.
Victor Eijkhout
today
1 Two graphs
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna ali-
90 qua. Pharetra massa massa ultricies mi quis hendrerit. Tempor
nec feugiat nisl pretium fusce id velit ut tortor. Eget nulla fa-
#Average Marks
80
80 cilisi etiam dignissim diam quis enim. Cursus sit amet dictum
70 sit amet justo donec. Tortor consequat id porta nibh venenatis
cras sed felis eget. Senectus et netus et malesuada fames ac
60 turpis egestas integer. Ultricies mi quis hendrerit dolor magna
60 eget est. A iaculis at erat pellentesque adipiscing. Sagittis orci
50 a scelerisque purus. Quisque non tellus orci ac. Nisl nunc
mi ipsum faucibus. Vivamus at augue eget arcu dictum var-
ius duis. Maecenas ultricies mi eget mauris pharetra et ultri-
Tom Jack Hary Liza Henry ces neque ornare. Pulvinar neque laoreet suspendisse inter-
Students Name dum consectetur. Nunc id cursus metus aliquam eleifend mi.
Tristique sollicitudin nibh sit amet commodo nulla. Massa tin-
cidunt nunc pulvinar sapien et ligula ullamcorper malesuada.
Justo laoreet sit amet cursus sit. Laoreet id donec ultrices tincidunt arcu non sodales.
Sem nulla pharetra diam sit amet. Vel pharetra vel turpis nunc
eget. Vulputate dignissim suspendisse in est ante in nibh mau-
#Annual Growth Percentage
ris cursus. Sem viverra aliquet eget sit amet tellus cras. Rhon- 80
80 78
cus aenean vel elit scelerisque mauris pellentesque pulvinar 75
pellentesque. Fusce ut placerat orci nulla pellentesque. Vel
risus commodo viverra maecenas accumsan lacus vel facilisis 70
70 68
volutpat. Enim ut tellus elementum sagittis vitae et. In nibh
mauris cursus mattis molestie. Curabitur gravida arcu ac tor- 63
61
tor dignissim convallis aenean et tortor. Mauris commodo quis 60 59
imperdiet massa. 55
1 C examples
int main() {
MPI_Init();
MPI_Comm comm = MPI_COMM_WORLD;
if (x==y)
MPI_Send( &x,1,MPI_INT,0,0,comm);
else
MPI_Recv( &y,1,MPI_INT,1,1,comm,MPI_STATUS_IGNORE);
MPI_Finalize();
}
2 Fortran examples
Program myprogram
Type(MPI_Comm) :: comm = MPI_COMM_WORLD
call MPI_Init()
if (.not. x==y ) then
call MPI_Send( x,1,MPI_INTEGER,0,0,comm);
else
call MPI_Recv( y,1,MPI_INTEGER,1,1,comm,MPI_STATUS_IGNORE)
end if
call MPI_Finalize()
End Program myprogram
Much of the teaching in this book is geared towards enabling you to write fast code, whether this is
through the choice of the right method, or through optimal coding of a method. Consequently, you some-
times want to measure just how fast your code is. If you have a simulation that runs for many hours, you’d
think just looking on the clock would be enough measurement. However, as you wonder whether your
code could be faster than it is, you need more detailed measurements. This tutorial will teach you some
ways to measure the behavior of your code in more or less detail.
16.1 Timers
There are various ways of timing your code, but mostly they come down to calling a timer routine twice
that tells you the clock values:
tstart = clockticks()
....
tend = clockticks()
runtime = (tend-tstart)/ticks_per_sec
198
16.1. Timers
16.1.1 Fortran
For instance, in Fortran there is the system_clock routine:
implicit none
INTEGER :: rate, tstart, tstop
REAL :: time
real :: a
integer :: i
with output
Clock frequency: 10000
1.000000 813802544 813826097 2.000000
16.1.2 C
In C there is the clock function: with output
clock resolution: 1000000
res: 1.000000e+00
start/stop: 0.000000e+00,2.310000e+00
Time: 2.310000e+00
Do you see a difference between the Fortran and C approaches? Hint: what happens in both cases when
the execution time becomes long? At what point do you run into trouble?
16.1.3 C++
While C routines are available in C++, there is also a new chrono library that can do many things, including
handling different time formats.
std::chrono::system_clock::time_point start_time;
start_time = std::chrono::system_clock::now();
// ... code ...
auto duration =
std::chrono::system_clock::now()-start_time;
auto millisec_duration =
std::chrono::duration_cast<std::chrono::milliseconds>(duration);
std::cout << "Time in milli seconds: "
<< .001 * millisec_duration.count() << endl;
For more details, see Introduction to Scientific Programming book, section 24.8.
and gettimeofday
#include <sys/time.h>
double time00(void)
{
struct timeval tp;
gettimeofday(&tp, NULL);
return( (double) (tp.tv_sec + tp.tv_usec/1000000.0) ); /* wall
}
These timers have the advantage that they can distinguish between user time and system time, that is,
exclusively timing program execution or giving wallclock time including all system activities.
However, this approach of using processor-specific timers is not portable. For this reason, the PAPI pack-
age (http://icl.cs.utk.edu/papi/) provides a uniform interface to hardware counters. You can see
this package in action in the codes in appendix HPC book, section 31.
In addition to timing, hardware counters can give you information about such things as cache misses
and instruction counters. A processor typically has only a limited number of counters, but they can be
assigned to various tasks. Additionally, PAPI has the concept of derived metrics.
Barrier();
tstart = Wtime();
Barrier();
duration = Wtime()-tstart;
16.3.1 gprof
The profiler of the GNU compiler, gprof requires recomplication and linking with an extra flag:
% gcc -g -pg -c ./srcFile.c
% gcc -g -pg -o MyProgram ./srcFile.o
16.3.2 perf
Coming with most Unix distributions, perf does not require any instrumentation.
Run:
perf record myprogram myoptions
perf record --call-graph fp myprogram myoptions
The display may be interactive; the following gives a pure ascii display, limiting to events amount to more
than one percent, and printing out only the columns of percentage and routine name:
perf report --stdio \
--percent-limit=1 \
--fields=Overhead,Symbol
Example:
+ 14.15% 4.07% fsm.exe fsm.exe [.] std::vector<richdem::dephier::
Depression<double>, std::allocator<richdem::dephier::Depression<double> > >::at
+ 8.92% 4.58% fsm.exe fsm.exe [.] std::vector<richdem::dephier::
Depression<double>, std::allocator<richdem::dephier::Depression<double> > >::
_M_range_check
This shows that 14% of the time is spent in indexing with at, and that more than half of that went into
the range checking.
For graphical output you can use vtune-gui. Rather than analyzing results, this lets you set up, run, and
analyze an application.
16.4 Tracing
In profiling we are only concerned with aggregate information: how many times a routine was called,
and with what total/average/min/max runtime. However sometimes we want to know about the exact
timing of events. This is especially relevant in a parallel context when we care about load unbalance and
idle time.
Tools such as Vampyr can collect trace information about events and in particular messages, and render
them in displays such as figure 16.2.
TAU
The TAU tool [20] (see http://www.cs.uoregon.edu/research/tau/home.php for the official doc-
umentation) uses instrumentation to profile and trace your code. That is, it adds profiling and trace calls
to your code. You can then inspect the output after the run.
Profiling is the gathering and displaying of bulk statistics, for instance showing you which routines take
the most time, or whether communication takes a large portion of your runtime. When you get concerned
about performance, a good profiling tool is indispensible.
Tracing is the construction and displaying of time-dependent information on your program run, for in-
stance showing you if one process lags behind others. For understanding a program’s behaviour, and the
reasons behind profiling statistics, a tracing tool can be very insightful.
• You can have the instrumentation added at compile time. For this, you need to let TAU take over
the compilation in some sense.
1. TAU has its own makefiles. The names and locations depend on your installation, but typi-
cally it will be something like
export TAU_MAKEFILE=$TAU_HOME/lib/Makefile.tau-mpi-pdt
2. Now you can invoke the TAU compilers tau_cc,sh, tau_cxx.sh, tau_f90.sh.
When you run your program you need to tell TAU what to do:
205
17. TAU
export TAU_TRACE=1
export TAU_PROFILE=1
export TRACEDIR=/some/dir
export PROFILEDIR=/some/dir
17.2 Instrumentation
Unlike such tools as VTune which profile your binary as-is, TAU can work by adding instrumentation
to your code: in effect it is a source-to-source translator that takes your code and turns it into one that
generates run-time statistics.
This instrumentation is largely done for you; you mostly need to recompile your code with a script that
does the source-to-source translation, and subsequently compiles that instrumented code. You could for
instance have the following in your makefile:
ifdef TACC_TAU_DIR
CC = tau_cc.sh
else
CC = mpicc
endif
% : %.c
<TAB>${CC} -o $@ $^
If TAU is to be used (which we detect here by checking for the environment variable TACC_TAU_DIR), we
define the CC variable as one of the TAU compilation scripts; otherwise we set it to a regular MPI compiler.
Fortran note. Cpp includes If your source contains
#include "something.h"
Remark 18 The PETSc library can be compiled with TAU instrumentation enabled by adding the --with-
perfstubs-tau=1 option at configuration time.
17.3 Running
You can now run your instrumented code; trace/profile output will be written to file if environment vari-
ables TAU_PROFILE and/or TAU_TRACE are set:
export TAU_PROFILE=1
export TAU_TRACE=1
A TAU run can generate many files: typically at least one per process. It is therefore advisabe to create a
directory for your tracing and profiling information. You declare them to TAU by setting the environment
variables PROFILEDIR and TRACEDIR.
mkdir tau_trace
mkdir tau_profile
export PROFILEDIR=tau_profile
export TRACEDIR=tau_trace
TACC note. At TACC, use ibrun without a processor count; the count is derived from the queue submis-
sion parameters.
While this example uses two separate directories, there is no harm in using the same for both.
17.4 Output
The tracing/profiling information is spread over many files, and hard to read as such. Therefore, you need
some further programs to consolidate and display the information.
You view profiling information with paraprof
paraprof tau_profile
If you skip the tau_timecorrect step, you can generate the slog2 file by:
tau2slog2 tau.trc tau.edf -o yourprogram.slog2
17.6 Examples
17.6.1 Bucket brigade
Let’s consider a bucket brigade implementation of a broadcast: each process sends its data to the next
higher rank.
int sendto =
( procno<nprocs-1 ? procno+1 : MPI_PROC_NULL )
;
int recvfrom =
( procno>0 ? procno-1 : MPI_PROC_NULL )
;
MPI_Recv( leftdata,1,MPI_DOUBLE,recvfrom,0,comm,MPI_STATUS_IGNORE);
myvalue = leftdata
MPI_Send( myvalue,1,MPI_DOUBLE,sendto,0,comm);
We implement the bucket brigade with blocking sends and receives: each process waits to receive from
its predecessor, before sending to its successor.
// bucketblock.c
if (procno>0)
MPI_Recv(leftdata, N, MPI_DOUBLE,recvfrom,0, comm, MPI_STATUS_IGNORE);
for (int i=0; i<N; i++)
myvalue[i] = (procno+1)*(procno+1) + leftdata[i];
if (procno<nprocs-1)
MPI_Send(myvalue,N, MPI_DOUBLE,sendto,0, comm);
The TAU trace of this is in figure 17.1, using 4 nodes of 4 ranks each. We see that the processes within
each node are fairly well synchronized, but there is less synchronization between the nodes. However,
the bucket brigade then imposes its own synchronization on the processes because each has to wait for
its predecessor, no matter if it posted the receive operation early.
Next, we introduce pipelining into this operation: each send is broken up into parts, and these parts are
sent and received with non-blocking calls.
// bucketpipenonblock.c
MPI_Request rrequests[PARTS];
for (int ipart=0; ipart<PARTS; ipart++) {
MPI_Irecv
(
leftdata+partition_starts[ipart],partition_sizes[ipart],
MPI_DOUBLE,recvfrom,ipart,comm,rrequests+ipart);
}
!! cgb.f
do i = 1, l2npcols
call mpi_irecv( d,
> 1,
> dp_type,
> reduce_exch_proc(i),
> i,
> mpi_comm_world,
> request,
> ierr )
call mpi_send( sum,
> 1,
> dp_type,
> reduce_exch_proc(i),
> i,
> mpi_comm_world,
> ierr )
sum = sum + d
enddo
We recognize this structure in the TAU trace: figure 17.3. Upon closer examination, we see how this
particular algorithm induces a lot of wait time. Figures 17.5 and 17.6 show a whole cascade of processes
Figure 17.6: Four stages of processes waiting caused by a single lagging process
SLURM
Supercomputer clusters can have a large number of nodes, but not enough to let all their users run si-
multaneously, and at the scale that they want. Therefore, users are asked to submit jobs, which may start
executing immediately, or may have to wait until resources are available.
The decision when to run a job, and what resources to give it, is not done by a human operator, but by
software called a batch system. (The Stampede cluster at TACC ran close to 10 million jobs over its lifetime,
which corresponds to starting a job every 20 seconds.)
This tutorial will cover the basics of such systems, and in particular Simple Linux Utility for Resource
Management (SLURM).
215
18. SLURM
18.2 Queues
Jobs often can not start immediately, because not enough resources are available, or because other jobs
may have higher priority (see section 18.7). It is thus typical for a job to be put on a queue, scheduled, and
started, by a batch system such as SLURM.
Batch systems do not put all jobs in one big pool: jobs are submitted to any of a number of queues, that
are all scheduled separately.
Queues can differ in the following ways:
• If a cluster has different processor types, those are typically in different queues. Also, there may
be separate queues for the nodes that have a Graphics Processing Unit (GPU) attched. Having
multiple queues means you have to decide what processor type you want your job to run on,
even if your executable is binary compatible with all of them.
• There can be ‘development’ queues, which have restrictive limits on runtime and node count, but
where jobs typically start faster.
• Some clusters have ‘premium’ queues, which have a higher charge rate, but offer higher priority.
• ‘Large memory nodes’ are typically also in a queue of their own.
• There can be further queues for jobs with large resource demands, such as large core counts, or
longer-than-normal runtimes.
For slurm, the sinfo command can tell you much about the queues.
# what queues are there?
sinfo -o "%P"
# what queues are there, and what is their status?
sinfo -o "%20P %.5a"
Exercise 18.2. Enter these commands. How many queues are there? Are they all operational at
the moment?
All options regarding the job run are contained in the script file, as we will now discuss.
As a result of your job submission you get a job id. After submission you can queury your job with squeue:
squeue -j 123456
The squeue command reports various aspects of your job, such as its status (typically pending or running);
and if it is running, the queue (or ‘partition’) where it runs, its elapsed time, and the actual nodes where
it runs.
squeue -j 5807991
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5807991 development packingt eijkhout R 0:04 2 c456-[012,034]
If you discover errors in your script after submitting it, including when it has started running, you can
cancel your job with scancel:
scancel 1234567
Common options (except parallelism related options which are discussed in section 18.5) are:
• -J: the jobname. This will be displayed when you call squeue.
• -o: name of the output file. This will contain all the stdout output of the script.
• -e: name of the error file. This will contain all the stderr output of the script, as well as slurm
error messages.
It can be a good idea to make the output and error file unique per job. To this purpose, the macro
%j is available, which at execution time expands to the job number. You will then get an output
file with a name such as myjob.o2384737.
• -p: the partition or queue. See above.
• -t hh:mm:ss: the maximum running time. If your job exceeds this, it will get cancelled. Two
considerations:
1. You can not specify a duration here that is longer than the queue limit.
2. The shorter your job, the more likely it is to get scheduled sooner rather than later.
• -w c452-[101-104,111-112,115] specific nodes to place the job.
• -A: the name of the account to which your job should be billed.
• --mail-user=you@where Slurm can notify you when a job starts or ends. You may for instance
want to connect to a job when it starts (to run top), or inspect the results when it’s done, but not
sit and stare at your terminal all day. The action of which you want to be notified is specified with
(among others) --mail-type=begin/end/fail/all
• --dependency=after:123467 indicates that this job is to start after jobs 1234567 finished. Use
afterok to start only if that job successfully finished. (See https://cvw.cac.cornell.edu/
slurm/submission_depend for more options.)
• --nodelist allows you to specify specific nodes. This can be good for getting reproducible tim-
ings, but it will probably increase your wait time in the queue.
• --array=0-30 is a specification for ‘array jobs’: a task that needs to be executed for a range of
parameter values.
TACC note. Arry jobs are not supported at TACC; use a launcher instead; section 18.5.3.
• --mem=10000 specifies the desired amount of memory per node. Default units are megabytes,
but can be explicitly indicated with K/M/G/T.
TACC note. This option can not be used to request arbitrary memory: jobs always have access
to all available physical memory, and use of shared memory is not allowed.
See https://slurm.schedmd.com/sbatch.html for a full list.
Exercise 18.3. Write a script that executes the date command twice, with a sleep in between.
Submit the script and investigate the output.
18.4.2 Environment
Your job script acts like any other shell script when it is executed. In particular, it inherits the calling
environment with all its environment variables. Additionally, slurm defines a number of environment
variables, such as the job ID, the hostlist, and the node and process count.
It would be possible to specify only the node count or the core count, but that takes away flexibility:
• If a node has 40 cores, but your program stops scaling at 10 MPI ranks, you would use:
#SBATCH -N 1
#SBATCH -n 10
• If your processes use a large amount of memory, you may want to leave some cores unused. On
a 40-core node you would either use
#SBATCH -N 2
#SBATCH -n 40
or
#SBATCH -N 1
#SBATCH -n 20
Rather than specifying a total core count, you can also specify the core count per node with --ntasks-per-node.
Exercise 18.4. Go through the above examples and replace the -n option by an equivalent
--ntasks-per-node values.
Python note. Python MPI programs Python programs using mpi4py should be treated like other MPI
programs, except that instead of an executable name you specify the python executable and the
script name:
ibrun python3 mympi4py.py
You can then ssh into the compute nodes of your job; normally, compute nodes are off-limits. This is
useful if you want to run top to see how your processes are doing.
18.9 Examples
Very sketchy section.
#!/bin/sh
you get the hostname of the login node from which your job was submitted.
Exercise 18.10. Which of these are shared with other users when your job is running:
• Memory;
• CPU;
• Disc space?
Exercise 18.11. What is the command for querying the status of your job?
• sinfo
• squeue
• sacct
Exercise 18.12. On 4 nodes with 40 cores each, what’s the largest program run, measured in
• MPI ranks;
• OpenMP threads?
SimGrid
Many readers of this book will have access to some sort of parallel machine so that they can run simu-
lations, maybe even some realistic scaling studies. However, not many people will have access to more
than one cluster type so that they can evaluate the influence of the interconnect. Even then, for didactic
purposes one would often wish for interconnect types (fully connected, linear processor array) that are
unlikely to be available.
In order to explore architectural issues pertaining to the network, we then resort to a simulation tool,
SimGrid.
Installation
Compilation You write plain MPI files, but compile them with the SimGrid compiler smpicc.
Running SimGrid has its own version of mpirun: smpirun. You need to supply this with options:
• -np 123456 for the number of (virtual) processors;
• -hostfile simgridhostfile which lists the names of these processors. You can basically
make these up, but are defined in:
• -platform arch.xml which defines the connectivity between the processors.
For instance, with a hostfile of 8 hosts, a linearly connected network would be defined as:
<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid/simgrid.dtd">
<platform version="4">
224
<host id="host4" speed="1Mf"/>
<host id="host5" speed="1Mf"/>
<host id="host6" speed="1Mf"/>
<host id="host7" speed="1Mf"/>
<host id="host8" speed="1Mf"/>
<link id="link1" bandwidth="125MBps" latency="100us"/>
<!-- the routing: specify how the hosts are interconnected -->
<route src="host1" dst="host2"><link_ctn id="link1"/></route>
<route src="host2" dst="host3"><link_ctn id="link1"/></route>
<route src="host3" dst="host4"><link_ctn id="link1"/></route>
<route src="host4" dst="host5"><link_ctn id="link1"/></route>
<route src="host5" dst="host6"><link_ctn id="link1"/></route>
<route src="host6" dst="host7"><link_ctn id="link1"/></route>
<route src="host7" dst="host8"><link_ctn id="link1"/></route>
</zone>
</platform>
Bibliography
[1] Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger. The Awk Programming Language.
Addison-Wesley Series in Computer Science. Addison-Wesley Publ., 1988. ISBN 020107981X,
9780201079814. [Cited on page 37.]
[2] L.S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammerling,
G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley. ScaLAPACK Users’ Guide. SIAM, 1997.
[Cited on page 122.]
[3] Netlib.org BLAS reference implementation. http://www.netlib.org/blas. [Cited on page 122.]
[4] Yaeyoung Choi, Jack J. Dongarra, Roldan Pozo, and David W. Walker. Scalapack: a scalable linear
algebra library for distributed memory concurrent computers. In Proceedings of the fourth symposium
on the frontiers of massively parallel computation (Frontiers ’92), McLean, Virginia, Oct 19–21, 1992,
pages 120–127, 1992. [Cited on page 122.]
[5] Edsger W. Dijkstra. Programming as a discipline of mathematical nature. Am. Math. Monthly, 81:608–
612, 1974. [Cited on page 151.]
[6] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff. A set of level 3 basic linear
algebra subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, March 1990. [Cited on
page 122.]
[7] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of
FORTRAN basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14(1):1–
17, March 1988. [Cited on page 122.]
[8] Dale Dougherty and Arnold Robbins. sed & awk. O’Reilly Media, 2nd edition edition. Print ISBN:
978-1-56592-225-9 , ISBN 10:1-56592-225-5; Ebook ISBN: 978-1-4493-8700-6, ISBN 10:1-4493-8700-4.
[Cited on page 37.]
[9] Victor Eijkhout. The Science of TEX and LATEX. lulu.com, 2012. [Cited on page 41.]
[10] C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, pages
576–580, October 1969. [Cited on page 151.]
[11] Helmut Kopka and Patrick W. Daly. A Guide to LATEX. Addison-Wesley, first published 1992. [Cited on
page 196.]
[12] L. Lamport. LATEX, a Document Preparation System. Addison-Wesley, 1986. [Cited on page 196.]
[13] C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for
fortran usage. ACM Trans. Math. Softw., 5(3):308–323, September 1979. [Cited on page 122.]
226
[14] Robert Mecklenburg. Managing Projects with GNU Make. O’Reilly Media, 3rd edition edition, 2004.
Print ISBN:978-0-596-00610-5 ISBN 10:0-596-00610-1 Ebook ISBN:978-0-596-10445-0 ISBN 10:0-596-
10445-6. [Cited on page 57.]
[15] Sandra Mendez, Sebastian Lührs, Volker Weinberg, Dominic Sloan-Murphy, and Andrew
Turner. Best practice guide - parallel i/o. https://prace-ri.eu/training-support/
best-practice-guides/best-practice-guide-parallel-io/, 02 2019. [Cited on page 139.]
[16] Frank Mittelbach, Michel Goossens, Johannes Braams, David Carlisle, and Chris Rowley. The LATEX
Companion, 2nd edition. Addison-Wesley, 2004. [Cited on page 196.]
[17] NASA Advaned Supercomputing Division. NAS parallel benchmarks. https://www.nas.nasa.
gov/publications/npb.html. [Cited on page 209.]
[18] Tobi Oetiker. The not so short introduction to LATEX. http://tobi.oetiker.ch/lshort/. [Cited on
pages 184 and 196.]
[19] Jack Poulson, Bryan Marker, Jeff R. Hammond, and Robert van de Geijn. Elemental: a new framework
for distributed memory dense matrix computations. ACM Transactions on Mathematical Software.
submitted. [Cited on page 122.]
[20] S. Shende and A. D. Malony. International Journal of High Performance Computing Applications,
20:287–331, 2006. [Cited on page 205.]
[21] TEX frequently asked questions. [Cited on page 196.]
[22] R. van de Geijn, Philip Alpatov, Greg Baker, Almadena Chtchelkanova, Joe Eaton, Carter Edwards,
Murthy Guddati, John Gunnels, Sam Guyer, Ken Klimkowski, Calvin Lin, Greg Morrow, Peter Nagel,
James Overfelt, and Michelle Pal. Parallel linear algebra package (PLAPACK): Release r0.1 (beta)
users’ guide. 1996. [Cited on page 122.]
[23] Robert A. van de Geijn. Using PLAPACK: Parallel Linear Algebra Package. The MIT Press, 1997. [Cited
on page 122.]
[24] Greg Wilson, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven
H. D. Haddock, Kathryn D. Huff, Ian M. Mitchell, Mark D. Plumbley, Ben Waugh, Ethan P. White,
and Paul Wilson. Best practices for scientific computing. PLOS Biology, 12(1):1–7, 01 2014. [Cited on
page 6.]
List of acronyms
228
MGS Modified Gram-Schmidt SLURM Simple Linux Utility for Resource Man-
ML Machine Learning agement
MPI Message Passing Interface SM Streaming Multiprocessor
MPL Message Passing Library SMP Symmetric Multi Processing
MSI Modified-Shared-Invalid SMT Symmetric Multi Threading
MTA Multi-Threaded Architecture SOA Structure-Of-Arrays
MTSP Multiple Traveling Salesman Problem SOR Successive Over-Relaxation
NUMA Non-Uniform Memory Access SSOR Symmetric Successive Over-Relaxation
ODE Ordinary Diffential Equation SP Streaming Processor
OO Object-Oriented SPMD Single Program Multiple Data
OOP Object-Oriented Programming
SPD symmetric positive definite
OS Operating System
SRAM Static Random-Access Memory
PGAS Partitioned Global Address Space
SSE SIMD Streaming Extensions
PDE Partial Diffential Equation
SSSP Single Source Shortest Path
PRAM Parallel Random Access Machine
RDMA Remote Direct Memory Access STL Standard Template Library
RNG Random Number Generator TBB Threading Building Blocks (Intel)
SAN Storage Area Network TDD Test-Drive Development
SAS Software As a Service TLB Translation Look-aside Buffer
SCS Shortest Common Superset TSP Traveling Salesman Problem
SFC Space-Filling Curve UB Undefined Behavior
SGD Stochastic Gradient Descent UMA Uniform Memory Access
SIMD Single Instruction Multiple Data UPC Unified Parallel C
SIMT Single Instruction Multiple Thread WAN Wide Area Network
Index
230
INDEX
ISBN 978-1-257-99254-6
90000
9 781257 992546