Introduction to LAMMPS
Stan Moore
SC22 Student Cluster Competition
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly
owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-
0003525. SAND2022-12475 PE
About Me
§ Stan Moore
§ One of the LAMMPS code developers at Sandia National
Laboratories in Albuquerque, New Mexico
§ Been at Sandia for 10 years
§ Main developer of the KOKKOS package in LAMMPS (runs
on GPUs and multi-core CPUs)
§ Expertise in long-range electrostatics
§ PhD in Chemical Engineering, dissertation on molecular
dynamics method development for predicting chemical
potential
2
Molecular Dynamics
§ Classical molecular dynamics (MD) models atom behavior
using Newton’s laws of motions
§ Normally use an empirical expression for forces (no explicit
electrons)
§ Atom positions à forces à velocities à new positions
§ Spherical cutoff gives O(N) linear scaling, can simulate billions
of atoms on a supercomputer
3
LAMMPS Overview
§ Large-scale Atomic/Molecular Massively Parallel Simulator
§ https://lammps.org
§ Open source, C++ code
§ Bio, materials, mesoscale
§ Particle simulator at varying length and time scales
§ Electrons à atomistic à coarse-grained à continuum
§ Spatial-decomposition of simulation domain for parallelism
§ Energy minimization, dynamics, non-equilibrium MD
§ GPU and OpenMP enhanced
§ Can be coupled to other scales: QM, kMC, FE, CFD, …
4
Statistical Mechanics Basics
§ Statistical Mechanics: relates macroscopic observations
(such as temperature and pressure) to microscopic
states (i.e. atoms)
§ Phase space: a space in which all possible states of a
system are represented. For N particles: 6N-dimensional
phase space (3 position variables and 3 momentum
variables for each particle)
§ Ensemble: an idealization consisting of a large number of
virtual copies of a system, considered all at once, each of
which represents a possible state that the real system
might be in, i.e. a probability distribution for the state of
the system
Statistical Mechanics Basics
§ Statistical Mechanics: relates macroscopic observations
(such as temperature and pressure) to microscopic
states (i.e. atoms)
§ Phase space: a space in which all possible states of a
system are represented. For N particles: 6N-dimensional
phase space (3 position variables and 3 momentum
variables for each particle)
§ Ensemble: an idealization consisting of a large number of
virtual copies of a system, considered all at once, each of
which represents a possible state that the real system
might be in, i.e. a probability distribution for the state of
the system
Molecular Dynamics: What is it?
Initial Positions Positions and
Velocities at
and Velocities
many later times
Interatomic
Potential
dri
Mathematical Formulation = vi
§ Classical Mechanics
dt
§ Atoms are Point Masses: r1, r2, ..... rN dvi Fi
Newton’s Equations: =
§ Positions, Velocities, Forces: ri, vi, Fi dt mi
§ Potential Energy Function = V(rN)
d
§ 6N coupled ODEs Fi = −
dri
( )
V rN
7
MD Versatility
Materials
Coupling to Science
Solid
Mechanics
Chemistry
Biophysics
Granular
Flow
8
MD Time & Length Scales
§ Quantum mechanical electronic
structure calculations (QM) provide
accurate description of mechanical and
chemical changes on the atom-scale,
10-6 s
but limited to ~1000 atoms
§ Atom-scale phenomena drive a lot of Continuum
Time
Mesoscale
interesting physics, chemistry, Large-
Models
materials science, mechanics, Scale MD
simulation
biology…but it usually plays out on a
QM
10-15 s
much larger scale
§ Mesoscale: much bigger than an atom, Å mm
much smaller than a glass of soda Distance
§ QM and continuum/mesoscale models
(CM) can not be directly compared—
large scale MD can bridge gap
Picture of soda glass: by Simon Cousins from High Wycombe, England - Bubbles, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=23020999 9
MD Basics 2D Triclinic
§ Atoms can be modeled as points
(most common), finite-size spheres,
or other shapes (e.g. ellipsoids)
§ Can model atomic-scale (all-atom
model) or meso/continuum scale
with MD-like models
§ Typically use an orthogonal or
triclinic (skewed) simulation cell
§ Commonly use periodic boundary
conditions: reduces finite size
effects from boundaries and
simulates bulk conditions
10
Simple Example: Crack
11
Interatomic Potentials
§ Quantum chemistry: solves Schrödinger equation (electron
interactions) to get forces on atoms. Accurate but very
computationally expensive and only feasible for small systems:
~1000 atoms
§ Molecular dynamics: uses empirical force fields, sometimes fit to
quantum data. Not as accurate but much faster
§ MD typically only considers pair-wise or three-body interactions,
scales as O(N) (billion atom simulations are considered huge)
Lennard-Jones Potential
repulsive wall
Interaction Energy
attractive tail
Pair-wise distance 12
Accuracy = Higher Cost
Moore’s Law for Interatomic Potentials
Plimpton and Thompson, MRS Bulletin (2012).
13
14
ReaxFF Potential
• Reactive model that captures bond breaking and formation
via bond order parameter
• Many complex equations required to capture physics
• Includes charge equilibration model (QEq)
%
$!"
𝐵𝑂!" = exp 𝜌# +
$#$
#
$!" % $!" %
exp 𝜌& $#%
+ exp 𝜌&& $#%%
& &&
𝐸𝑠𝑦𝑠𝑡𝑒𝑚 = 𝐸𝑏𝑜𝑛𝑑 + 𝐸𝑙𝑜𝑛𝑒 𝑝𝑎𝑖𝑟 +
𝐸𝑜𝑣𝑒𝑟 𝑐𝑜𝑜𝑟𝑑 + 𝐸𝑢𝑛𝑑𝑒𝑟 𝑐𝑜𝑜𝑟𝑑 +
𝐸𝑣𝑎𝑙𝑒𝑛𝑐𝑒 𝑎𝑛𝑔𝑙𝑒 + 𝐸𝑡𝑜𝑟𝑠𝑖𝑜𝑛 +
𝐸𝑐𝑜𝑛𝑗𝑢𝑔𝑎𝑡𝑖𝑜𝑛
+ 𝐸𝐻 𝐵𝑜𝑛𝑑𝑠 + 𝐸𝑣𝑑𝑊 +
𝐸𝑐𝑜𝑢𝑙𝑜𝑚𝑏 Chenoweth, van Duin and Goddard, J. Phys. Chem. A, 112, 1040-1053 (2008)
Strachan et al. J. Chem. Phys. 122, 054502 (2005)
Castro-Marcano et al. Combustion and Flame 159(3) 1272-1285 (2012)
Wood, van Duin, Strachan , J. Chem .Phys. A 118, 885 (2014)
Rappe and Goddard, J. Phys. Chem 1991, 95, 3358-3363
SNAP Machine Learning Potential
o ML interatomic potential (IAP) have three critical parts:
• Descriptors of the local environment
• Energy and force functions expressed in the descriptors
• Training (regression method) on large amount of ‘ground truth’ energies and forces
o Demonstrated ab initio accuracy in classical MD!
SNAP Performance on V100 GPU
Training
Data
~32x speedup in
FitSNAP 3 years!
15
Neighbor Lists
§ Neighbor lists are a list of neighboring atoms within the
interaction cutoff + skin for each central atom
§ Extra skin allows lists to be built less often
cutoff
16
Newton Option
§ Newton flag to off means that if two interacting atoms are on
different processors, both processors compute their
interaction and the resulting force information is not
communicated
§ Setting the newton flag to on saves computation but
increases communication
§ Performance depends on problem size, force cutoff lengths, a
machine’s compute/communication ratio, and how many
processors are being used
§ Newton off typically better for GPUs
newton on
newton off
17
Half Neighbor List
§ With newton flag on, each pair is stored only once (usually
better for CPUs), requires atomic operations for thread-safety
18
Full Neighbor List
§ Each pair stored twice which doubles computation but
reduces communication and doesn’t require atomic
operations for thread safety (can be faster on GPUs)
19
MPI Parallelization Approach
§ Domain decomposition: each processor owns a portion of the
simulation domain and atoms therein
proc 1 proc 2
proc 3 proc 4
20
Ghost Atoms
§ The processor domain is also extended include needed ghost
atoms (copies of atoms located on other processors)
§ Communicated via MPI (message passing interface)
proc 1
local atoms
ghost atoms
21
Load-balancing
§ Balance (static) and fix balance (dynamic) commands
§ “shift” style operates by adjusting planar cuts between
processors
§ Works well for 1D density variations
§ solid/gas or liquid/gas interfaces
§ Less well for general 2D/3D variations
22
2D and 3D Load-balancing
§ “rcb” style is a tiling method, works better for 2D and 3D
variations
23
Molecular Topology
§ Bonds: constrained length between two atoms
§ Angles: constrained angle between three atoms
§ Dihedrals: interactions between quadruplets of atoms
§ Impropers: “improper” interactions between quadruplets of
atoms
bond_style harmonic
angle_style charmm
dihedral_style charmm
improper_style harmonic
24
Long-Range Electrostatics
§ Truncation doesn’t work well for charged systems due to
long-ranged nature of Coulombic interactions
§ Use Kspace style to add long-range electrostatics. PPPM
method usually fastest, uses FFTs
§ Specify a relative accuracy (i.e. 1e-4)
§ Use pair_style *coul/long such as
lj/cut/coul/long instead of *coul/cut
§ Can vary Coulomb cutoff length and get the same answer
pair_style lj/cut/coul/long 10.0
kspace_style pppm 1e-4
25
Basic MD Timestep
§ During each timestep (without neighborlist build):
1. Initial integrate
2. MPI communication
3. Compute forces (pair, bonds, kspace, etc.)
4. Additional MPI communication (if newton flag on)
5. Final integrate
6. Output (if requested on this timestep)
*Computation of diagnostics (fixes or computes) can be
scattered throughout the timestep
26
LAMMPS Files
§ Input file: text file with LAMMPS commands used to run a
simulation
§ Log file: text file with thermodynamic output from simulation
§ Dump file: snapshot of atom properties, i.e. forces
§ Restart file: binary checkpoint file with data needed to restart
simulation
§ Data file: text file that can be used to start or restart
simulation
27
Downloading LAMMPS
§ Github (https://github.com/lammps/lammps)
§ https://github.com/lammps/lammps/releases
§ Clone or download button, then download zip file
§ git clone … (beyond this tutorial)
§ LAMMPS Website (https://lammps.org)
§ Go to “download” link
§ Download gzipped tar file
§ Stable version: rigorous testing
§ Development version: still tested but not as rigorous, latest
features, performance optimizations, and bug fixes (but could
also have new bugs)
28
Compiling LAMMPS
§ https://docs.lammps.org/Build.html
§ Need C++11 compiler (GNU, Intel, Clang)
§ Need MPI library, or can use the “STUBS” library
§ Many Makefiles in src/MAKE
§ LAMMPS also has CMake interface
29
Running LAMMPS
§ https://docs.lammps.org/Run_basics.html
§ Basic syntax: [executable] -in [input_script]
§ In serial:
./lmp_serial -in in.lj
§ In parallel:
mpirun -np 2 lmp_mpi -in in.lj
§ Many other command line options, see
https://docs.lammps.org/Run_options.html
30
Optional Packages
§ https://docs.lammps.org/Packages_list.html
§ LAMMPS is very modular and has several optional packages
§ For example, SNAP potential needs ML-SNAP package
installed
Traditional Make:
make yes-ml-snap
make no-ml-snap
CMAKE:
-D PKG_ML-SNAP=yes
31
Accelerator Packages
§ https://docs.lammps.org/Speed_packages.html
§ Some hardware components like GPUs, and multithreaded
CPUs require special code (i.e. OpenMP, CUDA) to fully take
advantage of the hardware
§ LAMMPS has 5 accelerator packages:
§ OPENMP
§ INTEL
§ OPT
§ GPU
§ KOKKOS
32
Running OPT Package
§ Compile LAMMPS with OPT package
§ Run with 8 MPI: mpiexec -np 8 ./lmp_exe -in
in.lj -sf opt
§ -sf opt is the suffix command: automatically appends
/opt onto anything it can
§ For example, pair_style lj/cut automatically
becomes pair_style lj/cut/opt (no changes to
input file needed)
§ https://docs.lammps.org/suffix.html
33
OPENMP Package
§ https://docs.lammps.org/Speed_omp.html
§ Uses OpenMP to enable multithreading on CPUs
§ MPI parallelization in LAMMPS is almost always more
effective than OpenMP on CPUs
§ When running with MPI across multi-core nodes, MPI often
suffers from communication bottlenecks, so using
MPI+OpenMP per node could be faster
§ The more nodes per job and the more cores per node, the
more pronounced the bottleneck and the larger the benefit
from MPI+OpenMP
§ OPENMP package may vectorize (SIMD) better than vanilla
code
34
Running OPENMP Package
§ Compile LAMMPS with OPENMP package
§ Run with 2 MPI and 2 OpenMP threads:
export OMP_NUM_THREADS=2
mpiexec -np 2 ./lmp_exe –in in.lj -sf omp
35
INTEL Package
§ https://docs.lammps.org/Speed_intel.html
§ Allows code to vectorize and run well on Intel CPUs (with or
without OpenMP threading)
§ Can also be used in conjunction with the OPENMP package
§ Normally best performance out of all accelerator packages for
CPUs
§ Supports reduced precision: mixed FP64+FP32 or pure single
FP32
36
Running INTEL Package
§ Compile LAMMPS with INTEL package
§ To run using 2 MPI and 2 threads on a Intel CPU:
mpiexec -np 2 ./lmp_exe -in in.lj -pk intel
0 omp 2 mode double -sf intel
§ -pk is the package command that sets package options, see
https://docs.lammps.org/package.html
37
GPU Package
§ https://docs.lammps.org/Speed_gpu.html
§ Designed for one or more GPUs coupled to many CPU cores
§ Only pair runs on GPU, fixes/bonds/computes run on CPU
§ Atom-based data (e.g. coordinates, forces) move back and
forth between the CPU(s) and GPU every timestep
§ Asynchronous force computations can be performed
simultaneously on the CPU(s) and GPU if using Kspace
§ Provides NVIDIA and more general OpenCL support
§ Supports reduced precision: mixed FP64+FP32 or pure single
FP32
38
Running GPU Package
§ Compile GPU library found in lib/gpu
§ Compile LAMMPS with GPU package
§ Run with 16 MPI and 4 GPUs: mpiexec -np 16
./lmp_exe -in in.lj -sf gpu -pk gpu 4
§ Best to use CUDA MPS (Multi-Process Service) if using
multiple MPI ranks per GPU
§ Automatically overlaps pair-style on GPU with Kspace on CPU
39
Kokkos
§ Abstraction layer between programmer and next-generation
platforms
§ Allows the same C++ code to run on multiple hardware (GPU,
CPU)
§ Kokkos consists of two main parts:
1. Parallel dispatch—threaded kernels are launched and mapped onto
backend languages such as CUDA or OpenMP
2. Kokkos views—polymorphic memory layouts that can be optimized
for a specific hardware
§ Used on top of existing MPI parallelization (MPI + X)
§ See https://kokkos.github.io/kokkos-core-wiki for more info
40
LAMMPS KOKKOS Package
§ https://docs.lammps.org/Speed_kokkos.html
§ Need C++14 compiler
§ Supports OpenMP and GPUs
§ Designed so that everything (pair, fixes, computes, etc.) runs
on the GPU, minimal data transfer from GPU to CPU
§ GPU performance penalty if kernel isn’t ported to Kokkos
§ Only double precision FP64 support
§ Package options can toggle full and half neighbor list, newton
on/off, etc.
-pk kokkos newton on neigh half
§ https://docs.lammps.org/package.html
41
Running Kokkos Package
§ Compile LAMMPS with the KOKKOS package
§ Run with 4 MPI and 4 GPUs: mpiexec -np 4
./lmp_exe -in in.lj -k on g 4 -sf kk
§ Run with 4 OpenMP threads: ./lmp_exe -in in.lj -k
on t 4 -sf kk
42
Processor and Thread Affinity
§ Use mpirun command-line arguments (e.g. --bind-to
core) to control how MPI tasks and threads are assigned to
nodes and cores
§ Also use OpenMP variables such as OMP_PROC_BIND and
OMP_PLACES
§ One must also pay attention to NUMA bindings between
tasks, cores, and GPUs. For example, for a dual-socket system,
MPI tasks driving GPUs should be on the same socket as the
GPU
43
Lennard-Jones Benchmark
§ lammps/bench/in.lj
§ Simple pair-wise model
§ Similar to argon liquid/gas
44
Measuring performance
§ For KOKKOS package on GPUs, timing breakdown won’t be
accurate without CUDA_LAUNCH_BLOCKING=1 (but will
prevent kernel overlap and could slow down simulation) 45
Performance of Different Potentials
46
Parallel MD Performance
§ MD parallelizes well: major parts of timestep (forces, neighbor list build,
time integration) can be done in parallel through domain decomposition
§ High communication overhead when strong scaling to a few 100
atoms/proc (depends on cost of the force-field)
§ Strong scaling: hold system size fixed while increasing processor count (#
of atoms/processor decreases)
§ Weak scaling: increase system size in proportion to increasing processor
count (# of atoms/processor remains constant)
§ For perfect strong scaling, doubling the processor count cuts the
simulation time in half
§ For perfect weak scaling, the simulation time stays exactly the same when
doubling the processor count
§ Harder to maintain parallel efficiency with strong scaling because the
compute time decreases relative to the communication time
47
Visualization Resources
§ LAMMPS “dump image” command:
https://docs.lammps.org/dump_image.html
§ VMD: https://www.ks.uiuc.edu/Research/vmd/
§ OVITO: https://www.ovito.org/about/ovito-pro/
48
Getting Help
§ Look at LAMMPS documentation, latest version here:
https://docs.lammps.org/Manual.html
§ Search the MatSci LAMMPS forum archives
https://matsci.org/lammps or join and post new questions
§ LAMMPS reference paper: gives an overview of the code
including its parallel algorithms, design features,
performance, and brief highlights of many of its materials
modeling capabilities
https://doi.org/10.1016/j.cpc.2021.108171
49
Thank You
Questions?
50