Test Descriptions and Options
All command-line options mentioned in the test descriptions are applicable to the ClusterKit binary (see Running ClusterKit).
Bandwidth Test (-d bw)
The bandwidth test utilizes nonblocking MPI_Isend and MPI_Irecv calls.
Options:
Iterations:
-b<iters>,--biters=<iters>(Default: 16)Message Size:
-B<size>,--bsize=<size>(Default: 32 MB)Unidirectional:
-U,--unidirectional(send data in one direction only; default is bidirectional)Tolerance:
-u <tol>,--btol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)
Latency Test (-d lat)
The latency test is performed with a series of MPI_Send and MPI_Recv calls, where one partner sends a message to the other, which then sends a message back. This process is repeated <iters> times.
Options:
Iterations:
-l<iters>,--liters=<iters>(Default: 1024)Message Size:
-L<size>,--lsize=<size>(Default: 0 Bytes)Tolerance:
-t <tol>,--ltol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)
GPU-GPU Latency Test (-d gpu_gpu_lat)
Measures latency of GPU-to-GPU communication with MPI_ISend and MPI_IRecv.
Options:
Iterations:
-k,--gpulati=<iters>(Default: 1024)Message Size:
-K,--gpulats=<size>(Default: 0 Bytes)Tolerance:
-t <tol>,--ltol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)Per-GPU test:
-z,--bygpu(test corresponding GPU pairs: GPU0-to-GPU0, GPU1-to-GPU1, etc.)Use GPUDIRECT:
-G,--gpudirect(use GPUDIRECT; default is to copy from GPU memory to host)
GPU-GPU Bandwidth Test (-d gpu_gpu_bw)
Measures bandwidth of GPU-to-GPU communication with MPI_ISend and MPI_IRecv.
Options:
Iterations:
-a,--gpubwi=<iters>(Default: 64)Message Size:
-A,--gpubws=<size>(Default: 1 MB)Tolerance:
-u <tol>,--btol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)Per-GPU test:
-z,--bygpu(test corresponding GPU pairs from different nodes: GPU0-to-GPU0, GPU1-to-GPU1, etc.)Use GPUDIRECT:
-G,--gpudirect(use GPUDIRECT; default is to copy from GPU memory to host)
NCCL GPU-GPU Bandwidth Test (-d nccl_bw)
Measures bandwidth of GPU-to-GPU communication with NCCL communications primitives.
Options:
Iterations:
-a,--gpubwi=<iters>(default: 64)Message Size:
-A,--gpubws=<size>(default: 1 MB)Tolerance:
-u <tol>,--btol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)
NCCL GPU-GPU Latency Test (-d gpu_gpu_lat)
Measures latency of GPU-to-GPU communication with NCCL communications primitives.
Options:
Iterations:
-k,--gpulati=<iters>(default: 1024)Message Size:
-K,--gpulats=<size>(default: 0 Bytes)
Collective Tests
Collective tests perform selected collective operations across all nodes in a defined scope.
Types of tests:
barrier
Allreduce
bcast
Alltoall (set as an argument to -d option)
Options:
Iterations:
-n,--niter=<iters>(default: 10000)
NCCL Collective Tests
Performs NCCL collective operations among nodes in the same scope.
Types of Tests:
nccl_bcastnccl_allreducenccl_reducenccl_allgathernccl_reducescatter
Options:
Iterations:
-n,--niter=<iters>(default: 10,000)
Bisectional Bandwidth Test (-d bisect_bw)
Measures bisectional bandwidth by enabling communication between corresponding nodes in different scopes, assessing potential interference.
Options:
Iterations:
-b<iters>,--biters=<iters>(default: 16)Message Size:
-B<size>,--bsize=<size>(default: 32 MB)Unidirectional:
-U,--unidirectional(sends data in one direction only)Scope Order:
--scope_order=<scope_order>(sets order of scopes for testing)
Scope Order File Format: The file consists of lines formatted as follows:
<pass_num>,<scope1>,<scope2>
Example:
1,scope01,scope02 1,scope03,scope04 2,scope02,scope03 3,scope01,scope04 3,scope02,scope03
This instructs ClusterKit to execute 3 passes, testing specified connections.
Memory Bandwidth Test (-d mb)
The memory bandwidth test can be conducted with one of the following operations:
ADD:
a[i] = b[i] + c[i]COPY:
a[i] = b[i]SCALE:
a[i] = D * b[i]TRIAD:
a[i] = b[i] + D * c[i]
Options:
Iterations:
-I <iters>,--mbiters=<iters>(default: 16)Array Size:
-I <size>,--mbsize=<size>(default: 4 * L3 cache size)Test Type:
-m <type>,--memtest=add|copy|scale|triad(default: TRIAD)
Effective Bandwidth Ordered Test (-d beff_o)
Rings of doubling size are formed, starting at 2, and messages are passed in one direction based on rank ordering.
Options:
Iterations:
-e,--beffi=<iters>(default: 512)Message Size:
-E,--beffs=<size>(default: 32 MB)Tolerance:
-u <tol>,--btol=<tol>(specify tolerance. Nodes showing results worse than max * tolerance will be considered ‘bad’)
Effective Bandwidth Random Test (-d beff_or)
Similar to the ordered test, but rings are created randomly.
Options:
Iterations:
-e,--beffi=<iters>(default: 512)Message Size:
-E,--beffs=<size>(default: 32 MB)Tolerance:
-u <tol>,--btol=<tol>(specify tolerance. Nodes showing results worse than max * tolerance will be considered ‘bad’)
GPU Memory Bandwidth Test (-d gpumb)
Measures bandwidth for host-to-GPU and GPU-to-host memory transfers.
Options:
Iterations:
-j,--gpumbi=<iters>(default: 16)Message Size:
-J,--gpumbs=<size>(default: 0 bytes)Tolerance:
-u <tol>,--btol=<tol>(specify tolerance. Nodes showing results worse than max * tolerance will be considered ‘bad’)
GPU Neighbor Latency Test (-d gpu_neighbor_lat)
A restricted variant of the GPU-GPU latency test that measures communication only between GPUs on neighboring nodes.
Options:
Iterations:
-k,--gpulati=<iters>(default: 1024)Message Size:
-K,--gpulats=<size>(default: 0 bytes)Use GPUDIRECT:
-G,--gpudirect(use GPUDIRECT - default is to copy from GPU memory to host)
GPU Neighbor Bandwidth Test (-d gpu_neighbor_bw)
A restricted variant of the GPU-GPU bandwidth test that measures communication only between GPUs on neighboring nodes.
Options:
Iterations:
-a,--gpubwi=<iters>(default: 64)Message Size:
-A,--gpubws=<size>(default: 1 MB)Use GPUDIRECT:
-G,--gpudirect(use GPUDIRECT - default is to copy from GPU memory to host)