Benchmarks¶

The benchmark results were produced by the scripts in examples/benchmarks, e.g.:

examples/benchmarks/generate_configuration.sh lennard_jones
examples/benchmarks/run_benchmark.sh lennard_jones

The Tesla GPUs had ECC enabled, no overclocking or other tweaking was done.

Simple Lennard-Jones fluid in 3 dimensions¶

Parameters:

64,000 particles, number density \(\rho = 0.4\sigma^3\)

force: lennard_jones (\(r_c = 3\sigma, r_\text{skin} = 0.7\sigma\))

integrator: verlet (NVE, \(\delta t^* = 0.002\))

Hardware	time per MD step and particle	steps per second	FP precision	compilation details
Intel Xeon E5-2637v4	750 ns	20.9	double	GCC 8.3.0, -O3
NVIDIA H100	3.4 ns	4546	double-single	CUDA 11.2, -arch compute_80
	3.3 ns	4666	single	CUDA 11.2, -arch compute_80
NVIDIA A40	3.5 ns	4436	double-single	CUDA 11.5, -arch compute_80
	2.9 ns	5412	single	CUDA 11.5, -arch compute_80
NVIDIA A100	4.6 ns	3371	double-single	CUDA 11.5, -arch compute_80
	3.9 ns	4028	single	CUDA 11.5, -arch compute_80
NVIDIA Tesla V100-PCI	5.6 ns	2790	double-single	CUDA 9.2, -arch compute_61
	5.1 ns	3020	single	CUDA 9.2, -arch compute_61
NVIDIA GeForce RTX 2080 S	5.9 ns	2640	double-single	CUDA 9.2, -arch compute_61
	5.6 ns	2810	single	CUDA 9.2, -arch compute_61
NVIDIA GeForce RTX 2070	7.7 ns	2030	double-single	CUDA 9.2, -arch compute_61
	7.0 ns	2220	single	CUDA 9.2, -arch compute_61

Results were obtained from 1 independent measurement based on release version 1.0.0. Each run consisted of NVT equilibration at \(T^*=1.2\) over \(\Delta t^*=100\) (10⁴ steps), followed by benchmarking 10⁴ NVE steps 5 times steps in a row.

Supercooled binary mixture (Kob-Andersen)¶

Parameters:

256,000 particles, number density \(\rho = 1.2\sigma^3\)

force: lennard_jones with 2 particle species (80% \(A\), 20% \(B\))

(\(\epsilon_{AA}=1\), \(\epsilon_{AB}=1.5\), \(\epsilon_{BB}=.5\), \(\sigma_{AA}=1\), \(\sigma_{AB}=.8\), \(\sigma_{BB}=.88\), \(r_c = 2.5\sigma\), \(r_\text{skin} = 0.3\sigma\), neighbour list occupancy: 70%)

integrator: verlet (NVE, \(\delta t^* = 0.001\))

Hardware	time per MD step and particle	steps per second	FP precision	compilation details
Intel Xeon E5-2637v4	744 ns	5.25	double	GCC 10.2.1, -O3
NVIDIA H100	1.79 ns	2178	double-single	CUDA 11.2, -arch compute_80
	1.74 ns	2238	single	CUDA 11.2, -arch compute_80
NVIDIA A40	2.56 ns	1528	double-single	CUDA 11.5, -arch compute_80
	2.26 ns	1728	single	CUDA 11.5, -arch compute_80
NVIDIA A100	3.10 ns	1260	double-single	CUDA 11.5, -arch compute_80
	2.91 ns	1343	single	CUDA 11.5, -arch compute_80
NVIDIA Tesla V100-PCI	3.83 ns	1020	double-single	CUDA 9.2, -arch compute_61
	3.65 ns	1070	single	CUDA 9.2, -arch compute_61
NVIDIA GeForce RTX 2080 S	5.17 ns	755	double-single	CUDA 9.2, -arch compute_61
	4.87 ns	802	single	CUDA 9.2, -arch compute_61
NVIDIA GeForce RTX 2070	6.63 ns	589	double-single	CUDA 9.2, -arch compute_61
	6.28 ns	621	single	CUDA 9.2, -arch compute_61

Results were obtained from 1 independent measurement and are based on release version 1.0.0. Each run consisted of NVT equilibration at \(T^*=0.7\) over \(\Delta t^*=100\) (2×10⁴ steps), followed by benchmarking 10⁴ NVE steps 5 times in a row.

Variant “tiny”¶

This benchmark tests an alternative implementation of the calculation of pair forces, using loop-unrolling. It is particularly suited for systems with small particle number.

Parameters:

4,096 particles, all other parameters are as above

neighbour lists are constructed directly, without binning to Verlet cells

re-ordering of particle data in memory is disabled

double-single floating point precision is enabled

Hardware	time per MD step and particle	steps per second	unroll force loop	compilation details
NVIDIA H100	23.6 ns	10319	yes	CUDA 11.2, -arch compute_80
	43.7 ns	5578	no	CUDA 11.2, -arch compute_80
NVIDIA A40	17.9 ns	13638	yes	CUDA 11.5, -arch compute_80
	30.5 ns	8013	no	CUDA 11.5, -arch compute_80
NVIDIA A100	22.6 ns	10823	yes	CUDA 11.5, -arch compute_80
	38.8 ns	6285	no	CUDA 11.5, -arch compute_80

Results were obtained from 1 independent measurement and are based on the pre-release version 1.0.0-67-g24afb4c68. Each run consisted of NVT equilibration at \(T^*=0.7\) over \(\Delta t^*=100\) (2×10⁴ steps), followed by benchmarking 10⁴ NVE steps 5 times in a row.

Benchmarks¶

Simple Lennard-Jones fluid in 3 dimensions¶

Supercooled binary mixture (Kob-Andersen)¶

Variant “tiny”¶

Table of Contents

Previous topic

Next topic

Search

Benchmarks¶

Simple Lennard-Jones fluid in 3 dimensions¶

Supercooled binary mixture (Kob-Andersen)¶

Variant “tiny”¶

Table of Contents

Previous topic

Next topic

Quick search

Search