Benchmarks

The benchmark results were produced by the scripts in examples/benchmarks, e.g.:

examples/benchmarks/generate_configuration.sh lennard_jones
examples/benchmarks/run_benchmark.sh lennard_jones

The Tesla GPUs had ECC enabled, no overclocking or other tweaking was done.

Simple Lennard-Jones fluid in 3 dimensions

Parameters:

  • 64,000 particles, number density \rho = 0.4\sigma^3
  • force: lennard_jones (r_c = 3\sigma, r_\text{skin} = 0.7\sigma)
  • integrator: verlet (NVE, \delta t^* = 0.002)
Hardware time per MD step and particle steps per second FP precision compilation details
Intel Xeon E5-2637v4 750 ns 20.9 double GCC 8.3.0, -O3
NVIDIA Tesla V100-PCI 5.6 ns 2790 double-single CUDA 9.2, -arch compute_61
  5.1 ns 3020 single CUDA 9.2, -arch compute_61
NVIDIA GeForce RTX 2080 S 5.9 ns 2640 double-single CUDA 9.2, -arch compute_61
  5.6 ns 2810 single CUDA 9.2, -arch compute_61
NVIDIA GeForce RTX 2070 7.7 ns 2030 double-single CUDA 9.2, -arch compute_61
  7.0 ns 2220 single CUDA 9.2, -arch compute_61
NVIDIA A40 3.5 ns 4436 double-single CUDA 11.5, -arch compute_80
  2.9 ns 5412 single CUDA 11.5, -arch compute_80
NVIDIA A100 4.6 ns 3371 double-single CUDA 11.5, -arch compute_80
  3.9 ns 4028 single CUDA 11.5, -arch compute_80

Results were obtained from 1 independent measurement based on release version 1.0.0. Each run consisted of NVT equilibration at T^*=1.2 over \Delta t^*=100 (10⁴ steps), followed by benchmarking 10⁴ NVE 5 times steps in a row.

Supercooled binary mixture (Kob-Andersen)

Parameters:

  • 256,000 particles, number density \rho = 1.2\sigma^3

  • force: lennard_jones with 2 particle species (80% A, 20% B)

    (\epsilon_{AA}=1, \epsilon_{AB}=1.5, \epsilon_{BB}=.5, \sigma_{AA}=1, \sigma_{AB}=.8, \sigma_{BB}=.88, r_c = 2.5\sigma, r_\text{skin} = 0.3\sigma, neighbour list occupancy: 70%)

  • integrator: verlet (NVE, \delta t^* = 0.001)

Hardware time per MD step and particle steps per second FP precision compilation details
Intel Xeon E5-2637v4 744 ns 5.25 double GCC 10.2.1, -O3
NVIDIA Tesla V100-PCI 3.83 ns 1020 double-single CUDA 9.2, -arch compute_61
  3.65 ns 1070 single CUDA 9.2, -arch compute_61
NVIDIA GeForce RTX 2080 S 5.17 ns 755 double-single CUDA 9.2, -arch compute_61
  4.87 ns 802 single CUDA 9.2, -arch compute_61
NVIDIA GeForce RTX 2070 6.63 ns 589 double-single CUDA 9.2, -arch compute_61
  6.28 ns 621 single CUDA 9.2, -arch compute_61
NVIDIA A40 2.56 ns 1528 double-single CUDA 11.5, -arch compute_80
  2.26 ns 1728 single CUDA 11.5, -arch compute_80
NVIDIA A100 3.10 ns 1260 double-single CUDA 11.5, -arch compute_80
  2.91 ns 1343 single CUDA 11.5, -arch compute_80

Results were obtained from 1 independent measurement and are based on release version 1.0.0. Each run consisted of NVT equilibration at T^*=0.7 over \Delta t^*=100 (2×10⁴ steps), followed by benchmarking 10⁴ NVE steps 5 times in a row.

Variant “tiny”

This benchmark tests an alternative implementation of the calculation of pair forces, using loop-unrolling. It is particularly suited for systems with small particle number.

Parameters:

  • 4,096 particles, all other parameters are as above
  • neighbour lists are constructed directly, without binning to Verlet cells
  • re-ordering of particle data in memory is disabled
  • double-single floating point precision is enabled
Hardware time per MD step and particle steps per second unroll force loop compilation details
NVIDIA A40 17.9 ns 13638 true CUDA 11.5, -arch compute_80
  30.5 ns 8013 false CUDA 11.5, -arch compute_80
NVIDIA A100 22.6 ns 10823 true CUDA 11.5, -arch compute_80
  38.8 ns 6285 false CUDA 11.5, -arch compute_80

Results were obtained from 1 independent measurement and are based on the pre-release version 1.0.0-67-g24afb4c68. Each run consisted of NVT equilibration at T^*=0.7 over \Delta t^*=100 (2×10⁴ steps), followed by benchmarking 10⁴ NVE steps 5 times in a row.