Multi-GPU machines¶
To distribute multiple HALMD processes among CUDA devices in a single machine, the CUDA devices have to be locked exclusively by the respective process. HALMD will then choose the first available CUDA device.
nvidia-smi tool¶
If your NVIDIA driver version comes with the nvidia-smi tool, set all CUDA devices to compute exclusive mode to restrict use to one process per device:
sudo nvidia-smi --gpu=0 --compute-mode-rules=1
sudo nvidia-smi --gpu=1 --compute-mode-rules=1
nvlock tool¶
Warning
The mechanism used by nvlock
to make certain GPUs invisible to the
NVIDIA driver does no longer work with recent drivers, e.g., later than
version 346. Please use the environment variable CUDA_VISIBLE_DEVICES
instead. (Caveat: with this, the reported GPU IDs are re-enumerated, always
starting with 0.)
If your NVIDIA driver version does not support the nvidia-smi tool, or if you
wish not to set the devices to compute exclusive mode, the nvlock
tool
may be used to exclusively assign a GPU to each process:
nvlock halmd [...]
You may also directly use the preload library:
LD_PRELOAD=libnvlock.so halmd [...]
nvlock is available at
https://github.com/halmd-org/nvcuda-tools
and is compiled with
cmake .
make