H5MD data files

Why H5MD?

The H5MD file format presents a unique standard to store data for and from molecular simulations along with derived quantities such as physical observables.

H5MD builds on the technology of the “Hierarchical Data Format 5” (HDF5) , which is a well established scientific file format, with bindings for C, C++, Fortran, Python and support by Matlab, Mathematica, … An excellent overview is found in the documentation of the project HDF5 for Python.

Note

The output files of HAL’s MD package comply with H5MD version 1.0, published in

P. de Buyl, P. H. Colberg, and F. Höfling, H5MD: a structured, efficient, and portable file format for molecular data, Comput. Phys. Commun. 185, 1546 (2014), [arXiv:1308.6382]

Working with HDF5 files

Using the h5ls/h5dump tools

For a quick analysis of HDF5 data files, use the h5ls tool (bundled with the HDF5 library):

h5ls -v file.h5

Alternatively, the structure of a file may be inspected with the h5dump tool:

h5dump -A file.h5

The contents of individual groups or datasets may be displayed either by

h5ls -v file.h5/path/to/group
h5ls -d file.h5/path/to/dataset

or this way:

h5dump -g /path/to/group file.h5
h5dump -d /path/to/dataset file.h5

Using Python and h5py

h5py is a Python module wrapping the HDF5 library. It is based on NumPy, which implements a MATLAB-like interface to multi-dimensional arrays. This is where the H5MD format reveals its true strength, as NumPy allows arbitrary transformations of HDF5 datasets, all while using a real programming language.

As a simple example, we open an H5MD file and print a dataset:

import h5py
f = h5py.File("file.h5", "r")
d = f["path/to/dataset"]
print d
print d[0:5]
f.close()

Attributes may be read with the attrs class member:

print f["h5md"].attrs["version"]
if "observables" in f.keys():
    print f["observables"].attrs["dimension"]

For further information, refer to the Numpy and Scipy Documentation and the HDF5 for Python Documentation.