Heirarchical data format (HDF5)
HDF5 is a file format designed to store and organize large amounts of numerical data.
HDF5: API Specification
- binary (so efficient)
- handles multidimensional data
- has a wide range of types, include complex and double precision
- deals with endian-issues
- stores data hierarchically, datasets (ie files) a kept in a tree of groups (ie directories)
- stores metadata as attributes, attached to groups or to datasets.
HDF5 can be read and written by:
- Mathematica (see below)
- Python (h5py), also PyTables (see below)
- Labview - but see below.
- Quite a lot of other languages which we don't use in the lab: perl, IDL,
Labview and HDF5
Labview has in the past used HDF5 as an internal format, but only written a limited and specific subset of it. General Labview support seems to be some time away. NI seems committed to their TDMS file format (proprietary, but well documented, doesn't support 2D arrays etc). This lavag post
seems to indicate NI tried HDF5 and found performance issues, which are slightly worrying, but we are not likely to need to append >100 separate datastreams. It also discusses their commitment to TDMS and extending TDMS. This is disappointing given HDF5 and other open standards like XSIL.
The best available Labview library seems to be LVHDF5
- based on HDF5 1.6.5.
There is mailing list evidence that Tomi Maila is developing a library based on HDF5 1.8, which is more recent, but doesn't seem to be publicly released.
Some limitations need to be investigated. In particular:
: "Only conversion of 1-D LabVIEW arrays is supported. Note that datasets may still be of higher dimensionality. Array datatypes are typically found only if contained by a cluster"
Not sure what this means - need to try it and see. It is of course always possible to flatten 2D to 1D and store, but highly undesirable.
Biggest problem appears to be very slow data conversion using strings
Directly calling the HDF5 DLL from LabVIEW
Is now a development project documented at HDF5 Direct To LabVIEW
GUIs to work with HDF5 files
There are also some nice GUI explorers such as ViTables
(in python), HDF Explorer
(windows only) and HDFView
H5LT: Lightweight HDF5 interface
This lightweight C wrapper
looks promising if we need to roll our own interface. There is an H5LT tutorial
. It is particularly attractive because we don't have to fuss with #defines. A good example is that rather than saying:
H5LTread_dataset (file_id, dset_name, H5T_NATIVE_INT, data);
which relies on H5T_NATIVE_INT being set somewhere, likely in a header file that Labview doesn't know about, we can just as well say:
H5LTread_dataset_int (file_id, dset_name, data);
which should be trivially callable from labview. Similary, creating a (possibly multidimensional) dataset is as easy as:
H5LTmake_dataset_int (file_id, DSET3_NAME, rank, dims, data_int_in);
Note that DSET3_NAME is not a #define, it's a string constant.
There's also H5LTdtype_to_text
which cheerfully converts opaque datatype enums to text strings, which something like labview can then handle in a fairly platform and library-revision independent way.
As an example of how relatively easy this makes things, the code below:
- creates a new HDF file
- writes in a 3x2 matrix containing numbers 1,2,3,4,5,6
- closes the file
This takes three lines of actual code, as you'd hope!
#define RANK 2
void main( void )
file_id = H5Fcreate ("ex_lite1.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); // create a HDF5 file
status = H5LTmake_dataset_int(file_id, "/dset", RANK, dims, data); // create and write an integer type dataset named "dset"
status = H5Fclose (file_id); // close file
hdf5dll.dll with MingW32
This is how to compile the above code with the MingW32-tdm compiler (see for example, [Building DLLs for LabVIEW
for how this compiler is installed).
Add the line
s to stop
arguing with the HDF5 includes about what
Then compile with
gcc -c ex_lite1.c -I"c:\Program Files\HDF5 1.8.6\include"
and link with
gcc -o ex_lite1.exe ex_lite1.o -L"c:\Program Files\HDF5 1.8.6\bin" -lhdf5dll -lhdf5_hldll
Info: resolving _H5T_NATIVE_INT_g by linking to __imp__H5T_NATIVE_INT_g (auto-import)
auto-importing has been activated without --enable-auto-import specified on the
This should work unless it involves constant data structures referencing symbols
from auto-imported DLLs.
This whinging from the linker seems harmless, but I suppose would be nice to know exactly what is going on. The obvious settings of C_INCLUDE_PATH don't seem to let us get rid of the
, nor does LIBRARY_PATH. Perhaps this is spaces in the filenames? Slashes the wrong way around? Meh, can fix with a Makefile if necessary.
, which cheerfully produces the example h5 file OK. Good.
A direct Python implementation of the C API, appropriately objecty is h5py
. There's no 64-bit version on the h5py project site
, but you can get one here
if necessary. !
PyTables is a different approach, see below.
Tabular data with columns of differing types can be stored in HDF5. You construct a compound type
(a struct in C), and then make an array of them. The struct variables are columns, the array member structs are rows. This is basically analogous to a single table in a relational database. Clearly, these tables are useful for storing multi-channel timeseries data amongst other things.
While tables can be created out of low level HDF5 library calls, this is tedious and various libraries have evolved. The official HDF5 H5TABLES
interface is one. Pytables
may be a much easier solution for getting tabular data into and out of HDF5. It's very object oriented, and massively faster than we'll need.
Interestingly, unlike in an RDBMS, a column can contain not just atomic types like strings and numbers, but arrays or even other tables. This is the idea of a hierarchical table system. So arguably, multiple BECs in a single HDF5 (ie multiple shots with the same parameters) should be rows in a table, with the BEC images being columns, and everything else being rows in the table too. Hmm. Using paths is requires lexical names like "/bec1/images/absorption". OTOH, other tools accessing hdf5 will likely cope much better if the tables are fairly flat.
The underlying files are still HDF5, and I don't think
it makes much use of metadata for its own purposes. So it shouldn't be hard to have Labview write in this format - well, it shouldn't be harder
than having Labview write anything else in HDF5.
Very encouragingly, the detailed PyTables manual
has this to say about interoperability with generic HDF:
!PyTables can access a wide range of objects in generic HDF5 files, like compound type datasets (that can be mapped to Table objects), homogeneous datasets (that can be mapped to
Array objects) or variable length record datasets (that can be mapped to VLArray objects)._ Besides, if a dataset is not supported, it will be mapped to a special UnImplemented class (see Section 4.14), that will let the user see that the data is there, although it will be unreachable (still, you will be able to access the attributes and some metadata in the dataset). With that, PyTables probably can access and modify most of the HDF5 files out there.
is a GUI for inspecting HDF5 files in general, particular aimed at fast access to large tabular data in PyTables format.
Installing it is slightly annoying. You need:
- Fairly recent Python, including numpy > 1.4.0. I used EPD version 7.0.1.
- PyQt4. Make sure you get the one for your version of Python! EPD-7 came with Python 2.7, so I got this one
The binary download of ViTables didn't work, so I built it from the repository. Doing
hg clone http://hg.berlios.de/repos/vitables vitables_tip
gets the code, and then the usual
python setup.py install
seemed to do the trick. It didn't like starting from cygwin, but was fine running from a
-- Main.LincolnTurner - 18 Feb 2011
Mathematica and HDF5
Mathematica speaks HDF5
but compund data structures are not supported (they are ignored by
A basic package to read H5Tables in Mathematica
is now more-or-less working. -- Main.LincolnTurner - 19 Mar 2011
Mathematica calls HDF5.exe in
<Mathematica install directory>\SystemFiles\Converters\Binaries\
which uses version 1.6.5 of the HDF5 library (in version 8 of Mathematica, at least). A very limited subset of the HDF5 functionality is exposed, in addition to the above problems.
-- Main.LincolnTurner - 07 Mar 2011