Danny Price recently left a comment to let me know about a new Python package he’s developing called hickle. The goal of “hickle” is to create a module that works like Python’s pickle module but stores its data in the HDF5 binary file format. This is a promising approach, because I advocate storing binary data in HDF5 files whenever possible instead of creating yet another one-off binary file format that nobody will be able to read in ten years. The immediate advantage of using HDF5 to store picked Python objects is that HDF5 files are portable across many platforms, while “pickled” objects may not be readable on a different platform.
The hickle developers have made a good start, and they have a long way to go before hickle will be useful to a wider audience. Right now, hickle can only store NumPy ndarrays and Python list objects. If you only need to store lists and arrays, you might as well use HDF5 bindings for Python such as PyTables or h5py. The power of the pickle module is that you can immediately serialize almost any Python object of arbitrary complexity, store it on disk, and retrieve it. hickle will only achieve its full potential once it replicates this functionality, and I’m not sure how difficult this will be. Ideally, you might be able to derive a class from Pickler that uses Picker’s methods to serialize an object, and then add your own method to write the serialized object to an HDF5 file.
In a future post, I’ll describe some of the practical problems with using pickle files to store data, and try to organize some thoughts about how they might be solved.