I received several off-list replies to this, including a pointer to previous experiments by Data 61 colleagues that concluded LZO [0] was preferable to Gzip, due to faster compression and decompression times. For the representative heap below, LZO produces a 490M output. Compression and decompression take only a couple of seconds, compared with ~40s for compression and ~9s for decompression with Gzip. LZO also has the advantage of almost zero memory overhead for both operations. These measurements are fairly coarse grained, using the default settings of the respective command line tools but I expect them to be representative.
It looks available for most platforms, but is also ANSI C and GPLed so we could include it in-tree. I do not know what people's preferences are in this area.
From my perspective, LZO seems the clear front runner for now. Please let me know if you have an alternative suggestion or if you outright object to compressing saved states.
[0]: http://www.oberhumer.com/opensource/lzo/
On 12/02/16 09:11, Matthew Fernandez wrote:
Hi Poly folk,
I am an Isabelle/HOL user interested in reducing the on-disk size of my Isabelle/HOL heaps. These are read and written via the save state functionality in Poly/ML. Saved states compress very well via conventional algorithms. On a representative Isabelle/HOL heap I see the following:
$ ls -lh -r--r--r-- 1 matthew matthew 1.7G Feb 10 11:22 MyHeap -r--r--r-- 1 matthew matthew 279M Feb 10 11:22 MyHeap.bz2 -r--r--r-- 1 matthew matthew 331M Feb 10 11:22 MyHeap.gz -r--r--r-- 1 matthew matthew 173M Feb 10 11:22 MyHeap.xz
I would like to propose integrating compression into the save state functionality itself. The runtime of Bzip and XZ are probably slow enough to irritate some users, but Gzip is reasonably fast for the benefit it shows. Gzipping saved states would require linking against zlib or something similar.
David Matthews has suggested the save state code detect whether a state is compressed or not on loading and act accordingly, which would provide a nice transition path for users with existing saved states.
I'm interested in hearing others' opinions about this proposal. If we feel this is something reasonable to do, I'll work on implementing this. When replying to this email, please CC me directly as I'm not subscribed to the list.
Thanks, Matthew