On Mon, 15 Feb 2016, David Matthews wrote:
As far I'm aware Isabelle just uses the shared state facilities of Poly/ML. That provides the ability to save states that are the direct descendant of the executable or the descendant of another saved state.
I've just changed that to use PolyML.SaveState.saveChild and PolyML.SaveState.loadHierarchy -- see http://isabelle.in.tum.de/repos/isabelle/rev/0189fe0f6452
From a quick look at the code the main effect that child states have is that StateLoader::LoadFile needs to seek within the saved state file to get the name of the parent file. That has to be loaded before the child because the child may, almost certainly will, overwrite some of the parent data. That may affect how you compact the data. How well do the compression libraries cope with seeking within the file?
From my own point of view I'm concerned that compacting the heap files may add to the cost of loading.
I'd like to see what effect adding compaction has on it but it may be necessary to provide separate functions to save states with and without compaction. Loading is easier because the loader can look at the format. Note that there's no need to provide backwards compatibility because a saved state can only ever be reloaded into the executable that created it.
Overall I don't quite see a benefit to spend a lot of run-time and complexity of the implementation to compress a few GB. Even a small SSD has already 64-128 GB.
What I often see is that a 32bit poly process is unable to save a heap due to memory shortage. Adding compression in the same process address space would probably make the situation worse.
Note that for the present Isabelle setup, it is trivial to add heap compression outside the poly process, operating on the already saved heap file. What remains is the problem of loading it again without decompressing into a separate file.
Anyway, what is the state of more detailed measurements of the application?
It should be also worthwhile to step back it bit, and ask why the heaps are so large. There might be some Isabelle/ML programming mishaps in the application with persistent values that refer to Context.theory instead of Context.theory_id. Standard containers like the Simplifier or Classical context take care of that, but add-ons might not.
Makarius