[polyml] Reducing the size of on-disk saved state

16 Mar 2016


      On Mon, 15 Feb 2016, David Matthews wrote:
...
As far I'm aware Isabelle just uses the shared state facilities of 
Poly/ML. That provides the ability to save states that are the direct 
descendant of the executable or the descendant of another saved state.
I've just changed that to use PolyML.SaveState.saveChild and 
PolyML.SaveState.loadHierarchy -- see
http://isabelle.in.tum.de/repos/isabelle/rev/0189fe0f6452
...
From a quick look at the code the main effect that child states have is 
that StateLoader::LoadFile needs to seek within the saved state file to 
get the name of the parent file.  That has to be loaded before the child 
because the child may, almost certainly will, overwrite some of the 
parent data.  That may affect how you compact the data.  How well do the 
compression libraries cope with seeking within the file?
From my own point of view I'm concerned that compacting the heap files 
may add to the cost of loading.
I'd like to see what effect adding compaction has on it but it may be 
necessary to provide separate functions to save states with and without 
compaction.  Loading is easier because the loader can look at the 
format.  Note that there's no need to provide backwards compatibility 
because a saved state can only ever be reloaded into the executable that 
created it.
Overall I don't quite see a benefit to spend a lot of run-time and 
complexity of the implementation to compress a few GB. Even a small SSD 
has already 64-128 GB.
What I often see is that a 32bit poly process is unable to save a heap due 
to memory shortage. Adding compression in the same process address space 
would probably make the situation worse.
Note that for the present Isabelle setup, it is trivial to add heap 
compression outside the poly process, operating on the already saved heap 
file. What remains is the problem of loading it again without 
decompressing into a separate file.
Anyway, what is the state of more detailed measurements of the 
application?
It should be also worthwhile to step back it bit, and ask why the heaps 
are so large. There might be some Isabelle/ML programming mishaps in the 
application with persistent values that refer to Context.theory instead of 
Context.theory_id. Standard containers like the Simplifier or Classical 
context take care of that, but add-ons might not.
Makarius

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[polyml] Reducing the size of on-disk saved state