On 18/02/16 05:34, David Matthews wrote:
On 17/02/2016 01:14, Matthew Fernandez wrote:
On 16/02/16 00:12, David Matthews wrote:
From a quick look at the code the main effect that child states have
is that StateLoader::LoadFile needs to seek
within the saved state file to get the name of the parent file. That
has to be loaded before the child because the
child may, almost certainly will, overwrite some of the parent data.
That may affect how you compact the data.
How well do the compression libraries cope with seeking within the file?
Admittedly this is not something I had thought to look for, and now that I do I note there are seeks performed during state *saving* as well where Poly/ML overwrites data at the start of the stream. A cursory glance at the LZO API makes it seem like a requirement to seek the stream may well be a deal breaker... Rafal Kolanski, can you comment any more on this?
It may be possible to rework the code to avoid the seeks. Perhaps it would be easier to compact each section of the data separately rather than process the file as a whole, if that's possible. The seeks are just to move between sections.
Of more concern is that LZO is licensed under GPL rather than LGPL. Poly/ML is licensed under LGPL and that means that it cannot include or even link to LZO without coming under GPL. That doesn't preclude experimenting with it but for distribution I'd prefer a library that didn't have these problems.
David
I think the salient points at this stage are the following: 1. Poly/ML performs seeks on the save path to rewind the FD and update metadata including byte offsets of other sections in the file. Here I'm referring to SaveRequest::Perform. 2. LZO is GPL v2, while Poly/ML is LGPL v2.1. Thanks David and Rob for correcting me; I had misread the licence. 3. LZO streams do not appear to be seekable. Gzip streams seem seekable only for reading, and this is acknowledged to be slow.
The na?ve ways I can see of working around (1) are either (a) construct the entire state in memory first then stream it out to a compressed file, (b) effectively run the save state logic twice to predict the offset values in the first pass so the second pass that does the actual writing can run linearly uninterrupted or (c) write the state out then compress it to a second file and delete the first. None of these are particularly palatable to me. David, you mentioned that it might be possible to avoid the seeks. Did you have a different idea?
As you've noted, there are also seeks on the load path, but to me this is a lesser hurdle to overcome than the seeks on the save path.
As for (2), the licensing issue... this appears to be a show stopper for using LZO. As I've said, I'm not wed to any particular compression algorithm, so I'm happy to revert to Gzip or another suggestion if there's one. For my own use case, my precious resources are RAM and disk space. Runtime is not a concern to me as this operation is already dwarfed by other things on my critical path. I suspect this is not the case for others, so it may make sense to implement this as an opt-in feature. As always, any and all comments welcome.