-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Currently, dump_npz
either destroys all existing arrays in an npz file or append arrays with names that exist in the npz file as duplicated entries. This is a rather strange semantics, especially when loading individual arrays with load_npz
doesn't follow those semantics.
A reasonable semantics would be replacing existing arrays. This requires a context object, with a role similar to HighFive::File
.
NumPy doesn't support this. They only overwrite all arrays at once.
I checked out libzippp and libzip++, neither work with streams. It's primarily because dump_npy_stream
and libzip are both "push" interfaces (thus, both need to run the main loop), so they cannot work together without writing a special stream class that serves as a pipe so that libzip can pull data from it...
I think there is a rather simple way to support replacing semantics given a context object. When an npz_file
is opened for update, append as usual while keeping the central directory as a data structure in memory. When closing, write a temporary file with only the up-to-date arrays, finish the file with the central directory entry, do an atomic move to replace the old file. (It's possible to shrink a file in-place, but I guess that would invalidate too much I/O buffer).