Description
Starting from version 2.21, git supports multi-pack-index
, which allows for O(log n) scan of n objects among m packfiles (instead of O(m log n/m)) without needing to do a garbage collection to compact all the objects into a single .pack
file. This is desirable in cases where there are lots of Delta islands or it is otherwise undesirable to merge large numbers of packfiles.
This should be relatively straightforward to implement. odb_pack.c
would need to be modified so that the multi-pack-index
's index is consulted first. Only if an OID is not found there, it should fall back to the current behavior of iterating over all the .pack
files, although packfile_load__cb()
should be modified to ignore any files that are indexed in multi-pack-index
(to avoid degenerating back into O(m log n/m) complexity).
This can be split in four chunks for easier reviewing:
- Support for parsing
multi-pack-index
files, and a fuzzer because parsing is hard.midx: Introduce a parser for multi-pack-index files #5401merged - Support for reading existing
multi-pack-index
files.midx: Support multi-pack-index files in odb_pack.c #5403merged - Support for creating a new
multi-pack-index
file from a set of.pack
files. midx: Add a way to write multi-pack-index files #5404 - Support for creating / updating a
multi-pack-index
file from an open repository. midx: Introduce git_odb_write_multi_pack_index() #5405
Relevant documentation: