ocrd workspace list-page: implement partioning, #1140 #1141
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This extends the functionality of
ocrd workspace list-pagewith a number of options to support different output formats, partitioning the list of pageIds into roughly equally distributed chunks and supporting both pageId and numerical ranges.E.g. for a workspace with non-contiguous pageIds
PHYS_0001..PHYS_0006,PHYS_0008..PHYS_0009..PHYS_0021,PHYS_0023..PHYS_0029(i.e. PHYS_007 and PHYS_0021 missing, cf. test workspace in the PR).This uses
numpy.array_splitto do the chunking.It's inefficient at the moment, using
find_filesand then sorting and I cannot guarantee off-by-one errors with the indexing, but if the general behavior is what has been wished for, then I can optimize it properly.