Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kba
Copy link
Member

@kba kba commented Aug 28, 2020

  • New class OcrdMetsFilter in ocrd_models that represents restrictions on files (include/exclude by fileGrp, mimetype currently)
  • ocrd workspace clone supports
    • --fileGrp-include
    • --fileGrp-exclude
    • --mimetype-include
    • --mimetype-exclude

Proposed by @bertsky in #506

This is a very rushed implementation because we need this feature now., Implementation has been improved now.

@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2020

Codecov Report

Merging #582 into master will increase coverage by 0.66%.
The diff coverage is 99.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #582      +/-   ##
==========================================
+ Coverage   84.60%   85.27%   +0.66%     
==========================================
  Files          49       50       +1     
  Lines        2813     2933     +120     
  Branches      550      577      +27     
==========================================
+ Hits         2380     2501     +121     
  Misses        332      332              
+ Partials      101      100       -1     
Impacted Files Coverage Δ
ocrd_utils/ocrd_utils/__init__.py 100.00% <ø> (ø)
ocrd_models/ocrd_models/ocrd_mets_filter.py 97.70% <97.70%> (ø)
ocrd/ocrd/cli/workspace.py 76.13% <100.00%> (-0.34%) ⬇️
ocrd/ocrd/decorators.py 95.78% <100.00%> (+4.12%) ⬆️
ocrd/ocrd/resolver.py 96.66% <100.00%> (+0.11%) ⬆️
ocrd_models/ocrd_models/__init__.py 100.00% <100.00%> (ø)
ocrd_models/ocrd_models/ocrd_mets.py 93.14% <100.00%> (+<0.01%) ⬆️
ocrd_models/ocrd_models/ocrd_xml_base.py 93.33% <100.00%> (+2.02%) ⬆️
ocrd_utils/ocrd_utils/str.py 90.81% <100.00%> (+1.28%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8dafbac...9fb27c0. Read the comment docs.

@kba
Copy link
Member Author

kba commented Aug 29, 2020

Any preferences on the command line interface?

0

  --ID, --id PAT                  ID to include, string/regex/comma-separated
  --not-ID, --not-id PAT          ID to exclude, string/regex/comma-separated
  --mimetype PAT                  mimetype to include, string/regex/comma-separated
  --not-mimetype PAT              mimetype to exclude, string/regex/comma-separated
  --pageId, --pageid PAT          pageId to include, string/comma-separated
  --not-pageId, --not-pageid PAT  pageId to exclude, string/regex/comma-separated
  --fileGrp, --filegrp PAT        fileGrp to include, string/regex/comma-separated
  --not-fileGrp, --not-filegrp PAT
                                  fileGrp to exclude, string/regex/comma-separated

1

  --id PAT            ID to include, string/regex/comma-separated
  --not-id PAT        ID to exclude, string/regex/comma-separated
  --mimetype PAT      mimetype to include, string/regex/comma-separated
  --not-mimetype PAT  mimetype to exclude, string/regex/comma-separated
  --pageid PAT        pageId to include, string/comma-separated
  --not-pageid PAT    pageId to exclude, string/regex/comma-separated
  --filegrp PAT       fileGrp to include, string/regex/comma-separated
  --not-filegrp PAT   fileGrp to exclude, string/regex/comma-separated

2

  --id-include PAT        ID to include, string/regex/comma-separated
  --id-exclude PAT        ID to exclude, string/regex/comma-separated
  --mimetype-include PAT  mimetype to include, string/regex/comma-separated
  --mimetype-exclude PAT  mimetype to exclude, string/regex/comma-separated
  --pageid-include PAT    pageId to include, string/comma-separated
  --pageid-exclude PAT    pageId to exclude, string/regex/comma-separated
  --filegrp-include PAT   fileGrp to include, string/regex/comma-separated
  --filegrp-exclude PAT   fileGrp to exclude, string/regex/comma-separated

3

  --id PAT            ID to include, string/regex/comma-separated
  --not-ID PAT        ID to exclude, string/regex/comma-separated
  --mimetype PAT      mimetype to include, string/regex/comma-separated
  --not-mimetype PAT  mimetype to exclude, string/regex/comma-separated
  --pageid PAT        pageId to include, string/comma-separated
  --not-pageId PAT    pageId to exclude, string/regex/comma-separated
  --filegrp PAT       fileGrp to include, string/regex/comma-separated
  --not-fileGrp PAT   fileGrp to exclude, string/regex/comma-separated

kba added 8 commits August 30, 2020 00:29
OcrdMetsFilter: support regexes like OcrdMets.find_files does
OcrdMetsFilter: __str__ and test for no-arg call
OcrdMetsFilter: support mimetype, fileGrp, pageId, ID
OcrdMetsFilter: support lowercase synonyms
OcrdMetsFilter: more robust kwargs, regex matching
OcrdMetsFilter: more synonyms
ocrd workspace clone: support inclusion/exclusion in download by fileGrp, mimetype
ocrd workspace clone: Simplify filter logic, allow empty dict
@bertsky
Copy link
Collaborator

bertsky commented Aug 31, 2020

Any preferences on the command line interface?

I fail to see the difference between 1 and 3. But I would prefer the --not-* scheme over *-exclude/*-include.

What about --not (as a separate option negating the follow-up option), though?

Also, I think it would be better to use the same identifiers as the other workspace CLI commands:

  • -i | --file-id
  • -m | --mimetype
  • -g | --page-id
  • -G | --file-grp

@kba
Copy link
Member Author

kba commented Nov 23, 2023

The relevant part is filtering by file group, which has now been impemented in #1139 in a simpler way than the more generic way proposed here.

Since this is only targeting file groups and --not-file-grp/--file-grp would conflict with the regular --file-grp option, it is using -Q/--exclude-file-grps and -q/--include-file-grps.

@kba kba closed this Nov 23, 2023
@kba kba deleted the clone-filter branch November 23, 2023 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants