-
Notifications
You must be signed in to change notification settings - Fork 33
Keep local copies of files in a separate mets:FLocat #1079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # ocrd_models/ocrd_models/ocrd_file.py
|
When the files are downloaded there is an output with relative file path per line. When I did undo the downloading it returned just |
Good point, I hadn't thought about that. Should be fixed in 0f26809 For the kant_aufklaerung_1784 test asset: and the reverse: |
|
Yes, that output is more convenient. Great! |
One early decision that has haunted us for years now is that we have been using a single
mets:FLocatfor both the original URL of amets:fileand the local copy in the workspace we use for processing.This PR tries to solve #323 by changing
OcrdFileand the download logic inResolverandWorkspace:mets:FLocat[@LOCTYPE="URL"]/xlink:hrefOcrdFile.local_filename) will now be written to an additionalmets:Flocat[@LOCTYPE="OTHER"][@OTHERLOCTYPE="FILE"]/xlink:hrefWorkspace.download_fileis that after calling it withOcrdFile f,fwill have alocal_filenameattribute and that is what processors should use rather than theurl.Resolver.download_to_directoryandWorkspace.download_filehas been adapted accordingly.The goal here is to make the OCR-D processing non-invasive. Currently, once you do
ocrd workspace find --download, the original URL will be gone. With this PR,ocrd workspace find --downloadwill add an additionalmets:Flocatwhich can then be removed after processing is finished (to be compliant with the DFG Viewer METS profile) withocrd workspace find --undo-download.