Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@MehmedGIT
Copy link
Contributor

@MehmedGIT MehmedGIT commented Oct 19, 2023

This PR fixes the logging errors of type:

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.7/logging/__init__.py", line 1028, in emit
    stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.

EDIT: Forcing InitLogging() just for Python processors but not for bashlib processors was the simple fix.

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why the STDERR/STDOUT redirection context manager does not work for the run_cli method as well?

But if it fixes the problem, then let's run with it.

What do you mean with the bulk-add issue? Do you mean the --mets-server-url argument for the bashlib processors?

@MehmedGIT
Copy link
Contributor Author

Any idea why the STDERR/STDOUT redirection context manager does not work for the run_cli method as well?

Because run_processor is not creating a separate process like run_cli. In the case of the run_processor method, we used the Python object to get the processor, but for bashlib processors, we have to redirect the output of the newly created process with each call.

What do you mean with the bulk-add issue? Do you mean the --mets-server-url argument for the bashlib processors?

There was the ocrd workspace ... command in bashlib processors that had to accept --mets-server-url for the bulk-add to work. That was discussed in our VC today.

@MehmedGIT
Copy link
Contributor Author

MehmedGIT commented Oct 19, 2023

Any idea why the STDERR/STDOUT redirection context manager does not work for the run_cli method as well?

Maybe there is a way to make it work without the extra redirection to file inside the run_cli but did not find a working solution.

EDIT: There is a way indeed. The problem was that the InitLogging() was called in the wrong place.

@MehmedGIT
Copy link
Contributor Author

olena-binarize runs. The changes are reflected to the mets file locally but not to the mets server. Hence, the next processor fails to find the required data:

Traceback (most recent call last):
  File "/home/mm/Desktop/core/ocrd_network/ocrd_network/process_helpers.py", line 45, in invoke_processor
    log_level='DEBUG'
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 134, in run_processor
    raise err
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 131, in run_processor
    processor.process()
  File "/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 763, in process
    for (n, input_file) in enumerate(self.input_files):
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/base.py", line 291, in input_files
    ret = self.zip_input_files(mimetype=None, on_error='abort')
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/base.py", line 359, in zip_input_files
    raise ValueError(msg)
ValueError: Could not find any files for --page-id PHYS_0007 - compare 'PHYS_0007' with the output of 'orcd workspace list-page'.

@kba, do you have ideas?

@MehmedGIT
Copy link
Contributor Author

MehmedGIT commented Oct 19, 2023

I think this is the reason, as we discussed in our VC.
OCR-D/ocrd_olena#94

It is not urgent to make bashlib processors work, however, we may want to keep a separate issue for that and merge this PR.

@kba
Copy link
Member

kba commented Oct 19, 2023

olena-binarize runs. The changes are reflected to the mets file locally but not to the mets server. Hence, the next processor fails to find the required data:

Traceback (most recent call last):
  File "/home/mm/Desktop/core/ocrd_network/ocrd_network/process_helpers.py", line 45, in invoke_processor
    log_level='DEBUG'
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 134, in run_processor
    raise err
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 131, in run_processor
    processor.process()
  File "/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 763, in process
    for (n, input_file) in enumerate(self.input_files):
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/base.py", line 291, in input_files
    ret = self.zip_input_files(mimetype=None, on_error='abort')
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/base.py", line 359, in zip_input_files
    raise ValueError(msg)
ValueError: Could not find any files for --page-id PHYS_0007 - compare 'PHYS_0007' with the output of 'orcd workspace list-page'.

@kba, do you have ideas?

If ocrd-olena-binarize is not using the METS server but writing to the METS file directly, then the OcrdMets of METS server is out of date. AFAICT we don't currently have an endpoint to do workspace.reload_mets().

So we either add such a POST /reload endpoint to reload the METS in the METS server or we add support for --mets-server-url to ocrd-olena-binarize. Idealy we should do both.

Base automatically changed from fix-bashlib-log to master October 19, 2023 12:15
@kba kba merged commit cedb0c4 into master Oct 19, 2023
@kba kba deleted the fix-bashlib-log2 branch October 19, 2023 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants