Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kba
Copy link
Member

@kba kba commented Sep 26, 2023

This includes #1101 and updates the default config, builtin and config file, to offer the same behavior as the hard-coded logging ocrd_network.

I did add a file handler for the processing serverlog, as it was before, so you get the idea on how to adapt the server cache logging as well.

I did not remove the hard-coded logging configuration in ocrd_network though, since @MehmedGIT & @joschrew know best what the exact behavior should be.

The easies way to fine-tune the logging setup is to copy ocrd_utils/ocrd_logging.conf to $PWD (or $HOME), then remove the coded logging setup and translate it into the config file syntax and add your changes to this PR, then we can merge it to #1083 (as this PR proposes) or into master if we decide to merge #1089 into master before finishing the logging stuff.

Ideally, we should change the root logger from ocrd_network to ocrd.network. That way, we only need one logger definition but it's not urgent.

@kba kba requested review from MehmedGIT and joschrew September 26, 2023 17:17
@kba
Copy link
Member Author

kba commented Sep 26, 2023

Test failure is due to connection issue, tests pass.

Copy link
Contributor

@MehmedGIT MehmedGIT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran a workflow to check if everything was logged properly I saw that only 2 main things were missing:

  1. The FastAPI and Uvicorn logging - already suggested a solution. - Resolved!
  2. The logging that comes from the OCR-D processors is suppressed. I am not sure about the fix. This is the logging that is redirected by the Deployer with the xyz hack to get the pid. You can check the log files with the pattern /tmp/deployed_* to see they have just 2 lines each - the initialization part.

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Sep 28, 2023

What is still missing/not working properly regarding the logging (potentially new issues to be created):

  1. The Processor Servers cannot log to a file (under /tmp/ocrd_server_{processor_name}.{pid}.log) both their and the output of the specific OCR-D Processor is missing. Note that the Processing Server which has similar configurations to the Processor Server can in fact log its output to a file (under /tmp/ocrd_processing_server.log). The separate log files for the caching mechanisms are also fine (under /tmp/ocrd_processing_server_cache_locked_pages.log and /tmp/ocrd_processing_server_cache_processing_requests.log).
  2. The Processing Workers can log their output (under /tmp/ocrd_worker_{processor_name}.{pid}.log) regarding the RabbitMQ message consumption and final job status of the triggered OCR-D processor. This works as expected considering that the output of the specific OCR-D processor should not go inside that log.
  3. The Deployer module, when deploying the Processing Workers redirects the deployed process/es stdout+stderr matching the processor_name to the /tmp/deployed_{processor_name}.log file. That log file contains both the Processing Workers output and the separate processing steps of OCR-D processors output.

Suggestions:

  • It would be easier to wrap our heads around if the run_processor and run_cli methods provided a way to redirect their stdout/stderr to a file path.
  • The run_cli method currently triggers a subprocess but only returns the return code of the finished process. If the run_cli supports an extra parameter that accepts a file path, and if set, writes the stdout+stderr of the finished process to that file path - that would be great. At least it is the simplest approach I have in mind currently. Then the Processing Worker could just pass the correct logging file path based on the processing job_id.
  • The run_processor method is a bit different. Since no subprocess is started there, potentially we would have to find a way to get the stdout+stderr of the Pythonic processors. This again could happen by supporting an extra file path parameter, and if set, that would create a logging file handler based on the provided path and duplicate the logging to that file.

@kba
Copy link
Member Author

kba commented Sep 28, 2023

  • It would be easier to wrap our heads around if the run_processor and run_cli methods provided a way to redirect their stdout/stderr to a file path.

For run_cli, it is easy to do and makes sense.

For run_processor, it's more complicated. Since these are called in the same process, redirecting STDOUT7STDERR is tricky. It is certainly possible to mess with sys.stdout but there is a mechanism specifically for this - logging.

How about we add a processor-wide log attribute that holds the logger the processor uses in process, i.e. have

def __init__(...):
  self.log = getLogger('ocrd.ocrd-tesserocr-recognize')

Once we have instantiated the processor, we can add a logging.FileHandler to processor.log which points to a provided file path.

@MehmedGIT
Copy link
Contributor

Once we have instantiated the processor, we can add a logging.FileHandler to processor.log which points to a provided file path.

Not sure if that would be enough for what we are trying to achieve. Sure adding a logging.FileHandler will capture everything related to that logger but there will be no differentiation among multiple initializations of that specific processor. 3 Processing Workers initializing the same OCR-D processor will end up logging to the same file since the logger name will be the same. This will not be the case if the logging.FileHandler is initialized inside the run_processor method. However, even then we still do not fully get what we are trying to achieve. Attaching the file handler to that specific logger will omit other relevant logging output from other loggers say ocrd.workspace. Is that what we want to achieve though?

@MehmedGIT
Copy link
Contributor

After #1109, now it is clear how we can achieve processing job level file logging.

@kba kba merged commit d253492 into workflow-endpoint Oct 11, 2023
@kba kba deleted the workflow-endpoint-logging-2023-09-26 branch October 11, 2023 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants