Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kba
Copy link
Member

@kba kba commented Jan 4, 2024

No description provided.

@kba kba requested review from MehmedGIT and bertsky January 4, 2024 14:42
Copy link
Contributor

@MehmedGIT MehmedGIT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the suggested change, the processing job output is unaffected (good!) since we can separately set the log level (currently only programmatically - it is set to DEBUG by default):

09:52:28.340 DEBUG ocrd.processor.helpers.run_processor - Running processor <class 'ocrd_cis.ocropy.binarize.OcropyBinarize'>
09:52:28.341 DEBUG ocrd.processor.helpers.run_processor - Processor instance <ocrd_cis.ocropy.binarize.OcropyBinarize object at 0x7fb50053d4c0> (ocrd-cis-ocropy-binarize v0.1.5 doing preprocessing/optimization/binarization)
09:52:28.341 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_home_mm_repos_ocrd_network_tests_ws29_data_mets_xml.sock] - find_files({'mimetype': None, 'page_id': 'PHYS_0007', 'file_grp': 'DEFAULT'})
09:52:28.342 DEBUG ocrd.processor.base - adding file FILE_0007_DEFAULT for page PHYS_0007 to input file group DEFAULT
09:52:28.343 INFO ocrd.workspace.download_file - 'local_filename' DEFAULT/FILE_0007_DEFAULT.jpg already within /home/mm/repos/ocrd_network_tests/ws29/data, nothing to do
09:52:28.384 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_home_mm_repos_ocrd_network_tests_ws29_data_mets_xml.sock] - find_files({'local_filename': 'DEFAULT/FILE_0007_DEFAULT.jpg'})
09:52:28.418 DEBUG ocrd.mets_client[/tmp/ocrd_network_sockets/_home_mm_repos_ocrd_network_tests_ws29_data_mets_xml.sock] - find_files({'local_filename': 'DEFAULT/FILE_0007_DEFAULT.jpg'})
09:52:28.437 DEBUG ocrd.workspace.image_from_page - page 'FILE_0007_DEFAULT' has  orientation=0 skew=0.00
09:52:31.289 DEBUG ocrd.workspace.add_file - outputfile file_grp=OCR-D-BINPAGE local_filename=OCR-D-BINPAGE/FILE_0007_OCR-D-BINPAGE.IMG-BIN.png content=True
09:52:31.293 INFO ocrd.workspace.save_image_file - created file ID: FILE_0007_OCR-D-BINPAGE.IMG-BIN, file_grp: OCR-D-BINPAGE, path: OCR-D-BINPAGE/FILE_0007_OCR-D-BINPAGE.IMG-BIN.png
09:52:31.293 DEBUG ocrd.workspace.add_file - outputfile file_grp=OCR-D-BINPAGE local_filename=OCR-D-BINPAGE/FILE_0007_OCR-D-BINPAGE.xml content=True
09:52:31.295 INFO ocrd.process.profile - Executing processor 'ocrd-cis-ocropy-binarize' took 2.954705s (wall) 2.868763s (CPU)( [--input-file-grp='DEFAULT' --output-file-grp='OCR-D-BINPAGE' --parameter='{"dpi": 300, "method": "ocropy", "threshold": 0.5, "grayscale": false, "maxskew": 0.0, "noise_maxsize": 0, "level-of-operation": "page"}' --page-id='PHYS_0007']

The output of the Processing Server is affected and logs only the INFO levels. It would be required to create a ocrd_logging.conf file under home directory to achieve the previous state (assuming all 3rd party libraries are known). It should be noted that this will also have side effects to other projects. I have not tested yet, but Operandi should be affected as well since the root logger is affected. Having an ocrd_logging.conf file under the home directory may not always be ideal:
1) Depending on the user running in the HPC it may have different log results (have to be tested yet)
2) with the ocrd_networking module when there are processors deployed on different hosts this may lead to log inconsistencies depending on whether there is a log configuration on a host or not. All hosts potentially would have to maintain the same ocrd_logging.conf file.

This PR can be merged.

@kba
Copy link
Member Author

kba commented Jan 8, 2024

It would be required to create a ocrd_logging.conf file under home directory to achieve the previous state

It does not have to be under the home directory, it can also be in the working directory of the process that calls initLogging.

Depending on the user running in the HPC it may have different log results (have to be tested yet)

Not sure I follow - why is the processing server deployed to HPC?

with the ocrd_networking module when there are processors deployed on different hosts this may lead to log inconsistencies depending on whether there is a log configuration on a host or not. All hosts potentially would have to maintain the same ocrd_logging.conf file.

In practice - at least that is how I have been testing it - you would need to set up a dedicated account for the processing workers in any case, to set up a minimal shell environment (setting $PATH to include the venv for example) and restrict that account as much as possible, since it can be accessed via SSH. So for productive deployment, especially to many hosts, users will use ansible or similar to automate that process. Deploying the ocrd_loggging.conf would be one step of that. Do we already have that part of the deployment process documented already, beyond https://pad.gwdg.de/Ty6IXzhIRa6AvDdC4kTy_g (which implies this setup already done)?

@MehmedGIT
Copy link
Contributor

Not sure I follow - why is the processing server deployed to HPC?

It is not related to that. The Operandi Server runs on a VM and has a different environment and file system than the ocrd_all container that runs inside the HPC. The ocrd_all singularity image is stored in a shared location (scratch storage) and every Operandi user (through their GWDG account) reuses that image. I have just realized that this is not an issue. I could still try to provide the default logging configuration file by volume mapping to prevent looking/searching for it inside the users' home directory.

In practice - at least that is how I have been testing it - you would need to set up a dedicated account for the processing workers in any case, to set up a minimal shell environment (setting $PATH to include the venv for example) and restrict that account as much as possible, since it can be accessed via SSH.

Yes, that should be in practice.

Deploying the ocrd_loggging.conf would be one step of that. Do we already have that part of the deployment process documented already, beyond https://pad.gwdg.de/Ty6IXzhIRa6AvDdC4kTy_g (which implies this setup already done)?

No.

@kba kba merged commit 65bdf92 into master Jan 8, 2024
@kba kba deleted the logging-downgrade-level branch January 8, 2024 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants