Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bertsky
Copy link
Collaborator

@bertsky bertsky commented Dec 12, 2025

  • _page_worker: remove ThreadPool mechanism introduced in 3cc4780 (which broke processors that are not threadsafe like TF/Keras)
  • since no mechanisms work to stop computation in uniprocessing (as not even _thread.interrupt_main() or signal.alarm() would interrupt I/O or C library calls like libtesseract): drop
  • since neither stdlib's nor loky's ProcessPoolExecutor enforces timeouts on jobs: replace by pebble
  • apply max_seconds timeout iff in ProcessPool mode iff running with METS Server
  • make test_run_output_timeout xfail
  • add test_run_output_metsserver_timeout

see OCR-D/ocrd_anybaseocr#115 (comment) for context (plus internal discussion)

- `_page_worker`: remove `ThreadPool` mechanism introduced in 3cc4780
  (which broke processors that are not threadsafe like TF/Keras)
- since _no mechanisms_ work to stop computation in uniprocessing
  (as not even `_thread.interrupt_main()` or `signal.alarm()` would
   interrupt I/O or C library calls like libtesseract): drop
- since neither stdlib's nor loky's ProcessPoolExecutor enforces
  timeouts on jobs: replace by pebble
- apply `max_seconds` timeout iff in ProcessPool mode iff running with
  METS Server
- make `test_run_output_timeout` xfail
- add `test_run_output_metsserver_timeout`
Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 16, 2025

I wonder whether we should still keep some mechanism in the page worker, though – for those cases where our timeout mechanism does work even in uniprocessing. Like interrupting I/O wait or CPU-bound Pythonic computation with signal(), or with _thread.interrupt_main(). But then maybe in the ProcessPool case we would have to avoid these two racing against each other...

If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).

And, regardless, perhaps it would be better to have some actual test cases that cover the pathological case (simulating a long-lasting C library call like libtesseract).

EDIT: BTW, it's the same with KeyboardInterrupt: it works only if in a subprocess, but libtesseract calls are not (!) interruptible. (Perhaps we should take that to tesserocr, though...)

@kba
Copy link
Member

kba commented Dec 16, 2025

Just quickly on this point:

If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).

What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 17, 2025

What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.

For simplicity and backwards-compatibility, we still want to support isolated runs of processor CLIs. It would be a shame if v3 envvars do not work there, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants