Implementation of the resource manager server (issue #1294) #1309

MehmedGIT · 2025-02-18T13:39:19Z

A PR draft for addressing #1294. Still in progress.

What has been added so far:

Support ocrd network resmgr-server --address host:port for triggering Resource Manager Server (RMS) in the background
For each host mentioned in the PS config file, the deployer will deploy a resource manager server on port 45555 of that host
Basic list-available functionality for the RMS
Basic list-installed functionality for the RMS
A partial fix for resmgr list-installed only knows about 3 processors with preconfigured resources #1251 (check the detailed comment from @bertsky here)
Refactoring of core resource manager

kba

So far so good, though I still need to understand how exactly this works

src/ocrd_network/resource_manager_server.py

MehmedGIT · 2025-03-04T12:47:07Z

I have difficulty deciding how the download functionality should function. There is a split of responsibilities between the OcrdResourceManager.download and the ocrd.cli.resmgr.download. The former method is used inside the latter one. If I reuse the former one, what should the endpoint itself return? It is just a file path as it is now, but paths will not matter for the end user of the endpoint. However, if I use the latter method, there are many side effects that we may not want in the download endpoint. For example, the progress bar, the resource database, etc. Moreover, the logger prints are useful for the CLI, but should they also appear in the response?

I will use the ocrd.cli.resmgr.download method to simulate the CLI experience. But that is not optimal for the ocrd_network.

@bertsky, @kba

This is required in order to fix: Not supported URL scheme http+docker

bertsky · 2025-03-04T13:33:34Z

@MehmedGIT indeed, this calls for more aggressive refactoring. IMO the high-level ocrd.cli.resmgr.download (checking executable is installed, checking resource is available, discerning local paths from URLs, resolving target location, providing resulting name/path) should become independent of the CLI, i.e. part of ocrd.resource_manager (but not necessarily OcrdResourceManager), and get passed the logger and progress_cb as kwargs.

Perhaps like this:

def download(
    logger: Optional(logging.Logger),
    progress_cb: Callable[[int], Any],
    resmgr: OcrdResourceManager, 
    any_url: Optional[str], 
    no_dynamic: bool, 
    resource_type: str,
    path_in_archive: str,
    allow_uninstalled: bool,
    overwrite: bool,
    location: str,
    executable: str,
    name: str
):
    # current implementation of cli.resmgr.download

(or perhaps as kwargs with their own defaults, perhaps returning the fpath and usable resource name as tuple instead of just logging it)

The ResourceManagerServer could then call this to its own liking.

MehmedGIT · 2025-03-04T13:37:41Z

@bertsky, agree. For now, I will mimic the ocrd.cli.resmgr.download method and append the logger print statements to a list, which will be returned. Instead of sys.exit(1), I will raise HTTPException and will get rid of the progress bar but still keep the database till I decide how to refactor that away properly.

bertsky · 2025-03-04T13:47:58Z

Indeed, the new/refactored download would also need a general exception (just ValueError?) instead of exit.

Perhaps it's a good time to remove the entire loathed user database facility from resmgr, too? (cf. #1251)

MehmedGIT · 2025-03-04T14:17:17Z

Indeed, the new/refactored download would also need a general exception (just ValueError?) instead of exit.

HTTPException instead of ValueError; we do not want to crash the resource manager server on a wrong request, right?

Perhaps it's a good time to remove the entire loathed user database facility from resmgr, too? (cf. #1251)

I would suggest reconsidering the wildcard ('*') options as well and requiring the processor and model name. Otherwise, the return of a response may take more than 30 minutes and effectively block the server for other requests if a wildcard is used for a processor name. At least, it does with the current implementation.

bertsky · 2025-03-04T14:36:51Z

Indeed, the new/refactored download would also need a general exception (just ValueError?) instead of exit.

HTTPException instead of ValueError; we do not want to crash the resource manager server on a wrong request, right?

For your interim implementation, yes. I was again referring to the proposed refactored ocrd.resource_manager.download function (which would need to accomodate both use-cases, so the ResourceManagerServer would re-raise into HTTPException, while the CLI would just catch and exit).

Perhaps it's a good time to remove the entire loathed user database facility from resmgr, too? (cf. #1251)

I would suggest reconsidering the wildcard ('*') options as well and requiring the processor and model name. Otherwise, the return of a response may take more than 30 minutes and effectively block the server for other requests if a wildcard is used for a processor name. At least, it does with the current implementation.

Agreed, this should be blocked in the RMS.

Perhaps the PS discovery can re-implement a similar behavior for convenience in the future, but I doubt there is much use for this. (It's more likely we will script this in some form externally, maybe along with the workflow repository.)

MehmedGIT · 2025-03-04T15:12:55Z

If @kba has no objections (since he was going to create a PR to fix the resmgr), I will do some basic refactoring without breaking the CLI behaviour.

Perhaps the PS discovery can re-implement a similar behavior for convenience in the future

I think for listing available/installed the wildcard is not an issue for now (but maybe will become after refactoring the database away).

MehmedGIT · 2025-03-06T15:04:23Z

bb0b0cd should be a fix for the missing entries for the '*' glob, i.e., #1251. I manually set the XDG_DATA_HOME and XDG_CONFIG_HOME in my environment to point to the same path.

ocrd resmgr download '*' should now yield the expected behaviour, of course, after running first ocrd resmgr list-available and potentially ocrd resmgr list-installed to update the resources.yml.

The issue was that not all entries found were saved properly in the database and to the resources.yml by invoking self.save_user_list(). Although that method is called inside self.add_to_user_database(), the latter was called inside an if case, and not everytime a new resource was found! Triggering the self.save_user_list() in the end of list_installed() and list_available() does fix it. It may not be optimal to call it every time, however, I prefer doing that and fixing the resource manager, instead of proceeding with a broken manager.

src/ocrd/cli/resmgr.py

bertsky · 2025-03-11T12:03:15Z

src/ocrd/resource_manager.py

-                resdict['path'] = str(res_filename)
+                # resdict['path'] = str(res_filename)


Why not store the path into the returned list of dicts anymore? (So far, it seems to be used only in print_resources, but why not?)

@bertsky, since the self.save_user_list() method writes to the resources.yml, the path occurred there as a key, which led to the failure of the OcrdResourceListValidator on consequential loads of the yaml file. Unfortunately, simply doing the saving before assigning a new path key also does not help. I did not want to make a deep copy of the entire database for the extra path key output. Neither wanted to modify the ocrd_tool.schema.yml by adding an extra path field. I will not get rid of the path completely; I just need to figure out how to achieve the same behaviour optimally. Maybe that will become clear after I refactor the database itself.

Oh, I see. Yes, I also would not like to see path in the schema (as this syntax is meant to be shared between ocrd-tool.json developer descriptions and resource.yml installations).

But in this function the final save_user_list() seems redundant, because every add_to_user_database() will already invoke that (for each executable, before adding path).

I don't remember why we originally decided to save the file for every database update. It should be independent IMO.

Also, currently we do the saving twice for every processor, because add_to_user_database() also invokes list_available(), which now finally invokes save_user_list() as well.

But in this function the final save_user_list() seems redundant, because every add_to_user_database() will already invoke that (for each executable, before adding path).

True, it seems redundant, but that was part of the fix for the missing resources of #1251. Still not sure why. Also, add_to_user_database() is not called for resources found at module level.

I don't remember why we originally decided to save the file for every database update. It should be independent IMO.

Agree. I will optimize that when I get there. There are unnecessary saves to and loads from the yaml file for each discovered resource.

Also, currently we do the saving twice for every processor, because add_to_user_database() also invokes list_available(), which now finally invokes save_user_list() as well.

Right. I have also spotted that in the logs. I think a simpler search method is needed instead of relying on the list_available() method, which also has other side effects.

I also do not like the database deduplication method. Preventing duplication should perform better than adding and then trying to remove duplications afterwards.

offtopic, but just a few lines above (I cannot suggest outside of the narrow diff hunk context):

instead of...

elif str(res_filename.parent) == moduledir:

...please write...

elif str(res_filename.parent).startswith(moduledir):

(because there are many module-provided data files that are in subdirectories, which now end up as cwd resources)

But in this function the final save_user_list() seems redundant, because every add_to_user_database() will already invoke that (for each executable, before adding path).

True, it seems redundant, but that was part of the fix for the missing resources of #1251.

Yes, that did help – but does not search for and add executables to the database in all cases. So I would still consider this behaviour an open matter.

Still not sure why. Also, add_to_user_database() is not called for resources found at module level.

True. The more I think about it, the less I understand what might have been the original idea behind the user database.

Clearly, @kba wanted to save the time of searching executables and resources repeatedly, hence the shortcuts. But not being able to add executables to the database seems broken, and not having module resources show up as registered/installed resources, too.

I don't remember why we originally decided to save the file for every database update. It should be independent IMO.

Agree. I will optimize that when I get there. There are unnecessary saves to and loads from the yaml file for each discovered resource.

Yes, once we have a clearer idea about the database lifetime, it should be easy to reduce file I/O.

Also, currently we do the saving twice for every processor, because add_to_user_database() also invokes list_available(), which now finally invokes save_user_list() as well.

Right. I have also spotted that in the logs. I think a simpler search method is needed instead of relying on the list_available() method, which also has other side effects.

...like short-cutting via ocrd-all-tool.json if available?

I also do not like the database deduplication method. Preventing duplication should perform better than adding and then trying to remove duplications afterwards.

As discussed in the chat, the list (instead of dict) structure (and hence deduplication) might have been meant to allow keeping additional user-defined versions of the same resource name.

The original idea behind writing to the database YAML was to a) speed up lookups b) automatically add new resources when doing a lookup and c) make it extensible, so users could add their own local resources.

But the resource manager currently does neither of those properly and it is very easy to create invalid data, add fixed paths that should be dynamic (module resources), have clashing names.

Also the way some of the functionality is handled in utils functions, other functionality in the manager class and yet other functionality in the CLI, is messy.

I'm open for (radical) refactoring.

…ation

src/ocrd_utils/os.py

bertsky · 2025-03-12T13:52:49Z

A fix for resmgr list-installed only knows about 3 processors with preconfigured resources #1251

note: this PR so far does not fix the described probelms entirely (not even with #1315).

Still, new database entries (i.e. new executables) only get added when

calling list_available (or ocrd resmgr list-available) with a glob pattern or executable name, and dynamic=True
calling list_installed (or ocrd resmgr list-installed) with an executable name
calling list_installed (or ocrd resmgr list-installed) without an executable name, if it happens to already have a directory ocrd-... under /usr/local/share/ocrd-resources or $XDG_DATA_HOME/ocrd-resources
calling handle_resource (or ocrd resmgr download without -D) with an executable name

But not:

list_available without an executable or with dynamic=False, because it then short-circuits:

core/src/ocrd/resource_manager.py

Lines 128 to 143 in d760cf9

    
               def list_available( 
        
                   self, executable: str = None, dynamic: bool = True, name: str = None, database: Dict = None, url: str = None 
        
               ): 
        
                   """ 
        
                   List models available for download by processor 
        
                   """ 
        
                   if not database: 
        
                       database = self.database 
        
                   if not executable: 
        
                       return database.items() 
        
                   if dynamic: 
        
                       self._search_executables(executable) 
        
                       self.save_user_list() 
        
                   found = False 
        
                   ret = [] 
        
                   for k in database:

ocrd resmgr download "*" (for the same reason)
list_installed without an executable (for the same reason)

So ocrd resmgr download (for specific executables) now adds entries, and ocrd resmgr list-available (for the default ocrd-*) searches the PATH and adds respective entries. And for already added entries or explicit executables, in list_installed the module directory now gets searched. But we are still not doing a search in all expected circumstances. And we are nowhere utilising the list of executables in ocrd-all-tool.json if present (bypassing a PATH search).

MehmedGIT · 2025-03-12T14:06:03Z

Noted and appended your comment as a reference to the top. Thanks!

bertsky · 2025-03-25T18:04:29Z

@kba what? wait! This was still a draft. And shouldn't you have merged #1315 into this first?

bertsky · 2025-03-25T18:32:04Z

https://github.com/OCR-D/core/releases/tag/v3.2.0 is also bugus – it does not include #1315

MehmedGIT · 2025-03-25T18:37:46Z

@kba what? wait! This was still a draft. And shouldn't you have merged #1315 into this first?

I did not react because I can also create another PR for the networt client, request forwarding over the Processing Server, etc. However, I am also fine if this PR is reverted back and I continue here. I agree about #1315.

kba · 2025-03-27T13:38:00Z

Yeah, I messed up, this was not intentional. I'll try to fix it :(

kba · 2025-03-27T13:59:43Z

@kba what? wait! This was still a draft. And shouldn't you have merged #1315 into this first?

I did not realize #1315 was based on #1309 was the problem.

https://github.com/OCR-D/core/releases/tag/v3.2.0 is also bugus – it does not include #1315

master (and the release) does include #1315, that PR is only open because it is based on #1309 and #1315 has not been merged into #1309.

I revert the merge and release a hotfix without it.

We'll need to rename the branch and open a new PR though, GitHub does not allow continuing working on a merged branch AFAIK.

I should (and will from now on) create release branches so I spot these oversights beforehand from now on.

MehmedGIT · 2025-03-27T14:06:13Z

We'll need to rename the branch and open a new PR though, GitHub does not allow continuing working on a merged branch AFAIK.

@kba Whatever is faster and esier on your end. I am fine with a new brach and PR to continue working on the RM Server.

bertsky · 2025-03-27T14:21:32Z

master (and the release) does include #1315, that PR is only open because it is based on #1309 and #1315 has not been merged into #1309.

Oh, I see. Sorry, did not notice.

I revert the merge and release a hotfix without it.

That's to avoid having an incomplete version of the ResourceManagerServer in master and release, right?

If so, I'm for that.

We'll need to rename the branch and open a new PR though, GitHub does not allow continuing working on a merged branch AFAIK.

Ok, if that's required, so be it. We should be careful not to loose sight of our discussion, inasfar as it still matters, e.g.

Implementation of the resource manager server (issue #1294) #1309 (comment) on how to actually get to resmgr list-installed only knows about 3 processors with preconfigured resources #1251
Implementation of the resource manager server (issue #1294) #1309 (comment) on a refactored common base function ocrd.resource_manager.download for both ocrd.cli.resmgr.download and ocrd_network.ResourceManagerServer.download_resource (perhaps excluding the * behaviour though)

Continuation of #1309: Implementation of the resource manager server (issue #1294)

MehmedGIT added 10 commits February 18, 2025 14:02

add: resource manager server skeleton

a6907f1

extend: const NetworkLoggingDirs

7a6d113

add: logging file path method

ea77ef1

add: resource manager network agent skeleton

1c47bdb

simplify: hosts.py

322709e

add: pid logs

92a740e

refine runtime data logging

786527b

add: deploying blank resource manager server

ccf6818

add: shutdown resmgr server from host

3775ee4

implement list available and list installed

4f2ff6c

kba reviewed Feb 24, 2025

View reviewed changes

src/ocrd_network/resource_manager_server.py Outdated Show resolved Hide resolved

src/ocrd_network/resource_manager_server.py Outdated Show resolved Hide resolved

refactor: implement kba feedback

fc58c9b

MehmedGIT added 4 commits March 4, 2025 13:50

implement: cli download method

b06aa05

fix: docker requirement >= 7.1.0

e68cc55

This is required in order to fix: Not supported URL scheme http+docker

add: default values to download method

a6460bb

remove: the location check

d7bc2a0

implement proper download method

5844e97

add: log response messages

740a566

MehmedGIT added 2 commits March 5, 2025 17:22

simple resmgr refactoring without side effects

9328e7e

fix: properly save the database to resource list

bb0b0cd

kba reviewed Mar 10, 2025

View reviewed changes

src/ocrd/cli/resmgr.py Outdated Show resolved Hide resolved

kba reviewed Mar 10, 2025

View reviewed changes

src/ocrd/cli/resmgr.py Outdated Show resolved Hide resolved

refactor: split resource copying/downloading

1a65c9b

bertsky reviewed Mar 11, 2025

View reviewed changes

MehmedGIT added 6 commits March 11, 2025 14:03

add: typing to resource manager

c89293c

remove: progress bar from resgmr

99120c6

simplify download cli, expand handle_resource

82ff7c0

fix: consider module level subdirs in list-installed

5cbebc9

improve: install to resource_locations[0] instead of refusing install…

387bc77

…ation

fix: revert back save_user_list() inside list_installed

2f59e5d

bertsky reviewed Mar 11, 2025

View reviewed changes

src/ocrd_utils/os.py Show resolved Hide resolved

MehmedGIT added 5 commits March 11, 2025 16:52

fix some tests: test_resource_manager

0635b0f

fix: extend anti-pattern list

474204a

fix test: cli test_resmgr.py

691a978

fix test: test_resource_manager.py

9f18657

readapt: resource_manager_server.py

d760cf9

merge master

bf85b85

kba merged commit 2e9dab5 into master Mar 25, 2025
6 of 22 checks passed

kba mentioned this pull request Mar 27, 2025

Revert "Merge remote-tracking branch 'bertsky/resmgr-type-checking'" #1317

Merged

MehmedGIT mentioned this pull request Mar 28, 2025

A continuation for #1309 that was accidentally merged #1318

Closed

This was referenced Mar 28, 2025

fix name vs subdir, add type checking for resource candidates #1315

Merged

Continuation of #1309: Implementation of the resource manager server (issue #1294) #1319

Merged

kba added a commit that referenced this pull request Dec 10, 2025

Merge pull request #1319 from OCR-D/1294-impl-rm-server-revert-revert

13f4696

Continuation of #1309: Implementation of the resource manager server (issue #1294)

		resdict['path'] = str(res_filename)
		# resdict['path'] = str(res_filename)

Implementation of the resource manager server (issue #1294) #1309

Implementation of the resource manager server (issue #1294) #1309

Uh oh!

Conversation

MehmedGIT commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kba left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MehmedGIT commented Mar 4, 2025

Uh oh!

bertsky commented Mar 4, 2025

Uh oh!

MehmedGIT commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bertsky commented Mar 4, 2025

Uh oh!

MehmedGIT commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bertsky commented Mar 4, 2025

Uh oh!

MehmedGIT commented Mar 4, 2025

Uh oh!

MehmedGIT commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bertsky Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

MehmedGIT Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bertsky Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

MehmedGIT Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bertsky Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

bertsky Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

kba Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bertsky commented Mar 12, 2025

Uh oh!

MehmedGIT commented Mar 12, 2025

Uh oh!

Uh oh!

bertsky commented Mar 25, 2025

Uh oh!

bertsky commented Mar 25, 2025

Uh oh!

MehmedGIT commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kba commented Mar 27, 2025

Uh oh!

kba commented Mar 27, 2025

Uh oh!

MehmedGIT commented Mar 27, 2025

Uh oh!

bertsky commented Mar 27, 2025

Uh oh!

Reviewers

Assignees

Labels

MehmedGIT commented Feb 18, 2025 •

edited

Loading

MehmedGIT commented Mar 4, 2025 •

edited

Loading

MehmedGIT commented Mar 4, 2025 •

edited

Loading

MehmedGIT commented Mar 6, 2025 •

edited

Loading

MehmedGIT Mar 11, 2025 •

edited

Loading

MehmedGIT Mar 11, 2025 •

edited

Loading

MehmedGIT commented Mar 25, 2025 •

edited

Loading