Description
[Not a priority, maybe not even necessary, just something to think about for the longer term]
I believe there is a use case where one might want to sync intermediate results from the agents to the launcher.
For example, maybe the user's function is processing a batch of inputs (one at a time), but this procedure has an expensive one-time start-up cost. Moreover, the user wants to use the intermediate results immediately (or there are so many inputs that they want to cache these results --- and preferably in the launcher --- as they go).
Serving LLMs (for inference) might be one such example: the user wants to set up their big LLM on multiple GPUs/nodes and process their API queries in this "live" fashion.
In Python, this reminds me of the generator
paradigm, which yields
results one-at-a-time as they are ready.
Maybe we can build a generator-like interface for our Launcher?