Improving fjåge.js timeouts #363

notthetup · 2025-07-28T08:02:17Z

This pull request improves timeouts available of various Gateway and AgentId methods in fjage.js.

Gateway Improvements

Previously the Gateway constructor had an option argument new Gateway({timeout: 1000}) which set the default timeout for all requests including the fjage container level queries like agentForService, agents, etc. This is still supported, but we now add the ability to set a timeout for each request individually using an argument to agentForService, agentsForService, agents and containsAgent methods.
The default timeout remain the same (the default is 8 times the Gateway's timeout, which itself was 1000ms by default).
With the new argument, you can now specify a different timeout for each request, e.g. gw.agents(10000) will set the timeout for that request to 10 seconds instead of the default 8 seconds. You can even set it to -1 to disable the timeout for that request, e.g. gw.agents(-1) will wait indefinitely for the response.

AgentID Improvements

We also default the timeout for all requests sent to an Agent using the AgentId methods (aid.get(), aid.set(), aid.request(), etc.) to default to the Gateway's timeout. Previously, this was set to 1000ms for request and 5000ms for set and get. By copying from the Gateway's timeout and multiplying it accordingly to the previous values, we keep the default behavior consistent.
We now add the ability to change the default timeout for any requests sent to a specific Agent by adding a setTimeout method to the AgentId class. This allows you to set a default timeout for all requests sent to that AgentId.

ettersi · 2025-07-29T01:35:05Z

Re timeouts: It is already quite difficult to keep track of which timeout is being used if I don't specify one myself. This PR is going to make this problem even worse. That's not really an objection to the specific changes here, just voicing out some bigger-picture concerns I have.

Re metrics: The new metrics system touches many lines yet its inner workings and intended usage aren't documented very clearly. I'm afraid that over time this will result in quite a bit of code where no one quite understands what it does any more, and in particular the metrics feature itself may break soon if future maintainers (including ourselves) forget to maintain it properly. Maybe we should have a discussion around what problem this feature is trying to solve and what's the best way to achieve that goal.

notthetup · 2025-07-30T09:57:45Z

I removed all the bits about metrics. Just the changes to timeouts now.

It is already quite difficult to keep track of which timeout is being used if I don't specify one myself. This PR is going to make this problem even worse. That's not really an objection to the specific changes here, just voicing out some bigger-picture concerns I have.

The idea here is to allow the user of fjage.js to specific the timeout instead using some magic numbers internally. If anything I would suspect this would help make things more clear.

The other option would be to force a change of the API and force the user to define timeouts for each transactions. This would break everything though. :(

ettersi · 2025-07-31T00:10:56Z

It is already quite difficult to keep track of which timeout is being used if I don't specify one myself. This PR is going to make this problem even worse. That's not really an objection to the specific changes here, just voicing out some bigger-picture concerns I have.

The idea here is to allow the user of fjage.js to specific the timeout instead using some magic numbers internally. If anything I would suspect this would help make things more clear.

My concern is a scenario where we get some logs which indicate that a request timed out, and now we have to go figure out what timeout was used for this request. This would be straightforward if we used a single, global, immutable default timeout but can be tricky once you have gateway-level and AgentID-level timeout overrides that can be changed by anyone who has access to the respective objects.

gateways/js/src/agentid.js

gateways/js/src/gateway.js

notthetup · 2025-07-31T11:33:32Z

Another approach to solve the arbitrary multipliers on timeouts (which frankly I also don't like) could be this.

We expand the Gateway constructor to have 2 options instead :

new Gateway(..., { containerQueryTimeout: 8000, messageTimeout: 5000 })

We can set the defaults such that the it works with the fjage tests. A user can set them appropriately for their setup.

These are then used as the default timeout for

containerQueryTimeout : agents, containsAgent, agentForService and agentsForService
messageTimeout : request

But each of the the methods also get a timeout argument to change away from the default value if required.

This however would break the external API for fjage.js. :( There are ways to make it backward compatible though.

The AgentID level timeout configuration (applicable to aid.set, aid.get and aid.request), is more targeted towards a use case where a known Agent is slow to respond (maybe because it's behind a slow network connection, etc). Being able to set that on an AgentID makes sense instead of changing it at every aid.set, aid.get and aid.request for that AgentID.

But this part can be considered separately from the part above, since they're not really related.

notthetup · 2025-08-01T01:50:14Z

We can set the defaults such that the it works with the fjage tests. A user can set them appropriately for their setup.

I actually prototyped this. It seems default 1000ms for both seems enough for all the tests. Also tried it with https://github.com/org-arl/unetsockets and all of it's tests also pass at 1000ms.

ettersi · 2025-08-01T02:21:07Z

Timeouts are anyway supposed to be a last-resort fix to prevent the system from deadlocking, no? So maybe the solution is to just use the max of all the individual timeouts everywhere?

The AgentID level timeout configuration (applicable to aid.set, aid.get and aid.request), is more targeted towards a use case where a known Agent is slow to respond (maybe because it's behind a slow network connection, etc). Being able to set that on an AgentID makes sense instead of changing it at every aid.set, aid.get and aid.request for that AgentID.

I've switched to the latter approach in most of my code. Maybe not so much yet in the context of fjage, but definitely in contexts like when you have a multi-layer simulation code that depends on many parameters. It's a bit tedious to spell out at every layer all the parameters that are relevant for that layer, but it makes tracing the origin of parameter values much easier. Also, when you want to change a parameter somewhere, explicit parameters force you to go and propagate that change throughout the entire stack, which seems annoyingly tedious at first but is actually a good thing because it requires you to explicitly go through everything that could potentially have been broken by that change. My experience has been that ultimately that's the less frustrating process compared to making the change and praying really hard that everything else will just work. So the TL;DR is, I'm actually quite inclined towards passing through explicitly all the timeouts to the extent that that's needed (see my earlier point).

I actually prototyped this. It seems default 1000ms for both seems enough for all the tests. Also tried it with https://github.com/org-arl/unetsockets and all of it's tests also pass at 1000ms.

I'm assuming that's fully locally on your fairly performant laptop, though, right? There's no guarantee that by changing the default timeouts we won't break things all over the place once we push this to less performant and distributed systems, no?

notthetup · 2025-08-01T10:56:27Z

Timeouts are anyway supposed to be a last-resort fix to prevent the system from deadlocking, no? So maybe the solution is to just use the max of all the individual timeouts everywhere?

Good point. I was looking at reducing them, but all that does is reduces latency of catching and dealing with errors. So instead we can set the default to something like 10000ms and then we don't have to do any of these multipliers etc.

I'll update the PR accordingly.

Also, when you want to change a parameter somewhere, explicit parameters force you to go and propagate that change throughout the entire stack, which seems annoyingly tedious at first but is actually a good thing because it requires you to explicitly go through everything that could potentially have been broken by that change.

The issue here is always forgetting it in one place.

But let's leave this change aside for now. We can revisit it later if we need it.

…tsForService, etc)

ettersi · 2025-08-01T12:57:19Z

Also, when you want to change a parameter somewhere, explicit parameters force you to go and propagate that change throughout the entire stack, which seems annoyingly tedious at first but is actually a good thing because it requires you to explicitly go through everything that could potentially have been broken by that change.

The issue here is always forgetting it in one place.

Yeah, to optimise for this particularly purpose the timeout argument would have to be mandatory. But that's of course breaking and annoying, so definitely not proposing this seriously. This whole discussion is just a tangent to this PR anyway, so happy to leave this for now as you suggested.

notthetup changed the title ~~chore(fjagejs): refactoring to split various functionality into indiv…~~ Improving fjåge.js timeouts Jul 28, 2025

notthetup force-pushed the fjage-js-timeout-improvements branch from 0d790f4 to 997659a Compare July 28, 2025 09:33

notthetup mentioned this pull request Jul 28, 2025

Split fjage.js functionality into files and expose JSONMessage #354

Merged

notthetup force-pushed the fjage-js-timeout-improvements branch 5 times, most recently from 4312a90 to 9e45afd Compare July 30, 2025 09:45

notthetup self-assigned this Jul 30, 2025

notthetup requested a review from ettersi July 30, 2025 09:57

notthetup marked this pull request as ready for review July 30, 2025 09:57

notthetup force-pushed the fjage-js-timeout-improvements branch from 9e45afd to b270bb3 Compare July 30, 2025 09:58

ettersi reviewed Jul 31, 2025

View reviewed changes

gateways/js/src/agentid.js Outdated Show resolved Hide resolved

gateways/js/src/gateway.js Outdated Show resolved Hide resolved

ettersi self-requested a review July 31, 2025 07:11

ettersi approved these changes Jul 31, 2025

View reviewed changes

notthetup force-pushed the fjage-js-timeout-improvements branch from b270bb3 to f267b21 Compare August 1, 2025 11:13

feat(fjagejs): adding timeouts to all the fjage actions (agents, agen…

1cc6e98

…tsForService, etc)

notthetup force-pushed the fjage-js-timeout-improvements branch from f267b21 to 1cc6e98 Compare August 1, 2025 11:14

notthetup requested a review from ettersi August 1, 2025 12:41

ettersi approved these changes Aug 1, 2025

View reviewed changes

notthetup merged commit 2ea6566 into master Aug 1, 2025
2 checks passed

notthetup deleted the fjage-js-timeout-improvements branch August 1, 2025 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improving fjåge.js timeouts #363

Improving fjåge.js timeouts #363

Uh oh!

notthetup commented Jul 28, 2025 •

edited

Loading

Uh oh!

ettersi commented Jul 29, 2025

Uh oh!

notthetup commented Jul 30, 2025 •

edited

Loading

Uh oh!

ettersi commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

notthetup commented Jul 31, 2025

Uh oh!

notthetup commented Aug 1, 2025

Uh oh!

ettersi commented Aug 1, 2025

Uh oh!

notthetup commented Aug 1, 2025

Uh oh!

ettersi commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Improving fjåge.js timeouts #363

Improving fjåge.js timeouts #363

Uh oh!

Conversation

notthetup commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Gateway Improvements

AgentID Improvements

Uh oh!

ettersi commented Jul 29, 2025

Uh oh!

notthetup commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ettersi commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

notthetup commented Jul 31, 2025

Uh oh!

notthetup commented Aug 1, 2025

Uh oh!

ettersi commented Aug 1, 2025

Uh oh!

notthetup commented Aug 1, 2025

Uh oh!

ettersi commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

notthetup commented Jul 28, 2025 •

edited

Loading

notthetup commented Jul 30, 2025 •

edited

Loading