ames, gall: fix "nacked-leave" logic #6954

yosoyubik · 2024-03-26T09:44:14Z

The gall-ames-desync caused a %leave and a %cork to be sent one after the other to %ames, both to be delivered to the remote server %ames. The change in #6759 made it that %gall only sends a %cork to %ames when the %leave gets acked, and then removes it from its queue. If a %leave gets %nacked, we start a timer to go over every outstanding %leave to send them (if there's only one message in the outstanding %gall queue, and it's a %leave). This is a problem because there could be %leaves there that belong to %dead-flows and have not been acks, so we are creating a space leak in %ames' unsent-message queue.

This was made worse by the bug introduced in the %flub logic when the specific message we deliver to the vane is a %leave (see #6953) since now there are more %leaves that are going to be %nacked—this was discovered on ~norsyr-torryn when multiple "...on closing bone, ignoring" messages were seeing. Investigating this further revealed that these were %leave %pleas, and also that ~halbex-palheb was always %nacking the same %leave over and over. Looking into why this happened revealed the situation described in #6759.

Here we just "flag" %nacked %leaves by adding a %missing request that it's checked when the nacked-leaves timer fires to skip %leaves that belong to dead-flows.

I've also added trace-logging (under the %odd flag) in %ames for sending %pleas on closing bones—this is more contentious since this PR should fix what has caused the "closing bone" message, so the ~& shouldn't be hidden behind a logging flag and the specific commit should probably be reverted.

The situation for the "...on closing bone" messages seems different than the nacked-leave scenario. I suspect that these are flows that came via the gall-ames desync, and were marked as closing (%cork and %leave were sent at the same time to %ames), but the %leave got dropped on the server, so the client %ames just resends it, but %gall still kept the outstanding %leave in the queue. The behavior of %nacking a %leave triggered by the bug in #6953 made it that all outstanding %leave request are sent to %ames, even for those flows that are in closing.

This reverts commit 198fa38.

in the thread-calling http interface. Specifying a content-type header of application/x-urb-jam will make the request body be interpreted as a uw-encoded jammed noun, rather than json. Specifying an accept header of application/x-urb-jam will make the thread result in the response body be rendered as a uw-encoded jammed noun, rather than json. For the latter, the output mark becomes unused, since we can just "render" the resulting noun directly, without needing to explicitly convert it. (This assumes that converting any mark to %noun will always result in the same noun, which isn't guaranteed in theory, but is always the case in practice.) This prepares spider for use in a nouns-based version of js-http-api.

This brings it in line with the serialization found in /mar/noun. The `@uw`-encoding was carried over from Eyre, who uses it for channels. In that context, outgoing jam bytes must be encoded, because newline characters (`0a` bytes) would break up the SSE data field. Because they're essentially part of the same protocol, Eyre mirrors this for incoming nouns. Even though PUT requests can carry arbitrary bytes just fine, the symmetry and protocol-wide consistency seems important. Here, we are dealing strictly with plain HTTP requests, and strictly with requests that have indicated support for the `application/x-urb-jam` mime type to boot. We should have no qualms about raw jam bytes. They're more compact/efficient, too.

Apparently the operation triggered by this generator may cause the rift for the specified moon to be inaccurate if |moon-breach was run previously. Here, detect if the moon has been created before, and recommend the other generators if that is the case.

yosoyubik · 2024-04-10T08:20:16Z

~norsyr-torryn is running this fix with no noticeable issues and the "...on closing bone" message does not show up anymore.

yosoyubik added 3 commits March 26, 2024 07:37

ames: log with trace when ignoring messages

198fa38

gall: check if a %leave got %nacked before resending

216ce19

gall: keep %leave in the queue

551507c

yosoyubik mentioned this pull request Apr 9, 2024

ship hangs on hearing about breaches #6794

Open

yosoyubik and others added 7 commits April 10, 2024 10:09

Revert "ames: log with trace when ignoring messages"

3a3475e

This reverts commit 198fa38.

ames: on flub, check if pending-vane-ack queue is not empty

dec3853

gall: don't ack %leave for non-running agents

abcb73f

Merge branch 'develop' into yu/trace-for-memo

ae7393b

pkova approved these changes Apr 10, 2024

View reviewed changes

pkova merged commit 8d0df85 into develop Apr 10, 2024

pkova deleted the yu/trace-for-memo branch April 10, 2024 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ames, gall: fix "nacked-leave" logic #6954

ames, gall: fix "nacked-leave" logic #6954

Uh oh!

yosoyubik commented Mar 26, 2024 •

edited

Loading

Uh oh!

yosoyubik commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ames, gall: fix "nacked-leave" logic #6954

ames, gall: fix "nacked-leave" logic #6954

Uh oh!

Conversation

yosoyubik commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yosoyubik commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yosoyubik commented Mar 26, 2024 •

edited

Loading