arvo: refactors internals, adds error-handling #2366

joemfb · 2020-02-27T04:36:09Z

This PR refactors arvo's internal engines, and adds new mechanisms by which vanes can propagate error notifications.

The primary changes are:

|wink refactored into |me and |va
|is refactored into |le
top-level structures are modernized, grouped, and documented
new-style error notifications are added, in addition to the current style

The new engines -- namespaced in |part -- are inspired to varying degrees by the unreleased, incomplete neo/arvo (deleted in 7d4b35c). These changes were made to support the error-notification implementation, and generally improve the quality of arvo's internals. No changes have been made to arvo's external interface or persistent structures, so this extensive refactoring is still suitable for OTA release without staging or adaptation.

The new error notifications involve changes to arvo's internal loop, and the interface between arvo and its vanes:

$goof is added, a new error-notification structure (similar to $ares from the deep past)
%hurl is added, a new kernel action to propagate a $goof along with a %pass or %give
(unit goof) is added to the sample of vanes' +call and +take arms

Inside the vanes, new error notifications are "downcast" to the old style (ie, %crud everywhere but %ames, where they become %crud or %hole). The only change in the error-handling behavior of the vanes is in %behn's %drip handling. Errors therein are now propagated to the intended recipients (arriving in their +take arm, where they'll be merely printed).

Finally, the worker process is updated to send new error-notification events, including both the $goof (ie, bail mote and stack trace) and the original event.

This PR represents an incomplete but viable snapshot of the error-handling work. Additional changes are needed in the vanes (to handle more error notifications, or more fully) and the runtime, specifically the IPC protocol and I/O drivers (to precisely handle errors in error notifications). Additional improvements to arvo also follow from this work, most notably around upgrade. All such changes will be more disruptive than these, and harder to handle without strict versioning coordination between arvo and the runtime. These changes are a foundation upon which incrementally better error-handling can be built for the live network, while larger efforts continue in the background.

Calling the error-notification structure $goof is somewhat ... well, goofy. Some other candidates include fail, ruin, crud, flaw, lack, and miss. Feedback is requested.

The changes to arvo and the vanes must be released together, but no intermediate staging is needed, and the runtime changes need not be correlated. Since almost every commit in this PR merits a new pill, but none individually require one, I've departed from the recommended approach and saved the pill update for the end.

joemfb · 2020-02-28T22:18:43Z

@belisarius222, %init wouldn't work right now, as it happens in the "legacy boot" event, before the initial userspace commit. I don't know %ford's internals too well, maybe there's an easy way to check for the first %reef build. Hardcoding cases 1/2 might be fine. But the right way to do the pit short-circuit is for arvo to persist the source it's running, so %ford can scry and compare. I aim to add that minimal arvo filesystem soon(tm).

belisarius222 · 2020-02-28T22:43:03Z

Yeah, comparing source (or receiving source with pit) is the right way to solve this. Should we just do that? Working on that is a better use of time than this Ford hack, although it's not good to let this PR languish either. If not, I could add a hack to Ford to track whether we've built %reef on this desk before, and if not, use the .pit. I think it whould be relatively straightforward. — ~rovnys-ricfer https://urbit.org

…

On Fri, Feb 28, 2020 at 5:18 PM, Joe Bryan < ***@***.*** > wrote: @ belisarius222 ( https://github.com/belisarius222 ) , %init wouldn't work right now, as it happens in the "legacy boot" event, before the initial userspace commit. I don't know %ford's internals too well, maybe there's an easy way to check for the first %reef build. Hardcoding cases 1/2 might be fine. But the right way to do the pit short-circuit is for arvo to persist the source it's running, so %ford can scry and compare. I aim to add that minimal arvo filesystem soon(tm). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( #2366?email_source=notifications&email_token=AAGVR5OTKMBFCIDIH6ET2RTRFGEUHA5CNFSM4K4TNSJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENKLRCI#issuecomment-592754825 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAGVR5KY5RVJOTAPRHDBKKLRFGEUHANCNFSM4K4TNSJA ).

joemfb · 2020-02-29T01:02:32Z

@belisarius222, I don't want to churn interfaces for short-term workarounds. And I don't want to add state adapters here, or disrupt the outer layers of arvo.

I've added back the pit-shortcircuit for %home and %base (which does get built first, you were right). This makes boot fast again. The only risk is making a bad pill (kernel source mismatch), which is already a risk. It should just be avoided. This seems fine for now, IMO, unless you'd prefer to prime the cache to the same effect. I just don't want to be blocked on this problem anymore.

belisarius222 · 2020-02-29T01:05:35Z

Ok, no this looks fine. Glad it worked.

belisarius222

LGTM

jtobin

@joemfb The kernel appears to reload fine when applying this (and ford-no-pit) on top of the latest release as a sanity check, but I do observe a single [%poke %bad-wire /] immediately afterwards.

I ran a |reboot after that and got a bunch of find.goof and find.wite errors and such. I made a trivial change to Arvo and recommitted to force it to reload, again observing [%poke %bad-wire /], and |reboot again afterwards produces the same find.goof and find.wite errors.

Anything to be concerned about here?

joemfb · 2020-03-02T20:39:59Z

@jtobin, thanks for checking. Was this a new fake-ship from the arvo tagged release?

I've pulled the %ford changes into #2384, and will reintegrate here once that's done. Pushing these changes OTA should work (and is why this milestone has been PR'd), but it still needs to be tested. As for [%poke %bad-wire ...] certain implicit constraints have been made explicit, and it looks like we're violating them in some cases. IMO, the best thing would be to track those cases down and either fix or special-case (ie, adapt) them.

joemfb · 2020-03-03T04:41:45Z

@jtobin, the [%poke %bad-wire ...] error was caused by the %vega reset notification using an invalid wire. I've fixed this, and added code to dynamically fix-up the %vega that will come from the old kernel on |reset. The find errors on |reload are most likely due to the kernel |reset failing -- |reload doesn't recompile the kernel, so it was attempting to load new vanes into an old kernel.

That being said, I'm not quite ready for this to be merged. I'd like to redo the master merge once #2384 is in, and clean things up generally. And I want to do a review myself.

Also worth noting, this does not need to be in or block the os1 release. Due to the issues we've been having with OTA's, it might be best to hold it until after os1. But we can discuss that separately.

joemfb · 2020-03-04T19:11:47Z

Ok, this is cleaned up and ready on my end. @belisarius222, please take a final look.

I've tested committing these changes to a fake-zod booted from the v0.10 branch. When #2247 is included, the OTA fails in %clay and %gall (%load-lost), due to type-of-type changes (The old type nests in the new, but the need to mutually nest for vanes that store vases.) When #2247 is excluded the upgrade applies cleanly.

@jtobin, I haven't tried merging this and then cherry-picking the merge commit; I'm not sure if that's sufficient to exclude the changes in question.

belisarius222

The recent changes all LGTM.

jtobin · 2020-03-05T10:42:05Z

On Wed, Mar 04, 2020 at 11:11:48AM -0800, Joe Bryan wrote: @jtobin, I haven't tried merging this and then cherry-picking the merge commit; I'm not sure if that's sufficient to exclude the changes in question.

No sweat, I'll check it out before merging.

* master: (484 commits) king: Slight CLI cleanup and fix test build. king: Add command-line flags to configure HTTP and HTTPS ports. groups: reduce metadata updates, removal chat: reducer handles metadata removal groups: exclude group metadata from channels list groups: set and surface group name metadata groups: remove dummy 'share' flow, 'default' group contacts: rename, migrate '~contacts' to '~groups' sh/release: rename vere release tarballs vere: patch version bump (v0.10.3 -> v0.10.4.rc1) [ci skip] pills: updated brass and solid chat: pull room contacts from associated group chat: spell 'permanent' correctly eyre: remove padding from 'access' input chat: only delete metadata for a chat if you created it chat: settings inputs add borders on focus vere: disables gc on |mass in the daemon process chat: remove console.log from metadataAction chat: style fixes during review, use metadata-hook chat: edit description, color settings ...

joemfb · 2020-03-05T20:18:24Z

I did one last re-merge (there was a legitimate, minor conflict in %ford as of os1-rc).

jtobin · 2020-03-06T11:11:42Z

The cherry-picked merge doesn't include the %spot hint changes, but it does include the %ford changes that can't yet go out OTA (per #2333 -- presumably this popped up in the conflict you encountered).

I think it's ok to merge this, as I don't plan to release any more non-surgical-hotfix updates prior to the OS1 release. This can probably just go out with OS1 proper.

jtobin · 2020-03-06T11:21:32Z

Alternatively, I can probably resolve the %ford conflict that occurs in the cherry-pick to take just the change relevant to this PR, and see if I can get it into the last release prior to OS1.

I'll give it a quick test just to check -- if anything looks dodgy I'll just hold off.

jtobin · 2020-03-06T12:07:51Z

I'll give it a quick test just to check -- if anything looks dodgy I'll just hold off.

This appears fine. Will merge and push a new Arvo release candidate.

* origin/arvo-errors: (35 commits) pill: all vane: jet-hints all vanes for profiling arvo: refines crash printfs arvo: fix wire (and adapt old) for %vega reset notification arvo: removes all vase literals from |va arvo: removes all traces of meta-meta card reduction arvo: cleanup per review arvo: removes vestigial |is core arvo: remove refactoring comments arvo: replace $milt with $meta arvo: replace $mill with $maze worker: sends new error-notification events arvo: removes %gave, generalizes %hurl vane: prints error notifications where not handled behn: forward %drip error notifications, refactor %crud handling ames: downcast %hear error notification to %hole vane: downcast all error notifications to %crud arvo: removes (commented out) legacy event routing test: updates vane calling convention dill: "downcast" +call error notification to %crud ... Signed-off-by: Jared Tobin <[email protected]> (cherry picked from commit 6ccc843)

joemfb added 28 commits February 26, 2020 16:56

arvo: adds dynamic analysis from neo

e2f03a6

arvo: refactors and enables neo dynamic analysis

6d8261a

arvo: adds new vane and event-loop engines

201ffd1

arvo: enables new event-loop and vane engines

f5d8a3f

arvo: refactor relationships between engines

9b09689

arvo: restore original %xeno wires

624a403

arvo: removes obsolete engines

8187c58

arvo: groups and refactors (most) top-level arvo structures

02d811f

arvo: use cached reflexives over explicit vases

eb6b99d

arvo: adds errors to $wind and |le

25a983a

arvo: adds errors to |me

b118147

arvo: moves most new structures to top level

5f0f32d

arvo: passes errors to all vanes

df970ed

arvo: supports both old and new %crud events

9915e7f

arvo: clear error state on each loop iteration

7b4ef1f

dill: "downcast" +call error notification to %crud

fa71cc2

test: updates vane calling convention

c38222d

arvo: removes (commented out) legacy event routing

8e6dc99

vane: downcast all error notifications to %crud

4cae84d

ames: downcast %hear error notification to %hole

93eaff7

behn: forward %drip error notifications, refactor %crud handling

e59d56a

vane: prints error notifications where not handled

53d9798

arvo: removes %gave, generalizes %hurl

796478a

worker: sends new error-notification events

687affc

arvo: replace $mill with $maze

bf23110

arvo: replace $milt with $meta

ba753e3

arvo: remove refactoring comments

835e34d

arvo: removes vestigial |is core

16bcff9

joemfb requested review from belisarius222 and ixv February 27, 2020 04:36

belisarius222 approved these changes Feb 29, 2020

View reviewed changes

jtobin reviewed Mar 2, 2020

View reviewed changes

joemfb mentioned this pull request Mar 2, 2020

ford: restores pit-shortcircuit, but only during boot #2384

Merged

joemfb added 3 commits March 3, 2020 12:58

arvo: fix wire (and adapt old) for %vega reset notification

066e994

arvo: refines crash printfs

c94b5c2

vane: jet-hints all vanes for profiling

6322639

joemfb force-pushed the arvo-errors branch from d69c2b1 to 2bd192f Compare March 3, 2020 23:17

belisarius222 approved these changes Mar 4, 2020

View reviewed changes

joemfb added 2 commits March 5, 2020 11:56

pill: all

41ceac8

joemfb force-pushed the arvo-errors branch from 2bd192f to 41ceac8 Compare March 5, 2020 20:16

jtobin approved these changes Mar 6, 2020

View reviewed changes

jtobin merged commit 6ccc843 into master Mar 6, 2020

jtobin deleted the arvo-errors branch March 6, 2020 12:08

joemfb mentioned this pull request Mar 9, 2020

arvo: removes looping crash printf #2422

Merged

joemfb mentioned this pull request Apr 28, 2020

urbit-king: Expected @t, but got ^ #2721

Closed

joemfb mentioned this pull request Jul 1, 2020

+slurs still referring to %lient, %rver #1990

Closed

arvo: refactors internals, adds error-handling #2366

arvo: refactors internals, adds error-handling #2366

Uh oh!

Conversation

joemfb commented Feb 27, 2020

Uh oh!

joemfb commented Feb 28, 2020

Uh oh!

belisarius222 commented Feb 28, 2020 via email

Uh oh!

joemfb commented Feb 29, 2020

Uh oh!

belisarius222 commented Feb 29, 2020

Uh oh!

belisarius222 left a comment

Choose a reason for hiding this comment

Uh oh!

jtobin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joemfb commented Mar 2, 2020

Uh oh!

joemfb commented Mar 3, 2020

Uh oh!

joemfb commented Mar 4, 2020

Uh oh!

belisarius222 left a comment

Choose a reason for hiding this comment

Uh oh!

jtobin commented Mar 5, 2020 via email

Uh oh!

joemfb commented Mar 5, 2020

Uh oh!

jtobin commented Mar 6, 2020

Uh oh!

jtobin commented Mar 6, 2020

Uh oh!

jtobin commented Mar 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jtobin left a comment •

edited

Loading