Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@joemfb
Copy link
Collaborator

@joemfb joemfb commented Feb 27, 2020

This PR refactors arvo's internal engines, and adds new mechanisms by which vanes can propagate error notifications.

The primary changes are:

  • |wink refactored into |me and |va
  • |is refactored into |le
  • top-level structures are modernized, grouped, and documented
  • new-style error notifications are added, in addition to the current style

The new engines -- namespaced in |part -- are inspired to varying degrees by the unreleased, incomplete neo/arvo (deleted in 7d4b35c). These changes were made to support the error-notification implementation, and generally improve the quality of arvo's internals. No changes have been made to arvo's external interface or persistent structures, so this extensive refactoring is still suitable for OTA release without staging or adaptation.

The new error notifications involve changes to arvo's internal loop, and the interface between arvo and its vanes:

  • $goof is added, a new error-notification structure (similar to $ares from the deep past)
  • %hurl is added, a new kernel action to propagate a $goof along with a %pass or %give
  • (unit goof) is added to the sample of vanes' +call and +take arms

Inside the vanes, new error notifications are "downcast" to the old style (ie, %crud everywhere but %ames, where they become %crud or %hole). The only change in the error-handling behavior of the vanes is in %behn's %drip handling. Errors therein are now propagated to the intended recipients (arriving in their +take arm, where they'll be merely printed).

Finally, the worker process is updated to send new error-notification events, including both the $goof (ie, bail mote and stack trace) and the original event.

This PR represents an incomplete but viable snapshot of the error-handling work. Additional changes are needed in the vanes (to handle more error notifications, or more fully) and the runtime, specifically the IPC protocol and I/O drivers (to precisely handle errors in error notifications). Additional improvements to arvo also follow from this work, most notably around upgrade. All such changes will be more disruptive than these, and harder to handle without strict versioning coordination between arvo and the runtime. These changes are a foundation upon which incrementally better error-handling can be built for the live network, while larger efforts continue in the background.

Calling the error-notification structure $goof is somewhat ... well, goofy. Some other candidates include fail, ruin, crud, flaw, lack, and miss. Feedback is requested.

The changes to arvo and the vanes must be released together, but no intermediate staging is needed, and the runtime changes need not be correlated. Since almost every commit in this PR merits a new pill, but none individually require one, I've departed from the recommended approach and saved the pill update for the end.

@joemfb joemfb requested review from belisarius222 and ixv February 27, 2020 04:36
@joemfb
Copy link
Collaborator Author

joemfb commented Feb 28, 2020

@belisarius222, %init wouldn't work right now, as it happens in the "legacy boot" event, before the initial userspace commit. I don't know %ford's internals too well, maybe there's an easy way to check for the first %reef build. Hardcoding cases 1/2 might be fine. But the right way to do the pit short-circuit is for arvo to persist the source it's running, so %ford can scry and compare. I aim to add that minimal arvo filesystem soon(tm).

@belisarius222
Copy link
Collaborator

belisarius222 commented Feb 28, 2020 via email

@joemfb
Copy link
Collaborator Author

joemfb commented Feb 29, 2020

@belisarius222, I don't want to churn interfaces for short-term workarounds. And I don't want to add state adapters here, or disrupt the outer layers of arvo.

I've added back the pit-shortcircuit for %home and %base (which does get built first, you were right). This makes boot fast again. The only risk is making a bad pill (kernel source mismatch), which is already a risk. It should just be avoided. This seems fine for now, IMO, unless you'd prefer to prime the cache to the same effect. I just don't want to be blocked on this problem anymore.

@belisarius222
Copy link
Collaborator

Ok, no this looks fine. Glad it worked.

Copy link
Collaborator

@belisarius222 belisarius222 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jtobin jtobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joemfb The kernel appears to reload fine when applying this (and ford-no-pit) on top of the latest release as a sanity check, but I do observe a single [%poke %bad-wire /] immediately afterwards.

I ran a |reboot after that and got a bunch of find.goof and find.wite errors and such. I made a trivial change to Arvo and recommitted to force it to reload, again observing [%poke %bad-wire /], and |reboot again afterwards produces the same find.goof and find.wite errors.

Anything to be concerned about here?

@joemfb
Copy link
Collaborator Author

joemfb commented Mar 2, 2020

@jtobin, thanks for checking. Was this a new fake-ship from the arvo tagged release?

I've pulled the %ford changes into #2384, and will reintegrate here once that's done. Pushing these changes OTA should work (and is why this milestone has been PR'd), but it still needs to be tested. As for [%poke %bad-wire ...] certain implicit constraints have been made explicit, and it looks like we're violating them in some cases. IMO, the best thing would be to track those cases down and either fix or special-case (ie, adapt) them.

@joemfb
Copy link
Collaborator Author

joemfb commented Mar 3, 2020

@jtobin, the [%poke %bad-wire ...] error was caused by the %vega reset notification using an invalid wire. I've fixed this, and added code to dynamically fix-up the %vega that will come from the old kernel on |reset. The find errors on |reload are most likely due to the kernel |reset failing -- |reload doesn't recompile the kernel, so it was attempting to load new vanes into an old kernel.

That being said, I'm not quite ready for this to be merged. I'd like to redo the master merge once #2384 is in, and clean things up generally. And I want to do a review myself.

Also worth noting, this does not need to be in or block the os1 release. Due to the issues we've been having with OTA's, it might be best to hold it until after os1. But we can discuss that separately.

@joemfb
Copy link
Collaborator Author

joemfb commented Mar 4, 2020

Ok, this is cleaned up and ready on my end. @belisarius222, please take a final look.

I've tested committing these changes to a fake-zod booted from the v0.10 branch. When #2247 is included, the OTA fails in %clay and %gall (%load-lost), due to type-of-type changes (The old type nests in the new, but the need to mutually nest for vanes that store vases.) When #2247 is excluded the upgrade applies cleanly.

@jtobin, I haven't tried merging this and then cherry-picking the merge commit; I'm not sure if that's sufficient to exclude the changes in question.

Copy link
Collaborator

@belisarius222 belisarius222 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recent changes all LGTM.

@jtobin
Copy link
Contributor

jtobin commented Mar 5, 2020 via email

joemfb added 2 commits March 5, 2020 11:56
* master: (484 commits)
  king: Slight CLI cleanup and fix test build.
  king: Add command-line flags to configure HTTP and HTTPS ports.
  groups: reduce metadata updates, removal
  chat: reducer handles metadata removal
  groups: exclude group metadata from channels list
  groups: set and surface group name metadata
  groups: remove dummy 'share' flow, 'default' group
  contacts: rename, migrate '~contacts' to '~groups'
  sh/release: rename vere release tarballs
  vere: patch version bump (v0.10.3 -> v0.10.4.rc1) [ci skip]
  pills: updated brass and solid
  chat: pull room contacts from associated group
  chat: spell 'permanent' correctly
  eyre: remove padding from 'access' input
  chat: only delete metadata for a chat if you created it
  chat: settings inputs add borders on focus
  vere: disables gc on |mass in the daemon process
  chat: remove console.log from metadataAction
  chat: style fixes during review, use metadata-hook
  chat: edit description, color settings
  ...
@joemfb
Copy link
Collaborator Author

joemfb commented Mar 5, 2020

I did one last re-merge (there was a legitimate, minor conflict in %ford as of os1-rc).

@jtobin
Copy link
Contributor

jtobin commented Mar 6, 2020

The cherry-picked merge doesn't include the %spot hint changes, but it does include the %ford changes that can't yet go out OTA (per #2333 -- presumably this popped up in the conflict you encountered).

I think it's ok to merge this, as I don't plan to release any more non-surgical-hotfix updates prior to the OS1 release. This can probably just go out with OS1 proper.

@jtobin
Copy link
Contributor

jtobin commented Mar 6, 2020

Alternatively, I can probably resolve the %ford conflict that occurs in the cherry-pick to take just the change relevant to this PR, and see if I can get it into the last release prior to OS1.

I'll give it a quick test just to check -- if anything looks dodgy I'll just hold off.

@jtobin
Copy link
Contributor

jtobin commented Mar 6, 2020

I'll give it a quick test just to check -- if anything looks dodgy I'll just hold off.

This appears fine. Will merge and push a new Arvo release candidate.

@jtobin jtobin merged commit 6ccc843 into master Mar 6, 2020
@jtobin jtobin deleted the arvo-errors branch March 6, 2020 12:08
jtobin pushed a commit that referenced this pull request Mar 6, 2020
* origin/arvo-errors: (35 commits)
  pill: all
  vane: jet-hints all vanes for profiling
  arvo: refines crash printfs
  arvo: fix wire (and adapt old) for %vega reset notification
  arvo: removes all vase literals from |va
  arvo: removes all traces of meta-meta card reduction
  arvo: cleanup per review
  arvo: removes vestigial |is core
  arvo: remove refactoring comments
  arvo: replace $milt with $meta
  arvo: replace $mill with $maze
  worker: sends new error-notification events
  arvo: removes %gave, generalizes %hurl
  vane: prints error notifications where not handled
  behn: forward %drip error notifications, refactor %crud handling
  ames: downcast %hear error notification to %hole
  vane: downcast all error notifications to %crud
  arvo: removes (commented out) legacy event routing
  test: updates vane calling convention
  dill: "downcast" +call error notification to %crud
  ...

Signed-off-by: Jared Tobin <[email protected]>
(cherry picked from commit 6ccc843)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants