hoon: wip, make parsers operate on cords #5456

Fang- · 2021-11-25T21:54:45Z

This has been mentioned in passing in the past, so figured I'd take a stab at it for laughs. Initial results are... not very promising.

Taking out the jets and testing this using the /tests/sys/zuse/html de-xml and de-json tests, the old version is ~20% faster. Testing against (ream .^(@t %cx %/sys/zuse/hoon) though, the old version is ~24 times faster than the parsers in this pr. That feels like way too big a difference, I have to wonder if I just did something very wrong here.

Having trouble comparing the jetted version, because when I try to start up vere with these changes I hit an atom assertion, something is trying to parse "$:type" using the jet (even with jet hints still commented out in hoon.hoon), but I'm not sure where that's being parsed (during the lite: arvo formula phase).

Could be interesting to compare the memory characteristics here as well, wish we had %duel as a memory-oriented corollary to %bout.

Opening this draft pr as an rfc. Pile on!

Unjetted version of this seems slower than unjetted version of tape parsers, and naively changing the jets breaks very functionality.

ohAitch · 2021-11-25T22:04:52Z

(rsh 3 q.tub) reallocates the whole rest of the string, which will indeed be worse than cdr on a tape. You want the input to be like, an offset-and-cord "slice", not a cord alone.

Though also, when I did some small-scale testing of that, having to chase the tape pointers got swamped by having to chase the formula pointers for tree-walking the combinators themselves, to no visible improvement. (Even when q.tub was like, a dummy number denoting "n unspecified characters"). Not sure if the bytecode interpreter is clever enough to solve that at this point.

joemfb · 2021-11-27T16:05:38Z

This is a worthy experiment -- I'm surprised at how small this diff is!

@ohAitch is right, you'll want to keep a cursor into the input cord and +cut bytes out of it to avoid reallocating every intermediate stage (this will probably make the diff quite a bit bigger). See #5224 for this same transformation being applied to the naive rollups.

Better allocation patterns should result in significant performance improvements, but I don't know how it will compare to the current tape handling (especially when those tapes have decent locality due to inner-road slab allocation). There will be significantly more function calls ...

ohAitch · 2021-11-27T23:41:46Z

Yeah I wouldn't be too surprised if this was still straight-up slower unjetted due to the extra tuples and function calls; though once you get to multiple kb of input text probably locality or no the 24x memory overhead of tapes starts to sting. (Certainly tripping hoon.hoon will flush everything else out of multiple layers of cache, though that's an extreme case and in practice probably most bytes parsed are from inputs <200 characters.)
In a jet ofc, the cut is directly grabbing a byte from an array.

ohAitch · 2021-11-28T00:10:31Z

For a perhaps more direct comparison(and to keep diff size manageable), you could write

++  tx  ta  ::/tc
++  ta
  |%
  ++  type  tape
  ++  take  |=(a=cord ^-(type (trip a)))
  ++  done  |=(a=type ^-(? ?=(~ a)))
  ++  look  |=(a=type ^-(char ?~(a '' i.a)))
  ++  next  |=(a=type ^-(type t.+.a))  :: unchecked
  --
++  tc
  |%
  ++  type  (pair @u cord)
  ++  take  |=(a=cord ^-(type [0 a])))
  ++  done  |=(a=type ^-(? =(p.a (met 3 q.a))))
  ++  look  |=(a=type ^-(char (cut 3 [p.a 1] q.a)))
  ++  next  |=(a=type ^-(type a(p +(p.a))))  :: unchecked
  --
::
+$  nail  [p=hair q=type:tx]  :: etc

(You might want to cache the met / support slices that also omit part of the end of the cord, though for final jetted performance that's presumably worse bc it's just a slot lookup)

zalberico · 2022-10-25T19:27:15Z

Closing this one since it's an old RFC and there's no plan to work on it or merge it for now.

joemfb · 2022-10-27T22:29:15Z

In lieu of a better place to put and discuss "draft RFC with working code", I want to keep this kind of thing open.

Manual "theirs" strategy.

Changes the parsers' "continuation" from a tape representinig the remainder of the parse input, to the original parse input paired with a pointer into it. Includes a rewritten +inde (and deduplicates +iny by pointing to it explicitly), because the original implementation was difficult to make work in this new context, and was somewhat questionable anyway. (Degenerate cases would reparse the entire rest of the input...) Note that for converting inputs from tapes, we must be cautious around two things: - +crip would drop null characters, so we must +rep instead. - Trailing null characters cannot be represented by the cord itself so must instead be represented by the .m in the parser input. Therefore, when converting from tape, measure the tape length first, instead of measuring the resulting atom. Despite changes, /lib/der gets broken here. To be fixed at a later date if necessary. Must come paired with updated jets. From initial measurements, performance with these changes remains mostly unchanged, so the juice might not be worth the squeeze here.

urbit/urbit#5456 changes the parsers to operate on cords instead of tapes. This updates the parser jets to match those changes, retaining existing jets for older hoons per the same pattern as #918.

Fang- · 2025-12-15T23:06:37Z

Four years later, time to take another stab at this!

Changes the parsers' "continuation" from a tape representing the remainder of the parse input, to the original parse input paired with a pointer into it. We +cut bytes out of this input to process them.

Includes a rewritten +inde (and deduplicates +iny by pointing to it explicitly), because the original implementation was difficult to make work in this new context, and was somewhat questionable anyway. (Degenerate cases would reparse the entire rest of the input...)

Note that for converting inputs from tapes, we must be cautious around two things:

+crip would drop null characters, so we must +rep instead.
Trailing null characters cannot be represented by the cord itself so must instead be represented by the .m in the parser input. Therefore, when converting from tape, measure the tape length first, instead of measuring the resulting atom. (It's not common by any means, but without this the wasm tests break.)

Despite best-effort changes, /lib/der gets broken here. I spent some time on it and thought I had it right, tests did pass! But for some reason now they don't anymore. Probably stupid, but I'll punt on it for now.

Updated jets are on vere's m/cord-parsers.

I took some measurements, and suspect the juice may not be worth the squeeze.

Using %bout, the time it takes to parse hoon.hoon gets reduced by at best 2%. Really, the difference is negligible and can easily be chalked up to other factors.

I wasn't sure how to measure the memory situation accurately here. I put a %meme hint before and after a +ream call. Only the "solid heap" displays any significant change.

In stock hoon, the solid heap grows by about 21 MB after parsing hoon.hoon.
With these changes, the heap grows by about 29 MB instead.

So it seems this implementation is worse than our baseline, memory-wise.

Maybe there is (again!) something stupid I've overlooked. It's certainly been too long since I wrote serious vere code, maybe I'm neglecting to refcount properly in the updated jets. Once again: RFC!

hoon: wip, make parsers operate on cords

486d82a

Unjetted version of this seems slower than unjetted version of tape parsers, and naively changing the jets breaks very functionality.

Fang- added the rfc label Nov 25, 2021

vere: forgotten changes

e56ab1b

ashelkovnykov mentioned this pull request Mar 16, 2022

vere, zuse: JSON Parsing/Serialization Jets #5566

Closed

zalberico closed this Oct 25, 2022

joemfb reopened this Oct 27, 2022

ashelkovnykov mentioned this pull request Apr 11, 2023

zuse: changes in preparation for JSON jets #6463

Merged

Fang- changed the base branch from master to next/kelvin/408 December 11, 2025 22:41

Fang- added 2 commits December 11, 2025 23:46

Merge branch 'next/kelvin/408' into m/cord-parsers

c8dce95

Manual "theirs" strategy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hoon: wip, make parsers operate on cords #5456

hoon: wip, make parsers operate on cords #5456

Uh oh!

Fang- commented Nov 25, 2021

Uh oh!

ohAitch commented Nov 25, 2021 •

edited

Loading

Uh oh!

joemfb commented Nov 27, 2021 •

edited

Loading

Uh oh!

ohAitch commented Nov 27, 2021 •

edited

Loading

Uh oh!

ohAitch commented Nov 28, 2021

Uh oh!

zalberico commented Oct 25, 2022 •

edited

Loading

Uh oh!

joemfb commented Oct 27, 2022

Uh oh!

Fang- commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hoon: wip, make parsers operate on cords #5456

Are you sure you want to change the base?

hoon: wip, make parsers operate on cords #5456

Uh oh!

Conversation

Fang- commented Nov 25, 2021

Uh oh!

ohAitch commented Nov 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joemfb commented Nov 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ohAitch commented Nov 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ohAitch commented Nov 28, 2021

Uh oh!

zalberico commented Oct 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joemfb commented Oct 27, 2022

Uh oh!

Fang- commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ohAitch commented Nov 25, 2021 •

edited

Loading

joemfb commented Nov 27, 2021 •

edited

Loading

ohAitch commented Nov 27, 2021 •

edited

Loading

zalberico commented Oct 25, 2022 •

edited

Loading