-
Notifications
You must be signed in to change notification settings - Fork 362
hoon: wip, make parsers operate on cords #5456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next/kelvin/408
Are you sure you want to change the base?
Conversation
Unjetted version of this seems slower than unjetted version of tape parsers, and naively changing the jets breaks very functionality.
|
Though also, when I did some small-scale testing of that, having to chase the tape pointers got swamped by having to chase the formula pointers for tree-walking the combinators themselves, to no visible improvement. (Even when |
|
This is a worthy experiment -- I'm surprised at how small this diff is! @ohAitch is right, you'll want to keep a cursor into the input cord and Better allocation patterns should result in significant performance improvements, but I don't know how it will compare to the current tape handling (especially when those tapes have decent locality due to inner-road slab allocation). There will be significantly more function calls ... |
|
Yeah I wouldn't be too surprised if this was still straight-up slower unjetted due to the extra tuples and function calls; though once you get to multiple kb of input text probably locality or no the 24x memory overhead of tapes starts to sting. (Certainly |
|
For a perhaps more direct comparison(and to keep diff size manageable), you could write (You might want to cache the |
|
Closing this one since it's an old RFC and there's no plan to work on it or merge it for now. |
|
In lieu of a better place to put and discuss "draft RFC with working code", I want to keep this kind of thing open. |
Manual "theirs" strategy.
Changes the parsers' "continuation" from a tape representinig the remainder of the parse input, to the original parse input paired with a pointer into it. Includes a rewritten +inde (and deduplicates +iny by pointing to it explicitly), because the original implementation was difficult to make work in this new context, and was somewhat questionable anyway. (Degenerate cases would reparse the entire rest of the input...) Note that for converting inputs from tapes, we must be cautious around two things: - +crip would drop null characters, so we must +rep instead. - Trailing null characters cannot be represented by the cord itself so must instead be represented by the .m in the parser input. Therefore, when converting from tape, measure the tape length first, instead of measuring the resulting atom. Despite changes, /lib/der gets broken here. To be fixed at a later date if necessary. Must come paired with updated jets. From initial measurements, performance with these changes remains mostly unchanged, so the juice might not be worth the squeeze here.
urbit/urbit#5456 changes the parsers to operate on cords instead of tapes. This updates the parser jets to match those changes, retaining existing jets for older hoons per the same pattern as #918.
|
Four years later, time to take another stab at this! Changes the parsers' "continuation" from a tape representing the remainder of the parse input, to the original parse input paired with a pointer into it. We Includes a rewritten Note that for converting inputs from tapes, we must be cautious around two things:
Despite best-effort changes, Updated jets are on vere's I took some measurements, and suspect the juice may not be worth the squeeze. Using I wasn't sure how to measure the memory situation accurately here. I put a In stock hoon, the solid heap grows by about 21 MB after parsing hoon.hoon. So it seems this implementation is worse than our baseline, memory-wise. Maybe there is (again!) something stupid I've overlooked. It's certainly been too long since I wrote serious vere code, maybe I'm neglecting to refcount properly in the updated jets. Once again: RFC! |
This has been mentioned in passing in the past, so figured I'd take a stab at it for laughs. Initial results are... not very promising.
Taking out the jets and testing this using the
/tests/sys/zuse/htmlde-xml and de-json tests, the old version is ~20% faster. Testing against(ream .^(@t %cx %/sys/zuse/hoon)though, the old version is ~24 times faster than the parsers in this pr. That feels like way too big a difference, I have to wonder if I just did something very wrong here.Having trouble comparing the jetted version, because when I try to start up vere with these changes I hit an atom assertion, something is trying to parse
"$:type"using the jet (even with jet hints still commented out in hoon.hoon), but I'm not sure where that's being parsed (during thelite: arvo formulaphase).Could be interesting to compare the memory characteristics here as well, wish we had
%duelas a memory-oriented corollary to%bout.Opening this draft pr as an rfc. Pile on!