-
Notifications
You must be signed in to change notification settings - Fork 396
Fuse emitting and printing of trees in the backend #4917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
linker/shared/src/main/scala/org/scalajs/linker/backend/javascript/Printers.scala
Outdated
Show resolved
Hide resolved
Performance measurementsSorry for the swapped colors between inc and batch. Incremental (including warmup run)FullCropped (same data)BatchSupplemental infoScript to calculate "Full Backend" timing. library(readr)
library(ggplot2)
library(dplyr)
d_full <- read_csv("logger-timings-batch.csv", col_names = c("variant", "op", "t_ns"), col_types = "ffn")
d_backend <- bind_cols(
d_full %>% filter(op == "Emitter") %>% select(variant, t_ns),
d_full %>% filter(op == "BasicBackend: Write result") %>% select(t_ns) %>% rename(t_ns2 = t_ns),
) %>% mutate(op = "Full Backend", t_ns = t_ns + t_ns2, t_ns2 = NULL)
d <- bind_rows(
d_full %>% filter(grepl('Emitter|BasicBackend', op)),
d_backend,
)
ggplot(d, aes(x = op, color = variant, y = t_ns)) + geom_boxplot() |
I've had a look at this with YourKit. Findings:
Thoughts:
|
I don't think I'm comfortable proceeding with this given the potential negative impacts depending on the codebase. Options I see:
|
FWIW: I'm going to try and benchmark the second option :) |
That is probably viable, yes. |
Numbers look comparable (incremental): |
I have made some progress here. @sjrd there are two points I'd like your input on:
TODOs for me:
|
Updated benchmarking results (incremental only): Cropped: |
Note to self: The adjustments to the JS printer are incorrect: There are semicolons after flow-control statements (e.g. if-else chains). |
268775c
to
bea04d2
Compare
The CI failure reveals a problem with the |
Fixing the show (and running scripted locally) reveals another problem: We cannot cache GCC trees that easily. It seems they keep a reference to their parents (and siblings). So caching them potentially leaks big amounts of memory, but also doesn't actually let us re-use the nodes. Maybe the best strategy for now is to not cache AST transforms for GCC for now? |
That's probably fine. When we run GCC we're not incremental anyway because it isn't. One level of transformation is not going to change much, especially if we have to recreate new |
0885927
to
b255236
Compare
I have added a commit (to be squashed) to not cache GCC trees. TODO: Write a unit test for the GCC backend to link twice consecutively (we should not only have caught this with a scripted test). |
Ensures that we catch issues like this: scala-js#4917 (comment)
PR for test is here: #4924. |
Ensures that we catch issues like this: scala-js#4917 (comment)
(_tree, false) | ||
// Input has not changed and we were not invalidated. | ||
// --> nothing has changed (we recompute to save memory). | ||
(compute, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems weird. Now, both sides of the if
call compute
, and they do not use its result other than to return it. Shouldn't we return only the Boolean
then, and let the caller do the compute
themselves? That would avoid the tuple and more importantly the extractChangedAndWithGlobals
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, yes. You have discovered what I have attempted to brush under the rug 😅 If I extract compute
after calling trackChanged
, org.scalajs.linker.BasicLinkerBackendTest.noInvalidatedModuleInSecondRun
fails.
I'm still at the stage of debugging where I think "this cannot happen". So all-in-all, I guess this clearly warrants further investigation :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the issue: The alternative code was using a short-circuiting or assignment (||=
) so the RHS was not always executed. Putting it into a val
first fixed the problem. (updated).
This allows us to use the Emitter's powerful caching mechanism to directly cache printed trees (as byte buffers) and not cache JavaScript trees anymore at all. This reduces in-between run memory usage on the test suite from 1.12 GB (not GiB) to 1.00 GB on my machine (roughly 10%). Runtime performance (both batch and incremental) is unaffected. It is worth pointing out, that due to how the Emitter caches trees, classes that end up being ES6 classes is performed will be held twice in memory (once the individual methods, once the entire class). On the test suite, this is the case for 710 cases out of 6538.
In the next commit, we want to avoid caching entire classes because of the memory cost. However, the BasicLinkerBackend relies on the identity of the generated trees to detect changes: Since that identity will change if we stop caching them, we need to provide an explicit "changed" signal.
This reduces some memory overhead for negligible performance cost. Residual (post link memory) benchmarks for the test suite: Baseline: 1.13 GB, new 1.01 GB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Looks good! Thanks!
This allows us to use the Emitter's powerful caching mechanism to
directly cache printed trees (as byte buffers) and not cache
JavaScript trees anymore at all.
This reduces in-between run memory usage on the test suite from
1.13 GB (not GiB) to 1.01 GB on my machine (roughly 10%).
Runtime performance (both batch and incremental) is unaffected.
It is worth pointing out, that in order to avoid duplicate caching, we do not cache full class trees anymore.