-
Notifications
You must be signed in to change notification settings - Fork 396
Incremental linker #1626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here's another datapoint for reference:
That's a one-character change to a string constant on a MacBook Pro 2.7ghz quad core. Suffice it to say I am fully supportive of incremental reachability analysis! |
Any news on this? Running PhantomJS tests on my i5, SSD machine on my small ~1000 LOC codebase currently clocks in at 15 seconds. In my experience fastOptJs just takes too long to run. |
No news, I'm afraid. This is currently not a priority, as it is a non-functional requirement. We are focusing on functional requirements until 1.0.0 is released. |
@vjovanov this is the one you meant, right? |
Yes, that is the one. It can serve as a good starting point. |
@jvican this |
I chatted with @vjovanov about this. As a stop-gap solution, we could do the following: If only method bodies change, do not process removals (but process additions, which already are incremental). This will leave things dangling that are not reachable anymore after the method body changes (and essentially emit more code than necessary). But maybe that is an acceptable trade-off (maybe under a flag). |
It might be acceptable in a number of situations, yes. But since it won't produce the same code after a |
During development, dangling extra code is perfectly acceptable. Could the count of removal be tracked? If so, the flag could be in the form of a threshold for full linking; once those many removals have occurred the linker could automatically perform full-linking and reset the counter. This will prevent unbounded growth of the output. |
FYI, I have started working on a skeleton for this. I feel like if we formalize |
I've been playing with this a bit, and the solution I have envisioned so far, is essentially using the However, this way of dealing with incremental updates cannot remove cycles properly in a similar way that reference counting cannot. It would not surprise me if we found parallels of our problem in (tracing) garbage collection. Any pointers would be highly appreciated. Further, it also seems that the add-only approach would have to force a full re-link, whenever a method that is reachable (in the incremental state of the analyzer) gets removed. Otherwise, it wouldn't be able to determine whether there is a linking error, or the method just was reachable due to leftover included code. |
@gzm0 Could you defer detecting the cycles to the point where linking error occurs? If there is a linkage error in method X then detect whether method X is in an unreachable cycle. If it is then remove the cycle and resume the incremental linking process (forgetting about the previous spurious error). Detecting whether method X is in an unreachable cycle takes time proportional to the size that cycle (if method X is in such cycle at current moment) or possibly proportional to full link (if it isn't in a dead cycle, but then that linkage error would be legit, i.e. not related to leftover unreachable method). I hope I've understood the topic good enough and my idea makes sense :) |
What you say makes sense, yes. However, it would, by design, cause incremental linking to output something different than if you like from scratch. That would be a very unfortunate disadvantage, which I think we should try very hard to avoid. |
Yes, but discussion above suggests that keeping it under a flag is a reasonable solution for time being. Since incremental linking is waiting several years already, it's probably very desirable to have something without waiting another decade or so. It seems we would need the harder variant, i.e. https://en.wikipedia.org/wiki/Dynamic_connectivity#Fully_dynamic_connectivity . I haven't tried to understand it (that would take some time, but I have other priorities now), but after skimming here are my thoughts:
Tracing GC has different requirements. The basic principle of a tracing GC is that precise search for garbage is very costly, so such GCs use many heuristics that remove a lot (but not all) garbage fast (i.e. they amortize the cost of doing heap scan, by waiting for a lot of garbage to accumulate and then, most of the time, removing only the garbage that is easy to remove). Only if there are repeated memory exhaustions, then a precise full GC cycle is used. From what I've read about G1GC there are young GC cycles, mixed (young + part of old) GC cycles, full GC cycles and last-ditch full GC cycles. The difference between ordinary full GC and last-ditch GC is that the latter tries to be precise and find all garbage, so e.g. it forgoes imprecise optimizations (including parallelizations) in some subtasks. There is at least one thing that high performance tracing GCs use that we could use too, i.e. parallel marking phase. See e.g. https://www.oracle.com/technical-resources/articles/java/g1gc.html
IIRC (but I remember that vaguely now) the G1GC uses work stealing to distribute reachability marking workload among threads. We would of course want to use all threads for that part as it wouldn't be a concurrent part in our case, i.e. we wouldn't need to worry about stealing CPU from application threads. Do other compilers (for whatever programming languages) have fully incremental linking, i.e. with fully incremental reachability analysis too? |
As far as I know, no. Scala is actually a bit of a pioneer in terms of incremental compilation pipelines. (I guess it had to, because recompiling everything is too slow.) |
Well, then perhaps it's worth mentioning a popular motto "done is better than perfect" :) Achieving perfect solution in this case seems unfeasible and long linking times are probably discouraging people from building big projects with Scala.js. What are the problems with leftover unreachable methods after a hypothetical imprecise fastLink?
|
Are long linking times still actually an issue in 1.13.1? Although not incremental, base linking performance has significantly improved. It's back to being linear with a pretty low constant factor. It takes less than 500 ms for the two rounds of linking (combined) on the Scala.js test suite, which emits a 27 MB .js file. Are there applications that are so much bigger than that that this time becomes discouraging? |
I agree with @sjrd's sentiment here. At this point, I have the feeling that anything incremental that is correct (i.e. produces code with correct semantics or fails, but maybe generates too much code) would be so complex, that we might be better off investing in batch performance improvements. I could also take a stab at parallelizing the linker: It is already fully asynchronous (due to file loading that also needs to work on JS). However, ATM, we execute all calculations on a single thread. So making it parallel will mainly mean adding locks and benchmarking to see if the lock contention is faster or slower than queuing tasks ad-hoc. |
I had the idea of parallelizing a few days ago. I think it should not be hard to "lock" on a per-class basis. Each class could have its own queue of jobs that concern it. Those jobs have to be executed sequentially, but jobs associated with separate classes can execute in parallel. The main dispatch method is |
I've given parallelizing a closer lock and IMO there are the following things that are non-trivial when attempting to simply using
It is worth pointing out that (1) isn't solved by having a per class queue. However, since public method creation only depends on the ancestry chain, it should be possible to have a global order for locking to avoid deadlocks (I need to verify this). We can observe that (2) and (3) can be solved with a per class queue, but do not need to: Both of them have natural de-synchronization points: For (2), once we have updated the log and retrieved the targeted subclasses, we need not run under synchronization anymore (after this line) For (3), we simply run Since we already have early abort branches if something has already been reached (we have to, to avoid infinite loops), we have a very natural point to designate the thread that will actually perform the downstream reachability. @sjrd does that make sense? Did I miss anything? This might mean we get away without any task queues altogether (and maybe even just by using plain old boring Futures). |
It seems to make sense, though I'm on mobile this weekend so I don't have a clear view of everything. What I hoped to achieve with per-class queues was to avoid as much synchronization as possible. If work queues are enough, it's possible to do everything using lockless strategies. I wonder to what extent Or perhaps I'm just overly optimistic about what lockless has to offer. |
I've managed to make class loading thread safe and parallel: https://github.com/gzm0/scala-js/tree/class-loader (almost lock free). I've been looking at how call resolution works in Analyzer, and I'm concerned that we have two points of non-determinism (even now): Method lookup does not ignore missing methods.In the following setup: class A
class B extends A
(???: A).foo() // missing method
(???: B).foo() // missing method Depending on which order we resolve the calls in the last two lines, we'll report the method Default bridge creation re-uses bridges from the parentTake the following: trait Foo {
def x: Int = 1
}
class A extends Foo
class B extends A
(???: A).x
(???: B).x If we resolve However, if we resolve IMO this second one is more severe, since it actually has the potential to change the emitted code. I suspect a proper solution to #2520 would fix this. |
Discovered as a simplification while working on scala-js#1626.
Discovered as a simplification while working on scala-js#1626. This also exposes a latent bug in the class loading sequence: the Analyzer is built with the assumption that pending tasks never drop to zero until the whole analysis is completed. However, our loading sequence violated this assumption: since the constructor of LoadingClass scheduled info loading immediately, it was possible for info loading to complete before linking is requested on the class. If this condition happens on the initial calling thread (which itself is not tracked as a "task"), the pending task count would drop to zero.
Discovered as a simplification while working on scala-js#1626.
Discovered while working on scala-js#1626.
Discovered as a simplification while working on scala-js#1626.
Discovered while working on scala-js#1626.
FYI, I managed to get these timings on a local branch with a parallel linker:
I hope I can get it ready for review this weekend. |
I propose we close this in 1.14.0. Although we do not have an incremental linker, in 1.13.1 and 1.14.0 we managed to make significant performance improvements to the linker. Also, after quite some investigation, I'm fairly convinced that building a proper incremental linker is simply infeasible (due to cycle detection). While an leaky incremental linker seems feasible, I'm not convinced that the required effort would be worth the outcome: It seems we get much bigger bang for the buck by investing in overall performance improvements (the main reason for this sentiment is likely the cost of test coverage). Opinions? |
I agree. We should be in a pretty good place performance-wise, now. |
Is it feasible to add special-case compiler (and linker) support for very quick processing of changes mentioned in initial posts, i.e. changes to literals and source code formatting?
Such mechanism would allow for some impressive demos (probably important to get a good opinion from newcomers and commentators), but otherwise wouldn't be universally useful. |
This should be possible, yes. Essentially, what we could do (in Scala.js linker internal parlens), re-use the previous Analysis if none of the Infos changed. However, I'm not sure we can do this without incurring additional cost on linking, so we need to weight this trade-off carefully. If you feel we should investigate this, may I suggest you open another issue to discuss this (IMHO the scope is quite different)? |
ok, I've made a separate issue: #4907 |
I'm closing this. Follow-up regarding the processing of non-info changing changes in #4907. |
Assigning this to 1.14.0, where we made the linker parallel. |
A while back on gitter chat, we were talking about the linker being slow.
The text was updated successfully, but these errors were encountered: