Incremental linker #1626

japgolly · 2015-04-30T05:17:58Z

A while back on gitter chat, we were talking about the linker being slow.

ramnivas April 1 2015  
@sjrd as an additinal data point to @matthughes, here is output from my fastOptJS run
 https://gist.github.com/ramnivas/723b816bef73a4592a7f
The change I made was a trivial one ("foo" -> "fooo")

sjrd April 1 2015  
Seems you would also benefit from an incremental linker.

ramnivas April 1 2015  
That will be awesome!

japgolly April 1 2015  
Me too  This is my output from after just adding a newline to a file:
[debug] Linker: Read info: 21185 us
[debug] Linker: Compute reachability: 381551 us
[debug] Linker: Assemble LinkedClasses: 1576527 us
[debug] Linker: cache stats: reused: 13811 -- invalidated: 6 -- trees read: 6
[debug] Linker: 1983960 us
[debug] Inc. optimizer: Batch mode: false
[debug] Inc. optimizer: Incremental part: 18653 us
[debug] Inc. optimizer: Optimizing 6 methods.
[debug] Inc. optimizer: Optimizer part: 1425 us
[debug] Inc. optimizer: 34441 us
[debug] Refiner: Compute reachability: 289414 us
[debug] Refiner: Assemble LinkedClasses: 18914 us
[debug] Refiner: 308436 us
[debug] Emitter: Class tree cache stats: reused: 3125 -- invalidated: 5
[debug] Emitter: Method tree cache stats: resued: 14569 -- invalidated: 6
[debug] Emitter (write output): 832415 us
@ramnivas Your times are massive! How large is your codebase mate?

ramnivas April 1 2015  
@japgolly Only about 100 *.scala files (spread over 6 subprojects)
This is for the web part only
@japgolly How large is your codebase, mate?

japgolly April 1 2015  
Interesting. My SJS project is 131 files and 9762 LoC.
(Plus a bunch of lib deps like scalaz, scalajs-react, monocle, etc etc)
Your linker takes 80% more time than mine.
What machine are you running on?

ramnivas April 1 2015  
LoC for my project is at 3820
Macbook Pro late model 2.5GHz i7, 16 MB RAM, SSD

japgolly April 1 2015  
My env = Arch Linux, i7-3770 CPU @ 3.40GHz (8 core), 16gb ram, SSDs
Cool, machine difference probably explains why your's takes longer

ramnivas April 1 2015  
Right

japgolly April 1 2015  
It will be awesome if SJS gets an incremental linker. Saving 2, 3 seconds makes a big difference when you change something small and just want to refresh the browser.

ramnivas April 1 2015  
I definitely noticed that machine difference is a significant factor. On another i5 machine, it takes about twice as much
Right. And I like many have tendency to make many many small changes, so it quickly adds up.
More important than elapsed time is breaking of the change code-see the effect rhythm

The text was updated successfully, but these errors were encountered:

easel · 2016-05-24T14:11:14Z

Here's another datapoint for reference:

[debug] Linker: Compute reachability: 724510 us
[debug] Linker: Assemble LinkedClasses: 85611 us
[debug] Basic Linking: 821394 us
[debug] Inc. optimizer: Batch mode: false
[debug] Inc. optimizer: Incremental part: 12683 us
[debug] Inc. optimizer: Optimizing 1 methods.
[debug] Inc. optimizer: Optimizer part: 9294 us
[debug] Inc. optimizer: 35224 us
[debug] Refiner: Compute reachability: 556680 us
[debug] Refiner: Assemble LinkedClasses: 26472 us
[debug] Refiner: 583272 us
[debug] Emitter: Class tree cache stats: reused: 1193 -- invalidated: 1629
[debug] Emitter: Method tree cache stats: resued: 14685 -- invalidated: 1
[debug] Emitter (write output): 318075 us
[debug] Global IR cache stats: reused: 2922 -- invalidated: 10 -- trees read: 9

That's a one-character change to a string constant on a MacBook Pro 2.7ghz quad core. Suffice it to say I am fully supportive of incremental reachability analysis!

gzm0 · 2016-05-24T14:23:23Z

Related: https://en.wikipedia.org/wiki/Dynamic_connectivity

zalbia · 2017-02-08T16:24:32Z

Any news on this? Running PhantomJS tests on my i5, SSD machine on my small ~1000 LOC codebase currently clocks in at 15 seconds. In my experience fastOptJs just takes too long to run.

sjrd · 2017-02-08T16:33:02Z

No news, I'm afraid. This is currently not a priority, as it is a non-functional requirement. We are focusing on functional requirements until 1.0.0 is released.

gzm0 · 2019-05-14T07:49:10Z

Maybe relevant:

https://www.researchgate.net/publication/312823524_Incremental_Points-to_Analysis_for_Java_via_Edit_Propagation

http://www.cse.unsw.edu.au/~jingling/papers/cc13.pdf

gzm0 · 2019-05-14T07:50:05Z

@vjovanov this is the one you meant, right?

vjovanov · 2019-05-14T08:13:07Z

Yes, that is the one. It can serve as a good starting point.

gzm0 · 2019-05-29T14:52:17Z

@jvican this

gzm0 · 2021-04-11T09:00:59Z

I chatted with @vjovanov about this. As a stop-gap solution, we could do the following:

If only method bodies change, do not process removals (but process additions, which already are incremental). This will leave things dangling that are not reachable anymore after the method body changes (and essentially emit more code than necessary). But maybe that is an acceptable trade-off (maybe under a flag).

sjrd · 2021-04-11T16:03:34Z

It might be acceptable in a number of situations, yes. But since it won't produce the same code after a clean, I think it should behind some sort of flag, yes.

ramnivas · 2021-04-11T16:39:57Z

During development, dangling extra code is perfectly acceptable. Could the count of removal be tracked? If so, the flag could be in the form of a threshold for full linking; once those many removals have occurred the linker could automatically perform full-linking and reset the counter. This will prevent unbounded growth of the output.

gzm0 · 2023-02-25T13:19:24Z

FYI, I have started working on a skeleton for this. I feel like if we formalize From information we can get this to work. But I think I just need to try it out.

gzm0 · 2023-05-18T18:58:22Z

I've been playing with this a bit, and the solution I have envisioned so far, is essentially using the From information formally. So if a method doesn't call another anymore, we'd remove the From and if all Froms are gone, the method isn't alive anymore.

However, this way of dealing with incremental updates cannot remove cycles properly in a similar way that reference counting cannot. It would not surprise me if we found parallels of our problem in (tracing) garbage collection. Any pointers would be highly appreciated.

Further, it also seems that the add-only approach would have to force a full re-link, whenever a method that is reachable (in the incremental state of the analyzer) gets removed. Otherwise, it wouldn't be able to determine whether there is a linking error, or the method just was reachable due to leftover included code.

tarsa · 2023-05-22T20:21:09Z

@gzm0 Could you defer detecting the cycles to the point where linking error occurs? If there is a linkage error in method X then detect whether method X is in an unreachable cycle. If it is then remove the cycle and resume the incremental linking process (forgetting about the previous spurious error). Detecting whether method X is in an unreachable cycle takes time proportional to the size that cycle (if method X is in such cycle at current moment) or possibly proportional to full link (if it isn't in a dead cycle, but then that linkage error would be legit, i.e. not related to leftover unreachable method).

I hope I've understood the topic good enough and my idea makes sense :)

sjrd · 2023-05-22T22:39:28Z

What you say makes sense, yes. However, it would, by design, cause incremental linking to output something different than if you like from scratch. That would be a very unfortunate disadvantage, which I think we should try very hard to avoid.

tarsa · 2023-05-23T11:18:30Z

What you say makes sense, yes. However, it would, by design, cause incremental linking to output something different than if you like from scratch. That would be a very unfortunate disadvantage, which I think we should try very hard to avoid.

Yes, but discussion above suggests that keeping it under a flag is a reasonable solution for time being. Since incremental linking is waiting several years already, it's probably very desirable to have something without waiting another decade or so.

Related: https://en.wikipedia.org/wiki/Dynamic_connectivity

It seems we would need the harder variant, i.e. https://en.wikipedia.org/wiki/Dynamic_connectivity#Fully_dynamic_connectivity . I haven't tried to understand it (that would take some time, but I have other priorities now), but after skimming here are my thoughts:

implementation is probably very tricky, so it would be a source of many bugs probably
solution with https://en.wikipedia.org/wiki/Dynamic_connectivity#The_Level_structure requires preprocessing and generating an extra forest of trees. Probably the memory and time complexity of that preparation is not very bad, but still would require much more time than a full link. Would users want to do that to have faster incremental linking after that? Also the update operation has O(n) worst time and only the amortized cost is O(lg n * lg n). I suspect that many early updates (i.e. just after initial build) will have a very high cost until the cost amortization benefits kicks in, so the incremental linking benefits will be moot.
solution with https://en.wikipedia.org/wiki/Dynamic_connectivity#The_Cutset_structure has beter worst case operations complexity (after creating the data structures), but the data structure has size O(n lg n lg n) (assuming that a number of size lg n is treates as 1 word in memory). That's way too much data to prepare (i.e. even more time consuming first incremental linking step) and keep in memory.

It would not surprise me if we found parallels of our problem in (tracing) garbage collection.

Tracing GC has different requirements. The basic principle of a tracing GC is that precise search for garbage is very costly, so such GCs use many heuristics that remove a lot (but not all) garbage fast (i.e. they amortize the cost of doing heap scan, by waiting for a lot of garbage to accumulate and then, most of the time, removing only the garbage that is easy to remove). Only if there are repeated memory exhaustions, then a precise full GC cycle is used. From what I've read about G1GC there are young GC cycles, mixed (young + part of old) GC cycles, full GC cycles and last-ditch full GC cycles. The difference between ordinary full GC and last-ditch GC is that the latter tries to be precise and find all garbage, so e.g. it forgoes imprecise optimizations (including parallelizations) in some subtasks.

There is at least one thing that high performance tracing GCs use that we could use too, i.e. parallel marking phase. See e.g. https://www.oracle.com/technical-resources/articles/java/g1gc.html

-XX:ConcGCThreads=n
Sets the number of parallel marking threads. Sets n to approximately 1/4 of the number of parallel garbage collection threads (ParallelGCThreads).

IIRC (but I remember that vaguely now) the G1GC uses work stealing to distribute reachability marking workload among threads. We would of course want to use all threads for that part as it wouldn't be a concurrent part in our case, i.e. we wouldn't need to worry about stealing CPU from application threads.

Do other compilers (for whatever programming languages) have fully incremental linking, i.e. with fully incremental reachability analysis too?

sjrd · 2023-05-23T11:32:26Z

Do other compilers (for whatever programming languages) have fully incremental linking, i.e. with fully incremental reachability analysis too?

As far as I know, no. Scala is actually a bit of a pioneer in terms of incremental compilation pipelines. (I guess it had to, because recompiling everything is too slow.)

tarsa · 2023-05-23T12:27:00Z

Scala is actually a bit of a pioneer in terms of incremental compilation pipelines. (I guess it had to, because recompiling everything is too slow.)

Well, then perhaps it's worth mentioning a popular motto "done is better than perfect" :) Achieving perfect solution in this case seems unfeasible and long linking times are probably discouraging people from building big projects with Scala.js.

What are the problems with leftover unreachable methods after a hypothetical imprecise fastLink?

not identical output to precise fastLink? Then hide behind a flag and (add good documentation about that on Scala.js webpage) or make third linking mode (named e.g. dirtyFastLink), because there are probably sufficient number of differences between precise and imprecise fastLinks to rename one of them.
bigger output size? Then do precise fastLink if the the output size after a series of imprecise fastLink steps is bigger more than e.g. 20% than the output size of first imprecise fastLink in that series.
wrong behaviour (i.e. different from precise linking)? That would probably only affect reflective access (e.g. structural typing) and dynamic typing (e.g. export to JS), so not the usual Scala code.

sjrd · 2023-05-23T12:41:07Z

Well, then perhaps it's worth mentioning a popular motto "done is better than perfect" :) Achieving perfect solution in this case seems unfeasible and long linking times are probably discouraging people from building big projects with Scala.js.

Are long linking times still actually an issue in 1.13.1? Although not incremental, base linking performance has significantly improved. It's back to being linear with a pretty low constant factor. It takes less than 500 ms for the two rounds of linking (combined) on the Scala.js test suite, which emits a 27 MB .js file. Are there applications that are so much bigger than that that this time becomes discouraging?

gzm0 · 2023-06-04T12:21:03Z

I agree with @sjrd's sentiment here. At this point, I have the feeling that anything incremental that is correct (i.e. produces code with correct semantics or fails, but maybe generates too much code) would be so complex, that we might be better off investing in batch performance improvements.

I could also take a stab at parallelizing the linker: It is already fully asynchronous (due to file loading that also needs to work on JS). However, ATM, we execute all calculations on a single thread.

So making it parallel will mainly mean adding locks and benchmarking to see if the lock contention is faster or slower than queuing tasks ad-hoc.

sjrd · 2023-06-04T14:40:16Z

I had the idea of parallelizing a few days ago. I think it should not be hard to "lock" on a per-class basis. Each class could have its own queue of jobs that concern it. Those jobs have to be executed sequentially, but jobs associated with separate classes can execute in parallel. The main dispatch method is followReachabilityInfo, and it already loops on a per-class basis, sending one job per class. AFAICT, it shouldn't be too difficult to send those jobs to the class-specific queues instead.

gzm0 · 2023-07-15T13:24:34Z

I've given parallelizing a closer lock and IMO there are the following things that are non-trivial when attempting to simply using synchronized:

Method synthesis. Classes modify their own publicMethodInfos for:
- Missing methods
- Reflexive proxies
- Default methods
Dynamic call dispatch resolution (methodsCalledLog vs instantiatedSubclasses)
Avoiding deadlocks in cyclic call graphs

It is worth pointing out that (1) isn't solved by having a per class queue. However, since public method creation only depends on the ancestry chain, it should be possible to have a global order for locking to avoid deadlocks (I need to verify this).

We can observe that (2) and (3) can be solved with a per class queue, but do not need to: Both of them have natural de-synchronization points:

For (2), once we have updated the log and retrieved the targeted subclasses, we need not run under synchronization anymore (after this line)

For (3), we simply run doReach without synchronization (this line).

Since we already have early abort branches if something has already been reached (we have to, to avoid infinite loops), we have a very natural point to designate the thread that will actually perform the downstream reachability.

@sjrd does that make sense? Did I miss anything?

This might mean we get away without any task queues altogether (and maybe even just by using plain old boring Futures).

sjrd · 2023-07-16T07:31:57Z

It seems to make sense, though I'm on mobile this weekend so I don't have a clear view of everything.

What I hoped to achieve with per-class queues was to avoid as much synchronization as possible. If work queues are enough, it's possible to do everything using lockless strategies. I wonder to what extent TrieMaps would solve the issues of (1). We might compute the same target twice during contention, but maybe it wouldn't happen often enough to outweigh the benefits of lockless.

Or perhaps I'm just overly optimistic about what lockless has to offer.

gzm0 · 2023-08-04T20:11:56Z

I've managed to make class loading thread safe and parallel: https://github.com/gzm0/scala-js/tree/class-loader (almost lock free).

I've been looking at how call resolution works in Analyzer, and I'm concerned that we have two points of non-determinism (even now):

Method lookup does not ignore missing methods.

In the following setup:

class A
class B extends A

(???: A).foo() // missing method
(???: B).foo() // missing method

Depending on which order we resolve the calls in the last two lines, we'll report the method B#foo as missing on A or missing on B: if the missing method is already present on A, we'll pick that. Otherwise we'll create a new missing method.

Default bridge creation re-uses bridges from the parent

Take the following:

trait Foo {
  def x: Int = 1
}

class A extends Foo
class B extends A

(???: A).x
(???: B).x

If we resolve B#x after A#x, A will already contain the default bridge to Foo#x. this code will find it and re-use it. In the emitted code B will not have the default bridge.

However, if we resolve A#x after B#x, B will already have a default bridge, which A doesn't see / cannot use. So in the emitted code B will have the default bridge.

IMO this second one is more severe, since it actually has the potential to change the emitted code. I suspect a proper solution to #2520 would fix this.

Discovered as a simplification while working on scala-js#1626.

Discovered as a simplification while working on scala-js#1626. This also exposes a latent bug in the class loading sequence: the Analyzer is built with the assumption that pending tasks never drop to zero until the whole analysis is completed. However, our loading sequence violated this assumption: since the constructor of LoadingClass scheduled info loading immediately, it was possible for info loading to complete before linking is requested on the class. If this condition happens on the initial calling thread (which itself is not tracked as a "task"), the pending task count would drop to zero.

Discovered as a simplification while working on scala-js#1626.

Discovered while working on scala-js#1626.

Discovered as a simplification while working on scala-js#1626.

Discovered while working on scala-js#1626.

gzm0 · 2023-08-19T15:47:52Z

FYI, I managed to get these timings on a local branch with a parallel linker:

  op                            variant  `median(t_ns)`
  <fct>                         <fct>             <dbl>
1 Linker: Compute reachability  main          323617749
2 Linker: Compute reachability  parallel      163542533
3 Refiner: Compute reachability main          255269676
4 Refiner: Compute reachability parallel      170701844

I hope I can get it ready for review this weekend.

gzm0 · 2023-09-23T10:58:56Z

I propose we close this in 1.14.0. Although we do not have an incremental linker, in 1.13.1 and 1.14.0 we managed to make significant performance improvements to the linker.

Also, after quite some investigation, I'm fairly convinced that building a proper incremental linker is simply infeasible (due to cycle detection).

While an leaky incremental linker seems feasible, I'm not convinced that the required effort would be worth the outcome: It seems we get much bigger bang for the buck by investing in overall performance improvements (the main reason for this sentiment is likely the cost of test coverage).

Opinions?

sjrd · 2023-09-23T11:14:27Z

I agree. We should be in a pretty good place performance-wise, now.

tarsa · 2023-09-24T09:06:56Z

Is it feasible to add special-case compiler (and linker) support for very quick processing of changes mentioned in initial posts, i.e. changes to literals and source code formatting?

#1626 (comment)

ramnivas April 1 2015
The change I made was a trivial one ("foo" -> "fooo")
japgolly April 1 2015
Me too This is my output from after just adding a newline to a file:

#1626 (comment)

That's a one-character change to a string constant

Such mechanism would allow for some impressive demos (probably important to get a good opinion from newcomers and commentators), but otherwise wouldn't be universally useful.

gzm0 · 2023-09-24T16:37:59Z

Is it feasible to add special-case compiler (and linker) support for very quick processing of changes mentioned in initial posts, i.e. changes to literals and source code formatting?

This should be possible, yes. Essentially, what we could do (in Scala.js linker internal parlens), re-use the previous Analysis if none of the Infos changed.

However, I'm not sure we can do this without incurring additional cost on linking, so we need to weight this trade-off carefully.

If you feel we should investigate this, may I suggest you open another issue to discuss this (IMHO the scope is quite different)?

tarsa · 2023-09-24T18:54:14Z

If you feel we should investigate this, may I suggest you open another issue to discuss this (IMHO the scope is quite different)?

ok, I've made a separate issue: #4907

gzm0 · 2024-01-07T09:14:23Z

I'm closing this. Follow-up regarding the processing of non-info changing changes in #4907.

gzm0 · 2024-01-07T09:15:27Z

Assigning this to 1.14.0, where we made the linker parallel.

gzm0 added the enhancement Feature request (that does not concern language semantics, see "language") label Apr 30, 2015

gzm0 added this to the Post-v1.0.0 milestone May 22, 2019

gzm0 removed this from the Post-v1.0.0 milestone Apr 8, 2020

gzm0 mentioned this issue Aug 5, 2023

Use standard loading mechanism for jl.Object in Analyzer #4890

Merged

gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 5, 2023

Use standard loading mechanism for jl.Object in Analyzer

5ae12d4

Discovered as a simplification while working on scala-js#1626.

gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 5, 2023

Use standard loading mechanism for jl.Object in Analyzer

c58c82b

Discovered as a simplification while working on scala-js#1626.

gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 5, 2023

Analyzer: Make MethodInfo attributes immutable where feasible

ee63896

Discovered while working on scala-js#1626.

gzm0 mentioned this issue Aug 5, 2023

Analyzer: Make MethodInfo attributes immutable where feasible #4891

Merged

gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 5, 2023

Use standard loading mechanism for jl.Object in Analyzer

42cd08d

Discovered as a simplification while working on scala-js#1626.

gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 6, 2023

Analyzer: Make MethodInfo attributes immutable where feasible

f0a38ac

Discovered while working on scala-js#1626.

tarsa mentioned this issue Sep 24, 2023

Optimized compiler and linker processing of changes to literals and source code formatting #4907

Open

gzm0 closed this as completed Jan 7, 2024

gzm0 added this to the v1.14.0 milestone Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental linker #1626

Incremental linker #1626

japgolly commented Apr 30, 2015

easel commented May 24, 2016 •

edited

Loading

gzm0 commented May 24, 2016

zalbia commented Feb 8, 2017 •

edited

Loading

sjrd commented Feb 8, 2017

gzm0 commented May 14, 2019

gzm0 commented May 14, 2019

vjovanov commented May 14, 2019

gzm0 commented May 29, 2019

gzm0 commented Apr 11, 2021

sjrd commented Apr 11, 2021

ramnivas commented Apr 11, 2021

gzm0 commented Feb 25, 2023

gzm0 commented May 18, 2023

tarsa commented May 22, 2023

sjrd commented May 22, 2023

tarsa commented May 23, 2023 •

edited

Loading

sjrd commented May 23, 2023

tarsa commented May 23, 2023

sjrd commented May 23, 2023

gzm0 commented Jun 4, 2023

sjrd commented Jun 4, 2023

gzm0 commented Jul 15, 2023

sjrd commented Jul 16, 2023

gzm0 commented Aug 4, 2023

gzm0 commented Aug 19, 2023

gzm0 commented Sep 23, 2023

sjrd commented Sep 23, 2023

tarsa commented Sep 24, 2023 •

edited

Loading

gzm0 commented Sep 24, 2023

tarsa commented Sep 24, 2023

gzm0 commented Jan 7, 2024

gzm0 commented Jan 7, 2024

Incremental linker #1626

Incremental linker #1626

Comments

japgolly commented Apr 30, 2015

easel commented May 24, 2016 • edited Loading

gzm0 commented May 24, 2016

zalbia commented Feb 8, 2017 • edited Loading

sjrd commented Feb 8, 2017

gzm0 commented May 14, 2019

gzm0 commented May 14, 2019

vjovanov commented May 14, 2019

gzm0 commented May 29, 2019

gzm0 commented Apr 11, 2021

sjrd commented Apr 11, 2021

ramnivas commented Apr 11, 2021

gzm0 commented Feb 25, 2023

gzm0 commented May 18, 2023

tarsa commented May 22, 2023

sjrd commented May 22, 2023

tarsa commented May 23, 2023 • edited Loading

sjrd commented May 23, 2023

tarsa commented May 23, 2023

sjrd commented May 23, 2023

gzm0 commented Jun 4, 2023

sjrd commented Jun 4, 2023

gzm0 commented Jul 15, 2023

sjrd commented Jul 16, 2023

gzm0 commented Aug 4, 2023

Method lookup does not ignore missing methods.

Default bridge creation re-uses bridges from the parent

gzm0 commented Aug 19, 2023

gzm0 commented Sep 23, 2023

sjrd commented Sep 23, 2023

tarsa commented Sep 24, 2023 • edited Loading

gzm0 commented Sep 24, 2023

tarsa commented Sep 24, 2023

gzm0 commented Jan 7, 2024

gzm0 commented Jan 7, 2024

easel commented May 24, 2016 •

edited

Loading

zalbia commented Feb 8, 2017 •

edited

Loading

tarsa commented May 23, 2023 •

edited

Loading

tarsa commented Sep 24, 2023 •

edited

Loading