Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Incremental linker #1626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
japgolly opened this issue Apr 30, 2015 · 32 comments
Closed

Incremental linker #1626

japgolly opened this issue Apr 30, 2015 · 32 comments
Labels
enhancement Feature request (that does not concern language semantics, see "language")
Milestone

Comments

@japgolly
Copy link
Contributor

A while back on gitter chat, we were talking about the linker being slow.

ramnivas April 1 2015  
@sjrd as an additinal data point to @matthughes, here is output from my fastOptJS run
 https://gist.github.com/ramnivas/723b816bef73a4592a7f
The change I made was a trivial one ("foo" -> "fooo")

sjrd April 1 2015  
Seems you would also benefit from an incremental linker.

ramnivas April 1 2015  
That will be awesome!

japgolly April 1 2015  
Me too  This is my output from after just adding a newline to a file:
[debug] Linker: Read info: 21185 us
[debug] Linker: Compute reachability: 381551 us
[debug] Linker: Assemble LinkedClasses: 1576527 us
[debug] Linker: cache stats: reused: 13811 -- invalidated: 6 -- trees read: 6
[debug] Linker: 1983960 us
[debug] Inc. optimizer: Batch mode: false
[debug] Inc. optimizer: Incremental part: 18653 us
[debug] Inc. optimizer: Optimizing 6 methods.
[debug] Inc. optimizer: Optimizer part: 1425 us
[debug] Inc. optimizer: 34441 us
[debug] Refiner: Compute reachability: 289414 us
[debug] Refiner: Assemble LinkedClasses: 18914 us
[debug] Refiner: 308436 us
[debug] Emitter: Class tree cache stats: reused: 3125 -- invalidated: 5
[debug] Emitter: Method tree cache stats: resued: 14569 -- invalidated: 6
[debug] Emitter (write output): 832415 us
@ramnivas Your times are massive! How large is your codebase mate?

ramnivas April 1 2015  
@japgolly Only about 100 *.scala files (spread over 6 subprojects)
This is for the web part only
@japgolly How large is your codebase, mate?

japgolly April 1 2015  
Interesting. My SJS project is 131 files and 9762 LoC.
(Plus a bunch of lib deps like scalaz, scalajs-react, monocle, etc etc)
Your linker takes 80% more time than mine.
What machine are you running on?

ramnivas April 1 2015  
LoC for my project is at 3820
Macbook Pro late model 2.5GHz i7, 16 MB RAM, SSD

japgolly April 1 2015  
My env = Arch Linux, i7-3770 CPU @ 3.40GHz (8 core), 16gb ram, SSDs
Cool, machine difference probably explains why your's takes longer

ramnivas April 1 2015  
Right

japgolly April 1 2015  
It will be awesome if SJS gets an incremental linker. Saving 2, 3 seconds makes a big difference when you change something small and just want to refresh the browser.

ramnivas April 1 2015  
I definitely noticed that machine difference is a significant factor. On another i5 machine, it takes about twice as much
Right. And I like many have tendency to make many many small changes, so it quickly adds up.
More important than elapsed time is breaking of the change code-see the effect rhythm
@gzm0 gzm0 added the enhancement Feature request (that does not concern language semantics, see "language") label Apr 30, 2015
@easel
Copy link

easel commented May 24, 2016

Here's another datapoint for reference:

[debug] Linker: Compute reachability: 724510 us
[debug] Linker: Assemble LinkedClasses: 85611 us
[debug] Basic Linking: 821394 us
[debug] Inc. optimizer: Batch mode: false
[debug] Inc. optimizer: Incremental part: 12683 us
[debug] Inc. optimizer: Optimizing 1 methods.
[debug] Inc. optimizer: Optimizer part: 9294 us
[debug] Inc. optimizer: 35224 us
[debug] Refiner: Compute reachability: 556680 us
[debug] Refiner: Assemble LinkedClasses: 26472 us
[debug] Refiner: 583272 us
[debug] Emitter: Class tree cache stats: reused: 1193 -- invalidated: 1629
[debug] Emitter: Method tree cache stats: resued: 14685 -- invalidated: 1
[debug] Emitter (write output): 318075 us
[debug] Global IR cache stats: reused: 2922 -- invalidated: 10 -- trees read: 9

That's a one-character change to a string constant on a MacBook Pro 2.7ghz quad core. Suffice it to say I am fully supportive of incremental reachability analysis!

@gzm0
Copy link
Contributor

gzm0 commented May 24, 2016

@zalbia
Copy link

zalbia commented Feb 8, 2017

Any news on this? Running PhantomJS tests on my i5, SSD machine on my small ~1000 LOC codebase currently clocks in at 15 seconds. In my experience fastOptJs just takes too long to run.

@sjrd
Copy link
Member

sjrd commented Feb 8, 2017

No news, I'm afraid. This is currently not a priority, as it is a non-functional requirement. We are focusing on functional requirements until 1.0.0 is released.

@gzm0
Copy link
Contributor

gzm0 commented May 14, 2019

@vjovanov this is the one you meant, right?

@vjovanov
Copy link

Yes, that is the one. It can serve as a good starting point.

@gzm0 gzm0 added this to the Post-v1.0.0 milestone May 22, 2019
@gzm0
Copy link
Contributor

gzm0 commented May 29, 2019

@jvican this

@gzm0 gzm0 removed this from the Post-v1.0.0 milestone Apr 8, 2020
@gzm0
Copy link
Contributor

gzm0 commented Apr 11, 2021

I chatted with @vjovanov about this. As a stop-gap solution, we could do the following:

If only method bodies change, do not process removals (but process additions, which already are incremental). This will leave things dangling that are not reachable anymore after the method body changes (and essentially emit more code than necessary). But maybe that is an acceptable trade-off (maybe under a flag).

@sjrd
Copy link
Member

sjrd commented Apr 11, 2021

It might be acceptable in a number of situations, yes. But since it won't produce the same code after a clean, I think it should behind some sort of flag, yes.

@ramnivas
Copy link

During development, dangling extra code is perfectly acceptable. Could the count of removal be tracked? If so, the flag could be in the form of a threshold for full linking; once those many removals have occurred the linker could automatically perform full-linking and reset the counter. This will prevent unbounded growth of the output.

@gzm0
Copy link
Contributor

gzm0 commented Feb 25, 2023

FYI, I have started working on a skeleton for this. I feel like if we formalize From information we can get this to work. But I think I just need to try it out.

@gzm0
Copy link
Contributor

gzm0 commented May 18, 2023

I've been playing with this a bit, and the solution I have envisioned so far, is essentially using the From information formally. So if a method doesn't call another anymore, we'd remove the From and if all Froms are gone, the method isn't alive anymore.

However, this way of dealing with incremental updates cannot remove cycles properly in a similar way that reference counting cannot. It would not surprise me if we found parallels of our problem in (tracing) garbage collection. Any pointers would be highly appreciated.

Further, it also seems that the add-only approach would have to force a full re-link, whenever a method that is reachable (in the incremental state of the analyzer) gets removed. Otherwise, it wouldn't be able to determine whether there is a linking error, or the method just was reachable due to leftover included code.

@tarsa
Copy link

tarsa commented May 22, 2023

@gzm0 Could you defer detecting the cycles to the point where linking error occurs? If there is a linkage error in method X then detect whether method X is in an unreachable cycle. If it is then remove the cycle and resume the incremental linking process (forgetting about the previous spurious error). Detecting whether method X is in an unreachable cycle takes time proportional to the size that cycle (if method X is in such cycle at current moment) or possibly proportional to full link (if it isn't in a dead cycle, but then that linkage error would be legit, i.e. not related to leftover unreachable method).

I hope I've understood the topic good enough and my idea makes sense :)

@sjrd
Copy link
Member

sjrd commented May 22, 2023

What you say makes sense, yes. However, it would, by design, cause incremental linking to output something different than if you like from scratch. That would be a very unfortunate disadvantage, which I think we should try very hard to avoid.

@tarsa
Copy link

tarsa commented May 23, 2023

What you say makes sense, yes. However, it would, by design, cause incremental linking to output something different than if you like from scratch. That would be a very unfortunate disadvantage, which I think we should try very hard to avoid.

Yes, but discussion above suggests that keeping it under a flag is a reasonable solution for time being. Since incremental linking is waiting several years already, it's probably very desirable to have something without waiting another decade or so.

Related: https://en.wikipedia.org/wiki/Dynamic_connectivity

It seems we would need the harder variant, i.e. https://en.wikipedia.org/wiki/Dynamic_connectivity#Fully_dynamic_connectivity . I haven't tried to understand it (that would take some time, but I have other priorities now), but after skimming here are my thoughts:

  • implementation is probably very tricky, so it would be a source of many bugs probably
  • solution with https://en.wikipedia.org/wiki/Dynamic_connectivity#The_Level_structure requires preprocessing and generating an extra forest of trees. Probably the memory and time complexity of that preparation is not very bad, but still would require much more time than a full link. Would users want to do that to have faster incremental linking after that? Also the update operation has O(n) worst time and only the amortized cost is O(lg n * lg n). I suspect that many early updates (i.e. just after initial build) will have a very high cost until the cost amortization benefits kicks in, so the incremental linking benefits will be moot.
  • solution with https://en.wikipedia.org/wiki/Dynamic_connectivity#The_Cutset_structure has beter worst case operations complexity (after creating the data structures), but the data structure has size O(n lg n lg n) (assuming that a number of size lg n is treates as 1 word in memory). That's way too much data to prepare (i.e. even more time consuming first incremental linking step) and keep in memory.

It would not surprise me if we found parallels of our problem in (tracing) garbage collection.

Tracing GC has different requirements. The basic principle of a tracing GC is that precise search for garbage is very costly, so such GCs use many heuristics that remove a lot (but not all) garbage fast (i.e. they amortize the cost of doing heap scan, by waiting for a lot of garbage to accumulate and then, most of the time, removing only the garbage that is easy to remove). Only if there are repeated memory exhaustions, then a precise full GC cycle is used. From what I've read about G1GC there are young GC cycles, mixed (young + part of old) GC cycles, full GC cycles and last-ditch full GC cycles. The difference between ordinary full GC and last-ditch GC is that the latter tries to be precise and find all garbage, so e.g. it forgoes imprecise optimizations (including parallelizations) in some subtasks.

There is at least one thing that high performance tracing GCs use that we could use too, i.e. parallel marking phase. See e.g. https://www.oracle.com/technical-resources/articles/java/g1gc.html

-XX:ConcGCThreads=n
Sets the number of parallel marking threads. Sets n to approximately 1/4 of the number of parallel garbage collection threads (ParallelGCThreads).

IIRC (but I remember that vaguely now) the G1GC uses work stealing to distribute reachability marking workload among threads. We would of course want to use all threads for that part as it wouldn't be a concurrent part in our case, i.e. we wouldn't need to worry about stealing CPU from application threads.

Do other compilers (for whatever programming languages) have fully incremental linking, i.e. with fully incremental reachability analysis too?

@sjrd
Copy link
Member

sjrd commented May 23, 2023

Do other compilers (for whatever programming languages) have fully incremental linking, i.e. with fully incremental reachability analysis too?

As far as I know, no. Scala is actually a bit of a pioneer in terms of incremental compilation pipelines. (I guess it had to, because recompiling everything is too slow.)

@tarsa
Copy link

tarsa commented May 23, 2023

Scala is actually a bit of a pioneer in terms of incremental compilation pipelines. (I guess it had to, because recompiling everything is too slow.)

Well, then perhaps it's worth mentioning a popular motto "done is better than perfect" :) Achieving perfect solution in this case seems unfeasible and long linking times are probably discouraging people from building big projects with Scala.js.

What are the problems with leftover unreachable methods after a hypothetical imprecise fastLink?

  • not identical output to precise fastLink? Then hide behind a flag and (add good documentation about that on Scala.js webpage) or make third linking mode (named e.g. dirtyFastLink), because there are probably sufficient number of differences between precise and imprecise fastLinks to rename one of them.
  • bigger output size? Then do precise fastLink if the the output size after a series of imprecise fastLink steps is bigger more than e.g. 20% than the output size of first imprecise fastLink in that series.
  • wrong behaviour (i.e. different from precise linking)? That would probably only affect reflective access (e.g. structural typing) and dynamic typing (e.g. export to JS), so not the usual Scala code.

@sjrd
Copy link
Member

sjrd commented May 23, 2023

Well, then perhaps it's worth mentioning a popular motto "done is better than perfect" :) Achieving perfect solution in this case seems unfeasible and long linking times are probably discouraging people from building big projects with Scala.js.

Are long linking times still actually an issue in 1.13.1? Although not incremental, base linking performance has significantly improved. It's back to being linear with a pretty low constant factor. It takes less than 500 ms for the two rounds of linking (combined) on the Scala.js test suite, which emits a 27 MB .js file. Are there applications that are so much bigger than that that this time becomes discouraging?

@gzm0
Copy link
Contributor

gzm0 commented Jun 4, 2023

I agree with @sjrd's sentiment here. At this point, I have the feeling that anything incremental that is correct (i.e. produces code with correct semantics or fails, but maybe generates too much code) would be so complex, that we might be better off investing in batch performance improvements.

I could also take a stab at parallelizing the linker: It is already fully asynchronous (due to file loading that also needs to work on JS). However, ATM, we execute all calculations on a single thread.

So making it parallel will mainly mean adding locks and benchmarking to see if the lock contention is faster or slower than queuing tasks ad-hoc.

@sjrd
Copy link
Member

sjrd commented Jun 4, 2023

I had the idea of parallelizing a few days ago. I think it should not be hard to "lock" on a per-class basis. Each class could have its own queue of jobs that concern it. Those jobs have to be executed sequentially, but jobs associated with separate classes can execute in parallel. The main dispatch method is followReachabilityInfo, and it already loops on a per-class basis, sending one job per class. AFAICT, it shouldn't be too difficult to send those jobs to the class-specific queues instead.

@gzm0
Copy link
Contributor

gzm0 commented Jul 15, 2023

I've given parallelizing a closer lock and IMO there are the following things that are non-trivial when attempting to simply using synchronized:

  1. Method synthesis. Classes modify their own publicMethodInfos for:
    • Missing methods
    • Reflexive proxies
    • Default methods
  2. Dynamic call dispatch resolution (methodsCalledLog vs instantiatedSubclasses)
  3. Avoiding deadlocks in cyclic call graphs

It is worth pointing out that (1) isn't solved by having a per class queue. However, since public method creation only depends on the ancestry chain, it should be possible to have a global order for locking to avoid deadlocks (I need to verify this).

We can observe that (2) and (3) can be solved with a per class queue, but do not need to: Both of them have natural de-synchronization points:

For (2), once we have updated the log and retrieved the targeted subclasses, we need not run under synchronization anymore (after this line)

For (3), we simply run doReach without synchronization (this line).

Since we already have early abort branches if something has already been reached (we have to, to avoid infinite loops), we have a very natural point to designate the thread that will actually perform the downstream reachability.

@sjrd does that make sense? Did I miss anything?

This might mean we get away without any task queues altogether (and maybe even just by using plain old boring Futures).

@sjrd
Copy link
Member

sjrd commented Jul 16, 2023

It seems to make sense, though I'm on mobile this weekend so I don't have a clear view of everything.

What I hoped to achieve with per-class queues was to avoid as much synchronization as possible. If work queues are enough, it's possible to do everything using lockless strategies. I wonder to what extent TrieMaps would solve the issues of (1). We might compute the same target twice during contention, but maybe it wouldn't happen often enough to outweigh the benefits of lockless.

Or perhaps I'm just overly optimistic about what lockless has to offer.

@gzm0
Copy link
Contributor

gzm0 commented Aug 4, 2023

I've managed to make class loading thread safe and parallel: https://github.com/gzm0/scala-js/tree/class-loader (almost lock free).

I've been looking at how call resolution works in Analyzer, and I'm concerned that we have two points of non-determinism (even now):

Method lookup does not ignore missing methods.

In the following setup:

class A
class B extends A

(???: A).foo() // missing method
(???: B).foo() // missing method

Depending on which order we resolve the calls in the last two lines, we'll report the method B#foo as missing on A or missing on B: if the missing method is already present on A, we'll pick that. Otherwise we'll create a new missing method.

Default bridge creation re-uses bridges from the parent

Take the following:

trait Foo {
  def x: Int = 1
}

class A extends Foo
class B extends A

(???: A).x
(???: B).x

If we resolve B#x after A#x, A will already contain the default bridge to Foo#x. this code will find it and re-use it. In the emitted code B will not have the default bridge.

However, if we resolve A#x after B#x, B will already have a default bridge, which A doesn't see / cannot use. So in the emitted code B will have the default bridge.

IMO this second one is more severe, since it actually has the potential to change the emitted code. I suspect a proper solution to #2520 would fix this.

gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 5, 2023
Discovered as a simplification while working on scala-js#1626.
gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 5, 2023
Discovered as a simplification while working on scala-js#1626.

This also exposes a latent bug in the class loading sequence: the
Analyzer is built with the assumption that pending tasks never drop to
zero until the whole analysis is completed.

However, our loading sequence violated this assumption: since the
constructor of LoadingClass scheduled info loading immediately, it was
possible for info loading to complete before linking is requested on
the class. If this condition happens on the initial calling
thread (which itself is not tracked as a "task"), the pending task
count would drop to zero.
gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 5, 2023
Discovered as a simplification while working on scala-js#1626.
gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 5, 2023
gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 5, 2023
Discovered as a simplification while working on scala-js#1626.
gzm0 added a commit to gzm0/scala-js that referenced this issue Aug 6, 2023
@gzm0
Copy link
Contributor

gzm0 commented Aug 19, 2023

FYI, I managed to get these timings on a local branch with a parallel linker:

  op                            variant  `median(t_ns)`
  <fct>                         <fct>             <dbl>
1 Linker: Compute reachability  main          323617749
2 Linker: Compute reachability  parallel      163542533
3 Refiner: Compute reachability main          255269676
4 Refiner: Compute reachability parallel      170701844

plot

I hope I can get it ready for review this weekend.

@gzm0
Copy link
Contributor

gzm0 commented Sep 23, 2023

I propose we close this in 1.14.0. Although we do not have an incremental linker, in 1.13.1 and 1.14.0 we managed to make significant performance improvements to the linker.

Also, after quite some investigation, I'm fairly convinced that building a proper incremental linker is simply infeasible (due to cycle detection).

While an leaky incremental linker seems feasible, I'm not convinced that the required effort would be worth the outcome: It seems we get much bigger bang for the buck by investing in overall performance improvements (the main reason for this sentiment is likely the cost of test coverage).

Opinions?

@sjrd
Copy link
Member

sjrd commented Sep 23, 2023

I agree. We should be in a pretty good place performance-wise, now.

@tarsa
Copy link

tarsa commented Sep 24, 2023

Is it feasible to add special-case compiler (and linker) support for very quick processing of changes mentioned in initial posts, i.e. changes to literals and source code formatting?

#1626 (comment)

ramnivas April 1 2015
The change I made was a trivial one ("foo" -> "fooo")
japgolly April 1 2015
Me too This is my output from after just adding a newline to a file:

#1626 (comment)

That's a one-character change to a string constant

Such mechanism would allow for some impressive demos (probably important to get a good opinion from newcomers and commentators), but otherwise wouldn't be universally useful.

@gzm0
Copy link
Contributor

gzm0 commented Sep 24, 2023

Is it feasible to add special-case compiler (and linker) support for very quick processing of changes mentioned in initial posts, i.e. changes to literals and source code formatting?

This should be possible, yes. Essentially, what we could do (in Scala.js linker internal parlens), re-use the previous Analysis if none of the Infos changed.

However, I'm not sure we can do this without incurring additional cost on linking, so we need to weight this trade-off carefully.

If you feel we should investigate this, may I suggest you open another issue to discuss this (IMHO the scope is quite different)?

@tarsa
Copy link

tarsa commented Sep 24, 2023

If you feel we should investigate this, may I suggest you open another issue to discuss this (IMHO the scope is quite different)?

ok, I've made a separate issue: #4907

@gzm0
Copy link
Contributor

gzm0 commented Jan 7, 2024

I'm closing this. Follow-up regarding the processing of non-info changing changes in #4907.

@gzm0 gzm0 closed this as completed Jan 7, 2024
@gzm0 gzm0 added this to the v1.14.0 milestone Jan 7, 2024
@gzm0
Copy link
Contributor

gzm0 commented Jan 7, 2024

Assigning this to 1.14.0, where we made the linker parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request (that does not concern language semantics, see "language")
Projects
None yet
Development

No branches or pull requests

8 participants