-
Notifications
You must be signed in to change notification settings - Fork 395
Introduce NewLambda to synthesize instances of SAM types. #5003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@gzm0 This is based on my experiments with trying to optimize the Wasm output. For everything else, I managed to make the Wasm output on average faster than Scala.js without touching the IR, or only in a way that also benefits JS (see #4998). For Scala functions, however, they remained desperately slow no matter what I tried. With these changes to the IR, overall Wasm reaches 0.85% the geomean run time wrt. JS -- so, 15% faster on average. When you get the chance, I'd like your broad opinion on this. Is this something we could commit to at this stage? Or should we only venture this far if and when Wasm gets more mature/battle-tested? There's obviously a catch-22 here: given the performance results, Scala.js-on-Wasm might not be production-viable without this change; but without production experience we might not implement it. That doesn't mean we couldn't wait at least a full release cycle and reports of Wasm being correct, even if it is slow. |
fd2b383
to
a98c757
Compare
I have been thinking about this, and I think what is happening is actually the other way around: the Scala.js compiler is over optimizing for the JS backend. Closures have two separate uses ATM IIUC:
The compiler is making the decision to perform the second optimization. However, it seems to only be an optimization that works well for the JS backend. This makes sense, since it has native support for lambdas, whereas WASM doesn't. How would the world look, if we instead moved the Scala lambda optimization from the compiler to the JS backend? IIUC to decide whether we can emit a Scala X class as a closure:
It seems to me that this is quite similar to the inlineable init optimization. And it also seems it has the potential to allow for more optimization (n.b. post optimizer closure optimization or closure optimization for classes where only a single method is used). WDYT? |
Ah and I forgot: IIUC if we do this, then the WASM backend shouldn't need any adjustment at all, because "typed closures" are just classes at this point (and IIUC the runtime layout would be quite similar, except for the vtable/typedata pointer overhead). |
Additional 2 things I forgot:
|
Identifying all these conditions would be really tricky. Just However, we could go the other way: not creating a class at all, and instead have a One benefit is that we can then apply the best optimization also for SAM interfaces other than Scala functions (e.g., In terms of code size, even for Wasm the single class carrying a Typed Closure is very beneficial. So expanding full classes is not necessarily the best strategy. As far as I can tell, the best strategy would be a closure-carrying single class, with optimized dispatch that eagerly type-tests for that single class. So when calling a That is something we could do with an LMF-style opcode, which we would have a hard time reverse engineering from already expanded classes. |
95a3e59
to
d7d256c
Compare
1785a96
to
f3e5d03
Compare
2a7e46b
to
58fe5f1
Compare
@gzm0 I added a commit that introduces As the commit message says, the implementation on the linker side is butchered at the moment. The idea is to support discussion of the IR changes at the moment. WDYT? If we add generalization to SAM types, this would also improve JS: all the lambdas for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the NewLambda
tree looks very promising, both in terms of simplifying the compiler and in giving more semantic information to the linker.
What I'm wondering is whether it is worth keeping the TypedClosure
abstraction in the core IR: if we could only have NewLambda
the complexity increase in the IR would be minimal. (if it helps the optimizer, we could still consider introducing typed closures as transients).
I don't think we can remove |
2e2513f
to
b23b5e5
Compare
Ah, sorry, I wasn't clear. I understand that we need a It might mean that we have to fall back to a full class when we need bridges for SAMs, but maybe that's OK? |
Ah, I see. So something like case class NewLambda(
descriptor: Descriptor,
captureParams: List[ParamDef],
params: List[ParamDef],
body: Tree,
captureValues: List[Tree]
) (note that That can work from an IR-only point of view, indeed. The main issue is that it's not clear how the Currently it generates a field of a My hunch is that answering these questions will yield a design that is even more complicated than having |
Ah, I haven't looked that far. Are we sure we want to desugar the lambdas? It feels to me:
|
I'm pretty sure we want to desugar the lambdas. If we don't, then every analysis and optimization that relates to the class hierarchy analysis gets more complicated. Figuring out the analyzer's job, in particular in the presence of reflective proxy and/or default methods, gets really tricky. In the optimizer, a method call now needs to handle actual target methods as well as hidden methods of lambdas. Currently there are very few changes in the optimizer, and they're all somewhat duplicating for |
I see, yeah that makes sense. So maybe what could be an in-between option is to disallow (free-standing) TypedClosures trees and types in serialized IR? This is somewhat similar to records (whether to have a TypedClosure inside a NewLambda or flatten it is TBD). This way, we can use TypedClosures in the linker pipeline but do not unnecessarily increase the IR interface. |
That seems doable. It will require a bit more work in the javalib IR cleaner (it currently lowers Scala functions to the underlying typed closures, but it could be custom functional interfaces instead), but nothing too dramatic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review. Sharing because I think it contains stuff worth iterating on already.
I have fully reviewed up until the "TODO: Continue here" comment in Analyzer.scala.
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Infos.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Infos.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments in ir/
and compiler/
so far.
I just realized that I do not understand at all, why we still need custom box processing in GenJSCode with this change. My understanding of the situation before this PR is that we use untyped closures to represent lambdas. As a result, we need to adjust boxes because the closure boundary requires boxed types (anys). However, with this change, we lift this limitation, so we should (?) be able to take the types scalac generates directly. As such, I would expect the necessary boxes to be already present. Just TBC: I do understand that boxing/unboxing can/will be required between the specific interface a method implements and its implementation, and potentially even between the I'll try to keep digging some more about this, but any help is appreciated. |
Yes, the JVM backend has to do a lot with that as well. The backend itself doesn't do much. The work is split between another phase ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some replies + review of WASM (only nits there).
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Infos.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/FunctionEmitter.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/FunctionEmitter.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/TypeTransformer.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/WasmContext.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/WasmContext.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/WasmContext.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believed I have addressed all the comments so far.
The main question that is probably still under debate is the design of SyntheticClassKind
.
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Infos.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Infos.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/FunctionEmitter.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/FunctionEmitter.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/FunctionEmitter.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some rebuttals and minor comments I found.
It's a bit tricky to review ATM because of the rebase on the name refactoring (using commits for both review history and intended target history at the same time is not working too well with Github's comment tracking).
But I think we can nevertheless make progress on the overall design here.
linker/shared/src/main/scala/org/scalajs/linker/checker/FeatureSet.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Infos.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Infos.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/FunctionEmitter.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/FunctionEmitter.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/analyzer/Analyzer.scala
Outdated
Show resolved
Hide resolved
Sorry. I expected #5136 to be mostly a no-brainer that would be quickly reviewed and merged 😅 |
linker/shared/src/main/scala/org/scalajs/linker/backend/wasmemitter/FunctionEmitter.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of LambdaSynthesizer: Either there is a big problem with transient type refs, or there is a big chunk I do not understand :) either way, probably worth iterating (and I'm taking a break anyways :P).
for (intf <- descriptor.interfaces) | ||
digestBuilder.updateUTF8String(intf.encoded) | ||
|
||
// FIXME This is not efficient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder. OK to not address in this PR IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linker/shared/src/main/scala/org/scalajs/linker/frontend/LambdaSynthesizer.scala
Outdated
Show resolved
Hide resolved
val suffixBuilder = new java.lang.StringBuilder(".$$Lambda$") | ||
for (b <- digest) { | ||
val i = b & 0xff | ||
suffixBuilder.append(Character.forDigit(i >> 4, 16)).append(Character.forDigit(i & 0x0f, 16)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I have considered whether we should factor out / share this with InternalModuleIDGenerator
and I agree with the (implicit or explicit) decision not to.
linker/shared/src/main/scala/org/scalajs/linker/frontend/LambdaSynthesizer.scala
Show resolved
Hide resolved
def makeClassInfo(descriptor: NewLambda.Descriptor, className: ClassName): ClassInfo = { | ||
val methodInfos = Array.fill(MemberNamespace.Count)(Map.empty[MethodName, MethodInfo]) | ||
|
||
val fFieldName = FieldName(className, SimpleFieldName("f")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: factor out the name f
to deduplicate with makeClassDef
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some typos / optional things. LGTM to merge after squashing.
* but we might as well cache them together. | ||
*/ | ||
private val syntheticLambdaNamesCache = | ||
mutable.Map.empty[NewLambda.Descriptor, (ClassName, MethodName)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I agree with the design decision that the memory leak here is acceptable.
* | ||
* Although `NewLambda` nodes themselves are desugared in the `Desugarer`, | ||
* the corresponding synthetic *classes* already have an existence after the | ||
* `BasedLinker`. They must, since they must participate in the CHA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `BasedLinker`. They must, since they must participate in the CHA | |
* `BaseLinker`. They must, since they must participate in the CRA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHA = Class Hierarchy Analysis.
What is CRA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we have no tests at all for the new nodes.
Maybe that's OK in the first iteration: We've been working on this PR for quite a while. IDK. I'll leave it up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here than on ClassDefChecker test.
Seems we have no new tests at all, but maybe that's OK for now, so we can get this PR out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `NewLambda` node creates an instance of an anonymous class from a `Descriptor` and a closure `fun`. The `Descriptor` specifies the shape of the anonymous class: a super class, a list of interfaces to implement, and the name of a single non-constructor method to provide. The body of the method calls the `fun` closure. At link time, the Analyzer and BaseLinker synthesize a unique such anonymous class per `Descriptor`. In practice, all the lambdas for a given target type share a common `Descriptor`. This is notably the case for all the Scala functions of arity N. `NewLambda` replaces the need for special `AnonFunctionN` classes in the library. Instead, classes of the right shape are synthesized at link-time. The scheme can also be used for most LambdaMetaFactory-style lambdas, although our `NewLambda` does not support bridge generation. In the common case where no bridges are necessary, we now also generate a `NewLambda`. This generalizes the code size optimization of having only one class per `Descriptor` to non-Scala functions. In order to truly support LMF-style lambdas, the closure `fun` must take parameters that match the (erased) type in their superinterface. Previously, for Scala `FunctionN`, we knew by construction that the parameters and result types were always `any`, and so JS `Closure`s were good enough. Now, we need closures that can accept different types. This is where typed closures come into play (see below). --- When bridges are required, we still generate a custom class from the compiler backend. In that case, we statically inline the closure body in the produced SAM implementation. We have to do this not to expose typed closures across method calls. Moreover, we need the better static types for the parameters to be able to inline the closures without too much hassle. So this change *has* to be done in lockstep with the rest of this commit. --- A typed closure is a `Closure` that does not have any semantics for JS interop. This is stronger than `Char`, which is "merely" opaque to JS. A `Char` can still be passed to JS and has a meaningful `toString()`. A typed closure *cannot* be passed to JS in any way. That is enforced by making their type *not* a subtype of `any` (like record types). Since a typed closure has no JS interop semantics, it is free to strongly, statically type its parameters and result type. Additionally, we can freely choose its representation in the best possible way for the given target. On JS, that remains an arrow function. On Wasm, however, we represent it as a pair of `(capture data pointer, function pointer)`. This allows to compile them in an efficient way that does not require going through a JS bridge closure. The latter has been shown to have a devastating impact on performance when a Scala function is used in a tight loop. The type of a typed closure is a `ClosureType`. It records its parameter types and its result type. Closure types are non-variant: they are only subtypes of themselves. As mentioned, they are not subtypes of `any`. They are however subtypes of `void` and supertypes of `nothing`. Unfortunately, they must also be nullable to have a default value, so they have nullable and non-nullable alternatives. To call a typed closure, we introduce a dedicated application node `ApplyTypedClosure`. IR checking ensures that actual arguments match the expected parameter types. The result type is directly used as the type of the application. There are no changes to the source language. In particular, there is no way to express typed closures or their types at the user level. They are only used for `NewLambda` nodes. In fact, typed closures and `ApplyTypedClosure`s are not first-class at the IR level. Before desugaring, typed closures are only allowed as direct children of `NewLambda` nodes. Desugaring transforms `NewLambda` nodes into `New`s of the synthesized anonymous classes. At that point, the two typed closure nodes become first-class expression trees. --- For Scala functions, these changes have no real impact on the JS output (only marginal naming differences). On Wasm, however, they make Scala functions much, much faster. Before, a Scala function in a tight loop would cause a Wasm implementation to be, in the worst measured case, 20x slower than on JS. After these changes, similar benchmarks become significantly faster on Wasm than on JS.
Thanks a lot @gzm0 for reviewing this deep PR over the months! |
Now JSArrayConstr for Array[T](...) Seq[T](...) List[T](...) is the problem
Now JSArrayConstr for Array[T](...) Seq[T](...) List[T](...) is the problem
At this moment, this is a more a Request For Comments than a real PR. It is built on #4988 since the motivation is entirely about making the Wasm output fast instead of disastrously slow when Scala functions are involved.Was: Introduce typed closures.
Now the focus in more on
NewLambda
than typedClosure
nodes.The
NewLambda
node creates an instance of an anonymous class from aDescriptor
and a closurefun
. TheDescriptor
specifies the shape of the anonymous class: a super class, a list of interfaces to implement, and the name of a single non-constructor method to provide. The body of the method calls thefun
closure.At link time, the Analyzer and BaseLinker synthesize a unique such anonymous class per
Descriptor
. In practice, all the lambdas for a given target type share a commonDescriptor
. This is notably the case for all the Scala functions of arity N.NewLambda
replaces the need for specialAnonFunctionN
classes in the library. Instead, classes of the right shape are synthesized at link-time.The scheme can also be used for most
LambdaMetaFactory
-style lambdas, although ourNewLambda
does not support bridge generation. In the common case where no bridges are necessary, we now also generate aNewLambda
. This generalizes the code size optimization of having only one class perDescriptor
to non-Scala functions.In order to truly support LMF-style lambdas, the closure
fun
must take parameters that match the (erased) type in their superinterface. Previously, for ScalaFunctionN
, we knew by construction that the parameters and result types were alwaysany
, and so JSClosure
s were good enough. Now, we need closures that can accept different types. This is where typed closures come into play (see below).When bridges are required, we still generate a custom class from the compiler backend. In that case, we statically inline the closure body in the produced SAM implementation.
We have to do this not to expose typed closures across method calls. Moreover, we need the better static types for the parameters to be able to inline the closures without too much hassle. So this change has to be done in lockstep with the rest of this commit.
A typed closure is a
Closure
that does not have any semantics for JS interop. This is stronger thanChar
, which is "merely" opaque to JS. AChar
can still be passed to JS and has a meaningfultoString()
. A typed closure cannot be passed to JS in any way. That is enforced by making their type not a subtype ofany
(like record types).Since a typed closure has no JS interop semantics, it is free to strongly, statically type its parameters and result type.
Additionally, we can freely choose its representation in the best possible way for the given target. On JS, that remains an arrow function. On Wasm, however, we represent it as a pair of
(capture data pointer, function pointer)
. This allows to compile them in an efficient way that does not require going through a JS bridge closure. The latter has been shown to have a devastating impact on performance when a Scala function is used in a tight loop.The type of a typed closure is a
ClosureType
. It records its parameter types and its result type. Closure types are non-variant: they are only subtypes of themselves. As mentioned, they are not subtypes ofany
. They are however subtypes ofvoid
and supertypes ofnothing
. Unfortunately, they must also be nullable to have a default value, so they have nullable and non-nullable alternatives.To call a typed closure, we introduce a dedicated application node
ApplyTypedClosure
. IR checking ensures that actual arguments match the expected parameter types. The result type is directly used as the type of the application.There are no changes to the source language. In particular, there is no way to express typed closures or their types at the user level. They are only used for
NewLambda
nodes.In fact, typed closures and
ApplyTypedClosure
s are not first-class at the IR level. Before desugaring, typed closures are only allowed as direct children ofNewLambda
nodes. Desugaring transformsNewLambda
nodes intoNew
s of the synthesized anonymous classes. At that point, the two typed closure nodes become first-class expression trees.For Scala functions, these changes have no real impact on the JS output (only marginal naming differences). On Wasm, however, they make Scala functions much, much faster. Before, a Scala function in a tight loop would cause a Wasm implementation to be, in the worst measured case, 20x slower than on JS. After these changes, similar benchmarks become significantly faster on Wasm than on JS.