-
Notifications
You must be signed in to change notification settings - Fork 396
Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
853e776
to
d3d97db
Compare
This is nice. Are there any downsides to this approach other than it leaks into Emitter? (FWIW, I think we can fix that, but we can look at that later). IIUC, we'll still have the same capability of compressing different partitions of names (e.g. class hierarchy that doesn't intersect) if we want to implement this (call site might have to provide more context, like the static type of the receiver, but that's probably always available).
Huh, I didn't think about this. Do you think this will leak in performance even if we turn it off with a flag (just like dangerous v.s. non-dangerous global refs)? |
I don't think so. The theoretical inelegance of having a JS AST that's not entirely immutable, but that shouldn't hold the idea.
Yes indeed. There should always be enough information available.
I'm not sure but it might, yes. |
It had been factored out in `SJSGen` because of one condition that happens to be repeated. However, it is clearer that it does the right thing at each of its call sites. Given the amount of information that needed to be passed to this helper, inlining it twice actually removes additional checks. Ultimately, it seems simpler this way.
d3d97db
to
35d8019
Compare
Agreed.
I have been thinking a bit more about this, and while it sounds more appealing / clean at first, I'm not sure it gives us a practical advantage as soon as post transforms come into the picture:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go with this approach over #4930.
I have reviewed the minification checksizes and done a higher level review of the new approach.
I suggest we first squash before we proceed to a more detailed review.
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala
Show resolved
Hide resolved
35d8019
to
bc5867b
Compare
Squashed and addressed the two simple comments. I'll look at the |
bc5867b
to
4e3a9f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes not concerning Emitter / BasicLinkerBackend look good. Only minor nits.
linker/shared/src/main/scala/org/scalajs/linker/backend/javascript/Printers.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/CoreJSLib.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/SJSGen.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala
Outdated
Show resolved
Hide resolved
4e3a9f2
to
abe332e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a look at the adjustments and replies to the comments, but I have not done a full in-detail review pass yet. (I'll do so anyways, but maybe the comments are already useful like this).
linker/shared/src/main/scala/org/scalajs/linker/backend/javascript/Printers.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/javascript/Printers.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/test/scala/org/scalajs/linker/backend/javascript/PrintersTest.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/javascript/Printers.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/SJSGen.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala
Show resolved
Hide resolved
val printedTrees = (trees: List[js.Tree]).asInstanceOf[List[js.PrintedTree]] | ||
for (printedTree <- printedTrees) | ||
jsFileWriter.write(printedTree.jsCode) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to abstract over this with an additional class so we can avoid the cast to List[js.PrintedTree]
with a path-dependent type)?
Like:
abstract class BodyPrinter {
type Tree
val postTransformer: PostTransformer[Tree]
def printWithoutSourceMap(trees: List[Tree], jsFileWriter: ByteArrayWriter): Unit
def printWithSourceMap(/* snip */)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a SQUASH commit that implements this suggestion. It is more verbose, but it does get rid of the cast. Let me know if it looks good to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it yes. It's a bit more verbose, but IMO the call flow is actually easier to trace / reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squashed then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full detailed review. Some minor stuff that I could have already caught earlier 🤷
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/CoreJSLib.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/SJSGen.scala
Show resolved
Hide resolved
abe332e
to
a6849ab
Compare
I believe I have addressed all the comments. |
a6849ab
to
af17a6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new commit structure is very nice IMO.
Just one new comment regarding the debugging interface for DelayedIdent
.
|
||
protected def debugString: String | ||
|
||
private object resolver extends (() => String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered making the name resolver a trait in DelayedIdent
(in the first commit already)?
It seems all usage sites need to use object
syntax anyways to defined it. So we might as well define two abstract methods on it to avoid forgetting to implement it at the usage site (the toString
of a lambda will be totally useless).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It had actually occurred to me, but I did not went through.
I now introduced DelayedIdent.Resolver
with resolve()
and debugString
.
val printedTrees = (trees: List[js.Tree]).asInstanceOf[List[js.PrintedTree]] | ||
for (printedTree <- printedTrees) | ||
jsFileWriter.write(printedTree.jsCode) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it yes. It's a bit more verbose, but IMO the call flow is actually easier to trace / reason about.
af17a6b
to
e5b3412
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an outdated comment. Otherwise LGTM.
assertEquals("x.<delayed:bar>;", tree.show) | ||
} | ||
|
||
// Even when `apply()` throws, `show` still succeeds based on `toString()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply()
-> resolve()
We introduce a new kind of node in JS ASTs: `DelayedIdent`. A delayed ident is like an `Ident`, but its `name` is provided by a resolver, to be determined later. This allows us to build a JS AST with `DelayedIdent`s whose final names will only be known later. Since pretty-printing requires to resolve the name, it might throw and is not so well suited to `show` for debugging purposes anymore. We therefore introduce `JSTreeShowPrinter`, which avoids resolving the names. Instead, it uses the `debugString` method of the resolver, which can be constructed to display meaningful debugging information. `DelayedIdent` is not yet actually used in this commit, but will be in a subsequent commit for minifying property names.
The only remaining public method is now `printStat(tree: Tree)`.
Previously, `SJSGen` generated `Ident`s for field members, but only names for method members. Generating the `Ident`s for methods was left in `ClassEmitter` and `Function`. Now, we concentrate that responsibility in `SJSGen` only. In addition, we make a clear distinction between idents generated for *definitions*, which receive an `OriginalName`, and those used for *use sites*, which never receive one.
When emitting ES modules, we cannot use Closure because it does not support ES modules the way we need it to. This results in files that are much larger than with other module kinds. Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names. We therefore add our own property name compressor. When enabled, the emitter computes the frequency of every field and method name in the entire program. It then uses those frequencies to allocate short names to them, with the shortest ones allocated to the most used properties. In order to compute the frequencies, we count how many times `genMethodName` is called for any particular `MethodName` (same for other kinds of names) during JS AST generation. That means that while we generate the JS AST, we do not know the final frequencies, and therefore the eventually allocated names. We use `DelayedIdent`s to defer the actual resolution until after the frequencies are computed. Obviously, this breaks any sort of incremental behavior. Since we do not cache the frequency counts per calling method, we have to force re-generation of the whole AST at each run to re-count. Therefore, we invalidate all the emitter caches on every run when the new minifier is enabled. This should not be a problem as it is only intended to be used for fullLink. An alternative would be to store the counts along with global refs in `WithGlobals`, but the overhead would then leak pretty strongly on incremental runs that do not minify. This strategy also prevents fusing AST generation and pretty-printing. When minifying, we demand that the `postTransformer` be `PostTransformer.Identity`. This adds a bit of handling to `BasicLinkerBackend` to deal with the two possible kinds of trees received from the emitter, but nothing too invasive. We automatically enable the new minifier under fullLink when GCC is disabled. This can be overridden with a `scalaJSLinkerConfig` setting.
e5b3412
to
280870d
Compare
This is exciting! |
Beautiful! |
Alternative to #4930 that does not require predicting occurrences, nor generating the full AST twice. Based on an idea by @kyouko-taiga.
The last commit contains the change of strategy. If we go for this alternative, IMO we should squash it inside the "Minify property names" and "Compress the ancestor names".