Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4945

sjrd · 2024-02-16T10:52:16Z

Alternative to #4930 that does not require predicting occurrences, nor generating the full AST twice. Based on an idea by @kyouko-taiga.

The last commit contains the change of strategy. If we go for this alternative, IMO we should squash it inside the "Minify property names" and "Compress the ancestor names".

gzm0 · 2024-02-17T17:29:37Z

This is nice. Are there any downsides to this approach other than it leaks into Emitter? (FWIW, I think we can fix that, but we can look at that later). IIUC, we'll still have the same capability of compressing different partitions of names (e.g. class hierarchy that doesn't intersect) if we want to implement this (call site might have to provide more context, like the static type of the receiver, but that's probably always available).

An alternative would be to store the counts along
with global refs in WithGlobals, but the overhead would then
leak pretty strongly on incremental runs that do not minify.

Huh, I didn't think about this. Do you think this will leak in performance even if we turn it off with a flag (just like dangerous v.s. non-dangerous global refs)?

sjrd · 2024-02-17T18:43:28Z

This is nice. Are there any downsides to this approach other than it leaks into Emitter? (FWIW, I think we can fix that, but we can look at that later).

I don't think so. The theoretical inelegance of having a JS AST that's not entirely immutable, but that shouldn't hold the idea.

IIUC, we'll still have the same capability of compressing different partitions of names (e.g. class hierarchy that doesn't intersect) if we want to implement this (call site might have to provide more context, like the static type of the receiver, but that's probably always available).

Yes indeed. There should always be enough information available.

An alternative would be to store the counts along
with global refs in WithGlobals, but the overhead would then
leak pretty strongly on incremental runs that do not minify.

Huh, I didn't think about this. Do you think this will leak in performance even if we turn it off with a flag (just like dangerous v.s. non-dangerous global refs)?

I'm not sure but it might, yes.

It had been factored out in `SJSGen` because of one condition that happens to be repeated. However, it is clearer that it does the right thing at each of its call sites. Given the amount of information that needed to be passed to this helper, inlining it twice actually removes additional checks. Ultimately, it seems simpler this way.

gzm0 · 2024-02-18T09:33:47Z

I don't think so. The theoretical inelegance of having a JS AST that's not entirely immutable, but that shouldn't hold the idea.

Agreed.

An alternative would be to store the counts along
with global refs in WithGlobals, but the overhead would then
leak pretty strongly on incremental runs that do not minify.

I have been thinking a bit more about this, and while it sounds more appealing / clean at first, I'm not sure it gives us a practical advantage as soon as post transforms come into the picture:

It would give us re-usable JS trees, but they are ephemeral anyways once we consider post transforms.
It does not help us / simplify control flow: In either world we have to make sure we generate all the trees before we print them.

gzm0

I think we should go with this approach over #4930.

I have reviewed the minification checksizes and done a higher level review of the new approach.

I suggest we first squash before we proceed to a more detailed review.

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala

project/Build.scala

Jenkinsfile

sjrd · 2024-02-18T11:05:55Z

Squashed and addressed the two simple comments. I'll look at the PostTransformManager comment later, but this way the rest of the PR should be reviewable.

gzm0

Changes not concerning Emitter / BasicLinkerBackend look good. Only minor nits.

linker/shared/src/main/scala/org/scalajs/linker/backend/javascript/Printers.scala

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/CoreJSLib.scala

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/SJSGen.scala

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala

gzm0

I've taken a look at the adjustments and replies to the comments, but I have not done a full in-detail review pass yet. (I'll do so anyways, but maybe the comments are already useful like this).

linker/shared/src/main/scala/org/scalajs/linker/backend/javascript/Printers.scala

linker/shared/src/test/scala/org/scalajs/linker/backend/javascript/PrintersTest.scala

linker/shared/src/main/scala/org/scalajs/linker/backend/javascript/Printers.scala

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/SJSGen.scala

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala

gzm0 · 2024-02-25T10:35:55Z

linker/shared/src/main/scala/org/scalajs/linker/backend/BasicLinkerBackend.scala

+            val printedTrees = (trees: List[js.Tree]).asInstanceOf[List[js.PrintedTree]]
+            for (printedTree <- printedTrees)
+              jsFileWriter.write(printedTree.jsCode)
+          }


Does it make sense to abstract over this with an additional class so we can avoid the cast to List[js.PrintedTree] with a path-dependent type)?

Like:

abstract class BodyPrinter { type Tree val postTransformer: PostTransformer[Tree] def printWithoutSourceMap(trees: List[Tree], jsFileWriter: ByteArrayWriter): Unit def printWithSourceMap(/* snip */) }

I added a SQUASH commit that implements this suggestion. It is more verbose, but it does get rid of the cast. Let me know if it looks good to you.

I think we should keep it yes. It's a bit more verbose, but IMO the call flow is actually easier to trace / reason about.

Squashed then.

gzm0

Full detailed review. Some minor stuff that I could have already caught earlier 🤷

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/CoreJSLib.scala

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/SJSGen.scala

sjrd · 2024-02-26T12:56:51Z

I believe I have addressed all the comments.

gzm0

The new commit structure is very nice IMO.

Just one new comment regarding the debugging interface for DelayedIdent.

gzm0 · 2024-03-02T09:10:09Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala

+
+    protected def debugString: String
+
+    private object resolver extends (() => String) {


Have you considered making the name resolver a trait in DelayedIdent (in the first commit already)?

It seems all usage sites need to use object syntax anyways to defined it. So we might as well define two abstract methods on it to avoid forgetting to implement it at the usage site (the toString of a lambda will be totally useless).

It had actually occurred to me, but I did not went through.

I now introduced DelayedIdent.Resolver with resolve() and debugString.

gzm0 · 2024-03-02T09:11:28Z

linker/shared/src/main/scala/org/scalajs/linker/backend/BasicLinkerBackend.scala

+            val printedTrees = (trees: List[js.Tree]).asInstanceOf[List[js.PrintedTree]]
+            for (printedTree <- printedTrees)
+              jsFileWriter.write(printedTree.jsCode)
+          }


I think we should keep it yes. It's a bit more verbose, but IMO the call flow is actually easier to trace / reason about.

gzm0

Just an outdated comment. Otherwise LGTM.

gzm0 · 2024-03-06T05:49:58Z

linker/shared/src/test/scala/org/scalajs/linker/backend/javascript/PrintersTest.scala

+      assertEquals("x.<delayed:bar>;", tree.show)
+    }
+
+    // Even when `apply()` throws, `show` still succeeds based on `toString()`.


apply() -> resolve()

We introduce a new kind of node in JS ASTs: `DelayedIdent`. A delayed ident is like an `Ident`, but its `name` is provided by a resolver, to be determined later. This allows us to build a JS AST with `DelayedIdent`s whose final names will only be known later. Since pretty-printing requires to resolve the name, it might throw and is not so well suited to `show` for debugging purposes anymore. We therefore introduce `JSTreeShowPrinter`, which avoids resolving the names. Instead, it uses the `debugString` method of the resolver, which can be constructed to display meaningful debugging information. `DelayedIdent` is not yet actually used in this commit, but will be in a subsequent commit for minifying property names.

The only remaining public method is now `printStat(tree: Tree)`.

Previously, `SJSGen` generated `Ident`s for field members, but only names for method members. Generating the `Ident`s for methods was left in `ClassEmitter` and `Function`. Now, we concentrate that responsibility in `SJSGen` only. In addition, we make a clear distinction between idents generated for *definitions*, which receive an `OriginalName`, and those used for *use sites*, which never receive one.

When emitting ES modules, we cannot use Closure because it does not support ES modules the way we need it to. This results in files that are much larger than with other module kinds. Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names. We therefore add our own property name compressor. When enabled, the emitter computes the frequency of every field and method name in the entire program. It then uses those frequencies to allocate short names to them, with the shortest ones allocated to the most used properties. In order to compute the frequencies, we count how many times `genMethodName` is called for any particular `MethodName` (same for other kinds of names) during JS AST generation. That means that while we generate the JS AST, we do not know the final frequencies, and therefore the eventually allocated names. We use `DelayedIdent`s to defer the actual resolution until after the frequencies are computed. Obviously, this breaks any sort of incremental behavior. Since we do not cache the frequency counts per calling method, we have to force re-generation of the whole AST at each run to re-count. Therefore, we invalidate all the emitter caches on every run when the new minifier is enabled. This should not be a problem as it is only intended to be used for fullLink. An alternative would be to store the counts along with global refs in `WithGlobals`, but the overhead would then leak pretty strongly on incremental runs that do not minify. This strategy also prevents fusing AST generation and pretty-printing. When minifying, we demand that the `postTransformer` be `PostTransformer.Identity`. This adds a bit of handling to `BasicLinkerBackend` to deal with the two possible kinds of trees received from the emitter, but nothing too invasive. We automatically enable the new minifier under fullLink when GCC is disabled. This can be overridden with a `scalaJSLinkerConfig` setting.

ngbinh · 2024-03-06T11:11:07Z

This is exciting!

kyouko-taiga · 2024-03-07T12:05:15Z

Beautiful!

sjrd requested a review from gzm0 February 16, 2024 10:52

sjrd force-pushed the own-short-names-emitter-2 branch from 853e776 to d3d97db Compare February 17, 2024 16:45

sjrd force-pushed the own-short-names-emitter-2 branch from d3d97db to 35d8019 Compare February 17, 2024 22:41

gzm0 requested changes Feb 18, 2024

View reviewed changes

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala Show resolved Hide resolved

project/Build.scala Outdated Show resolved Hide resolved

Jenkinsfile Show resolved Hide resolved

sjrd force-pushed the own-short-names-emitter-2 branch from 35d8019 to bc5867b Compare February 18, 2024 11:05

sjrd force-pushed the own-short-names-emitter-2 branch from bc5867b to 4e3a9f2 Compare February 18, 2024 11:25

gzm0 requested changes Feb 18, 2024

View reviewed changes

sjrd force-pushed the own-short-names-emitter-2 branch from 4e3a9f2 to abe332e Compare February 18, 2024 16:45

sjrd requested a review from gzm0 February 19, 2024 03:56

sjrd mentioned this pull request Feb 23, 2024

More minification. #4931

Merged

gzm0 requested changes Feb 25, 2024

View reviewed changes

gzm0 mentioned this pull request Feb 25, 2024

Fix #4949: Always wrap object literals with (). #4952

Merged

gzm0 requested changes Feb 25, 2024

View reviewed changes

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/CoreJSLib.scala Outdated Show resolved Hide resolved

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/SJSGen.scala Show resolved Hide resolved

sjrd force-pushed the own-short-names-emitter-2 branch from abe332e to a6849ab Compare February 26, 2024 12:51

sjrd requested a review from gzm0 February 26, 2024 12:56

sjrd mentioned this pull request Feb 26, 2024

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4930

Closed

sjrd force-pushed the own-short-names-emitter-2 branch from a6849ab to af17a6b Compare February 26, 2024 13:41

gzm0 requested changes Mar 2, 2024

View reviewed changes

sjrd force-pushed the own-short-names-emitter-2 branch from af17a6b to e5b3412 Compare March 3, 2024 20:48

sjrd requested a review from gzm0 March 3, 2024 20:50

gzm0 approved these changes Mar 6, 2024

View reviewed changes

sjrd added 3 commits March 6, 2024 11:05

Make JSTreePrinter.printTree protected.

298c60a

The only remaining public method is now `printStat(tree: Tree)`.

sjrd added 3 commits March 6, 2024 11:05

Compress the ancestor names used for instance tests.

7928663

Minify core (internal) property names to one letter each.

280870d

sjrd force-pushed the own-short-names-emitter-2 branch from e5b3412 to 280870d Compare March 6, 2024 10:05

sjrd merged commit f4d39fa into scala-js:main Mar 6, 2024

sjrd deleted the own-short-names-emitter-2 branch March 6, 2024 14:33

gzm0 mentioned this pull request Mar 24, 2024

Cleanups to Emitter post transforms #4970

Merged


		protected def debugString: String

		private object resolver extends (() => String) {

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4945

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4945

Uh oh!

Conversation

sjrd commented Feb 16, 2024

Uh oh!

gzm0 commented Feb 17, 2024

Uh oh!

sjrd commented Feb 17, 2024

Uh oh!

gzm0 commented Feb 18, 2024

Uh oh!

gzm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjrd commented Feb 18, 2024

Uh oh!

gzm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gzm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gzm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sjrd commented Feb 26, 2024

Uh oh!

gzm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gzm0 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngbinh commented Mar 6, 2024

Uh oh!

kyouko-taiga commented Mar 7, 2024

Uh oh!

Uh oh!