Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4930

sjrd · 2024-01-24T13:45:38Z

When emitting ES modules, we cannot use Closure because it does not support ES modules the way we need it to. This results in files that are much larger than with other module kinds.

Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names.

We therefore add our own property name compressor. When enabled, the emitter computes the frequency of every field and method name in the entire program. It then uses those frequencies to allocate short names to them, with the shortest ones allocated to the most used properties.

Obviously, this breaks any sort of incremental behavior, so we also invalidate all the emitter caches on every run when the new minifier is enabled. This should not be a problem as it is only intended to be used for fullLink.

Since we have to walk the entire codebase to compute property frequencies, we take the opportunity to also compute the set of dangerous global refs. This way, when we minify, we can avoid the optimistic first attempt, and always guarantee that a single attempt avoids all the referenced global refs.

We automatically enable the new minifier under fullLink when GCC is disabled. This can be overridden with a scalaJSLinkerConfig setting.

Overall, when used in the context of a Scala.js + Vite application, these change bring a 50% size to the npm run build output of Vite. So the size of the output is half that of before.

To be clear, this is comparing

fullLink without GCC without Minify but with Vite's minification (through rollup), versus
fullLink without GCC with Minify and also Vite's minification.

When comparing on our test suite without Vite's minification, the reduction is "only" of 20%. But this is not really the important measure; we definitely expect this feature to be used in conjunction with a JS-only minifier.

ekrich · 2024-01-25T16:17:38Z

I was just wondering if you expect the size without GCC (SJS minimization) to be similar to GCC's?

sjrd · 2024-01-25T16:27:29Z

Not if you only apply the changes here. But if you combine that with any off-the-shelf JS minifier, such as rollup which is automatically part of vite, then I'm hoping we can get close to what GCC offers. If yes, over time we could phase out GCC entirely in favor of this new solution.

sjrd · 2024-01-25T16:44:04Z

This is now ready for review. There are further things we could do (for example, "ambiguating" different property names that are used in a disjoint set of classes to have the same name) but this covers the essentials. Indeed, we do not emit any property name that is not compressed anymore, except:

$classData, because we use it to identify Scala.js objects (maybe not the best idea, but I'm not touching that now),
public properties of $TypeData that are read by the code of java.lang.Class, and
the fields of $linkingInfo, which are publicly specified as well.

gzm0

Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names.

Just to check my understanding: We do not need to be concerned that aggressive property renaming leads to less opportunity in renaming local identifiers since their scopes are completely separate, right?

Higher level comments:

The first commit might be split off into a separate PR (IMHO valuable on it's own)
Note the comment regarding using a full pass to collect names, rather than re-building name collection.

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/FunctionEmitter.scala

linker-interface/shared/src/main/scala/org/scalajs/linker/interface/StandardConfig.scala

gzm0 · 2024-01-28T10:07:07Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala

@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) {

  private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {}

+  private val nameCompressor: Option[NameCompressor] =
+    if (minify) Some(new NameCompressor(config))
+    else None


Shouldn't this be part of the state as well?

In theory, it probably should. But to create the state we need the lastMentionedDangerousGlobalRefs, and to compute those, we need the nameCompressor. So that would be tricky to manipulate. It seems easier to manage if it is outside.

Right. So let's table that discussion until we're settled on how to collect the names we want to minify.

Since the compressor isn't incremental, could we model it's result as a return value (whether or not it's the same object is irrelevant for the interface).

Then the beginning of emit could look like this:

if (minify) { val (compressedNames, dangerousGlobalRefs) = NameCompressor.compress(moduleSet, logger) state = new State(dangerousGlobalRefs, Some(compressedNames)) } else if (state == null) { state = new State(dangerousGlobalRefs = Set.empty, compressedNames = None) }

This would also ensure that the state = null after emission releases a maximum amount of memory.

Going along with this: The duplication of the logic in the NameCompressor and the rest of the Emitter still makes me quite uneasy (even with the usage validation, IMHO it is not very easy to verify that the counting logic is correct).

If we use a return type, we could return different implementations that track or do not track the exact count.

I realize you were concern about not being able to approximate, but IIUC the only place where we do this right now is for Apply to hijacked classes. But this seems to be easy to do correctly:

isMaybeHijackedClass is already factored in a helper.

We can use the uncachedGlobalKnowledge to determine this in the NameCompressor.

Updating the global knowledge needs to be moved to emit from emitInternal but that does not seem to be an issue.

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala

sjrd · 2024-01-28T10:57:54Z

Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names.

Just to check my understanding: We do not need to be concerned that aggressive property renaming leads to less opportunity in renaming local identifiers since their scopes are completely separate, right?

Indeed. These are completely separate namespaces. And indeed I see Vite/rollup generate single-letter local variable names even after this transformation, which shows that it knows. ;)

sjrd · 2024-01-28T11:33:41Z

Addressed the easy comments so far. I need to think more about the one-pass/two-pass thing.

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/FunctionEmitter.scala

linker-interface/shared/src/main/scala/org/scalajs/linker/interface/StandardConfig.scala

gzm0 · 2024-01-28T12:27:12Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala

@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) {

  private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {}

+  private val nameCompressor: Option[NameCompressor] =
+    if (minify) Some(new NameCompressor(config))
+    else None


Right. So let's table that discussion until we're settled on how to collect the names we want to minify.

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala

sjrd · 2024-01-29T17:21:31Z

I rebased on top of everything, and added a commit that checks that every allocated name is used at least one.

gzm0 · 2024-02-11T08:47:36Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/SJSGen.scala

-          if (semantics.strictFloats) genCallPolyfillableBuiltin(FroundBuiltin, expr)
-          else wg(UnaryOp(irt.JSUnaryOp.+, expr))
+          if (semantics.strictFloats)
+            genCallPolyfillableBuiltin(FroundBuiltin, List(expr), keepOnlyTrackedGlobalRefs = false)


Looking at this: I'm wondering if we should make this an implicit parameter.

Basically the rule we want is:

When called from ClassEmitter, always set to true.

When called from FunctionEmitter, always set to false.

Otherwise, forward.

This call site violates the forward rule, I assume simply to avoid syntactic bloat. An implicit could solve this and would also make the call sites in ClassEmitter and FunctionEmitter cleaner.

gzm0 · 2024-02-11T08:56:03Z

linker/shared/src/test/scala/org/scalajs/linker/LibrarySizeTest.scala

@@ -71,7 +71,7 @@ class LibrarySizeTest {

    testLinkedSizes(
      expectedFastLinkSize = 150063,
-      expectedFullLinkSizeWithoutClosure = 130664,
+      expectedFullLinkSizeWithoutClosure = 95734,


I think we should also have checksizes for the minifier. Otherwise I think just with this, our test coverage is a bit low.

gzm0 · 2024-02-11T09:00:43Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala

@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) {

  private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {}

+  private val nameCompressor: Option[NameCompressor] =
+    if (minify) Some(new NameCompressor(config))
+    else None


Since the compressor isn't incremental, could we model it's result as a return value (whether or not it's the same object is irrelevant for the interface).

Then the beginning of emit could look like this:

if (minify) { val (compressedNames, dangerousGlobalRefs) = NameCompressor.compress(moduleSet, logger) state = new State(dangerousGlobalRefs, Some(compressedNames)) } else if (state == null) { state = new State(dangerousGlobalRefs = Set.empty, compressedNames = None) }

This would also ensure that the state = null after emission releases a maximum amount of memory.

gzm0 · 2024-02-11T09:17:22Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala

+    val generator = new NameGenerator(namesToAvoid)
+
+    for (entry <- orderedEntries)
+      entry.allocatedName = generator.nextString()


Have you compared this (performance-wise) with an approach where we create one or two new maps just mapping to the generated name?

I haven't compared performance. Now that we track usages, though, we need more mutability after NameCompressor.compress is done anyway, so I don't think that would make anything simpler at this point.

gzm0 · 2024-02-11T09:24:30Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala

+    }
+
+    /** Tie break for stability. */
+    private def tieBreak(x: PropertyNameEntry, y: PropertyNameEntry): Int = (x, y) match {


Have you considered making this the natural order of PropertyNameEntry? Then, allocatePropertyNames could look like this:

val comparator: Comparator[PropertyNameEntry] = { Comparator.comparingInt(_.occurrences).reversed() .andThen(Comparator.naturalOrder())) } java.util.Arrays.sort(orderedEntries, comparator)

IMO this improves readability, since it keeps the (relevant) ordering condition more local. (FWIW, it might have a performance impact, IDK).

gzm0 · 2024-02-11T09:34:08Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala

+    }
+
+    def traverseModuleSet(moduleSet: ModuleSet): Unit = {
+      interfaceClassNames = (for {


Rather than having a var for this, make a factory/helper method? This will make it more explicit, that it is only created once and never modified again.

gzm0 · 2024-02-11T09:37:24Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala

+        case JSMethodDef(_, StringLiteral(name), _, _, _) =>
+          propertyNamesToAvoid += name
+        case JSPropertyDef(_, StringLiteral(name), _, _) =>
+          propertyNamesToAvoid += name


Drop a comment about how collisions on computed names are accepted (in JS minifiers in general)?

Expanded the comment on the declaration of propertyNamesToAvoid.

gzm0 · 2024-02-11T11:32:45Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala

@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) {

  private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {}

+  private val nameCompressor: Option[NameCompressor] =
+    if (minify) Some(new NameCompressor(config))
+    else None


Going along with this: The duplication of the logic in the NameCompressor and the rest of the Emitter still makes me quite uneasy (even with the usage validation, IMHO it is not very easy to verify that the counting logic is correct).

If we use a return type, we could return different implementations that track or do not track the exact count.

I realize you were concern about not being able to approximate, but IIUC the only place where we do this right now is for Apply to hijacked classes. But this seems to be easy to do correctly:

isMaybeHijackedClass is already factored in a helper.

We can use the uncachedGlobalKnowledge to determine this in the NameCompressor.

Updating the global knowledge needs to be moved to emit from emitInternal but that does not seem to be an issue.

gzm0 · 2024-02-11T11:34:24Z

linker/shared/src/test/scala/org/scalajs/linker/backend/emitter/NameCompressorTest.scala

+
+    for (expected <- expectedSequenceStart)
+      assertEquals(expected, generator.nextString())
+  }


This is probably OK (given the amount of boilerplate otherwise), but I recommend this read:
https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html

I generally agree with not to put logic in tests, but here it was not reasonably possible.

This is compensated by the fact that the logic in the test should be as declarative as it gets, whereas the logic in the implementation is as imperative as it gets. The algorithms used are very different, so if they agree on the results, it's a good indicator that they're correct.

gzm0 · 2024-02-11T11:44:17Z

linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala

+  private sealed abstract class BaseEntry {
+    private var _occurrences: Int = 0
+    private var allocatedName: String = null
+    private var actuallyUsed: Boolean = false


If we decide to keep the "usage only" check with this high amount of mutability, we might want to re-use _occurrences to store usage (set it to 0 when used).

sjrd · 2024-02-14T10:51:42Z

Still to do:

~~add checksizes with the minifier~~
verify exact number of occurrences

It had been factored out in `SJSGen` because of one condition that happens to be repeated. However, it is clearer that it does the right thing at each of its call sites. Given the amount of information that needed to be passed to this helper, inlining it twice actually removes additional checks. Ultimately, it seems simpler this way.

When emitting ES modules, we cannot use Closure because it does not support ES modules the way we need it to. This results in files that are much larger than with other module kinds. Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names. We therefore add our own property name compressor. When enabled, the emitter computes the frequency of every field and method name in the entire program. It then uses those frequencies to allocate short names to them, with the shortest ones allocated to the most used properties. Obviously, this breaks any sort of incremental behavior, so we also invalidate all the emitter caches on every run when the new minifier is enabled. This should not be a problem as it is only intended to be used for fullLink. Since we have to walk the entire codebase to compute property frequencies, we take the opportunity to also compute the set of dangerous global refs. This way, when we minify, we can avoid the optimistic first attempt, and always guarantee that a single attempt avoids all the referenced global refs. We automatically enable the new minifier under fullLink when GCC is disabled. This can be overridden with a `scalaJSLinkerConfig` setting.

sjrd · 2024-02-26T12:59:22Z

Superseded by #4945.

sjrd force-pushed the own-short-names-emitter branch from 53f63b9 to 0699850 Compare January 24, 2024 16:41

Lukah0173 mentioned this pull request Jan 25, 2024

Reduce JS bundle size scala-js/vite-plugin-scalajs#14

Open

sjrd force-pushed the own-short-names-emitter branch from 0699850 to a728097 Compare January 25, 2024 13:29

sjrd changed the title ~~WiP Minify property names ourselves in fullLink when we don't use GCC.~~ Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. Jan 25, 2024

sjrd linked an issue Jan 25, 2024 that may be closed by this pull request

FullLink Scala.js-specific minifier #4482

Closed

sjrd marked this pull request as ready for review January 25, 2024 16:40

sjrd requested a review from gzm0 January 25, 2024 16:40

sjrd force-pushed the own-short-names-emitter branch 2 times, most recently from 447dbd2 to 0892dcf Compare January 26, 2024 11:05

gzm0 requested changes Jan 28, 2024

View reviewed changes

sjrd mentioned this pull request Jan 28, 2024

Do not put O (jl.Object) in the ancestors dictionaries. #4932

Merged

sjrd force-pushed the own-short-names-emitter branch from 0892dcf to c540f0f Compare January 28, 2024 11:32

gzm0 reviewed Jan 28, 2024

View reviewed changes

sjrd force-pushed the own-short-names-emitter branch 2 times, most recently from 7b0362c to ca33790 Compare January 29, 2024 17:17

sjrd requested a review from gzm0 January 29, 2024 17:21

sjrd force-pushed the own-short-names-emitter branch from ca33790 to 3ac03ad Compare January 29, 2024 19:06

gzm0 requested changes Feb 11, 2024

View reviewed changes

gzm0 mentioned this pull request Feb 11, 2024

Optimize isInstanceOf on Scala object for interfaces #3815

Open

sjrd force-pushed the own-short-names-emitter branch from 3ac03ad to abaf00a Compare February 14, 2024 10:45

sjrd mentioned this pull request Feb 14, 2024

Preparations before our own name minifier. #4944

Merged

sjrd force-pushed the own-short-names-emitter branch 2 times, most recently from c027782 to 6a00a87 Compare February 14, 2024 16:06

sjrd force-pushed the own-short-names-emitter branch 2 times, most recently from 2dc3af7 to ef4ce58 Compare February 15, 2024 12:57

sjrd mentioned this pull request Feb 16, 2024

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4945

Merged

sjrd force-pushed the own-short-names-emitter branch 3 times, most recently from 9909a78 to b7341d8 Compare February 17, 2024 16:48

sjrd added 5 commits February 17, 2024 23:21

Compress the ancestor names used for instance tests.

4eef04b

Minify core (internal) property names to one letter each.

51c7c5d

Check that we use all the compressed names that we allocate.

4198b89

sjrd force-pushed the own-short-names-emitter branch from b7341d8 to 4198b89 Compare February 17, 2024 22:44

sjrd closed this Feb 26, 2024

sjrd deleted the own-short-names-emitter branch September 2, 2024 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4930

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4930

sjrd commented Jan 24, 2024 •

edited

Loading

ekrich commented Jan 25, 2024

sjrd commented Jan 25, 2024

sjrd commented Jan 25, 2024

gzm0 left a comment

gzm0 Jan 28, 2024

sjrd Jan 28, 2024

gzm0 Jan 28, 2024

gzm0 Feb 11, 2024

gzm0 Feb 11, 2024

sjrd commented Jan 28, 2024

sjrd commented Jan 28, 2024

gzm0 Jan 28, 2024

sjrd commented Jan 29, 2024

gzm0 Feb 11, 2024

gzm0 Feb 11, 2024

gzm0 Feb 11, 2024

gzm0 Feb 11, 2024

sjrd Feb 14, 2024

gzm0 Feb 11, 2024

gzm0 Feb 11, 2024

gzm0 Feb 11, 2024

sjrd Feb 14, 2024

gzm0 Feb 11, 2024

gzm0 Feb 11, 2024

sjrd Feb 14, 2024

gzm0 Feb 11, 2024

sjrd commented Feb 14, 2024 •

edited

Loading

sjrd commented Feb 26, 2024

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4930

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4930

Conversation

sjrd commented Jan 24, 2024 • edited Loading

ekrich commented Jan 25, 2024

sjrd commented Jan 25, 2024

sjrd commented Jan 25, 2024

gzm0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjrd commented Jan 28, 2024

sjrd commented Jan 28, 2024

Choose a reason for hiding this comment

sjrd commented Jan 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjrd commented Feb 14, 2024 • edited Loading

sjrd commented Feb 26, 2024

sjrd commented Jan 24, 2024 •

edited

Loading

sjrd commented Feb 14, 2024 •

edited

Loading