-
Notifications
You must be signed in to change notification settings - Fork 396
Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
53f63b9
to
0699850
Compare
0699850
to
a728097
Compare
I was just wondering if you expect the size without GCC (SJS minimization) to be similar to GCC's? |
Not if you only apply the changes here. But if you combine that with any off-the-shelf JS minifier, such as |
This is now ready for review. There are further things we could do (for example, "ambiguating" different property names that are used in a disjoint set of classes to have the same name) but this covers the essentials. Indeed, we do not emit any property name that is not compressed anymore, except:
|
447dbd2
to
0892dcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names.
Just to check my understanding: We do not need to be concerned that aggressive property renaming leads to less opportunity in renaming local identifiers since their scopes are completely separate, right?
Higher level comments:
- The first commit might be split off into a separate PR (IMHO valuable on it's own)
- Note the comment regarding using a full pass to collect names, rather than re-building name collection.
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/FunctionEmitter.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/FunctionEmitter.scala
Outdated
Show resolved
Hide resolved
linker-interface/shared/src/main/scala/org/scalajs/linker/interface/StandardConfig.scala
Show resolved
Hide resolved
@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) { | |||
|
|||
private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {} | |||
|
|||
private val nameCompressor: Option[NameCompressor] = | |||
if (minify) Some(new NameCompressor(config)) | |||
else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be part of the state as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, it probably should. But to create the state we need the lastMentionedDangerousGlobalRefs
, and to compute those, we need the nameCompressor
. So that would be tricky to manipulate. It seems easier to manage if it is outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. So let's table that discussion until we're settled on how to collect the names we want to minify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the compressor isn't incremental, could we model it's result as a return value (whether or not it's the same object is irrelevant for the interface).
Then the beginning of emit
could look like this:
if (minify) {
val (compressedNames, dangerousGlobalRefs) = NameCompressor.compress(moduleSet, logger)
state = new State(dangerousGlobalRefs, Some(compressedNames))
} else if (state == null) {
state = new State(dangerousGlobalRefs = Set.empty, compressedNames = None)
}
This would also ensure that the state = null
after emission releases a maximum amount of memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going along with this: The duplication of the logic in the NameCompressor
and the rest of the Emitter
still makes me quite uneasy (even with the usage validation, IMHO it is not very easy to verify that the counting logic is correct).
If we use a return type, we could return different implementations that track or do not track the exact count.
I realize you were concern about not being able to approximate, but IIUC the only place where we do this right now is for Apply to hijacked classes. But this seems to be easy to do correctly:
isMaybeHijackedClass
is already factored in a helper.- We can use the
uncachedGlobalKnowledge
to determine this in theNameCompressor
. - Updating the global knowledge needs to be moved to
emit
fromemitInternal
but that does not seem to be an issue.
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala
Outdated
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala
Outdated
Show resolved
Hide resolved
Indeed. These are completely separate namespaces. And indeed I see Vite/rollup generate single-letter local variable names even after this transformation, which shows that it knows. ;) |
0892dcf
to
c540f0f
Compare
Addressed the easy comments so far. I need to think more about the one-pass/two-pass thing. |
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/FunctionEmitter.scala
Outdated
Show resolved
Hide resolved
linker-interface/shared/src/main/scala/org/scalajs/linker/interface/StandardConfig.scala
Show resolved
Hide resolved
@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) { | |||
|
|||
private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {} | |||
|
|||
private val nameCompressor: Option[NameCompressor] = | |||
if (minify) Some(new NameCompressor(config)) | |||
else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. So let's table that discussion until we're settled on how to collect the names we want to minify.
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/NameCompressor.scala
Show resolved
Hide resolved
linker/shared/src/main/scala/org/scalajs/linker/backend/emitter/Emitter.scala
Outdated
Show resolved
Hide resolved
7b0362c
to
ca33790
Compare
I rebased on top of everything, and added a commit that checks that every allocated name is used at least one. |
ca33790
to
3ac03ad
Compare
if (semantics.strictFloats) genCallPolyfillableBuiltin(FroundBuiltin, expr) | ||
else wg(UnaryOp(irt.JSUnaryOp.+, expr)) | ||
if (semantics.strictFloats) | ||
genCallPolyfillableBuiltin(FroundBuiltin, List(expr), keepOnlyTrackedGlobalRefs = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this: I'm wondering if we should make this an implicit parameter.
Basically the rule we want is:
- When called from
ClassEmitter
, always set totrue
. - When called from
FunctionEmitter
, always set tofalse
. - Otherwise, forward.
This call site violates the forward rule, I assume simply to avoid syntactic bloat. An implicit could solve this and would also make the call sites in ClassEmitter
and FunctionEmitter
cleaner.
@@ -71,7 +71,7 @@ class LibrarySizeTest { | |||
|
|||
testLinkedSizes( | |||
expectedFastLinkSize = 150063, | |||
expectedFullLinkSizeWithoutClosure = 130664, | |||
expectedFullLinkSizeWithoutClosure = 95734, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also have checksizes for the minifier. Otherwise I think just with this, our test coverage is a bit low.
@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) { | |||
|
|||
private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {} | |||
|
|||
private val nameCompressor: Option[NameCompressor] = | |||
if (minify) Some(new NameCompressor(config)) | |||
else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the compressor isn't incremental, could we model it's result as a return value (whether or not it's the same object is irrelevant for the interface).
Then the beginning of emit
could look like this:
if (minify) {
val (compressedNames, dangerousGlobalRefs) = NameCompressor.compress(moduleSet, logger)
state = new State(dangerousGlobalRefs, Some(compressedNames))
} else if (state == null) {
state = new State(dangerousGlobalRefs = Set.empty, compressedNames = None)
}
This would also ensure that the state = null
after emission releases a maximum amount of memory.
val generator = new NameGenerator(namesToAvoid) | ||
|
||
for (entry <- orderedEntries) | ||
entry.allocatedName = generator.nextString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you compared this (performance-wise) with an approach where we create one or two new maps just mapping to the generated name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't compared performance. Now that we track usages, though, we need more mutability after NameCompressor.compress
is done anyway, so I don't think that would make anything simpler at this point.
} | ||
|
||
/** Tie break for stability. */ | ||
private def tieBreak(x: PropertyNameEntry, y: PropertyNameEntry): Int = (x, y) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered making this the natural order of PropertyNameEntry
? Then, allocatePropertyNames
could look like this:
val comparator: Comparator[PropertyNameEntry] = {
Comparator.comparingInt(_.occurrences).reversed()
.andThen(Comparator.naturalOrder()))
}
java.util.Arrays.sort(orderedEntries, comparator)
IMO this improves readability, since it keeps the (relevant) ordering condition more local. (FWIW, it might have a performance impact, IDK).
} | ||
|
||
def traverseModuleSet(moduleSet: ModuleSet): Unit = { | ||
interfaceClassNames = (for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having a var
for this, make a factory/helper method? This will make it more explicit, that it is only created once and never modified again.
case JSMethodDef(_, StringLiteral(name), _, _, _) => | ||
propertyNamesToAvoid += name | ||
case JSPropertyDef(_, StringLiteral(name), _, _) => | ||
propertyNamesToAvoid += name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop a comment about how collisions on computed names are accepted (in JS minifiers in general)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded the comment on the declaration of propertyNamesToAvoid
.
@@ -42,13 +42,17 @@ final class Emitter(config: Emitter.Config) { | |||
|
|||
private val uncachedKnowledge = new knowledgeGuardian.KnowledgeAccessor {} | |||
|
|||
private val nameCompressor: Option[NameCompressor] = | |||
if (minify) Some(new NameCompressor(config)) | |||
else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going along with this: The duplication of the logic in the NameCompressor
and the rest of the Emitter
still makes me quite uneasy (even with the usage validation, IMHO it is not very easy to verify that the counting logic is correct).
If we use a return type, we could return different implementations that track or do not track the exact count.
I realize you were concern about not being able to approximate, but IIUC the only place where we do this right now is for Apply to hijacked classes. But this seems to be easy to do correctly:
isMaybeHijackedClass
is already factored in a helper.- We can use the
uncachedGlobalKnowledge
to determine this in theNameCompressor
. - Updating the global knowledge needs to be moved to
emit
fromemitInternal
but that does not seem to be an issue.
|
||
for (expected <- expectedSequenceStart) | ||
assertEquals(expected, generator.nextString()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably OK (given the amount of boilerplate otherwise), but I recommend this read:
https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree with not to put logic in tests, but here it was not reasonably possible.
This is compensated by the fact that the logic in the test should be as declarative as it gets, whereas the logic in the implementation is as imperative as it gets. The algorithms used are very different, so if they agree on the results, it's a good indicator that they're correct.
private sealed abstract class BaseEntry { | ||
private var _occurrences: Int = 0 | ||
private var allocatedName: String = null | ||
private var actuallyUsed: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to keep the "usage only" check with this high amount of mutability, we might want to re-use _occurrences
to store usage (set it to 0 when used).
3ac03ad
to
abaf00a
Compare
Still to do:
|
c027782
to
6a00a87
Compare
2dc3af7
to
ef4ce58
Compare
9909a78
to
b7341d8
Compare
It had been factored out in `SJSGen` because of one condition that happens to be repeated. However, it is clearer that it does the right thing at each of its call sites. Given the amount of information that needed to be passed to this helper, inlining it twice actually removes additional checks. Ultimately, it seems simpler this way.
When emitting ES modules, we cannot use Closure because it does not support ES modules the way we need it to. This results in files that are much larger than with other module kinds. Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names. We therefore add our own property name compressor. When enabled, the emitter computes the frequency of every field and method name in the entire program. It then uses those frequencies to allocate short names to them, with the shortest ones allocated to the most used properties. Obviously, this breaks any sort of incremental behavior, so we also invalidate all the emitter caches on every run when the new minifier is enabled. This should not be a problem as it is only intended to be used for fullLink. Since we have to walk the entire codebase to compute property frequencies, we take the opportunity to also compute the set of dangerous global refs. This way, when we minify, we can avoid the optimistic first attempt, and always guarantee that a single attempt avoids all the referenced global refs. We automatically enable the new minifier under fullLink when GCC is disabled. This can be overridden with a `scalaJSLinkerConfig` setting.
b7341d8
to
4198b89
Compare
Superseded by #4945. |
When emitting ES modules, we cannot use Closure because it does not support ES modules the way we need it to. This results in files that are much larger than with other module kinds.
Off-the-shelf JavaScript bundlers/minifier can compensate for that to a large extent for local and file-local variables, but they do not have enough semantic information to do it on property names.
We therefore add our own property name compressor. When enabled, the emitter computes the frequency of every field and method name in the entire program. It then uses those frequencies to allocate short names to them, with the shortest ones allocated to the most used properties.
Obviously, this breaks any sort of incremental behavior, so we also invalidate all the emitter caches on every run when the new minifier is enabled. This should not be a problem as it is only intended to be used for fullLink.
Since we have to walk the entire codebase to compute property frequencies, we take the opportunity to also compute the set of dangerous global refs. This way, when we minify, we can avoid the optimistic first attempt, and always guarantee that a single attempt avoids all the referenced global refs.
We automatically enable the new minifier under fullLink when GCC is disabled. This can be overridden with a
scalaJSLinkerConfig
setting.Overall, when used in the context of a Scala.js + Vite application, these change bring a 50% size to the
npm run build
output of Vite. So the size of the output is half that of before.To be clear, this is comparing
When comparing on our test suite without Vite's minification, the reduction is "only" of 20%. But this is not really the important measure; we definitely expect this feature to be used in conjunction with a JS-only minifier.