Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix #4482: Minify property names ourselves in fullLink when we don't use GCC. #4945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 6, 2024

Conversation

sjrd
Copy link
Member

@sjrd sjrd commented Feb 16, 2024

Alternative to #4930 that does not require predicting occurrences, nor generating the full AST twice. Based on an idea by @kyouko-taiga.

The last commit contains the change of strategy. If we go for this alternative, IMO we should squash it inside the "Minify property names" and "Compress the ancestor names".

@sjrd sjrd requested a review from gzm0 February 16, 2024 10:52
@sjrd sjrd force-pushed the own-short-names-emitter-2 branch from 853e776 to d3d97db Compare February 17, 2024 16:45
@gzm0
Copy link
Contributor

gzm0 commented Feb 17, 2024

This is nice. Are there any downsides to this approach other than it leaks into Emitter? (FWIW, I think we can fix that, but we can look at that later). IIUC, we'll still have the same capability of compressing different partitions of names (e.g. class hierarchy that doesn't intersect) if we want to implement this (call site might have to provide more context, like the static type of the receiver, but that's probably always available).

An alternative would be to store the counts along
with global refs in WithGlobals, but the overhead would then
leak pretty strongly on incremental runs that do not minify.

Huh, I didn't think about this. Do you think this will leak in performance even if we turn it off with a flag (just like dangerous v.s. non-dangerous global refs)?

@sjrd
Copy link
Member Author

sjrd commented Feb 17, 2024

This is nice. Are there any downsides to this approach other than it leaks into Emitter? (FWIW, I think we can fix that, but we can look at that later).

I don't think so. The theoretical inelegance of having a JS AST that's not entirely immutable, but that shouldn't hold the idea.

IIUC, we'll still have the same capability of compressing different partitions of names (e.g. class hierarchy that doesn't intersect) if we want to implement this (call site might have to provide more context, like the static type of the receiver, but that's probably always available).

Yes indeed. There should always be enough information available.

An alternative would be to store the counts along
with global refs in WithGlobals, but the overhead would then
leak pretty strongly on incremental runs that do not minify.

Huh, I didn't think about this. Do you think this will leak in performance even if we turn it off with a flag (just like dangerous v.s. non-dangerous global refs)?

I'm not sure but it might, yes.

It had been factored out in `SJSGen` because of one condition that
happens to be repeated. However, it is clearer that it does the
right thing at each of its call sites. Given the amount of
information that needed to be passed to this helper, inlining it
twice actually removes additional checks. Ultimately, it seems
simpler this way.
@sjrd sjrd force-pushed the own-short-names-emitter-2 branch from d3d97db to 35d8019 Compare February 17, 2024 22:41
@gzm0
Copy link
Contributor

gzm0 commented Feb 18, 2024

I don't think so. The theoretical inelegance of having a JS AST that's not entirely immutable, but that shouldn't hold the idea.

Agreed.

An alternative would be to store the counts along
with global refs in WithGlobals, but the overhead would then
leak pretty strongly on incremental runs that do not minify.

I have been thinking a bit more about this, and while it sounds more appealing / clean at first, I'm not sure it gives us a practical advantage as soon as post transforms come into the picture:

  • It would give us re-usable JS trees, but they are ephemeral anyways once we consider post transforms.
  • It does not help us / simplify control flow: In either world we have to make sure we generate all the trees before we print them.

Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go with this approach over #4930.

I have reviewed the minification checksizes and done a higher level review of the new approach.

I suggest we first squash before we proceed to a more detailed review.

@sjrd sjrd force-pushed the own-short-names-emitter-2 branch from 35d8019 to bc5867b Compare February 18, 2024 11:05
@sjrd
Copy link
Member Author

sjrd commented Feb 18, 2024

Squashed and addressed the two simple comments. I'll look at the PostTransformManager comment later, but this way the rest of the PR should be reviewable.

@sjrd sjrd force-pushed the own-short-names-emitter-2 branch from bc5867b to 4e3a9f2 Compare February 18, 2024 11:25
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes not concerning Emitter / BasicLinkerBackend look good. Only minor nits.

@sjrd sjrd force-pushed the own-short-names-emitter-2 branch from 4e3a9f2 to abe332e Compare February 18, 2024 16:45
@sjrd sjrd requested a review from gzm0 February 19, 2024 03:56
@sjrd sjrd mentioned this pull request Feb 23, 2024
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a look at the adjustments and replies to the comments, but I have not done a full in-detail review pass yet. (I'll do so anyways, but maybe the comments are already useful like this).

val printedTrees = (trees: List[js.Tree]).asInstanceOf[List[js.PrintedTree]]
for (printedTree <- printedTrees)
jsFileWriter.write(printedTree.jsCode)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to abstract over this with an additional class so we can avoid the cast to List[js.PrintedTree] with a path-dependent type)?

Like:

abstract class BodyPrinter {
  type Tree
  
  val postTransformer: PostTransformer[Tree]

  def printWithoutSourceMap(trees: List[Tree], jsFileWriter: ByteArrayWriter): Unit
  def printWithSourceMap(/* snip */)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a SQUASH commit that implements this suggestion. It is more verbose, but it does get rid of the cast. Let me know if it looks good to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep it yes. It's a bit more verbose, but IMO the call flow is actually easier to trace / reason about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squashed then.

Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full detailed review. Some minor stuff that I could have already caught earlier 🤷

@sjrd sjrd force-pushed the own-short-names-emitter-2 branch from abe332e to a6849ab Compare February 26, 2024 12:51
@sjrd
Copy link
Member Author

sjrd commented Feb 26, 2024

I believe I have addressed all the comments.

Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new commit structure is very nice IMO.

Just one new comment regarding the debugging interface for DelayedIdent.


protected def debugString: String

private object resolver extends (() => String) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered making the name resolver a trait in DelayedIdent (in the first commit already)?

It seems all usage sites need to use object syntax anyways to defined it. So we might as well define two abstract methods on it to avoid forgetting to implement it at the usage site (the toString of a lambda will be totally useless).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had actually occurred to me, but I did not went through.

I now introduced DelayedIdent.Resolver with resolve() and debugString.

val printedTrees = (trees: List[js.Tree]).asInstanceOf[List[js.PrintedTree]]
for (printedTree <- printedTrees)
jsFileWriter.write(printedTree.jsCode)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep it yes. It's a bit more verbose, but IMO the call flow is actually easier to trace / reason about.

@sjrd sjrd force-pushed the own-short-names-emitter-2 branch from af17a6b to e5b3412 Compare March 3, 2024 20:48
@sjrd sjrd requested a review from gzm0 March 3, 2024 20:50
Copy link
Contributor

@gzm0 gzm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an outdated comment. Otherwise LGTM.

assertEquals("x.<delayed:bar>;", tree.show)
}

// Even when `apply()` throws, `show` still succeeds based on `toString()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply() -> resolve()

sjrd added 3 commits March 6, 2024 11:05
We introduce a new kind of node in JS ASTs: `DelayedIdent`. A
delayed ident is like an `Ident`, but its `name` is provided by a
resolver, to be determined later. This allows us to build a JS
AST with `DelayedIdent`s whose final names will only be known
later.

Since pretty-printing requires to resolve the name, it might throw
and is not so well suited to `show` for debugging purposes anymore.
We therefore introduce `JSTreeShowPrinter`, which avoids resolving
the names. Instead, it uses the `debugString` method of the
resolver, which can be constructed to display meaningful debugging
information.

`DelayedIdent` is not yet actually used in this commit, but will
be in a subsequent commit for minifying property names.
The only remaining public method is now `printStat(tree: Tree)`.
Previously, `SJSGen` generated `Ident`s for field members, but
only names for method members. Generating the `Ident`s for methods
was left in `ClassEmitter` and `Function`. Now, we concentrate that
responsibility in `SJSGen` only.

In addition, we make a clear distinction between idents generated
for *definitions*, which receive an `OriginalName`, and those used
for *use sites*, which never receive one.
sjrd added 3 commits March 6, 2024 11:05
When emitting ES modules, we cannot use Closure because it does not
support ES modules the way we need it to. This results in files
that are much larger than with other module kinds.

Off-the-shelf JavaScript bundlers/minifier can compensate for that
to a large extent for local and file-local variables, but they do
not have enough semantic information to do it on property names.

We therefore add our own property name compressor. When enabled,
the emitter computes the frequency of every field and method name
in the entire program. It then uses those frequencies to allocate
short names to them, with the shortest ones allocated to the most
used properties.

In order to compute the frequencies, we count how many times
`genMethodName` is called for any particular `MethodName` (same
for other kinds of names) during JS AST generation. That means
that while we generate the JS AST, we do not know the final
frequencies, and therefore the eventually allocated names. We
use `DelayedIdent`s to defer the actual resolution until after
the frequencies are computed.

Obviously, this breaks any sort of incremental behavior. Since we
do not cache the frequency counts per calling method, we have to
force re-generation of the whole AST at each run to re-count.
Therefore, we invalidate all the emitter caches on every run when
the new minifier is enabled. This should not be a problem as it
is only intended to be used for fullLink.

An alternative would be to store the counts along with global
refs in `WithGlobals`, but the overhead would then leak pretty
strongly on incremental runs that do not minify.

This strategy also prevents fusing AST generation and
pretty-printing. When minifying, we demand that the
`postTransformer` be `PostTransformer.Identity`. This adds a bit
of handling to `BasicLinkerBackend` to deal with the two possible
kinds of trees received from the emitter, but nothing too invasive.

We automatically enable the new minifier under fullLink when
GCC is disabled. This can be overridden with a `scalaJSLinkerConfig`
setting.
@sjrd sjrd force-pushed the own-short-names-emitter-2 branch from e5b3412 to 280870d Compare March 6, 2024 10:05
@ngbinh
Copy link

ngbinh commented Mar 6, 2024

This is exciting!

@sjrd sjrd merged commit f4d39fa into scala-js:main Mar 6, 2024
@sjrd sjrd deleted the own-short-names-emitter-2 branch March 6, 2024 14:33
@kyouko-taiga
Copy link

Beautiful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants