-
Notifications
You must be signed in to change notification settings - Fork 396
Compress SourceMapWriter.Fragment data #4965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Draft because:
|
Those numbers look awesome!
🤣 |
It was just an inverted condition. In fact, it produced no source maps :P |
Seems like this is also faster :) (attn: cropped plot, 4 outliers dropped). EDIT: These are incremental timings (I have not measured batch): |
@@ -41,9 +41,11 @@ final class BasicLinkerBackend(config: LinkerBackendImpl.Config) | |||
private[this] var totalModules = 0 | |||
private[this] val rewrittenModules = new AtomicInteger(0) | |||
|
|||
private[this] val fragmentIndex = new SourceMapWriter.Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: We accept leaking old sources / names in incremental runs here (similar to how NameGen leaks as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that this is only used (for writing) from a single thread at a time? Could you add a comment about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because the Emitter
is in fact single threaded. This is probably also something we should fix. I'll make Index
thread-safe and see if it affects the benchmarks. If it doesn't, we should probably keep the thread safe version :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach definitely looks good overall. I have a few localized comments.
if (pendingColumnInGenerated >= 0) | ||
doWriteSegment(pendingColumnInGenerated, pendingPos, pendingName) | ||
if (pendingColumnInGenerated >= 0) { | ||
// Allocate a name string *before* we write a fragment so we can cache it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this comments is trying to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obsolete with the revert. The idea was that we convert OriginalName
to String
before we create a fragment so that when we write a fragment, we already have a String
, so we're faster.
@@ -39,18 +40,18 @@ object SourceMapWriter { | |||
private final class NodePosStack { | |||
private var topIndex: Int = -1 | |||
private var posStack: Array[Position] = new Array(128) | |||
private var nameStack: Array[String] = new Array(128) | |||
private var nameStack: Array[OriginalName] = new Array(128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially problematic from a performance point of view. An array of some AnyVal
class is an array of its boxed instances. So at the bytecode level we need to allocate instances of OriginalName
to put them in this array.
The benchmarks suggest that this is not such a big deal in this case, or that it's well compensated by the other improvements. If it is the latter case, there is maybe more to gain by avoiding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ah. I did not realize this. I've reverted the first commit that turned this into an original name.
It probably does not show up in the performance profile because this happens pre-cache (almost exclusively). So in an incremental run, we only hit this code path for very little code.
// Source index field | ||
if (source eq lastSource) { // highly likely | ||
buffer(offset) = 'A' // 0 in Base64VLQ | ||
offset += 1 | ||
} else { | ||
val sourceIndex = sourceToIndex(source) | ||
val sourceIndex = outIndex.sourceToIndex(source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is probably a huge performance win to get here, if we reuse the integer index from the global Index
as the key for outIndex
. Basically making outIndex
a pair of Int -> Int
maps rather than of String -> Int
maps. And because the keys of such maps would be indices of a reified data structure, we can probably make those Int -> Int
maps raw Array[Int]
s. IIRC, these map lookups were sill significant in the profiles after all the optimizations, so getting rid of them could have a large impact.
This does not have to be in this PR, though, of course.
@@ -41,9 +41,11 @@ final class BasicLinkerBackend(config: LinkerBackendImpl.Config) | |||
private[this] var totalModules = 0 | |||
private[this] val rewrittenModules = new AtomicInteger(0) | |||
|
|||
private[this] val fragmentIndex = new SourceMapWriter.Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that this is only used (for writing) from a single thread at a time? Could you add a comment about that?
linker/shared/src/main/scala/org/scalajs/linker/backend/javascript/SourceMapWriter.scala
Show resolved
Hide resolved
var nameIndex: Int = 0 | ||
|
||
while (buf.hasRemaining()) { | ||
(buf.get(): @unchecked) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unchecked
is not necessary for Int
matches, is it?
@switch
would be relevant, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Fixed.
if ((value & ~127) != 0) | ||
buffer(offset) = ((value & 127) | 128).toByte | ||
else | ||
buffer(offset) = (value & 127).toByte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using hexadecimal notation when we use numbers for their bit patterns:
if ((value & ~127) != 0) | |
buffer(offset) = ((value & 127) | 128).toByte | |
else | |
buffer(offset) = (value & 127).toByte | |
if ((value & ~0x7f) != 0) | |
buffer(offset) = ((value & 0x7f) | 0x80).toByte | |
else | |
buffer(offset) = (value & 0x7f).toByte |
value >>>= 1 | ||
|
||
if (!neg) value | ||
else if (value == 0) Int.MinValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given what the numbers we read represent, Int.MinValue
is not a realistic input for writeRawVLQ
. Consider getting rid of this case to avoid one branch. (It might even remove two branches at the assembly level since it will basically become a SELECT
.)
We use a similar compression strategy than source maps themselves to reduce the in-memory footprint. After linking the test suite, residual memory usage is as follows: | what | main [MB] | PR [MB] | |-------------------|----------:|--------:| | sbt overall heap | 842 | 772 | | backend retained | 147 | 77 | | frontend retained | 215 | 215 |
Huh, the updated measurements look suspiciously like the baseline. (I'm willing to attribute the spread to the locking introduced). I'll benchmark just the original first commit (changing String to OriginalName), maybe it makes things faster? |
Ok, this is a bit annoying: It seems my benchmarking setup is insufficiently consistent: The spread I observe seems to be caused by the environment. However, I think for this PR it does not matter: We have sufficient evidence that it does not make matters worse, and the main goal is to reduce memory consumption, which it does. legend: pr original PR (old) |
We use a similar compression strategy than source maps themselves to reduce the in-memory footprint.
After linking the test suite, residual memory usage is as follows: