-
-
Notifications
You must be signed in to change notification settings - Fork 926
JIT size and perf improvements #7589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This optimization allows AsString to skip doing the dynamic call to `to_s` when the target object is a natural String. It does not currently work in refined scopes, like most indy method caching, which makes this an optimization-in-waiting since AsString is only used for possibly-refined code.
This reduces the bytecode emitted for DynamicScope variable accesses by doing the following: * The depth and location are embedded into an indy call site rather than being pushed and passed every time. * The nil retrieval is done as part of the site to reduce the entire operation to two pushes and an invokedynamic. This does not have much affect on methods that do not contain blocks or eval, since they will use JVM local variables. Methods that contain blocks improve by not emitting the location when accessing variables beyond the 10th in scope. Blocks improve by not emitting depth push, location (beyond 10th in scope), nor nil retrieval calls.
Starting with an empty string saves us that initial bytelist and byte[] allocation, but it is immediately made moot when we append the first element of the compound string. This commit instead starts with the compound string estimated size.
4a07e11
to
724f935
Compare
We have always pushed both the receiver object and the caller object for all calls in the JIT, due to normal non-functional calls needing the caller to check visibility. However this is a wasted load for functional dispatches that do not check visibility. Removing this unused argument shrinks all functional calls by one ALOAD bytecode.
302df91
to
41a4cc2
Compare
Code like `a, b, c = 1` can be compiled as assigning 1 to a and nil to b and c, without creating an intermediate array. This skips any call to `to_ary` which we may need to guard for, but altering that method for any literal types will break most Ruby code. We could also insert a builtin check here and only do the to_ary if it might have unknown side effects.
The logic for ivar get and set did some extra stack gymnastics to avoid evaluating the target object twice. However, all known cases will have the target object either be literally `self` or a local variable containing an inlined `self`. Since these cases can all copy propagate, we can reevaluate them safely and avoid having to introduce a temporary variable or juggle the stack around. This also allows calling the VariableAccessor methods directly, since the stack order now matches without manipulation. Indy mode just does all this inside the call site.
* When the static components are FrozenString, emit as such to avoid allocating a new ByteList every time. * Construct the bufferString from scratch every time, so we can properly right-size the buffer. The previous code used a cached ByteList of the appropriate size, but since it was shared any modification immediately caused it to reallocate as a too-small string. These changes roughly double the performance of a string like "foo#{$$}bar" in a simple benchmark on OpenJDK 19.
This eliminates one load of a boolean value for splats of the form [*ary].
This appears to only be used by the inliner.
headius
added a commit
that referenced
this pull request
Feb 6, 2023
Not all IR instructions do the same amount of work in bytecode, so the actual bytecode size resulting from similarly-sized (in IR) methods will vary, but this does allow a key lexer method from the parser gem to compile (at 2998 IR instructions), and the resulting bytecode also appears to native JIT. We could probably go higher, but we would want to make sure that larger methods still can native JIT before blindly increasing the threshold. Relates to #7589 and work to get the parser gem to fully JIT.
edipofederle
pushed a commit
to edipofederle/jruby
that referenced
this pull request
Feb 8, 2023
Not all IR instructions do the same amount of work in bytecode, so the actual bytecode size resulting from similarly-sized (in IR) methods will vary, but this does allow a key lexer method from the parser gem to compile (at 2998 IR instructions), and the resulting bytecode also appears to native JIT. We could probably go higher, but we would want to make sure that larger methods still can native JIT before blindly increasing the threshold. Relates to jruby#7589 and work to get the parser gem to fully JIT.
headius
added a commit
to headius/jruby
that referenced
this pull request
Feb 9, 2023
Original logic used hardcoded argument offsets, which was broken by my work in jruby#7589 that removed the "caller" argument from self- calls that don't need to check visibility. The new logic uses SmartBinder and Signature to manipulate incoming arguments by name rather than by index. Fixes jruby#7642
headius
added a commit
to headius/jruby
that referenced
this pull request
Aug 29, 2023
In jruby#7589 I reduced the call site size for functional calls ("self" calls) by omitting the caller argument normally used to check visibility. This change was never applied to the Java invokedynamic binding logic, resulting in arity mismatches when a Java method is called from within an instance of itself or when calling a Java method through an alias, since the alias is rebound as a self call. This change specializes the number of arguments to drop from the beginning of the argument list based additionally upon whether it is a functional call site: * static and functional: drop 2 (context and self) * static and normal: drop 3 (context and caller and self) * instance and functional: drop 1 (context) * instance and normal: drop 2 (context and caller)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Working on various JIT improvements, to produce more efficient code and less of it.
See #7588 for additional ideas that could be added here.