Thanks to visit codestin.com
Credit goes to github.com

Skip to content

JIT size and perf improvements #7589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 6, 2023
Merged

JIT size and perf improvements #7589

merged 13 commits into from
Feb 6, 2023

Conversation

headius
Copy link
Member

@headius headius commented Jan 20, 2023

Working on various JIT improvements, to produce more efficient code and less of it.

See #7588 for additional ideas that could be added here.

This optimization allows AsString to skip doing the dynamic call
to `to_s` when the target object is a natural String. It does not
currently work in refined scopes, like most indy method caching,
which makes this an optimization-in-waiting since AsString is only
used for possibly-refined code.
@headius headius added this to the JRuby 9.4.1.0 milestone Jan 20, 2023
This reduces the bytecode emitted for DynamicScope variable
accesses by doing the following:

* The depth and location are embedded into an indy call site
  rather than being pushed and passed every time.
* The nil retrieval is done as part of the site to reduce the
  entire operation to two pushes and an invokedynamic.

This does not have much affect on methods that do not contain
blocks or eval, since they will use JVM local variables. Methods
that contain blocks improve by not emitting the location when
accessing variables beyond the 10th in scope. Blocks improve by
not emitting depth push, location (beyond 10th in scope), nor nil
retrieval calls.
Starting with an empty string saves us that initial bytelist and
byte[] allocation, but it is immediately made moot when we append
the first element of the compound string. This commit instead
starts with the compound string estimated size.
@headius headius force-pushed the jit_optz branch 5 times, most recently from 4a07e11 to 724f935 Compare January 23, 2023 04:28
We have always pushed both the receiver object and the caller
object for all calls in the JIT, due to normal non-functional
calls needing the caller to check visibility. However this is a
wasted load for functional dispatches that do not check
visibility. Removing this unused argument shrinks all functional
calls by one ALOAD bytecode.
@headius headius force-pushed the jit_optz branch 3 times, most recently from 302df91 to 41a4cc2 Compare January 23, 2023 17:40
Code like `a, b, c = 1` can be compiled as assigning 1 to a and
nil to b and c, without creating an intermediate array. This skips
any call to `to_ary` which we may need to guard for, but altering
that method for any literal types will break most Ruby code. We
could also insert a builtin check here and only do the to_ary if
it might have unknown side effects.
The logic for ivar get and set did some extra stack gymnastics to
avoid evaluating the target object twice. However, all known cases
will have the target object either be literally `self` or a local
variable containing an inlined `self`. Since these cases can all
copy propagate, we can reevaluate them safely and avoid having to
introduce a temporary variable or juggle the stack around.

This also allows calling the VariableAccessor methods directly,
since the stack order now matches without manipulation.

Indy mode just does all this inside the call site.
@headius headius mentioned this pull request Jan 24, 2023
22 tasks
* When the static components are FrozenString, emit as such to
  avoid allocating a new ByteList every time.
* Construct the bufferString from scratch every time, so we can
  properly right-size the buffer. The previous code used a cached
  ByteList of the appropriate size, but since it was shared any
  modification immediately caused it to reallocate as a too-small
  string.

These changes roughly double the performance of a string like
"foo#{$$}bar" in a simple benchmark on OpenJDK 19.
This eliminates one load of a boolean value for splats of the form
[*ary].
This appears to only be used by the inliner.
@headius headius merged commit 5edda1d into jruby:master Feb 6, 2023
@headius headius deleted the jit_optz branch February 6, 2023 08:35
headius added a commit that referenced this pull request Feb 6, 2023
Not all IR instructions do the same amount of work in bytecode,
so the actual bytecode size resulting from similarly-sized (in IR)
methods will vary, but this does allow a key lexer method from the
parser gem to compile (at 2998 IR instructions), and the resulting
bytecode also appears to native JIT. We could probably go higher,
but we would want to make sure that larger methods still can
native JIT before blindly increasing the threshold.

Relates to #7589 and work to get the parser gem to fully JIT.
edipofederle pushed a commit to edipofederle/jruby that referenced this pull request Feb 8, 2023
Not all IR instructions do the same amount of work in bytecode,
so the actual bytecode size resulting from similarly-sized (in IR)
methods will vary, but this does allow a key lexer method from the
parser gem to compile (at 2998 IR instructions), and the resulting
bytecode also appears to native JIT. We could probably go higher,
but we would want to make sure that larger methods still can
native JIT before blindly increasing the threshold.

Relates to jruby#7589 and work to get the parser gem to fully JIT.
headius added a commit to headius/jruby that referenced this pull request Feb 9, 2023
Original logic used hardcoded argument offsets, which was broken
by my work in jruby#7589 that removed the "caller" argument from self-
calls that don't need to check visibility.

The new logic uses SmartBinder and Signature to manipulate
incoming arguments by name rather than by index.

Fixes jruby#7642
headius added a commit to headius/jruby that referenced this pull request Aug 29, 2023
In jruby#7589 I reduced the call site size for functional
calls ("self" calls) by omitting the caller argument normally
used to check visibility. This change was never applied to the
Java invokedynamic binding logic, resulting in arity mismatches
when a Java method is called from within an instance of itself or
when calling a Java method through an alias, since the alias
is rebound as a self call.

This change specializes the number of arguments to drop from the
beginning of the argument list based additionally upon whether it
is a functional call site:

* static and functional: drop 2 (context and self)
* static and normal: drop 3 (context and caller and self)
* instance and functional: drop 1 (context)
* instance and normal: drop 2 (context and caller)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant