JIT size and perf improvements #7589

headius · 2023-01-20T19:00:29Z

Working on various JIT improvements, to produce more efficient code and less of it.

See #7588 for additional ideas that could be added here.

This optimization allows AsString to skip doing the dynamic call to `to_s` when the target object is a natural String. It does not currently work in refined scopes, like most indy method caching, which makes this an optimization-in-waiting since AsString is only used for possibly-refined code.

This reduces the bytecode emitted for DynamicScope variable accesses by doing the following: * The depth and location are embedded into an indy call site rather than being pushed and passed every time. * The nil retrieval is done as part of the site to reduce the entire operation to two pushes and an invokedynamic. This does not have much affect on methods that do not contain blocks or eval, since they will use JVM local variables. Methods that contain blocks improve by not emitting the location when accessing variables beyond the 10th in scope. Blocks improve by not emitting depth push, location (beyond 10th in scope), nor nil retrieval calls.

Starting with an empty string saves us that initial bytelist and byte[] allocation, but it is immediately made moot when we append the first element of the compound string. This commit instead starts with the compound string estimated size.

We have always pushed both the receiver object and the caller object for all calls in the JIT, due to normal non-functional calls needing the caller to check visibility. However this is a wasted load for functional dispatches that do not check visibility. Removing this unused argument shrinks all functional calls by one ALOAD bytecode.

Code like `a, b, c = 1` can be compiled as assigning 1 to a and nil to b and c, without creating an intermediate array. This skips any call to `to_ary` which we may need to guard for, but altering that method for any literal types will break most Ruby code. We could also insert a builtin check here and only do the to_ary if it might have unknown side effects.

The logic for ivar get and set did some extra stack gymnastics to avoid evaluating the target object twice. However, all known cases will have the target object either be literally `self` or a local variable containing an inlined `self`. Since these cases can all copy propagate, we can reevaluate them safely and avoid having to introduce a temporary variable or juggle the stack around. This also allows calling the VariableAccessor methods directly, since the stack order now matches without manipulation. Indy mode just does all this inside the call site.

* When the static components are FrozenString, emit as such to avoid allocating a new ByteList every time. * Construct the bufferString from scratch every time, so we can properly right-size the buffer. The previous code used a cached ByteList of the appropriate size, but since it was shared any modification immediately caused it to reallocate as a too-small string. These changes roughly double the performance of a string like "foo#{$$}bar" in a simple benchmark on OpenJDK 19.

This eliminates one load of a boolean value for splats of the form [*ary].

This appears to only be used by the inliner.

Not all IR instructions do the same amount of work in bytecode, so the actual bytecode size resulting from similarly-sized (in IR) methods will vary, but this does allow a key lexer method from the parser gem to compile (at 2998 IR instructions), and the resulting bytecode also appears to native JIT. We could probably go higher, but we would want to make sure that larger methods still can native JIT before blindly increasing the threshold. Relates to #7589 and work to get the parser gem to fully JIT.

Not all IR instructions do the same amount of work in bytecode, so the actual bytecode size resulting from similarly-sized (in IR) methods will vary, but this does allow a key lexer method from the parser gem to compile (at 2998 IR instructions), and the resulting bytecode also appears to native JIT. We could probably go higher, but we would want to make sure that larger methods still can native JIT before blindly increasing the threshold. Relates to jruby#7589 and work to get the parser gem to fully JIT.

Original logic used hardcoded argument offsets, which was broken by my work in jruby#7589 that removed the "caller" argument from self- calls that don't need to check visibility. The new logic uses SmartBinder and Signature to manipulate incoming arguments by name rather than by index. Fixes jruby#7642

In jruby#7589 I reduced the call site size for functional calls ("self" calls) by omitting the caller argument normally used to check visibility. This change was never applied to the Java invokedynamic binding logic, resulting in arity mismatches when a Java method is called from within an instance of itself or when calling a Java method through an alias, since the alias is rebound as a self call. This change specializes the number of arguments to drop from the beginning of the argument list based additionally upon whether it is a functional call site: * static and functional: drop 2 (context and self) * static and normal: drop 3 (context and caller and self) * instance and functional: drop 1 (context) * instance and normal: drop 2 (context and caller)

headius added this to the JRuby 9.4.1.0 milestone Jan 20, 2023

headius force-pushed the jit_optz branch from 620edc0 to b2d9f0f Compare January 20, 2023 19:23

headius added 2 commits January 20, 2023 18:20

Use estimated size for initial string

cace829

Starting with an empty string saves us that initial bytelist and byte[] allocation, but it is immediately made moot when we append the first element of the compound string. This commit instead starts with the compound string estimated size.

headius force-pushed the jit_optz branch from f6b2083 to cace829 Compare January 21, 2023 00:20

Fix AsString and encoding push for normal mode

423f0f9

headius force-pushed the jit_optz branch 5 times, most recently from 4a07e11 to 724f935 Compare January 23, 2023 04:28

headius force-pushed the jit_optz branch 3 times, most recently from 302df91 to 41a4cc2 Compare January 23, 2023 17:40

headius added 2 commits January 23, 2023 13:14

Use lambda

ed0fa29

headius force-pushed the jit_optz branch from 41a4cc2 to ed0fa29 Compare January 23, 2023 19:14

headius mentioned this pull request Jan 24, 2023

Possible JIT optimizations #7588

Open

22 tasks

headius added 4 commits January 23, 2023 23:21

Add Range support to dumper

dc99536

Specialize array splat logic

21f5b88

This eliminates one load of a boolean value for splats of the form [*ary].

Optimize args array arity checking

947eb74

This appears to only be used by the inliner.

headius force-pushed the jit_optz branch from 60649fb to 947eb74 Compare January 24, 2023 07:18

Move indy logic to indy branch compiler

56b1e92

headius merged commit 5edda1d into jruby:master Feb 6, 2023

headius deleted the jit_optz branch February 6, 2023 08:35

headius mentioned this pull request Feb 9, 2023

[9.4.1.0] ArrayIndexOutOfBoundsException #7642

Closed

headius mentioned this pull request Feb 9, 2023

Use SmartBinder to set up struct calls #7643

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT size and perf improvements #7589

JIT size and perf improvements #7589

headius commented Jan 20, 2023

JIT size and perf improvements #7589

JIT size and perf improvements #7589

Conversation

headius commented Jan 20, 2023