Possible JIT optimizations

This is a list of optimizations I see could be done in the JIT but which require more work than just on the code that the JIT emits (i.e. specialized invokedynamic call sites or helper code).

* [x] Calls to `block_given?` currently cause the method to deoptimize, since we need the method's frame to be able to retrieve the block from `RubyKernel#block_given_p`. This is unnecessary when we are in a normal method scope; we can just check the passed-in block directly, avoiding the deoptimization. This requires work in IR (use BlockGivenInstr rather than call when in a method scope, possibly with a guard for user-defined `block_given?`) (Implemented for bare `block_given?` calls in #8170)
* [ ] #8189
* [x] BuildCompoundStringInstr emits a lot of code for each element being appended. This could be a single indy call with all inputs (if the order of evaluation and appending is not important) or N indy calls that do all of the coersion and appending in one shot.
  * Part of #7589 reduces allocation and bytecode by pushing frozen strings for components, but it still uses encCrStrBufCat which has a lot of complex logic for encoding and CR negotiation that is not really needed here.
  * The bulk of this work will land with #8180 which pushes most of the string-building work into an invokedynamic call site, eliminating all static string and append code in the jitted code.
* [x] Keyword arguments "setCallInfo" could be rolled into the call operation itself; with indy that would eliminate the extra bytecode altogether, and without indy it could still eliminate the flags push by having specialized "setCallInfo" for different flags. This will eventually be moot once we push kwarg descriptors through all call paths. (completed in #7720)
* [ ] Interpolated strings could profile their final length, allocating that length for future interpolations. This would eliminate all but the first allocation of the resulting string data. It should be designed with some safety tolerances in place, e.g. not allocating gigantic strings forever because one case did a gigantic string.
* [x] BuildRangeInstr could have specialized versions for embeddable literals as either begin or end, avoiding the bytecode needed to emit such values only to consume them in the Range. (Implemented for fixnum and string ranges in #8176) (Additional tweaks also handled endless and beginless fixnum ranges. We will wait and see if any other forms are useful.)
* [ ] Class variables are currently uncached and not structured in a way that would lend itself to caching. More needs to be done than in just bytecode, but these could potentially be cached forever since they rarely refer to multiple values from a given call site.
* [ ] Global variables are likewise largely uncached, due to races and design issues with the current structure used to store them. True global variables could be cached nearly forever, and local global variables can be compiled to less intrusive state accesses.
* [ ] method_missing may be poorly optimized in the indy JIT, and has only basic optimizations (caching) in non-indy JIT. Ideally it should inline any trivial Ruby method_missing target.
* [ ] Specialized return values to reduce bytecode: for example, a method with no result could be called as `void` to avoid popping, or a method immediately used in a conditional could be called with a `boolean` return value and avoid calling `isTrue`. Calls guaranteed to return specific types could return those types and avoid a `checkcast`.
* [ ] #8487
* [ ] Splitting of block-receiving methods and polymorphic methods, similar to TruffleRuby.
* [ ] Java methods no longer are optimizing in invokedynamic call sites. Further, they never handled more than one arity when they did optimize. (Basic support restored in https://github.com/jruby/jruby/pull/7789)
* [ ] Restore direct indy binding of user-defined method_missing. This was removed temporarily in jruby/jruby#7797 due to it breaking the argument list aggregated by a core method_missing error (which showed up during jruby/jruby#7732).
* [ ] Leaf closure scopes should never need to push a new DynamicScope. Currently this works except when any instruction that must access the parent dynamic scope itself (not its variables) appears in a closure body. For example, adding a non-local `return` to an otherwise leafy scope will force it to allocate and use its own DynamicScope (#5933).
* [ ] Proper shape caching and per-object shapes. Some work has been done toward this end in https://github.com/jruby/jruby/pull/7516, with an old bug describing a need for shared shape caching in https://github.com/jruby/jruby/issues/156.
* [ ] String shaping optimizations. Frozen strings already have some specialized shapes, but we could do more to cache hashcode etc inside those different shapes. We also could implement "embedded" strings that put small strings into the "header" of the object as in CRuby. See https://bugs.ruby-lang.org/issues/20415 for an example. An attempt that does not appear to work is here: https://gist.github.com/headius/b4a8967b7e3bfbc9dc7aab7d5fa491ec
* [ ] Optimized argument forwarding with `...` as described in https://bugs.ruby-lang.org/issues/20425.
* [ ] Literal collections with literal elements could use reduced bytecode by embedding the literals into the indy call site that constructs the collection (e.g. [1, 2, 3] could be a single indy instruction with embedded longs). Similar ideas implemented for CRuby in https://github.com/ruby/ruby/pull/9721. This could also include a fast path for building a Hash with all literal Symbol keys, since that appears to be a frequent pattern
* [ ] Super methods, refined methods and sends do not inline. Super will usually be monomorphic, or low-morphic. Refined methods will usually be monomorphic, since once bound to a scope they will remain bound to that scope. Sends have a potential to become megamorphic but will frequently be used for only a few targets; even megamorphic cases could be optimized better via a dispatch chain or balanced search tree.
* [ ] The "normal" compilers in the JIT need more testing and could be optimized better; cached lazy values could be static final and the script could construct them on initialization so they would fold.
* [ ] Methods that access other frame data need specialized call sites. A few examples: `__dir__` and attrs need access to the file (https://github.com/jruby/jruby/issues/8079), refined methods need access to the scope. These currently force either a frame or a dynamic scope or need a backtrace (not provided by JIT). None of them should need to trigger deopt just to pass readily-available values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Possible JIT optimizations #7588

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Possible JIT optimizations #7588

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions