Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Possible JIT optimizations #7588

@headius

Description

@headius

This is a list of optimizations I see could be done in the JIT but which require more work than just on the code that the JIT emits (i.e. specialized invokedynamic call sites or helper code).

  • Calls to block_given? currently cause the method to deoptimize, since we need the method's frame to be able to retrieve the block from RubyKernel#block_given_p. This is unnecessary when we are in a normal method scope; we can just check the passed-in block directly, avoiding the deoptimization. This requires work in IR (use BlockGivenInstr rather than call when in a method scope, possibly with a guard for user-defined block_given?) (Implemented for bare block_given? calls in Implement block_given? call as optimized instruction #8170)
  • Simplify BNEInstr forms to eliminate non-identity comparisons #8189
  • BuildCompoundStringInstr emits a lot of code for each element being appended. This could be a single indy call with all inputs (if the order of evaluation and appending is not important) or N indy calls that do all of the coersion and appending in one shot.
    • Part of JIT size and perf improvements #7589 reduces allocation and bytecode by pushing frozen strings for components, but it still uses encCrStrBufCat which has a lot of complex logic for encoding and CR negotiation that is not really needed here.
    • The bulk of this work will land with Optimizations for dynamic string building #8180 which pushes most of the string-building work into an invokedynamic call site, eliminating all static string and append code in the jitted code.
  • Keyword arguments "setCallInfo" could be rolled into the call operation itself; with indy that would eliminate the extra bytecode altogether, and without indy it could still eliminate the flags push by having specialized "setCallInfo" for different flags. This will eventually be moot once we push kwarg descriptors through all call paths. (completed in More indy call optz #7720)
  • Interpolated strings could profile their final length, allocating that length for future interpolations. This would eliminate all but the first allocation of the resulting string data. It should be designed with some safety tolerances in place, e.g. not allocating gigantic strings forever because one case did a gigantic string.
  • BuildRangeInstr could have specialized versions for embeddable literals as either begin or end, avoiding the bytecode needed to emit such values only to consume them in the Range. (Implemented for fixnum and string ranges in Simplify fixnum and string ranges #8176) (Additional tweaks also handled endless and beginless fixnum ranges. We will wait and see if any other forms are useful.)
  • Class variables are currently uncached and not structured in a way that would lend itself to caching. More needs to be done than in just bytecode, but these could potentially be cached forever since they rarely refer to multiple values from a given call site.
  • Global variables are likewise largely uncached, due to races and design issues with the current structure used to store them. True global variables could be cached nearly forever, and local global variables can be compiled to less intrusive state accesses.
  • method_missing may be poorly optimized in the indy JIT, and has only basic optimizations (caching) in non-indy JIT. Ideally it should inline any trivial Ruby method_missing target.
  • Specialized return values to reduce bytecode: for example, a method with no result could be called as void to avoid popping, or a method immediately used in a conditional could be called with a boolean return value and avoid calling isTrue. Calls guaranteed to return specific types could return those types and avoid a checkcast.
  • Indy call sites can directly bind to only one arity #8487
  • Splitting of block-receiving methods and polymorphic methods, similar to TruffleRuby.
  • Java methods no longer are optimizing in invokedynamic call sites. Further, they never handled more than one arity when they did optimize. (Basic support restored in Java call optimizations #7789)
  • Restore direct indy binding of user-defined method_missing. This was removed temporarily in Fix recent regressions on master #7797 due to it breaking the argument list aggregated by a core method_missing error (which showed up during Add more testing for invokedynamic modes #7732).
  • Leaf closure scopes should never need to push a new DynamicScope. Currently this works except when any instruction that must access the parent dynamic scope itself (not its variables) appears in a closure body. For example, adding a non-local return to an otherwise leafy scope will force it to allocate and use its own DynamicScope (early return from a block is slow #5933).
  • Proper shape caching and per-object shapes. Some work has been done toward this end in Improvements to instance variable shaping #7516, with an old bug describing a need for shared shape caching in Instance vars on dup'ed classes should cache the same #156.
  • String shaping optimizations. Frozen strings already have some specialized shapes, but we could do more to cache hashcode etc inside those different shapes. We also could implement "embedded" strings that put small strings into the "header" of the object as in CRuby. See https://bugs.ruby-lang.org/issues/20415 for an example. An attempt that does not appear to work is here: https://gist.github.com/headius/b4a8967b7e3bfbc9dc7aab7d5fa491ec
  • Optimized argument forwarding with ... as described in https://bugs.ruby-lang.org/issues/20425.
  • Literal collections with literal elements could use reduced bytecode by embedding the literals into the indy call site that constructs the collection (e.g. [1, 2, 3] could be a single indy instruction with embedded longs). Similar ideas implemented for CRuby in Optimize compilation of large literal arrays ruby/ruby#9721. This could also include a fast path for building a Hash with all literal Symbol keys, since that appears to be a frequent pattern
  • Super methods, refined methods and sends do not inline. Super will usually be monomorphic, or low-morphic. Refined methods will usually be monomorphic, since once bound to a scope they will remain bound to that scope. Sends have a potential to become megamorphic but will frequently be used for only a few targets; even megamorphic cases could be optimized better via a dispatch chain or balanced search tree.
  • The "normal" compilers in the JIT need more testing and could be optimized better; cached lazy values could be static final and the script could construct them on initialization so they would fold.
  • Methods that access other frame data need specialized call sites. A few examples: __dir__ and attrs need access to the file (Inconsistency between MRI and JRuby source location. #8079), refined methods need access to the scope. These currently force either a frame or a dynamic scope or need a backtrace (not provided by JIT). None of them should need to trigger deopt just to pass readily-available values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions