Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jhawthorn
Copy link
Member

@jhawthorn jhawthorn commented Nov 19, 2021

Previously when instance_exec or instance_eval was called on an object, that object would be given a singleton class so that method definitions inside the block would be added to the object rather than its class.

This commit aims to improve performance by delaying the creation of the singleton class unless/until one is needed for method definition, based on the discussion in #18276. Most of the time instance_eval is used without any method definition.

This was implemented by adding a flag to the cref indicating that it represents a singleton of the object rather than a class itself. In this case CREF_CLASS returns the object's existing class, but in cases that we are defining a method (either via definemethod or VM_SPECIAL_OBJECT_CBASE which is used for undef and alias).

This also happens to fix what I believe is a bug. Previously instance_eval behaved differently with regards to constant access for true/false/nil than for all other objects. I don't think this was intentional.

String::Foo = "foo"
"".instance_eval("Foo")   # => "foo"
Integer::Foo = "foo"
123.instance_eval("Foo")  # => "foo"
TrueClass::Foo = "foo"
true.instance_eval("Foo") # NameError: uninitialized constant Foo

With this change TrueClass/NilClass/FalseClass behave the same as everything else.

This also slightly changes the error message when trying to define a method through instance_eval on an object which can't have a singleton class.

Before:

$ ruby -e '123.instance_eval { def foo; end }'
-e:1:in `block in <main>': no class/module to add method (TypeError)

After:

$ ./ruby -e '123.instance_eval { def foo; end }'
-e:1:in `block in <main>': can't define singleton (TypeError)

IMO this error is a small improvement on the original and better matches
the (both old and new) message when definging a method using def self.

$ ruby -e '123.instance_eval{ def self.foo; end }'
-e:1:in `block in <main>': can't define singleton (TypeError)

With this change we can observe that instance_eval doesn't change an object's class unless necessary.

Before

$ ruby -robjspace -e 'obj = Object.new; puts ObjectSpace.dump(obj); obj.instance_eval { self }; puts ObjectSpace.dump(obj)'
{"address":"0x562155967e00", "type":"OBJECT", "class":"0x56215571a8a0", "ivars":3, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x562155967e00", "type":"OBJECT", "class":"0x562155967ae0", "ivars":3, "memsize":40, "flags":{"wb_protected":true}}

(the "class" address changes)

After

$ ./ruby -robjspace -e 'obj = Object.new; puts ObjectSpace.dump(obj); obj.instance_eval { self }; puts ObjectSpace.dump(obj)'
{"address":"0x7fa089ce7698", "type":"OBJECT", "class":"0x7fa08d19e850", "ivars":3, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7fa089ce7698", "type":"OBJECT", "class":"0x7fa08d19e850", "ivars":3, "memsize":40, "flags":{"wb_protected":true}}

(the "class" address remains the same)


This should be particularly helpful for Rails apps, which use instance_eval as part of the ActiveSupport::Callbacks mechanism when provided a Proc (which is common for developers to do). Under the interpreter, this should be faster due to not allocating a new singleton, and keeping method entries and inline caches valid. Under both MJIT and YJIT this should be even more helpful as we'll be able to use jitted methods on objects which previously had been given singleton classes.

I ran railsbench from yjit-bench on this (on my local AMD zen2 Linux machine) and the numbers look great (great enough that I'd love someone to double check this because it's in "too good to be true" territory 😳 🤞)

Before

end_time="2021-11-18 15:57:24 PST (-0800)"
yjit_opts=""
ruby_version="ruby 3.1.0dev (2021-11-18T23:49:36Z lazy_singleton 8ba9639805) [x86_64-linux]"
git_branch="lazy_singleton"
git_commit="8ba9639805"

----------  -----------  ----------  ---------  ----------  -----------  ------------
bench       interp (ms)  stddev (%)  yjit (ms)  stddev (%)  interp/yjit  yjit 1st itr
railsbench  2092.0       1.0         1644.6     1.8         1.27         1.24
----------  -----------  ----------  ---------  ----------  -----------  ------------
Legend:
- interp/yjit: ratio of interp/yjit time. Higher is better. Above 1 represents a speedup.
- 1st itr: ratio of interp/yjit time for the first benchmarking iteration.

After

end_time="2021-11-18 16:03:25 PST (-0800)"
yjit_opts=""
ruby_version="ruby 3.1.0dev (2021-11-18T21:57:23Z lazy_singleton f09b438e6b) [x86_64-linux]"
git_branch="lazy_singleton"
git_commit="f09b438e6b"

----------  -----------  ----------  ---------  ----------  -----------  ------------
bench       interp (ms)  stddev (%)  yjit (ms)  stddev (%)  interp/yjit  yjit 1st itr
railsbench  1908.1       1.1         1415.9     1.6         1.35         1.29
----------  -----------  ----------  ---------  ----------  -----------  ------------
Legend:
- interp/yjit: ratio of interp/yjit time. Higher is better. Above 1 represents a speedup.
- 1st itr: ratio of interp/yjit time for the first benchmarking iteration.

So this change makes YJIT 1.16x faster than it was previously, and the interpreter 1.09x faster than it used to be! (and for fun old_interp/new_yjit = 1.47)

Ref: https://bugs.ruby-lang.org/issues/18354

Previously when instance_exec or instance_eval was called on an object,
that object would be given a singleton class so that method
definitions inside the block would be added to the object rather than
its class.

This commit aims to improve performance by delaying the creation of the
singleton class unless/until one is needed for method definition. Most
of the time instance_eval is used without any method definition.

This was implemented by adding a flag to the cref indicating that it
represents a singleton of the object rather than a class itself. In this
case CREF_CLASS returns the object's existing class, but in cases that
we are defining a method (either via definemethod or
VM_SPECIAL_OBJECT_CBASE which is used for undef and alias).

This also happens to fix what I believe is a bug. Previously
instance_eval behaved differently with regards to constant access for
true/false/nil than for all other objects. I don't think this was
intentional.

    String::Foo = "foo"
    "".instance_eval("Foo")   # => "foo"
    Integer::Foo = "foo"
    123.instance_eval("Foo")  # => "foo"
    TrueClass::Foo = "foo"
    true.instance_eval("Foo") # NameError: uninitialized constant Foo

This also slightly changes the error message when trying to define a method
through instance_eval on an object which can't have a singleton class.

Before:

    $ ruby -e '123.instance_eval { def foo; end }'
    -e:1:in `block in <main>': no class/module to add method (TypeError)

After:

    $ ./ruby -e '123.instance_eval { def foo; end }'
    -e:1:in `block in <main>': can't define singleton (TypeError)

IMO this error is a small improvement on the original and better matches
the (both old and new) message when definging a method using `def self.`

    $ ruby -e '123.instance_eval{ def self.foo; end }'
    -e:1:in `block in <main>': can't define singleton (TypeError)

Co-authored-by: Matthew Draper <[email protected]>
vm_eval.c Outdated
//rb_obj_info_dump(under);
// Make crefs log that this is a special lazy singleton?

cref = vm_cref_push(ec, under, ep, TRUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to add an assertion VM_ASSERT(singleton ? under == CLASS_OF(self) : under == self). Personally I like to remove under argument and pass CLASS_OF(self) to vm_cref_push when singleton is true, but I don't know whether @ko1 likes.

break;
}
cref = CREF_NEXT(cref);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ko1 Can CREF_CLASS(cref) be no longer zero? According to the code coverage, this loop seems indeed unused.

https://rubyci.s3.amazonaws.com/coverage-latest-html/ruby/vm_insnhelper.c.gcov.html#920

image

Copy link
Member Author

@jhawthorn jhawthorn Nov 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. I couldn't find anywhere this happened (even before this change) and added an assertion to cref_new that klass was never 0 https://github.com/ruby/ruby/pull/5146/files#diff-2af2e7f2e1c28da5e9d99ad117cba1c4dabd8b0bc3081da88e414c55c6aa9549R251

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea... Let's try!

Copy link
Member

@mame mame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Rails is using instance_eval too much :trollface: Great improvement! 👏

@mame
Copy link
Member

mame commented Nov 21, 2021

Oops, this change makes constant access slow.

# before the patch
$ time ./miniruby.orig -e 'C = 1; 100000000.times { C }'

real    0m2.716s
user    0m2.707s
sys     0m0.008s

# after the patch
$ time ./miniruby -e 'C = 1; 100000000.times { C }'

real    0m3.068s
user    0m3.064s
sys     0m0.005s

Can you reproduce this on your machine?

I thought opt_getinlinecache would work, but looks like it does not actually? I have no idea why.

@jhawthorn
Copy link
Member Author

jhawthorn commented Nov 21, 2021

Can you reproduce this on your machine?

I thought opt_getinlinecache would work, but looks like it does not actually? I have no idea why.

Yes. I do see a slowdown as well 😩. I actually don't think it's the constant access that's slow but the .times loop itself... I don't yet understand why. I'll investigate this more this week.

$ time ./miniruby.orig -e '100000000.times { }'
./miniruby.orig -e '100000000.times { }'  2.25s user 0.00s system 99% cpu 2.255 total

$ time ./miniruby -e '100000000.times { }'
./miniruby -e '100000000.times { }'  2.45s user 0.00s system 99% cpu 2.459 total

EDIT: it seems to have something to do with what gcc happens to want to inline.

@jhawthorn
Copy link
Member Author

jhawthorn commented Nov 22, 2021

I've included suggestions and think I have a better understanding of the performance change of that loop. I think what we're seeing is just a change to GCC's inlining.

The change seems to be visible just from a .times loop. I don't see much of a difference with or without the constant access (opt_getinlinecache is working).

$ time ./miniruby.orig -e '100000000.times { }'
./miniruby.orig -e '100000000.times { }'  2.25s user 0.00s system 99% cpu 2.255 total
$ time ./miniruby -e '100000000.times { }'
./miniruby -e '100000000.times { }'  2.45s user 0.00s system 99% cpu 2.459 total

Comparing the two using perf record/perf diff I get:

$ perf diff perf.data.orig perf.data
# Event 'cycles:u'
#
# Baseline  Delta Abs  Shared Object     Symbol
# ........  .........  ................  .....................................
#
               +4.42%  miniruby          [.] CALLER_SETUP_ARG
    21.51%     -3.57%  miniruby          [.] rb_vm_exec
     7.68%     +3.29%  miniruby          [.] vm_yield_setup_args
    44.22%     -2.31%  miniruby          [.] rb_yield_1
    24.13%     -2.11%  miniruby          [.] vm_exec_core
     0.15%     +0.24%  miniruby          [.] int_dotimes

What's interesting is we see CALLER_SETUP_ARG which is "new" and more time spent in vm_yield_setup_args though I think those aren't actually new but were previously being inlined into parent methods (why we see those sped up).

To test this assumption I made a commit jhawthorn@dc7f771 which inlines CALLER_SETUP_ARG and vm_callee_setup_block_arg, getting us similar performance to before (and the perf diff looks similar as well):

$ time ./miniruby -e '100000000.times { }'
./miniruby -e '100000000.times { }'  2.21s user 0.00s system 99% cpu 2.213 total

This took some trial and error I happened to find that inlining vm_yield_setup_args resulted in an even faster loop than before (e7a31a8)

$ time ./miniruby -e '100000000.times { }'
./miniruby -e '100000000.times { }'  2.00s user 0.00s system 99% cpu 1.997 total

What do you recommend? I don't love using FORCE_INLINE, but relying on GCC's decision making seems brittle here.

insns.def Outdated
(rb_num_t value_type)
()
(VALUE val)
// attr bool leaf = (value_type != VM_SPECIAL_OBJECT_CBASE); /* get cbase may allocate a singleton */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocation violate the leaf assumption?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah maybe it does not I was mistaken, but I think this can also raise in the case of special constant so can't be leaf. I will update the comment.

@ko1
Copy link
Contributor

ko1 commented Dec 1, 2021

I couldn't check everything (I forget this area...), but it seems okay.

@jhawthorn jhawthorn merged commit 733500e into ruby:master Dec 2, 2021
@ko1
Copy link
Contributor

ko1 commented Dec 3, 2021

Could you note the improvement in NEWS performance section?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants