Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

osyoyu
Copy link
Contributor

@osyoyu osyoyu commented May 3, 2023

Add a new API rb_profile_thread_frames(), which is essentialy a per-thread version of rb_profile_frames().

While the original rb_profile_frames() always returns results about the current active thread obtained by GET_EC(), this new API takes a Thread to be profiled as an argument.

This should come in handy when profiling I/O-bound programs such as webapps, since this new API allows us to learn about Threads performing I/O (which do not have the GVL).

Profiling worker threads (such as Sidekiq workers) may be another application.

Implements [Feature #10602]

@osyoyu osyoyu force-pushed the rb_profile_thread_frames branch from dd981ff to e6f7244 Compare May 3, 2023 13:49
@osyoyu osyoyu force-pushed the rb_profile_thread_frames branch from e6f7244 to 75ce94f Compare May 11, 2023 19:44
@osyoyu osyoyu changed the title [Feature #10602] Add new API rb_profile_thread_frames() [Feature #10602] Add new API rb_thread_profile_frames() May 11, 2023
@nateberkopec
Copy link

For the use-case, see here.

@ivoanjo
Copy link
Contributor

ivoanjo commented May 12, 2023

This is also useful for us at Datadog for the profiler in the ddtrace gem.

In fact, I ended up copy-pasting rb_profile_frames and keeping a separate copy inside the gem to be able to support sampling other threads: https://github.com/DataDog/dd-trace-rb/blob/f9b0f45ed93aa6ec9d36afa7c9a3c4b5b4a4706d/ext/ddtrace_profiling_native_extension/private_vm_api_access.c#L410 (among some other fixes to make it behave exactly like Thread#backtrace, etc).

On an additional related note, the backtracie gem (authored by me and @KJTsanaktsidis) also reimplements rb_profile_frames, but changes the API entirely, see our RubyKaigi 2022 talk for a discussion of why ;)

Edit: Our use-case for ddtrace is to build an accurate wall-time profiler, e.g. be also able to look at stack traces for threads that are not currently executing.

@casperisfine
Copy link
Contributor

I'm very much in favor of this. A few notes.

1: This is slightly different from https://bugs.ruby-lang.org/issues/10602, as it accept VALUE thread instead of rb_thread_t *th. No big deal but it would need to be clarified if this is meant to be discussed at a developer meeting.

2: I see the need to profile only a particular thread, but I also see the need to profile all threads Thread.list.each { |th| capture(th) }. To do this efficiently in stackprof and other profilers, I think we'd need an efficient way to iterate over the live threads from C, and get some identifier for them. Alternatively, profilers can snapshot Thread.list when starting, but then it won't work well if the code being profiled spawns new threads.

3: There is the question of what to report when a thread is waiting for the GVL or stopped because GC is running. Currently since you only profile the running thread, it's not much of a consideration. But if you start profiling a specific thread, you need to properly report when it wasn't executing. Perhaps with a special frame? Perhaps with another API?

@ivoanjo
Copy link
Contributor

ivoanjo commented May 18, 2023

Great points!

2: I see the need to profile only a particular thread, but I also see the need to profile all threads Thread.list.each { |th| capture(th) }. To do this efficiently in stackprof and other profilers, I think we'd need an efficient way to iterate over the live threads from C, and get some identifier for them. Alternatively, profilers can snapshot Thread.list when starting, but then it won't work well if the code being profiled spawns new threads.

The ddtrace gem actually ends up shipping with its own implementation of thread list, so that it can get an up-to-date list on every sample. Link to current impl

Having a thread list API visible would be great! (And yeah we're careful when and how we call this, e.g. making sure we're in the main ractor)

@osyoyu
Copy link
Contributor Author

osyoyu commented May 18, 2023

Thank you all for comments!

This is slightly different from https://bugs.ruby-lang.org/issues/10602, as it accept VALUE thread instead of rb_thread_t *th.

This change is intentional, as rb_thread_t is not exposed outside of CRuby.

I think we'd need an efficient way to iterate over the live threads from C, and get some identifier for them.

Do you mean that calling Thread.list via rb_funcall would not be performant enough?

There is the question of what to report when a thread is waiting for the GVL or stopped because GC is running.

In such cases, I believe the current implementation will simply return the last stack trace executed. While I think that a rb_profile_frames()-ish API returning information about the execution state would be easy to use, I'm not sure if that would be the best design. Maybe checking the global GC state through rb_during_gc() before calling rb_thread_profile_frames() might suffice?

@osyoyu
Copy link
Contributor Author

osyoyu commented May 18, 2023

One more point to add: Allowing profiling of threads which don't have the GVL lets us build a more precise wall-time profiler. Profiles obtained through the current API tends to show I/O time less than actual. This is because rb_profile_frames() returns information about the thread which had the GVL - threads performing I/O have a lesser chance to be the targeted one.

A accurate wall-time profiler is wanted especially when you are profiling web apps, where I/O is frequent (HTTP responses and RDBMS calls).

@casperisfine
Copy link
Contributor

Do you mean that calling Thread.list via rb_funcall would not be performant enough?

It's not so much that it would be slow. It's more that an important behavior of rb_profile_frames is that it doesn't allocate.

rb_funcall(rb_cThread, rb_intern("list"), 0) would and may trigger GC, release the GVL etc, which is very undesirable for profilers.

Maybe checking the global GC state through rb_during_gc() before calling rb_thread_profile_frames() might suffice?

For GC yes. But thinking about the "profile alls threads" use case, I'd need to know for every given thread what "state" they're in (waiting for GVL, running, etc).

All this may be considered a bit unrelated, but if we're to propose some new profiling APIs, I think we might as well consider that too.

A accurate wall-time profiler is wanted especially when you are profiling web apps, where I/O is frequent

Absolutely, hence why I'd like to expose thread status as well. If you want to profile a multi-threaded web app, you need to properly report if the thread was doing IOs or was waiting on the GVL. Because that's a major difference that will prompt you to implement radically different solutions to optimize your application.

@casperisfine
Copy link
Contributor

To clarify: I don't think the concerns I raised are really blocking, I just think it would make sense to come with a new feature request that attempt to solve all these at once.

There's interest for a new profiler in ruby-core: https://github.com/rubygsoc/rubygsoc/wiki/Ideas-List-%282023%29#make-a-new-profiler, so we can probably propose more than a simple incremental improvement.

Add a new API rb_profile_thread_frames(), which is essentialy a
per-thread version of rb_profile_frames().

While the original rb_profile_frames() always returns results about the
current active thread obtained by GET_EC(), this new API takes a Thread
to be profiled as an argument.

This should come in handy when profiling I/O-bound programs such as
webapps, since this new API allows us to learn about Threads performing
I/O (which do not have the GVL).

Profiling worker threads (such as Sidekiq workers) may be another
application.

Implements [Feature ruby#10602]

Co-authored-by: Mike Perham <[email protected]>
@osyoyu osyoyu force-pushed the rb_profile_thread_frames branch from 75ce94f to bf477e7 Compare September 29, 2023 06:41
@osyoyu osyoyu changed the title [Feature #10602] Add new API rb_thread_profile_frames() [Feature #10602] Add new API rb_profile_thread_frames() Sep 29, 2023
@osyoyu
Copy link
Contributor Author

osyoyu commented Sep 29, 2023

I've renamed this to rb_profile_thread_frames() from rb_thread_profile_frames() (flipped the profile and thread), assuming that these APIs should be categorized under the profile category.
Also I've added some tests.

@osyoyu osyoyu force-pushed the rb_profile_thread_frames branch from bf477e7 to 3bdfd91 Compare September 29, 2023 06:53
@osyoyu
Copy link
Contributor Author

osyoyu commented Sep 29, 2023

@ko1 Can you take a look?

}
end

def test_profile_thread_frames
Copy link
Contributor Author

@osyoyu osyoyu Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As profile_frames() itself is throughly tested through test_profile_frames, I have kept tests for rb_profile_thread_frames() rather simple (only test that it captures frames for the specified Thread).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants