-
Notifications
You must be signed in to change notification settings - Fork 5.5k
[Feature #10602] Add new API rb_profile_thread_frames() #7784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dd981ff
to
e6f7244
Compare
e6f7244
to
75ce94f
Compare
For the use-case, see here. |
This is also useful for us at Datadog for the profiler in the In fact, I ended up copy-pasting On an additional related note, the backtracie gem (authored by me and @KJTsanaktsidis) also reimplements rb_profile_frames, but changes the API entirely, see our RubyKaigi 2022 talk for a discussion of why ;) Edit: Our use-case for ddtrace is to build an accurate wall-time profiler, e.g. be also able to look at stack traces for threads that are not currently executing. |
I'm very much in favor of this. A few notes. 1: This is slightly different from https://bugs.ruby-lang.org/issues/10602, as it accept 2: I see the need to profile only a particular thread, but I also see the need to profile all threads 3: There is the question of what to report when a thread is waiting for the GVL or stopped because GC is running. Currently since you only profile the running thread, it's not much of a consideration. But if you start profiling a specific thread, you need to properly report when it wasn't executing. Perhaps with a special frame? Perhaps with another API? |
Great points!
The ddtrace gem actually ends up shipping with its own implementation of thread list, so that it can get an up-to-date list on every sample. Link to current impl Having a thread list API visible would be great! (And yeah we're careful when and how we call this, e.g. making sure we're in the main ractor) |
Thank you all for comments!
This change is intentional, as rb_thread_t is not exposed outside of CRuby.
Do you mean that calling Thread.list via rb_funcall would not be performant enough?
In such cases, I believe the current implementation will simply return the last stack trace executed. While I think that a rb_profile_frames()-ish API returning information about the execution state would be easy to use, I'm not sure if that would be the best design. Maybe checking the global GC state through rb_during_gc() before calling rb_thread_profile_frames() might suffice? |
One more point to add: Allowing profiling of threads which don't have the GVL lets us build a more precise wall-time profiler. Profiles obtained through the current API tends to show I/O time less than actual. This is because rb_profile_frames() returns information about the thread which had the GVL - threads performing I/O have a lesser chance to be the targeted one. A accurate wall-time profiler is wanted especially when you are profiling web apps, where I/O is frequent (HTTP responses and RDBMS calls). |
It's not so much that it would be slow. It's more that an important behavior of
For GC yes. But thinking about the "profile alls threads" use case, I'd need to know for every given thread what "state" they're in (waiting for GVL, running, etc). All this may be considered a bit unrelated, but if we're to propose some new profiling APIs, I think we might as well consider that too.
Absolutely, hence why I'd like to expose thread status as well. If you want to profile a multi-threaded web app, you need to properly report if the thread was doing IOs or was waiting on the GVL. Because that's a major difference that will prompt you to implement radically different solutions to optimize your application. |
To clarify: I don't think the concerns I raised are really blocking, I just think it would make sense to come with a new feature request that attempt to solve all these at once. There's interest for a new profiler in ruby-core: https://github.com/rubygsoc/rubygsoc/wiki/Ideas-List-%282023%29#make-a-new-profiler, so we can probably propose more than a simple incremental improvement. |
Add a new API rb_profile_thread_frames(), which is essentialy a per-thread version of rb_profile_frames(). While the original rb_profile_frames() always returns results about the current active thread obtained by GET_EC(), this new API takes a Thread to be profiled as an argument. This should come in handy when profiling I/O-bound programs such as webapps, since this new API allows us to learn about Threads performing I/O (which do not have the GVL). Profiling worker threads (such as Sidekiq workers) may be another application. Implements [Feature ruby#10602] Co-authored-by: Mike Perham <[email protected]>
75ce94f
to
bf477e7
Compare
I've renamed this to |
bf477e7
to
3bdfd91
Compare
@ko1 Can you take a look? |
} | ||
end | ||
|
||
def test_profile_thread_frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As profile_frames()
itself is throughly tested through test_profile_frames
, I have kept tests for rb_profile_thread_frames()
rather simple (only test that it captures frames for the specified Thread).
Add a new API rb_profile_thread_frames(), which is essentialy a per-thread version of rb_profile_frames().
While the original rb_profile_frames() always returns results about the current active thread obtained by GET_EC(), this new API takes a Thread to be profiled as an argument.
This should come in handy when profiling I/O-bound programs such as webapps, since this new API allows us to learn about Threads performing I/O (which do not have the GVL).
Profiling worker threads (such as Sidekiq workers) may be another application.
Implements [Feature #10602]