-
Notifications
You must be signed in to change notification settings - Fork 38
Track on CPU events too #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track on CPU events too #74
Conversation
To not count dead backends as still running on the CPU we need to detect that the backend is dead. Starting from PG17 proc->pid is reset in ProcKill, before this we can check if the process latch is disowned. Not nice to be poking around in latch internals like this, but all alternatives seem to involve scanning bestatus array and correlating pids. Also makes sense to exclude ourselves as we will always be on CPU while looking at wait events.
Should I add a GUC to turn this functionality on and off? |
Thanks for working on this! I did some tests with pgbench and the results look good. Yes, I think we need a GUC to turn this on. Also the pg_wait_sampling_current view needs to be patched similarly. |
Defaults to false meaning previous behavior is retained. Update pg_wait_sampling_current view to respect this flag.
Added a sample_cpu GUC and updated the pg_wait_sampling_current view. GUC defaults to false for backwards compatibility. What is the preference on this? I think most people would want to see the events, so maybe it should default to true? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Looks good, except for a couple minor things.
Why not turn sample_cpu = on by default? Mainly for backward compatibility. I'm afraid that blank event_type in the profile would puzzle unprepared users or break UI. Also, turning it on could overflow the history buffer with new rows requiring adjustment of history_size. Anyway, I'm not sure which default is more useful for a regular user. Does it make sense?
Thank you guys! |
To not count dead backends as still running on the CPU we need to detect that the backend is dead. Starting from PG17 proc->pid is reset in ProcKill, before this we can check if the process latch is disowned. Not nice to be poking around in latch internals like this, but all alternatives seem to involve scanning bestatus array and correlating pids.
Verified that the latch disown mechanism works on at least PostgreSQL 12-16.
Also makes sense to exclude ourselves as we will always be on CPU while looking at wait events.
Resolves #10