-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Describe the bug
When calling scheduler:sample/0 from a spawned process, time values for the normal schedulers will only be reported starting from the moment the child process was started. This makes values reported from different processes inconsistent with each other. When passing samples from different child processes to scheduler:utilization, the resulting utilization values will be meaningless.
However, this behaviour only happens if the parent process has not itself called scheduler:sample/0 before -- after the parent process calls scheduler:sample/0, the values reported from the child processes will be consistent with each other.
[Note: I'm describing this in terms of scheduler:sample/0, but the same issue can be observed by calling erlang:statistics(scheduler_wall_time) directly]
To Reproduce
To simplify the resulting output, we'll call element(4, hd(element(2, scheduler:sample()))), obtaining the total time utilization of the first normal scheduler. Note that the behaviour can be observed for the total and active time values of all schedulers.
From an ERL shell, without any previous calls to scheduler:sample():
spawn(fun() -> erlang:display(element(4, hd(element(2, scheduler:sample())))) end).
timer:sleep(1000).
spawn(fun() -> erlang:display(element(4, hd(element(2, scheduler:sample())))) end).
Expected behavior
The time values returned by these two functions should be at least 1000000 microseconds apart, so that calculating the utilization from them returns meaningful values.
Alternatively, if the value is meant to have a per-process scope, then it should be documented as such, and then the values of parent processes should not leak to child processes. Note that the following does correctly return values from the spawned processes that are at least 1000000 microseconds apart:
scheduler:sample().
spawn(fun() -> erlang:display(element(4, hd(element(2, scheduler:sample())))) end).
timer:sleep(1000).
spawn(fun() -> erlang:display(element(4, hd(element(2, scheduler:sample())))) end).
That is, the moment you call scheduler:sample/0 on the parent process, values obtained by calling it from its child processes have time values that are consistent with the parent process. My hunch is that calling scheduler:sample/0 from the parent process initialises some scheduler time counters that get passed to the child process.
EDIT: it seems that calling erlang:system_flag(scheduler_wall_time, true) from the parent process has the same effect. Calling scheduler:sample() implicitly sets that flag, but when it is called in a child process, the flag is not propagated to the parent.
Affected versions
Tested on OTP 24. Older versions may be affected as well.
Additional context
I've found this issue when fetching samples from independent Elixir.Task processes and calculating the utilization.