erts: fix attempt to start timer when executing on dirty scheduler #2024

max-au · 2018-11-19T22:21:40Z

Since OTP R20, there is a possibility for MAJOR garbage collection to
run on dirty scheduler. So DistEntry destructor is being called on
dirty scheduler as well. This, in turn, leads to an attempt to schedule
timer on a dirty scheduler too, which is impossible (and will assert
on debug build, but does succeed for release build, creating an
infinite busy loop, since aux work wakes scheduler up, but dirty
scheduler cannot execute aus work).
There is a similar method in erl_hl_timer, see erts_start_timer_callback.

jhogberg

Thanks for the PR! It looks good aside from the lack of a test, do you think you could add one?

jhogberg · 2018-11-20T07:19:19Z

erts/emulator/beam/erl_node_tables.c

@@ -421,8 +421,15 @@ static void schedule_delete_dist_entry(DistEntry* dep)
     *
     * Note that timeouts do not guarantee thread progress.


Can you add a comment on why we're re-scheduling on the first scheduler?

I found this trick in erl_hl_timer.c, erts_start_timer_callback.
It is guaranteed that scheduler #1 is always online (and active), even on a system with a single core.
This is a super-rare event, because in most cases garbage collection of dist entry happens on a normal scheduler. So it does not seem necessary to take a random scheduler out of those that are online.
Making unit test for this case seems rather complicated. I'd probably suggest to have a different solution for all dirty schedulers, in erl_process.c, erts_schedule, when there is AUX WORK scheduled for dirty scheduler, either abort the emulator (because internal state is broken) or silently reschedule AUX WORK on normal scheduler.

I think I was a bit too unclear, I want a brief comment on why this dance is done in the first place as it might not be immediately obvious. There's nothing wrong with picking scheduler 1 for this purpose.

As for the test, I think we can live without one if you haven't found a neat way to provoke this.

After discussing it internally, we think that it's best to abort the emulator when aux work is erroneously scheduled on a dirty scheduler. Feel free to add a commit that does this.

Ah, I just misread what you've written.
I added an explanatory comment.
I think crashing emulator on an attempt to schedule any AUX work on a dirty scheduler should be a separate commit/PR, as it potentially touches a lot of other subsystems and may uncover even more issues.

Great!
I'll add the assertion to master, we're a bit too close to the 21.2 release to include it there.

Since OTP R20, there is a possibility for MAJOR garbage collection to run on dirty scheduler. So DistEntry destructor is being called on dirty scheduler as well. This, in turn, leads to an attempt to schedule timer on a dirty scheduler too, which is impossible (and will assert on debug build, but does succeed for release build, creating an infinite busy loop, since aux work wakes scheduler up, but dirty scheduler cannot execute aus work). There is a similar method in erl_hl_timer, see erts_start_timer_callback.

jhogberg · 2018-11-27T06:06:26Z

Merged, thanks again for the PR!

jhogberg added team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI labels Nov 20, 2018

jhogberg self-assigned this Nov 20, 2018

jhogberg reviewed Nov 20, 2018

View reviewed changes

jhogberg removed the testing currently being tested, tag is used by OTP internal CI label Nov 26, 2018

max-au force-pushed the fix_aux_work_on_dcpu_sched branch from ad16e3e to 63077f5 Compare November 26, 2018 19:27

jhogberg merged commit 39d52f3 into erlang:maint Nov 27, 2018

max-au deleted the fix_aux_work_on_dcpu_sched branch January 25, 2022 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

erts: fix attempt to start timer when executing on dirty scheduler #2024

erts: fix attempt to start timer when executing on dirty scheduler #2024

Uh oh!

max-au commented Nov 19, 2018

Uh oh!

jhogberg left a comment

Uh oh!

jhogberg Nov 20, 2018

Uh oh!

max-au Nov 20, 2018

Uh oh!

jhogberg Nov 21, 2018

Uh oh!

jhogberg Nov 21, 2018

Uh oh!

max-au Nov 26, 2018

Uh oh!

jhogberg Nov 27, 2018

Uh oh!

jhogberg commented Nov 27, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -421,8 +421,15 @@ static void schedule_delete_dist_entry(DistEntry* dep)
		*
		* Note that timeouts do not guarantee thread progress.

erts: fix attempt to start timer when executing on dirty scheduler #2024

erts: fix attempt to start timer when executing on dirty scheduler #2024

Uh oh!

Conversation

max-au commented Nov 19, 2018

Uh oh!

jhogberg left a comment

Choose a reason for hiding this comment

Uh oh!

jhogberg Nov 20, 2018

Choose a reason for hiding this comment

Uh oh!

max-au Nov 20, 2018

Choose a reason for hiding this comment

Uh oh!

jhogberg Nov 21, 2018

Choose a reason for hiding this comment

Uh oh!

jhogberg Nov 21, 2018

Choose a reason for hiding this comment

Uh oh!

max-au Nov 26, 2018

Choose a reason for hiding this comment

Uh oh!

jhogberg Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

jhogberg commented Nov 27, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants