-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[coop] Hybrid suspend #8068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[coop] Hybrid suspend #8068
Conversation
luhenry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I don't see why we would need different state transitions for the hybrid suspend and the preemptive/cooperative suspend cases. The thread state machine should not have any knowledge of how the threads are going to be suspended, and it should work the same regardless. Another sign there is no need for different async suspend transitions is that the resume transitions are the same.
mono/utils/mono-threads-coop.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a --enable-hybrid-suspend, and it will eventually become the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just renumber STATE_BLOCKING in the enum. These values are neither part of the API, neither live anywhere else than in non-persistent memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rename that to STATE_BLOCKING_SELF_SUSPENDED so it's clear that there is a parallel between STATE_RUNNING and STATE_BLOCKING. It should be:
"RUNNING",
"DETACHED",
"ASYNC_SUSPENDED",
"SELF_SUSPENDED",
"ASYNC_SUSPEND_REQUESTED",
"BLOCKING",
"BLOCKING_ASYNC_SUSPENDED",
"BLOCKING_SELF_SUSPENDED",
"BLOCKING_ASYNC_SUSPEND_REQUESTED",
The corresponding enum at https://github.com/mono/mono/pull/8068/files#diff-6fa4afbec40d6b93ebd74cf63b06fe93R127 should also be modified.
mono/utils/mono-threads.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a separate code path? The fact that we use cooperative or preemptive suspend doesn't depend on the state machine, as we want to keep the possibility to use both cooperative and preemptive suspend whatever the state of the thread is. The cases we want to support are:
- preemptive = preemptive for running + preemptive for blocking
- hybrid = cooperative for running + preemptive for blocking
- cooperative = cooperative for running + cooperative for blocking
The only necessary thing should be either to pass a parameter to begin_async_suspend for blocking or running, or to replace the calls to begin_async_suspend by begin_safepoint_suspend/begin_preemptive_suspend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if I'm getting this: have a single "begin suspending" transition that subsumes both request_async_suspension and request_hybrid_suspension; give it enough return codes so that we can switch on the results and in all cases do the correct begin_safepoint_suspend/begin_preemptive_suspend?
That makes sense but there are a couple problems:
- (minor) right now
ASYNC_SUSPENDEDhas arequest_async_suspensionselfloop, but no legalrequest_hybrid_suspensiontransition (if we decided to suspend that thread via safepointing, it coudn't have preemptively suspended and shouldn't try to preemptively suspend for suspend_count >1. This one is probably harmless to add.). The symmetric situation onBLOCKING_ASYNC_SUSPENDED- it has arequest_hybrid_suspensionselfloop but no legalrequest_async_suspensiontransition - if we decided to suspend it preemptively we shouldn't try to suspend it again via checkpointing. I think it's better when more things are illegal. - (serious) in
BLOCKINGstate,request_async_suspensionloops back toBLOCKING, butrequest_hybrid_suspensiongoes toBLOCKING_SUSPEND_REQUESTED- this decision is driven purely by external considerations, not by the state machine state. So we need two distinct transitions for this case.
In general, the suspend transitions represent policy - different suspend techniques starting from the same states want different decisions. On the resume side it's different - the resume policy is encoded in the state that the thread entered when we suspended it.
Alternately, maybe we could split the blocking state into, let's say BLOCKING_NONPREEMPTABLE (kind of like the original meaning of BLOCKING - it's in a syscall or other code where we know for a fact it can never touch managed at all which will suspend at safepoints on the done blocking transition (and maybe we even disallow it abort blocking)) and BLOCKING_PREEMPTABLE (for unknown foreign code which might turn out to be embedders who can do abort blocking or hold on to managed pointers) which is preempted. In that case we push all the complexity into runtime code (we now need to do let's say - BEGIN_SYSCALL/END_SYSCALL and BEGIN_FOREGIN/END_FOREIGN). On the bright side we won't have to restart syscalls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the morning things look better. The solution is to do a two-phase suspend in blocking always, the same way that we do it in the running case (ie: request_suspension should move from Blocking to Blocking_Suspend_Requested, always, which communicates enough information to the suspend initiator to then decide on a suspend policy). The previous request_async_suspension transition from Blocking to Blocking was just an artifact of not having an explicit Blocking_Suspend_Requested state.
And there's a benefit for the state machine, too: we can establish the invariant that in Blocking the suspend count is always equal to 0 and in Blocking_Suspend_Requested it's always strictly positive.
I'm going to update the PR to make it work this way.
Thanks @luhenry !
mono/utils/mono-threads.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO: begin_cooperative_suspend conveys more the meaning that it is the "other way" of begin_preemptive_suspend
mono/utils/mono-threads.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commentary should be the same as for STATE_ASYNC_SUSPENDED and STATE_SELF_SUSPENDED.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commentary should be the same as for STATE_ASYNC_SUSPENDED and STATE_SELF_SUSPENDED.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably just have it in a different case statement to ease undestanding the code and keep it consistent with the rest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just add this case at https://github.com/mono/mono/pull/8068/files#diff-6efe503cfde87f09553ab6c97c061604R644
50888b8 to
fd0c32d
Compare
|
@luhenry updated:
|
|
The |
9c15fc3 to
332114a
Compare
mono/utils/mono-threads-coop.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want it for both USE_COOP_GC and USE_HYBRID_COOP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Hybrid Coop doesn't make much sense, Hybrid Suspend would be more meaningful.
mono/utils/mono-threads-coop.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably just duplicate the code for AbortBlockingWait and AbortBlockingNotifyAndWait to keep it consistent across the board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should merge this case with the previous one.
mono/utils/mono-threads.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should duplicate the code for ReqSuspendAlreadySuspendedBlocking and ReqSuspendInitSuspendBlocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mono_threads_transition_request_suspension should be enough and provides a good counter-part to mono_threads_transition_request_resume.
|
The individual comments are nitpick and are not blocking the PR in general, just the requested change at #8068 (comment) is. |
56ec20c to
bfc4545
Compare
The hybrid cooperative suspend mechanism works like this:
- when a thread is executing in GC Unsafe mode, we use cooperative suspend and expect
the thread to periodically checkpoint its execution (as in the ordinary full
coop mode).
- when a thread is executing in GC Safe mode, we use a preemptive signal-based
suspend. This is new - previously in full coop we would allow BLOCKING
threads to continue executing and only suspend them when they wanted to go
from GC Safe to GC Unsafe (via the BLOCKING_SELF_SUSPENDED state).
There are two new states and one updated transition - the transition will
service both running and blocking threads: The idea is that suspend mechanism
is determined by the suspend initiator, and is not embodied in the the state
machine. The state machine just has to return distinctive enough values from
the transitions for the initiator to select a policy.
Resume mechanism is determined by the state that a thread finds itself in when
it is resumed, and the original suspend policy.
The primary differences between suspend mechanisms are: the preemptive
suspend is two phase - the finish_async_suspension transition only does not
apply for cooperative suspend; the cooperative policy uses the poll transition
when the thread is running.
We renamed mono_threads_transition_request_async_suspension to mono_threads_transition_request_suspension.
- It must be called by a suspend initiator on a victim thread that is not
itself.
- There can only be one suspend initiator for a victim at a time - if the
suspend initiator initiates a suspension it must follow through with the
whole protocol.
- If a victim is RUNNING we transition it to ASYNC_SUSPEND_REQUSTED and return
ReqSuspendInitSuspendRunning which signals that the caller must initiate
suspension for a running thread. (Same as the old AsyncSuspendInitSuspend).
- If a victim is BLOCKING we transition it to BLOCKING_SUSPEND_REQUESTED.
Note that this is different from the old request_async_suspension which just
incremented the suspend count and returned AsyncSuspendBlocking. Now we
return ReqSuspendInitSuspendBlocking. The return of
ReqSuspendInitSuspendBlocking means the initiator may signal the victim and
must wait to be notified of the suspension (in preemptive suspend it will be
notified by the signal handler, in cooperative when the blocking thread
attempts to exit from blocking mode).
- In BLOCKING_SUSPEND_REQUESTED we may increment the suspend count. This is
slightly too lax if we're going to be using preemptive suspend on blocking
threads: in that case we're in the middle of a two-phase suspend and since
there is only one suspend initiator, we wouldn't expect another suspend to
be initiated until we have a chance to finish suspending. We leave it to
the suspend initiator to rule out by returning
ReqSuspendAlreadySuspendedBlocking from this state.
- In all the not executing states (SELF_SUSPENDED, ASYNC_SUSPENDED,
BLOCKING_SELF_SUSPENDED, BLOCKING_ASYNC_SUSPENDED) we just increment the
suspend count.
- All other transitions are illegal.
- Note that BLOCKING must always have a suspend count of 0 and
BLOCKING_SUSPEND_REQUESTED must have suspend_count > 0.
We add two new states:
- STATE_BLOCKING_SUSPEND_REQUESTED - this is a new state that indicates that a
suspend initiator wants to suspend this victim thread.
- The thread is still executing in this state.
- We only transition into this state with
mono_threads_transition_request_suspension.
- The only legal transitions out from this state are:
- resume - decrements the suspend_count. If the suspend_count becomes 0 we
go back to BLOCKING, otherwise we stay in BLOCKING_SUSPEND_REQUESTED.
- finish_async_suspension (executed by the suspend signal handler to
finish the two phase preemptive suspend request and notify the suspend initiator),
- done_blocking and abort_blocking. It means that the a
suspend initiator requested a suspend but we got to done (or abort)
blocking before the signal was delivered. We return NotifyAndWait and the
caller is supposed to send a notification to the suspend initiator and
wait for a resume(with suspend count == 0) which will return to
RUNNING (the victim thread goes to STATE_BLOCKING_SELF_SUSPENDED).
- STATE_BLOCKING_ASYNC_SUSPENDED
- The thread is not executing in this state (it is waiting for a resume).
- a thread transitions into this state from STATE_BLOCKING_SUSPEND_REQUESTED
on a finish_async_suspension transition (see above).
- on a resume transition we decrement the suspend count. if it's still
positive we stay in this state. if it's 0 the thread transitions out of
this state. Unlike other resume transitions, in this case the thread goes
back to BLOCKING and resumes executing in GC Safe mode. The resume caller
must notify the resume initiator (ie this is a ResumeInitAsyncResume).
- on a request_suspension we stay in this state and just increment the
suspend count.
- all other transitions from this state are illegal.
We add two new return values for abort_blocking and done_blocking:
- DoneBlockingNotifyAndWait and AbortBlockingNotifyAndWait. Similar to
SelfSuspendNotifyAndWait - there was a race between the second phase of a
preemptive suspend and another operation (in that case a poll, in this case a
done or abort blocking) and the other operation won, so its caller should
notify the suspend initiator. Since the thread is now suspended, the caller
should wait for the resume signal.
We add new results for request_suspension MonoRequestSuspendResult
(formerly MonoRequestAsyncSuspendResult):
- ReqSuspendAlreadySuspended means the thread is not executing and we just
incremented its suspend count and there's nothing else for the suspend
initiator to do. (The old AsyncSuspendAlreadySuspended)
- ReqSuspendAlreadySuspendedBlocking means the thread is in
BLOCKING_SUSPEND_REQUESTED and we initiated another suspend request. This
should only happen with full cooperative suspend - with hybrid suspend we
will use a two phase preemptive suspend on a blocking thread and since only
a single suspend initiator is active, this state should not be visible to
another suspend initiator.
- ReqSuspendInitSuspendRunning - Thread is executing in GC Unsafe mode and the
caller should suspend it. This is the old AsyncSuspendInitSuspend. In
full preemptive suspend this is the only suspend initiation action and the caller
should begin the preemptive suspend procedure. In full coop we expect the
victim thread to reach a safepoint and poll to finish suspending.
- ReqSuspendInitSuspendBlocking - Thread is executing in GC Safe mode and
the caller should initiate a suspend. In full coop the caller has nothing
to do - a thread executing in blocking mode is assumed to be suspended. In
hybrid suspend, the caller has to initiate a preemptive suspend.
When the environment variable is set, Mono will use cooperative safepoint suspend for threads running in GC Unsafe mode (running managed code or native runtime code), and preemptive suspend for threads running in GC Safe mode (running native embedder or P/Invoke code, or blocking system calls) The environment variable is visible through mono_threads_is_hybrid_suspension_enabled (). This commit just adds the env var, it doesn't turn on hybrid coop suspend.
If a thread in blocking mode needs to be suspended, suspend it using preemptive suspend.
Defaults to 'no'. To use `--with-hybrid-suspend=yes`, require `--with-cooperative-gc=yes`.
bfc4545 to
f214256
Compare
Change begin_cooperative_suspend and begin_preemptive_suspend to return one of three results: suspend fail, suspend succeeded using cooperative suspend, suspend succeeded using preemptive suspend. This is used by check_async_suspend to decide whether to check the MonoThreadInfo for the result of a suspend. (If a thread is suspended cooperatively, it doesn't make sense to check). Fixes sporadic failures in mono/tests/monitor-abort.exe
Move it from mono-threads-state-machine.c to mono-threads.c and make it check how blocking threads are suspended before returning the saved state.
|
Now the only one that's broken is The |
|
@luhenry I pushed a commit that wraps all the I still see the occasional |
|
@lambdageek the longer term fix would be to have |
|
And maybe the test is flaky just because we have a lower chance of hitting what you observe with hybrid suspend. |
|
@monojenkins build Linux ARMv5 |
|
@monojenkins merge |
[coop] Hybrid suspend ## Summary This is a new suspend mode where threads that are running managed or native runtime code (what we call "GC Unsafe") are suspended cooperatively using safepoints, but threads running either embedder code or P/Invoke code or blocking syscalls (what we call "GC Safe") are suspended preemptively using signals. The motivation is to make an embedder-friendly coop suspend mechanism where the runtime will do GC Safe -> GC Unsafe transitions on calls to `MONO_API` functions, but the embedders themselves don't have to have any special knowledge of coop safepoints or state transitions. To try this out, set **both** `MONO_ENABLE_COOP` and `MONO_ENABLE_HYBRID_COOP` environment variables. ## Implementation Here's a picture of the new state machine. There's two new states `Blocking_Suspend_Requested` and `Blocking_Async_Suspended` and one updated transition `req_s` (request suspension).  ### State machine states and transitions The hybrid cooperative suspend mechanism works like this: - when a thread is executing in GC Unsafe mode, we use cooperative suspend and expect the thread to periodically checkpoint its execution (as in the ordinary full coop mode). - when a thread is executing in GC Safe mode, we use a preemptive signal-based suspend. This is new - previously in full coop we would allow `BLOCKING` threads to continue executing and only suspend them when they wanted to go from GC Safe to GC Unsafe (via the `BLOCKING_SELF_SUSPENDED` state). There are two new states and one updated transition - the transition will service both running and blocking threads: The idea is that suspend mechanism is determined by the suspend initiator, and is not embodied in the the state machine. The state machine just has to return distinctive enough values from the transitions for the initiator to select a policy. Resume mechanism is determined by the state that a thread finds itself in when it is resumed, and the original suspend policy. The primary differences between suspend mechanisms are: the preemptive suspend is two phase - the `finish_async_suspension` does not apply for cooperative suspend; the cooperative policy uses the `poll` transition when the thread is running. We renamed `mono_threads_transition_request_async_suspension` to `mono_threads_transition_request_suspension`. - It must be called by a suspend initiator on a victim thread that is not itself. - There can only be one suspend initiator for a victim at a time - if the suspend initiator initiates a suspension it must follow through with the whole protocol. - If a victim is `RUNNING` we transition it to `ASYNC_SUSPEND_REQUSTED` and return `ReqSuspendInitSuspendRunning` which signals that the caller must initiate suspension for a running thread. (Same as the old `AsyncSuspendInitSuspend`). - If a victim is `BLOCKING` we transition it to `BLOCKING_SUSPEND_REQUESTED`. Note that this is different from the old `request_async_suspension` which just incremented the suspend count and returned `AsyncSuspendBlocking`. Now we return `ReqSuspendInitSuspendBlocking`. The return of `ReqSuspendInitSuspendBlocking` means the initiator may signal the victim and must wait to be notified of the suspension (in preemptive suspend it will be notified by the signal handler, in cooperative when the blocking thread attempts to exit from blocking mode). - In `BLOCKING_SUSPEND_REQUESTED` we may increment the suspend count. This is slightly too lax if we're going to be using preemptive suspend on blocking threads: in that case we're in the middle of a two-phase suspend and since there is only one suspend initiator, we wouldn't expect another suspend to be initiated until we have a chance to finish suspending. We leave it to the suspend initiator to rule out by returning `ReqSuspendAlreadySuspendedBlocking` from this state. - In all the not executing states (`SELF_SUSPENDED`, `ASYNC_SUSPENDED`, `BLOCKING_SELF_SUSPENDED`, `BLOCKING_ASYNC_SUSPENDED`) we just increment the suspend count. - All other transitions are illegal. - Note that `BLOCKING` must always have a suspend count of 0 and `BLOCKING_SUSPEND_REQUESTED` must have suspend_count > 0. We add two new states: - `STATE_BLOCKING_SUSPEND_REQUESTED` - this is a new state that indicates that a suspend initiator wants to suspend this victim thread. - The thread is still executing in this state. - We only transition into this state with `mono_threads_transition_request_suspension`. - The only legal transitions out from this state are: - `resume` - decrements the `suspend_count`. If the `suspend_count` becomes 0 we go back to `BLOCKING`, otherwise we stay in `BLOCKING_SUSPEND_REQUESTED`. - `finish_async_suspension` (executed by the suspend signal handler to finish the two phase preemptive suspend request and notify the suspend initiator), - `done_blocking` and `abort_blocking`. It means that the a suspend initiator requested a suspend but we got to done (or abort) blocking before the signal was delivered. We return `NotifyAndWait` and the caller is supposed to send a notification to the suspend initiator and wait for a resume(with suspend count == 0) which will return to `RUNNING` (the victim thread goes to `STATE_BLOCKING_SELF_SUSPENDED`). - `STATE_BLOCKING_ASYNC_SUSPENDED` - The thread is not executing in this state (it is waiting for a `resume`). - a thread transitions into this state from `STATE_BLOCKING_SUSPEND_REQUESTED` on a `finish_async_suspension` transition (see above). - on a `resume` transition we decrement the suspend count. if it's still positive we stay in this state. if it's 0 the thread transitions out of this state. Unlike other resume transitions, in this case the thread goes back to `BLOCKING` and resumes executing in GC Safe mode. The resume caller must notify the resume initiator (ie this is a `ResumeInitAsyncResume`). - on a `request_suspension` we stay in this state and just increment the suspend count. - all other transitions from this state are illegal. We add two new return values for `abort_blocking` and `done_blocking`: - `DoneBlockingNotifyAndWait` and `AbortBlockingNotifyAndWait`. Similar to `SelfSuspendNotifyAndWait` - there was a race between the second phase of a preemptive suspend and another operation (in that case a poll, in this case a done or abort blocking) and the other operation won, so its caller should notify the suspend initiator. Since the thread is now suspended, the caller should wait for the resume signal. We add new results for `request_initiate_suspension` `MonoRequestSuspendResult` (formerly `MonoRequestAsyncSuspendResult`): - `ReqSuspendAlreadySuspended` means the thread is not executing and we just incremented its suspend count and there's nothing else for the suspend initiator to do. (The old `AsyncSuspendAlreadySuspended`) - `ReqSuspendAlreadySuspendedBlocking` means the thread is in `BLOCKING_SUSPEND_REQUESTED` and we initiated another suspend request. This should only happen with full cooperative suspend - with hybrid suspend we will use a two phase preemptive suspend on a blocking thread and since only a single suspend initiator is active, this state should not be visible to another suspend initiator. - `ReqSuspendInitSuspendRunning` - Thread is executing in GC Unsafe mode and the caller should suspend it. This is the old `AsyncSuspendInitSuspend`. In full preemptive suspend this is the only suspend initiation action and the caller should begin the preemptive suspend procedure. In full coop we expect the victim thread to reach a safepoint and poll to finish suspending. - `ReqSuspendInitSuspendBlocking` - Thread is executing in GC Safe mode and the caller should initiate a suspend. In full coop the caller has nothing to do - a thread executing in blocking mode is assumed to be suspended. In hybrid suspend, the caller has to initiate a preemptive suspend. ### Suspend initiator, signal handler, self suspender Basically we call `mono_threads_transition_request_suspension` instead of `mono_threads_transition_request_async_suspension`. Then if it says to suspend a running thread we call `begin_suspend_for_running_thread`, if it says initiate a suspension of a blocking thread, we call `begin_suspend_for_blocking_thread`, and if it says do nothing we do nothing. **Questions** 1. We have an assumption in `safe_interrupt_thread` that we can return from a self interrupt if we're using coop suspend. I'm not sure if that makes sense for hybrid suspend, and not clear what to do about it. 2. In `mono_thread_info_safe_suspend_and_run` we expect that the callback won't return `KeepSuspended` if coop is enabled. I'm not sure what this means for hybrid. 3. A bunch of tests like `monitor-abort` and `appdomain-threadpool-unload` are failing if I enable hybrid suspend. Don't know why yet. Commit migrated from mono/mono@529e486
Summary
This is a new suspend mode where threads that are running managed or native runtime code (what we call "GC Unsafe") are suspended cooperatively using safepoints, but threads running either embedder code or P/Invoke code or blocking syscalls (what we call "GC Safe") are suspended preemptively using signals.
The motivation is to make an embedder-friendly coop suspend mechanism where the runtime will do GC Safe -> GC Unsafe transitions on calls to
MONO_APIfunctions, but the embedders themselves don't have to have any special knowledge of coop safepoints or state transitions.To try this out, set both
MONO_ENABLE_COOPandMONO_ENABLE_HYBRID_COOPenvironment variables.Implementation
Here's a picture of the new state machine. There's two new states
Blocking_Suspend_RequestedandBlocking_Async_Suspendedand one updated transitionreq_s(request suspension).State machine states and transitions
The hybrid cooperative suspend mechanism works like this:
the thread to periodically checkpoint its execution (as in the ordinary full
coop mode).
suspend. This is new - previously in full coop we would allow
BLOCKINGthreads to continue executing and only suspend them when they wanted to go
from GC Safe to GC Unsafe (via the
BLOCKING_SELF_SUSPENDEDstate).There are two new states and one updated transition - the transition will
service both running and blocking threads: The idea is that suspend mechanism
is determined by the suspend initiator, and is not embodied in the the state
machine. The state machine just has to return distinctive enough values from
the transitions for the initiator to select a policy.
Resume mechanism is determined by the state that a thread finds itself in when
it is resumed, and the original suspend policy.
The primary differences between suspend mechanisms are: the preemptive
suspend is two phase - the
finish_async_suspensiondoes notapply for cooperative suspend; the cooperative policy uses the
polltransition when the thread is running.We renamed
mono_threads_transition_request_async_suspensiontomono_threads_transition_request_suspension.It must be called by a suspend initiator on a victim thread that is not
itself.
There can only be one suspend initiator for a victim at a time - if the
suspend initiator initiates a suspension it must follow through with the
whole protocol.
If a victim is
RUNNINGwe transition it toASYNC_SUSPEND_REQUSTEDand returnReqSuspendInitSuspendRunningwhich signals that the caller must initiatesuspension for a running thread. (Same as the old
AsyncSuspendInitSuspend).If a victim is
BLOCKINGwe transition it toBLOCKING_SUSPEND_REQUESTED.Note that this is different from the old
request_async_suspensionwhich justincremented the suspend count and returned
AsyncSuspendBlocking. Now wereturn
ReqSuspendInitSuspendBlocking. The return ofReqSuspendInitSuspendBlockingmeans the initiator may signal the victim andmust wait to be notified of the suspension (in preemptive suspend it will be
notified by the signal handler, in cooperative when the blocking thread
attempts to exit from blocking mode).
In
BLOCKING_SUSPEND_REQUESTEDwemay increment the suspend count. This is slightly too lax if we're going to
be using preemptive suspend on blocking threads: in that case we're in the
middle of a two-phase suspend and since there is only one suspend initiator,
we wouldn't expect another suspend to be initiated until we have a chance to
finish suspending. We leave it to the suspend initiator to rule out by returning
ReqSuspendAlreadySuspendedBlockingfrom this state.In all the not executing states (
SELF_SUSPENDED,ASYNC_SUSPENDED,BLOCKING_SELF_SUSPENDED,BLOCKING_ASYNC_SUSPENDED) we justincrement the suspend count.
All other transitions are illegal.
Note that
BLOCKINGmust always have a suspend count of 0 andBLOCKING_SUSPEND_REQUESTEDmust have suspend_count > 0.We add two new states:
STATE_BLOCKING_SUSPEND_REQUESTED- this is a new state that indicates that asuspend initiator wants to suspend this victim thread.
mono_threads_transition_request_suspension.resume- decrements thesuspend_count. If thesuspend_countbecomes 0 wego back to
BLOCKING, otherwise we stay inBLOCKING_SUSPEND_REQUESTED.finish_async_suspension(executed by the suspend signal handler tofinish the two phase preemptive suspend request and notify the suspend initiator),
done_blockingandabort_blocking. It means that the asuspend initiator requested a suspend but we got to done (or abort)
blocking before the signal was delivered. We return
NotifyAndWaitand thecaller is supposed to send a notification to the suspend initiator and
wait for a resume(with suspend count == 0) which will return to
RUNNING(the victim thread goes toSTATE_BLOCKING_SELF_SUSPENDED).STATE_BLOCKING_ASYNC_SUSPENDEDresume).STATE_BLOCKING_SUSPEND_REQUESTEDon a
finish_async_suspensiontransition (see above).resumetransition we decrement the suspend count. if it's stillpositive we stay in this state. if it's 0 the thread transitions out of
this state. Unlike other resume transitions, in this case the thread goes
back to
BLOCKINGand resumes executing in GC Safe mode. The resume callermust notify the resume initiator (ie this is a
ResumeInitAsyncResume).request_suspensionwe stay in this state and just increment thesuspend count.
We add two new return values for
abort_blockinganddone_blocking:DoneBlockingNotifyAndWaitandAbortBlockingNotifyAndWait. Similar toSelfSuspendNotifyAndWait- there was a race between the second phase of apreemptive suspend and another operation (in that case a poll, in this case a
done or abort blocking) and the other operation won, so its caller should
notify the suspend initiator. Since the thread is now suspended, the caller
should wait for the resume signal.
We add new results for
request_initiate_suspensionMonoRequestSuspendResult(formerly
MonoRequestAsyncSuspendResult):ReqSuspendAlreadySuspendedmeans the thread is not executing and we justincremented its suspend count and there's nothing else for the suspend
initiator to do. (The old
AsyncSuspendAlreadySuspended)ReqSuspendAlreadySuspendedBlockingmeans the thread is inBLOCKING_SUSPEND_REQUESTEDand we initiated another suspend request. Thisshould only happen with full cooperative suspend - with hybrid suspend we
will use a two phase preemptive suspend on a blocking thread and since only
a single suspend initiator is active, this state should not be visible to
another suspend initiator.
ReqSuspendInitSuspendRunning- Thread is executing in GC Unsafe mode and thecaller should suspend it. This is the old
AsyncSuspendInitSuspend. Infull preemptive suspend this is the only suspend initiation action and the caller
should begin the preemptive suspend procedure. In full coop we expect the
victim thread to reach a safepoint and poll to finish suspending.
ReqSuspendInitSuspendBlocking- Thread is executing in GC Safe mode andthe caller should initiate a suspend. In full coop the caller has nothing
to do - a thread executing in blocking mode is assumed to be suspended. In
hybrid suspend, the caller has to initiate a preemptive suspend.
Suspend initiator, signal handler, self suspender
Basically we call
mono_threads_transition_request_suspensioninstead ofmono_threads_transition_request_async_suspension.Then if it says to suspend a running thread we call
begin_suspend_for_running_thread, if it says initiate a suspension of a blocking thread, we callbegin_suspend_for_blocking_thread, and if it says do nothing we do nothing.Questions
safe_interrupt_threadthat we can return from a self interrupt if we're using coop suspend. I'm not sure if that makes sense for hybrid suspend, and not clear what to do about it.mono_thread_info_safe_suspend_and_runwe expect that the callback won't returnKeepSuspendedif coop is enabled. I'm not sure what this means for hybrid.monitor-abortandappdomain-threadpool-unloadare failing if I enable hybrid suspend. Don't know why yet.