Initial work of the subsystem plugin model (for minimal powershell)#13186
Initial work of the subsystem plugin model (for minimal powershell)#13186anmenaga merged 26 commits intoPowerShell:masterfrom
Conversation
|
PSES could probably leverage the |
src/System.Management.Automation/engine/Subsystem/Prediction/CommandPrediction.cs
Outdated
Show resolved
Hide resolved
|
Is it possible to ship these subsystems in modules instead of a new subsystem model? By shipping them in modules it could be possible to update individual components of PowerShell for example, updating the help subsystem with a new feature or bug fix. |
|
Module will be the main approach to ship a subsystem for now. We considered leveraging NuGet package directly, but it's still vague how to support that. Subsystem will be wrapped as a module, and registration happens on module loading; unregistration happens on module unloading (well, if that subsystem allows unregistration). |
|
@daxian-dbw is the help subsystem in the cards for this in future? |
|
Thanks for the clarification @daxian-dbw. It sounds like how |
|
@vexx32 The current idea is to pull the help system out as a completely standalone module, which means it doesn't need to hook up into the engine, but only expose those help cmdlets. Subsystems are more about the components that are tightly coupled with the engine state and thus have to hook up to the engine via some pre-defined interface/contracts. |
@ThomasNieto Yes, very similar to that. The The difference is that subsystem is not for custom extension, but only for separating existing components from the engine (maybe used for new PS components in future too). |
PaulHigin
left a comment
There was a problem hiding this comment.
I am still looking through these changes, but looks good to me so far. Just a few comments/questions.
src/System.Management.Automation/engine/Subsystem/Commands/GetSubsystemCommand.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/Subsystem/SubsystemManager.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Is subsystem registration persisted, say in a configuration file?
There was a problem hiding this comment.
It's not by this API, but the pre-registration of selected subsystems should be supported eventually to allow shipping customized PowerShell package that contains the selected subsystems out-of-box.
Configuration could be a way, another option is to define a specific folder in $PSHOME for the core engine to discover the available subsystems.
There was a problem hiding this comment.
It will be interesting to think how PowerShell core subsystems such as remoting, engine events, (debugger?) can be refactored to work within this framework. Besides cmdlets, there are also C# APIs, usually within their own namespaces but not always.
There was a problem hiding this comment.
Yes, absolutely. I did a simple proof-of-concept experiment by separating out the tab completion. It was not fully done, but showed it's doable. Of course, the tab completion component cannot be compared with the other components in terms of complexity, so it would be great if we can plan for refactoring one of those components to be a subsystem as a learning.
For backward compatibility, cmdlets can be specially handled to get in the Microsoft.PowerShell.Core source (namespace), and function can be load into the global scope. As for APIs, type forward can be done to close the gap for binary implementations, for scripters, the type resolution will work as always once the subsystem assemblies are loaded.
There was a problem hiding this comment.
I'm very concerned that an implementer of IPredictor will try to access runspace state. If that happens we may end up with a lot of difficult to troubleshoot dead locks and/or state corruption. This also rules out any possibility of a PowerShell based implementation.
I know y'all are probably looking to work ML into this, but I'm worried this will cause a lot of issues in the long term.
There was a problem hiding this comment.
Thanks for the feedback!
A key design point is to prevent the predictor implementation from depending on PowerShell to get the prediction results, at least not the default Runspace that is being used by the host.
Time sensitivity is the main reason for this design. The method PredictInput is likely to be called for every keystroke, so it needs to finish very fast. I set the default timeout to be 20 ms today, because in testing it seems 30 ms per keystroke is about the most we can have without having a clearly noticeable lagging when typing. The rendering also takes time, so 20 ms timeout seems to be a reasonable default for the PredictInput call. With this extreme time limit, PredictInput cannot afford running anything in PowerShell.
Note: 20 ms may still turn out to be too much when we hook it up with PSReadLine end-to-end.
As a comparison, today the time from PSReadLine reading the key to it finishing rendering for simple keys that are not bound is just a few milliseconds (1 - 3 ms depending on the length of the total input)
The second reason is to allow maximum parallelization between the render (e.g. PSReadLine) and the predictor (still tightly related to the time sensitivity). The current implementation of CommandPrediction.PredictInput spins up tasks to call GetSuggestion on each predictor, and then it returns a task that will finish when either all tasks are done or the timeout is up. From the PSReadLine side, it will call PredictInput early (as soon as the user input is known), and continue with its rendering preparation (e.g. GenerateRender) until the point that it needs to get the prediction results. That means the a predictor implementation cannot assume that the default Runspace is put on hold when GetSuggestion is called, and thus it shouldn't depend on it.
The third reason is about the difference between tab completion and prediction. I don't expect the prediction feature to replace tab completion and it also won't be able to. IMHO, unlike tab completion, prediction doesn't need to be that accurate, so it doesn't need to do things like resolving the CommandAst to CommandProcessor or CommandInfo, which requires the state of the engine where the host is running in.
A predictor implementation should be more ML like, that can learn from history and other telemetries. CommandPrediction.LineAccepted is designed to be called when a line is accepted by the host. At that point, the PS engine will be busy running the accepted command line, and the latest history is provided to the predictor to do thinking/learning in parallel. This is a chance for the predictor to spend more time on processing to make up the extreme time limit placed on Predictor.GetSuggestion.
There was a problem hiding this comment.
A key design point is to prevent the predictor implementation from depending on PowerShell to get the prediction results, at least not the default Runspace that is being used by the host.
Yeah that makes sense. My worry is that they'll hold onto a bit of state during registration, something that marshals back to the default runspace for a trivial call (like CommandInfo.Parameters).
Time sensitivity is the main reason for this design. The method
PredictInputis likely to be called for every keystroke, so it needs to finish very fast.
What if there was a one time wait handle, that fires after ~50ms here. I'm not sure how the UX would be tbh, it could go either way but I think it's worth trying. That would greatly limit the amount of the API needs to be called and may end up with a very similar or better experience.
The third reason is about the difference between tab completion and prediction. I don't expect the prediction feature to replace tab completion and it also won't be able to. IMHO, unlike tab completion, prediction doesn't need to be that accurate, so it doesn't need to do things like resolving the
CommandAsttoCommandProcessororCommandInfo, which requires the state of the engine where the host is running in.
There are definitely use cases for it though. Like finishing a pipeline for instance. Lets say you have:
Get-ChildItem *.logIt might be desirable to have a prediction like:
Get-ChildItem *.log | Remove-ItemNow I'm not saying that's something I personally want, but it does seem inevitable to me that if this API sees a lot of use, there will be some popular state based implementations.
There was a problem hiding this comment.
@SeeminglyScience Sorry for the delay of my response, and again, I appreciate your feedback!
What if there was a one time wait handle, that fires after ~50ms here.
Can you please elaborate a bit more about this? How do you think it will work with a one time wait handle here?
Back to the topic about whether or not allowing predictor to access the default Runspace. Another important reason is that you can have multiple predictors registered, and with the current design, they will be triggered at about the same time. If predictors are allowed to access the default Runspace, then it's likely multiple predictors will try doing that at the same time, which will not work.
There was a problem hiding this comment.
Can you please elaborate a bit more about this? How do you think it will work with a one time wait handle here?
One time per key press I mean. Basically a 50ms timeout after each key press so that it only triggers on a pause or when a prediction is accepted (e.g. right arrow is pressed).
Back to the topic about whether or not allowing predictor to access the default Runspace. Another important reason is that you can have multiple predictors registered, and with the current design, they will be triggered at about the same time. If predictors are allowed to access the default Runspace, then it's likely multiple predictors will try doing that at the same time, which will not work.
I'm not trying to make a case for directly supporting runspace access, I'm saying if this API is popular then state based implementations are inevitable regardless. It won't be clear to the implementer exactly how bad that can be, it'll just look like an oversight in the API.
There was a problem hiding this comment.
We could also take the same approach as what VS Code does. For A-Z characters, they send only one message on the first keystroke and then client side filter as the user types more A-Z characters.
Example:
I
Gives a bunch of completions like: Invoke-Command, Install-Module, Import-Module... from PSES.
Once the user types:
In
The client side filters the results from the I results and doesn't send a message to PSES. Import-Module no longer shows up in completions as expected.
Then PredictInput doesn't need to be run as frequently.
There was a problem hiding this comment.
Thank you both for the helpful ideas! Definitely need to consider this when hooking up the PSReadLine with the prediction APIs.
There was a problem hiding this comment.
Feel free to reach out when you get to that point 😃
src/System.Management.Automation/engine/Subsystem/Prediction/CommandPrediction.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/Subsystem/Prediction/CommandPrediction.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Since the implementer can't be PowerShell anyway, should this API also be async? If it's synchronous and the external code ends up calling async API's then you might risk thread starvation in environments with low thread pool sizes.
There was a problem hiding this comment.
I thought about making it async, but didn't go that way.
Here is my reasoning: there is a strict time limit on how long this method should run to finish -- 20 ms by default for now, so doing async operations in this method feels like wasting time on extra overheads (scheduling, context switching and etc.). That's why I chose to make it synchronous, kinda like a suggestion/hint that synchronous operations are preferred in this method implementation.
There was a problem hiding this comment.
Any task that isn't completed when the first task finished a timeout occurs will be ignored, is that intended?
Edit: Missed the WhenAll, thanks @daxian-dbw! I do share @vexx32's concerns though regarding failed and incomplete tasks though.
There was a problem hiding this comment.
I'm not sure I understand your question. Let me explain my intent here more clearly.
The code below is to make PredictInput stop waiting after timeout. WhenAny will return when either all tasks are completed or the time is up.
await Task.WhenAny(
Task.WhenAll(tasks),
Task.Delay(millisecondsTimeout, cancellationToken)).ConfigureAwait(false);
When code reaches foreach (Task<PredictionResult> task in tasks), it's possible that
- some or even all of
tasksare not yet completed; - some or even all of
tasksfailed; - some or even all of
taskssucceeded;
For all of those possible cases, we only care about the tasks that IsCompletedSuccessfully. The rest are ignored intentionally.
Please let me know if that doesn't make sense, thanks!
There was a problem hiding this comment.
@daxian-dbw how are we intending to handle tasks that failed? I'm not seeing any handling for any failed tasks, nor anything that may be hanging indefinitely (should we have a timeout?).
That sounds kind of problematic for any kind of debugging or just general error handling that tasks that don't complete or error out never see the light of day again.
There was a problem hiding this comment.
how are we intending to handle tasks that failed? I'm not seeing any handling for any failed tasks,
From the perspective of PredictInput, a failed or a not-yet-completed task is equivalent to "there is no prediction result from that predictor". It doesn't need to care why it fails and it shouldn't propagate any failures to the caller (readline host).
nor anything that may be hanging indefinitely (should we have a timeout?).
A timeout is built in to make sure Task.WhenAny returns when the time is up.
As for an individual task, it may hang depending on what predictor.GetSuggestion does. cancellationSource.Token is passed to predictor.GetSuggestion(context, cancellationSource.Token) in the task, and when Task.WhenAny returns, cancellationSource.Cancel() is called to signal the cancellation to predictor.GetSuggestion, so if the predictor chooses to respect the cancellation token, it should be able to finish running.
That sounds kind of problematic for any kind of debugging or just general error handling that tasks that don't complete or error out never see the light of day again.
Hope my explanation helps. It doesn't affect debugging since a task cannot be cancelled from outside, so when attached to debugger, you can always see what predicotr.GetSuggestion is doing in case a hang or error happens.
Also, the task created in PredictInput is very trivial, so if a hang or error happens, that must root in predictor.GetSuggestion. It's the responsibility of a predictor implementation to handle possible error and avoid a hang. For a API that consumes the predictor implementation, it only needs to make sure the arbitrary wrong behavior in the predictor implementation won't affect the stability of the public API.
src/System.Management.Automation/engine/Subsystem/Prediction/CommandPrediction.cs
Outdated
Show resolved
Hide resolved
Current SMA.dll size is ~20 Mb with R2R (~10% from dir full size) and ~7 Mb without R2R (~3% from dir full size). |
...stem.Management.Automation/FormatAndOutput/DefaultFormatters/PowerShellCore_format_ps1xml.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/CommandCompletion/CompletionAnalysis.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/CommandCompletion/CompletionAnalysis.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/CommandCompletion/CompletionAnalysis.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/CommandCompletion/CompletionAnalysis.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/Subsystem/SubsystemInfo.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/Subsystem/SubsystemInfo.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/Subsystem/SubsystemInfo.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/Subsystem/SubsystemInfo.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/Subsystem/SubsystemInfo.cs
Outdated
Show resolved
Hide resolved
src/System.Management.Automation/engine/Subsystem/Prediction/CommandPrediction.cs
Outdated
Show resolved
Hide resolved
@iSazonov sorry for the delay of my response. You are talking about the whole pwsh package size. If you have a scenario that require full set of features, then there is no much we can do about the disk footprint. But when you have scenarios that you only need a subset of PowerShell features, there is a chance to reduce the footprint. For example, say an Azure service wants to use PowerShell language for local in-proc processing, then the remoting and job components and their dependencies are just burden. The goal of minimal powershell is to make the engine minimal and pluggable, so you can get only what you need and don't waste additional disk footprint. |
|
Thank you all for the review and comments. I think I address or respond to all the comments. |
src/System.Management.Automation/engine/Subsystem/CommandPrediction/CommandPrediction.cs
Show resolved
Hide resolved
src/System.Management.Automation/engine/CommandCompletion/CompletionAnalysis.cs
Outdated
Show resolved
Hide resolved
|
It looks like the Prediction subsystem is still in SMA but one of the goals of subsystems was pulling them out of SMA. Any reason Predictions didn't start outside of SMA? Also, I'm interested in understanding how this work will interact with:
|
The
So far, I don't think a PSHost needs to interact with subsystems. When remoting is refactored as a subsystem, it may need to interact with PSHost, but the host doesn't need to know whether it comes from
Not sure what you are asking here.
For now, you will have to build 2 assemblies for the module, one targeting PS standard, which can be used on prior-7.1 PowerShell; one targeting the 7.1 nuget packages, which is used when the psversion is 7.1. It's in the same situation as the new debugging APIs added in 7.0. They are not in the PowerShell Standard. How are you using them in PSES today? |
Which dll is it in?
Oops! I think I was going to ask if the PowerShell SDK will be responsible for pulling in subsystem dlls or if the user needs to do that. |
It will be a plugin DLL, making available as a PowerShell module, which will register itself to
For the For subsystems that don't yet exist, such as remoting and eventing, the shipping vehicles will be module and nuget package (how to support nuget package directly is not yet known). The user can choose to reference the minimal core nuget package, and pull in additional subsystems as needed via modules, or the user can choose to reference the minimal core nuget package as well as additional subsystem nuget packages, and pre-configure the subsystem registrations via APIs. |
|
Ah so this PR does not include an implementation of a Also, I guess we need to wait til we try to pull out remoting and eventing to see what the nuget package landscape will be. |
I stumbled from this earlier too. It seems the names could be more intuitively clear. |
| /// <param name="astTokens">The <see cref="Token"/> objects from parsing the current command line input.</param> | ||
| /// <param name="millisecondsTimeout">The milliseconds to timeout.</param> | ||
| /// <returns>A list of <see cref="PredictionResult"/> objects.</returns> | ||
| public static async Task<List<PredictionResult>?> PredictInput(Ast ast, Token[] astTokens, int millisecondsTimeout) |
There was a problem hiding this comment.
Would making this public static async IAsyncEnumerable<PredictionResult> be better?
There was a problem hiding this comment.
I think this would be a nice change to make
There was a problem hiding this comment.
I don't see how this can apply to this method. When await Task.WhenAny returns, there is no delay between getting the final results, so yield doesn't apply. Also, on the readline host side, the processing of individual results is not isolated from each other as the rendering requires all data to be available, so it's not clear to me how the IAsyncEnumerable can be consumed. We can get back to this when the need is clear.
|
FYI. It would be good to have this in |
|
🎉 Handy links: |
|
@daxian-dbw We need docs for this experimental feature. Please open a docs issue and provide the necessary information. |
PR Summary
The subsystem plugin model is designed to make it possible to break down the components in
System.Management.Automation.dllto individual subsystems (each of which resides in its own assembly), so that we can get a core PowerShell engine that takes minimal disk footprint, which at the meanwhile could be brought up to a fully fledged PowerShell by registering those subsystems to the core engine.Currently, only the
CommandPredictorsubsystem is supported, which will be used along with PSReadLine to provide custom prediction plugins. In future,Job,CommandCompleter,Remotingand more other components that currently inS.M.A.dllcould be make subsystems and be moved out ofS.M.A.dll.PR Context
Goal: Seek for review and early feedback.
The experimental flag
PSSubsystemPluginModelis used to indicate that this is experimental and subject to change.Register/Unregister-Subsystemcmdlets are NOT planned because it's targeting binary subsystem implementations, which should deal with registration/unregistration via APIs.PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.PSSubsystemPluginModel