Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Initial work of the subsystem plugin model (for minimal powershell)#13186

Merged
anmenaga merged 26 commits intoPowerShell:masterfrom
daxian-dbw:subsystem
Aug 21, 2020
Merged

Initial work of the subsystem plugin model (for minimal powershell)#13186
anmenaga merged 26 commits intoPowerShell:masterfrom
daxian-dbw:subsystem

Conversation

@daxian-dbw
Copy link
Member

@daxian-dbw daxian-dbw commented Jul 15, 2020

PR Summary

The subsystem plugin model is designed to make it possible to break down the components in System.Management.Automation.dll to individual subsystems (each of which resides in its own assembly), so that we can get a core PowerShell engine that takes minimal disk footprint, which at the meanwhile could be brought up to a fully fledged PowerShell by registering those subsystems to the core engine.

Currently, only the CommandPredictor subsystem is supported, which will be used along with PSReadLine to provide custom prediction plugins. In future, Job, CommandCompleter, Remoting and more other components that currently in S.M.A.dll could be make subsystems and be moved out of S.M.A.dll.

PR Context

Goal: Seek for review and early feedback.
The experimental flag PSSubsystemPluginModel is used to indicate that this is experimental and subject to change.

Register/Unregister-Subsystem cmdlets are NOT planned because it's targeting binary subsystem implementations, which should deal with registration/unregistration via APIs.

PR Checklist

@TylerLeonhardt
Copy link
Member

PSES could probably leverage the CommandPredictor in completions or other places. As this progresses we should have an issue opened in PSES to support this new capability and what the experience should be like.

@ThomasNieto
Copy link
Contributor

Is it possible to ship these subsystems in modules instead of a new subsystem model? By shipping them in modules it could be possible to update individual components of PowerShell for example, updating the help subsystem with a new feature or bug fix.

@daxian-dbw
Copy link
Member Author

daxian-dbw commented Jul 16, 2020

Module will be the main approach to ship a subsystem for now. We considered leveraging NuGet package directly, but it's still vague how to support that. Subsystem will be wrapped as a module, and registration happens on module loading; unregistration happens on module unloading (well, if that subsystem allows unregistration).

@vexx32
Copy link
Collaborator

vexx32 commented Jul 16, 2020

@daxian-dbw is the help subsystem in the cards for this in future?

@ThomasNieto
Copy link
Contributor

Thanks for the clarification @daxian-dbw. It sounds like how PSProvider have Get-PSProvider but get shipped and loaded/unloaded with a module.

@daxian-dbw
Copy link
Member Author

daxian-dbw commented Jul 16, 2020

@vexx32 The current idea is to pull the help system out as a completely standalone module, which means it doesn't need to hook up into the engine, but only expose those help cmdlets. Subsystems are more about the components that are tightly coupled with the engine state and thus have to hook up to the engine via some pre-defined interface/contracts.

@daxian-dbw
Copy link
Member Author

It sounds like how PSProvider have Get-PSProvider but get shipped and loaded/unloaded with a module.

@ThomasNieto Yes, very similar to that. The Provider or JobAdapter for example allow extensions to a specific component of PowerShell, and the subsystem idea is to make it more general, applicable the the individual component itself of PowerShell.

The difference is that subsystem is not for custom extension, but only for separating existing components from the engine (maybe used for new PS components in future too).

Copy link
Contributor

@PaulHigin PaulHigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still looking through these changes, but looks good to me so far. Just a few comments/questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is subsystem registration persisted, say in a configuration file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not by this API, but the pre-registration of selected subsystems should be supported eventually to allow shipping customized PowerShell package that contains the selected subsystems out-of-box.
Configuration could be a way, another option is to define a specific folder in $PSHOME for the core engine to discover the available subsystems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be interesting to think how PowerShell core subsystems such as remoting, engine events, (debugger?) can be refactored to work within this framework. Besides cmdlets, there are also C# APIs, usually within their own namespaces but not always.

Copy link
Member Author

@daxian-dbw daxian-dbw Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely. I did a simple proof-of-concept experiment by separating out the tab completion. It was not fully done, but showed it's doable. Of course, the tab completion component cannot be compared with the other components in terms of complexity, so it would be great if we can plan for refactoring one of those components to be a subsystem as a learning.

For backward compatibility, cmdlets can be specially handled to get in the Microsoft.PowerShell.Core source (namespace), and function can be load into the global scope. As for APIs, type forward can be done to close the gap for binary implementations, for scripters, the type resolution will work as always once the subsystem assemblies are loaded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very concerned that an implementer of IPredictor will try to access runspace state. If that happens we may end up with a lot of difficult to troubleshoot dead locks and/or state corruption. This also rules out any possibility of a PowerShell based implementation.

I know y'all are probably looking to work ML into this, but I'm worried this will cause a lot of issues in the long term.

Copy link
Member Author

@daxian-dbw daxian-dbw Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

A key design point is to prevent the predictor implementation from depending on PowerShell to get the prediction results, at least not the default Runspace that is being used by the host.

Time sensitivity is the main reason for this design. The method PredictInput is likely to be called for every keystroke, so it needs to finish very fast. I set the default timeout to be 20 ms today, because in testing it seems 30 ms per keystroke is about the most we can have without having a clearly noticeable lagging when typing. The rendering also takes time, so 20 ms timeout seems to be a reasonable default for the PredictInput call. With this extreme time limit, PredictInput cannot afford running anything in PowerShell.

Note: 20 ms may still turn out to be too much when we hook it up with PSReadLine end-to-end.
As a comparison, today the time from PSReadLine reading the key to it finishing rendering for simple keys that are not bound is just a few milliseconds (1 - 3 ms depending on the length of the total input)

The second reason is to allow maximum parallelization between the render (e.g. PSReadLine) and the predictor (still tightly related to the time sensitivity). The current implementation of CommandPrediction.PredictInput spins up tasks to call GetSuggestion on each predictor, and then it returns a task that will finish when either all tasks are done or the timeout is up. From the PSReadLine side, it will call PredictInput early (as soon as the user input is known), and continue with its rendering preparation (e.g. GenerateRender) until the point that it needs to get the prediction results. That means the a predictor implementation cannot assume that the default Runspace is put on hold when GetSuggestion is called, and thus it shouldn't depend on it.

The third reason is about the difference between tab completion and prediction. I don't expect the prediction feature to replace tab completion and it also won't be able to. IMHO, unlike tab completion, prediction doesn't need to be that accurate, so it doesn't need to do things like resolving the CommandAst to CommandProcessor or CommandInfo, which requires the state of the engine where the host is running in.

A predictor implementation should be more ML like, that can learn from history and other telemetries. CommandPrediction.LineAccepted is designed to be called when a line is accepted by the host. At that point, the PS engine will be busy running the accepted command line, and the latest history is provided to the predictor to do thinking/learning in parallel. This is a chance for the predictor to spend more time on processing to make up the extreme time limit placed on Predictor.GetSuggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A key design point is to prevent the predictor implementation from depending on PowerShell to get the prediction results, at least not the default Runspace that is being used by the host.

Yeah that makes sense. My worry is that they'll hold onto a bit of state during registration, something that marshals back to the default runspace for a trivial call (like CommandInfo.Parameters).

Time sensitivity is the main reason for this design. The method PredictInput is likely to be called for every keystroke, so it needs to finish very fast.

What if there was a one time wait handle, that fires after ~50ms here. I'm not sure how the UX would be tbh, it could go either way but I think it's worth trying. That would greatly limit the amount of the API needs to be called and may end up with a very similar or better experience.

The third reason is about the difference between tab completion and prediction. I don't expect the prediction feature to replace tab completion and it also won't be able to. IMHO, unlike tab completion, prediction doesn't need to be that accurate, so it doesn't need to do things like resolving the CommandAst to CommandProcessor or CommandInfo, which requires the state of the engine where the host is running in.

There are definitely use cases for it though. Like finishing a pipeline for instance. Lets say you have:

Get-ChildItem *.log

It might be desirable to have a prediction like:

Get-ChildItem *.log | Remove-Item

Now I'm not saying that's something I personally want, but it does seem inevitable to me that if this API sees a lot of use, there will be some popular state based implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SeeminglyScience Sorry for the delay of my response, and again, I appreciate your feedback!

What if there was a one time wait handle, that fires after ~50ms here.

Can you please elaborate a bit more about this? How do you think it will work with a one time wait handle here?

Back to the topic about whether or not allowing predictor to access the default Runspace. Another important reason is that you can have multiple predictors registered, and with the current design, they will be triggered at about the same time. If predictors are allowed to access the default Runspace, then it's likely multiple predictors will try doing that at the same time, which will not work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate a bit more about this? How do you think it will work with a one time wait handle here?

One time per key press I mean. Basically a 50ms timeout after each key press so that it only triggers on a pause or when a prediction is accepted (e.g. right arrow is pressed).

Back to the topic about whether or not allowing predictor to access the default Runspace. Another important reason is that you can have multiple predictors registered, and with the current design, they will be triggered at about the same time. If predictors are allowed to access the default Runspace, then it's likely multiple predictors will try doing that at the same time, which will not work.

I'm not trying to make a case for directly supporting runspace access, I'm saying if this API is popular then state based implementations are inevitable regardless. It won't be clear to the implementer exactly how bad that can be, it'll just look like an oversight in the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also take the same approach as what VS Code does. For A-Z characters, they send only one message on the first keystroke and then client side filter as the user types more A-Z characters.

Example:

I

Gives a bunch of completions like: Invoke-Command, Install-Module, Import-Module... from PSES.

Once the user types:

In

The client side filters the results from the I results and doesn't send a message to PSES. Import-Module no longer shows up in completions as expected.

Then PredictInput doesn't need to be run as frequently.

Copy link
Member Author

@daxian-dbw daxian-dbw Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for the helpful ideas! Definitely need to consider this when hooking up the PSReadLine with the prediction APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to reach out when you get to that point 😃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the implementer can't be PowerShell anyway, should this API also be async? If it's synchronous and the external code ends up calling async API's then you might risk thread starvation in environments with low thread pool sizes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about making it async, but didn't go that way.
Here is my reasoning: there is a strict time limit on how long this method should run to finish -- 20 ms by default for now, so doing async operations in this method feels like wasting time on extra overheads (scheduling, context switching and etc.). That's why I chose to make it synchronous, kinda like a suggestion/hint that synchronous operations are preferred in this method implementation.

Copy link
Contributor

@SeeminglyScience SeeminglyScience Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any task that isn't completed when the first task finished a timeout occurs will be ignored, is that intended?

Edit: Missed the WhenAll, thanks @daxian-dbw! I do share @vexx32's concerns though regarding failed and incomplete tasks though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your question. Let me explain my intent here more clearly.
The code below is to make PredictInput stop waiting after timeout. WhenAny will return when either all tasks are completed or the time is up.

                await Task.WhenAny(
                    Task.WhenAll(tasks),
                    Task.Delay(millisecondsTimeout, cancellationToken)).ConfigureAwait(false);

When code reaches foreach (Task<PredictionResult> task in tasks), it's possible that

  • some or even all of tasks are not yet completed;
  • some or even all of tasks failed;
  • some or even all of tasks succeeded;

For all of those possible cases, we only care about the tasks that IsCompletedSuccessfully. The rest are ignored intentionally.

Please let me know if that doesn't make sense, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daxian-dbw how are we intending to handle tasks that failed? I'm not seeing any handling for any failed tasks, nor anything that may be hanging indefinitely (should we have a timeout?).

That sounds kind of problematic for any kind of debugging or just general error handling that tasks that don't complete or error out never see the light of day again.

Copy link
Member Author

@daxian-dbw daxian-dbw Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we intending to handle tasks that failed? I'm not seeing any handling for any failed tasks,

From the perspective of PredictInput, a failed or a not-yet-completed task is equivalent to "there is no prediction result from that predictor". It doesn't need to care why it fails and it shouldn't propagate any failures to the caller (readline host).

nor anything that may be hanging indefinitely (should we have a timeout?).

A timeout is built in to make sure Task.WhenAny returns when the time is up.

As for an individual task, it may hang depending on what predictor.GetSuggestion does. cancellationSource.Token is passed to predictor.GetSuggestion(context, cancellationSource.Token) in the task, and when Task.WhenAny returns, cancellationSource.Cancel() is called to signal the cancellation to predictor.GetSuggestion, so if the predictor chooses to respect the cancellation token, it should be able to finish running.

That sounds kind of problematic for any kind of debugging or just general error handling that tasks that don't complete or error out never see the light of day again.

Hope my explanation helps. It doesn't affect debugging since a task cannot be cancelled from outside, so when attached to debugger, you can always see what predicotr.GetSuggestion is doing in case a hang or error happens.

Also, the task created in PredictInput is very trivial, so if a hang or error happens, that must root in predictor.GetSuggestion. It's the responsibility of a predictor implementation to handle possible error and avoid a hang. For a API that consumes the predictor implementation, it only needs to make sure the arbitrary wrong behavior in the predictor implementation won't affect the stability of the public API.

@iSazonov
Copy link
Collaborator

so that we can get a core PowerShell engine that takes minimal disk footprint

Current SMA.dll size is ~20 Mb with R2R (~10% from dir full size) and ~7 Mb without R2R (~3% from dir full size).
I wonder how could we noticeably reduce "disk footprint" with the plugin model?

@ghost ghost added the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Jul 25, 2020
@ghost ghost removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Jul 30, 2020
@daxian-dbw
Copy link
Member Author

Current SMA.dll size is ~20 Mb with R2R (~10% from dir full size) and ~7 Mb without R2R (~3% from dir full size).
I wonder how could we noticeably reduce "disk footprint" with the plugin model?

@iSazonov sorry for the delay of my response. You are talking about the whole pwsh package size. If you have a scenario that require full set of features, then there is no much we can do about the disk footprint. But when you have scenarios that you only need a subset of PowerShell features, there is a chance to reduce the footprint.

For example, say an Azure service wants to use PowerShell language for local in-proc processing, then the remoting and job components and their dependencies are just burden. The goal of minimal powershell is to make the engine minimal and pluggable, so you can get only what you need and don't waste additional disk footprint.

@daxian-dbw
Copy link
Member Author

Thank you all for the review and comments. I think I address or respond to all the comments.
Let's keep the discussion going and I appreciate any additional feedback!

@TylerLeonhardt
Copy link
Member

It looks like the Prediction subsystem is still in SMA but one of the goals of subsystems was pulling them out of SMA. Any reason Predictions didn't start outside of SMA?

Also, I'm interested in understanding how this work will interact with:

  • Hosting scenarios using the PowerShell SDK
    • How does a PSHost interact with subsystems?
    • PowerShell SDK currently pulls in SMA... what
  • Referencing scenarios using PowerShell Standard
    • If I am a PowerShell module referencing PowerShell Standard, how could I leverage a subsystem only when I'm running in PS 7.next?

@daxian-dbw
Copy link
Member Author

It looks like the Prediction subsystem is still in SMA but one of the goals of subsystems was pulling them out of SMA. Any reason Predictions didn't start outside of SMA?

The CommandPredictor subsystem is not in SMA. Without a predictor implementation, APIs exposed from CommandPrediction do nothing. CommandPrediction exposes the interfaces for a user to consume the CommandPredictor implementations, if any is registered. So CommandPrediction is not the "prediction subsystem", but just a thin layer between users and the CommandPredictor subsystem.

Hosting scenarios using the PowerShell SDK - How does a PSHost interact with subsystems?

So far, I don't think a PSHost needs to interact with subsystems. When remoting is refactored as a subsystem, it may need to interact with PSHost, but the host doesn't need to know whether it comes from S.M.A.Core or a PowerShell.Subsystem.Remoting.dll.

PowerShell SDK currently pulls in SMA... what

Not sure what you are asking here.

If I am a PowerShell module referencing PowerShell Standard, how could I leverage a subsystem only when I'm running in PS 7.next?

For now, you will have to build 2 assemblies for the module, one targeting PS standard, which can be used on prior-7.1 PowerShell; one targeting the 7.1 nuget packages, which is used when the psversion is 7.1. It's in the same situation as the new debugging APIs added in 7.0. They are not in the PowerShell Standard. How are you using them in PSES today?

@TylerLeonhardt
Copy link
Member

The CommandPredictor subsystem is not in SMA.

Which dll is it in?

Not sure what you are asking here.

Oops! I think I was going to ask if the PowerShell SDK will be responsible for pulling in subsystem dlls or if the user needs to do that.

@daxian-dbw
Copy link
Member Author

The CommandPredictor subsystem is not in SMA.

Which dll is it in?

It will be a plugin DLL, making available as a PowerShell module, which will register itself to SubsystemManager during module importing.

I think I was going to ask if the PowerShell SDK will be responsible for pulling in subsystem dlls or if the user needs to do that.

For the CommandPredictor subsystem specifically, the user needs to import the module that implements the predictor.

For subsystems that don't yet exist, such as remoting and eventing, the shipping vehicles will be module and nuget package (how to support nuget package directly is not yet known). The user can choose to reference the minimal core nuget package, and pull in additional subsystems as needed via modules, or the user can choose to reference the minimal core nuget package as well as additional subsystem nuget packages, and pre-configure the subsystem registrations via APIs.

@TylerLeonhardt
Copy link
Member

Ah so this PR does not include an implementation of a ComandPredictor. I see now.

Also, I guess we need to wait til we try to pull out remoting and eventing to see what the nuget package landscape will be.

@iSazonov
Copy link
Collaborator

So CommandPrediction is not the "prediction subsystem", but just a thin layer between users and the CommandPredictor subsystem.

I stumbled from this earlier too. It seems the names could be more intuitively clear.

/// <param name="astTokens">The <see cref="Token"/> objects from parsing the current command line input.</param>
/// <param name="millisecondsTimeout">The milliseconds to timeout.</param>
/// <returns>A list of <see cref="PredictionResult"/> objects.</returns>
public static async Task<List<PredictionResult>?> PredictInput(Ast ast, Token[] astTokens, int millisecondsTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would making this public static async IAsyncEnumerable<PredictionResult> be better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a nice change to make

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this can apply to this method. When await Task.WhenAny returns, there is no delay between getting the final results, so yield doesn't apply. Also, on the readline host side, the processing of individual results is not isolated from each other as the rendering requires all data to be available, so it's not clear to me how the IAsyncEnumerable can be consumed. We can get back to this when the need is clear.

@anmenaga anmenaga added the CL-Engine Indicates that a PR should be marked as an engine change in the Change Log label Aug 19, 2020
@anmenaga
Copy link

FYI. It would be good to have this in PowerShell 7.1.0-preview.7 and currently requirements are met.
Be advised: I'll be merging this in 24 hours, so there is the last chance to post anything blocking this PR.
@TylerLeonhardt @rjmholt @theJasonHelmick @adityapatwardhan @iSazonov @SeeminglyScience @vexx32 @PaulHigin

@anmenaga anmenaga merged commit fc4c9cb into PowerShell:master Aug 21, 2020
@daxian-dbw daxian-dbw deleted the subsystem branch August 21, 2020 17:33
@daxian-dbw daxian-dbw added this to the 7.1.0-preview.7 milestone Aug 21, 2020
@ghost
Copy link

ghost commented Sep 8, 2020

🎉v7.1.0-preview.7 has been released which incorporates this pull request.:tada:

Handy links:

@sdwheeler
Copy link
Collaborator

@daxian-dbw We need docs for this experimental feature. Please open a docs issue and provide the necessary information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-Engine Indicates that a PR should be marked as an engine change in the Change Log

Projects

None yet

Development

Successfully merging this pull request may close these issues.