-
Notifications
You must be signed in to change notification settings - Fork 97
Expose resource based auto-tuner options #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pyo3 = { version = "0.20", features = ["extension-module", "abi3-py38", "anyhow"] } | ||
pyo3-asyncio = { version = "0.20", features = ["tokio-runtime"] } | ||
pythonize = "0.20" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the impetus for this upgrade? Is it just for the enum? Not that I disagree, just want to understand. Also, why not upgrade to latest version? (I am assuming latest involves more refactoring?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to get Duration
conversion but that's only in latest, I kept what I could. There is no pyo3-asyncio
that supports latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why update at all if you couldn't update enough to get the feature you wanted? (not that I disagree w/ an update for update's sake, just checking if there was something else in 0.20 you're leveraging)
@@ -73,7 +105,7 @@ pub fn new_worker( | |||
config, | |||
client.retry_client.clone().into_inner(), | |||
) | |||
.map_err(|err| PyValueError::new_err(format!("Failed creating worker: {}", err)))?; | |||
.context("Failed creating worker")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand, can you confirm with these context
changes what the error string looks like to Python users now? I just want to see exactly what the concatenation looks like since it's not explicit anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll double check what it looks like, but I doubt it's materially different
if !all_are_same { | ||
return Err(PyValueError::new_err( | ||
"All resource-based slot suppliers must have the same ResourceBasedTunerOptions", | ||
)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a temporary limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It always needs to be this way, otherwise whatever slot type has lower targets could be completely starved out by the one with higher targets. Since you're targeting system resource use, it doesn't really make sense to have different targets within the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, I just wanted to confirm before some of my other suggestions in this PR
// Need pyo3 0.21+ for this to be std Duration | ||
ramp_throttle_ms: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Need pyo3 0.21+ for this to be std Duration | |
ramp_throttle_ms: u64, | |
ramp_throttle_millis: u64, |
We do this in plenty of other places, it's fine for the bridge (we usually use _millis
suffix)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it'd be nicer if it just worked though
temporalio/worker/_tuning.py
Outdated
|
||
|
||
@dataclass(frozen=True) | ||
class ResourceBasedTunerOptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class ResourceBasedTunerOptions: | |
class ResourceBasedTunerConfig: |
While we try to use kwargs as much as possible, I can understand the use case here, but I think we should suffix it with Config
(we've done this in a few other places)
temporalio/worker/_tuning.py
Outdated
raise NotImplementedError | ||
|
||
|
||
class ResourceBasedTuner(WorkerTuner): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this should exist now that I think about it. I think this can be on worker tuner:
class WorkerTuner(ABC):
@staticmethod
def create_resource_based(
*,
target_memory_usage: float,
target_cpu_usage: float,
workflow_slot_config: Optional[ResourceBasedSlotConfig] = None,
activity_slot_config: Optional[ResourceBasedSlotConfig] = None,
local_activity_slot_config: Optional[ResourceBasedSlotConfig] = None,
) -> WorkerTuner:
# Create composite tuner
I have no strong opinion on those trailing 3 kwarg names (e.g. can be workflow_config:
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, I can go with that
temporalio/worker/_tuning.py
Outdated
The resource based tuner is currently experimental. | ||
""" | ||
|
||
slot_options: ResourceBasedSlotOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technically doesn't need to be required, though I am ok that it is.
temporalio/worker/_tuning.py
Outdated
|
||
|
||
@dataclass(frozen=True) | ||
class CompositeTuner(WorkerTuner): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now thinking that this class doesn't need to be exposed either but instead can also be a static method on worker tuner that accepts these three kwargs and returns a worker tuner. There is no value from a user POV to expose a new type vs just a method to make a worker tuner from these three values.
(note, I will disagree with myself here when it comes to .NET)
temporalio/worker/_worker.py
Outdated
if isinstance(max_workers, int) and max_workers < ( | ||
max_concurrent_activities or 0 | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a way for us to introspect the tuner at this time to find its max activities
activity_slot_supplier: temporalio.bridge.worker.SlotSupplier | ||
local_activity_slot_supplier: temporalio.bridge.worker.SlotSupplier | ||
|
||
if tuner is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach is to just have WorkerTuner
have a private _create_bridge_tuner
method, so something like:
if tuner:
if max_concurrent_workflow_tasks or max_concurrent_activities or max_concurrent_local_activities:
raise ValueError(
"Cannot specify max_concurrent_workflow_tasks, max_concurrent_activities, "
"or max_concurrent_local_activities when also specifying tuner"
)
else:
tuner = WorkerTuner.create_fixed(
max_concurrent_workflow_tasks=max_concurrent_workflow_tasks,
max_concurrent_activities=max_concurrent_activities,
max_concurrent_local_activities=max_concurrent_local_activities,
)
bridge_tuner = tuner._create_bridge_tuner()
That at least moves the defaults and all tuning code to one place and only requires importing the WorkerTuner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one minor class name suggestion
temporalio/worker/_tuning.py
Outdated
|
||
|
||
@dataclass(frozen=True) | ||
class ResourceBasedSlotOptions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class ResourceBasedSlotOptions: | |
class ResourceBasedSlotConfig: |
Let's change this to Config
too
What was changed
Expose resource based auto-tuner
Why?
Adding in all SDKs
Checklist
Closes
How was this tested:
Added some tests, manual verification with omes / existing core tests
Any docs updates needed?
Will be after experimental phase