Expose resource based auto-tuner options #559

Sushisource · 2024-06-24T23:07:06Z

What was changed

Expose resource based auto-tuner

Why?

Adding in all SDKs

Checklist

Closes
How was this tested:
Added some tests, manual verification with omes / existing core tests
Any docs updates needed?
Will be after experimental phase

cretz · 2024-06-25T13:55:33Z

temporalio/bridge/Cargo.toml

+pyo3 = { version = "0.20", features = ["extension-module", "abi3-py38", "anyhow"] }
+pyo3-asyncio = { version = "0.20", features = ["tokio-runtime"] }
+pythonize = "0.20"


What was the impetus for this upgrade? Is it just for the enum? Not that I disagree, just want to understand. Also, why not upgrade to latest version? (I am assuming latest involves more refactoring?)

I wanted to get Duration conversion but that's only in latest, I kept what I could. There is no pyo3-asyncio that supports latest

Why update at all if you couldn't update enough to get the feature you wanted? (not that I disagree w/ an update for update's sake, just checking if there was something else in 0.20 you're leveraging)

cretz · 2024-06-25T13:58:03Z

temporalio/bridge/src/worker.rs

@@ -73,7 +105,7 @@ pub fn new_worker(
        config,
        client.retry_client.clone().into_inner(),
    )
-    .map_err(|err| PyValueError::new_err(format!("Failed creating worker: {}", err)))?;
+    .context("Failed creating worker")?;


Just to understand, can you confirm with these context changes what the error string looks like to Python users now? I just want to see exactly what the concatenation looks like since it's not explicit anymore.

Yeah, I'll double check what it looks like, but I doubt it's materially different

cretz · 2024-06-25T14:04:39Z

temporalio/bridge/src/worker.rs

+        if !all_are_same {
+            return Err(PyValueError::new_err(
+                "All resource-based slot suppliers must have the same ResourceBasedTunerOptions",
+            ));
+        }


Is this a temporary limitation?

No. It always needs to be this way, otherwise whatever slot type has lower targets could be completely starved out by the one with higher targets. Since you're targeting system resource use, it doesn't really make sense to have different targets within the process.

This makes sense, I just wanted to confirm before some of my other suggestions in this PR

cretz · 2024-06-25T14:05:46Z

temporalio/bridge/src/worker.rs

+    // Need pyo3 0.21+ for this to be std Duration
+    ramp_throttle_ms: u64,


Suggested change

// Need pyo3 0.21+ for this to be std Duration

ramp_throttle_ms: u64,

ramp_throttle_millis: u64,

We do this in plenty of other places, it's fine for the bridge (we usually use _millis suffix)

Sure, it'd be nicer if it just worked though

cretz · 2024-06-25T14:10:42Z

temporalio/worker/_tuning.py

+
+
+@dataclass(frozen=True)
+class ResourceBasedTunerOptions:


Suggested change

class ResourceBasedTunerOptions:

class ResourceBasedTunerConfig:

While we try to use kwargs as much as possible, I can understand the use case here, but I think we should suffix it with Config (we've done this in a few other places)

cretz · 2024-06-25T14:20:41Z

temporalio/worker/_tuning.py

+        raise NotImplementedError
+
+
+class ResourceBasedTuner(WorkerTuner):


I am not sure this should exist now that I think about it. I think this can be on worker tuner:

class WorkerTuner(ABC): @staticmethod def create_resource_based( *, target_memory_usage: float, target_cpu_usage: float, workflow_slot_config: Optional[ResourceBasedSlotConfig] = None, activity_slot_config: Optional[ResourceBasedSlotConfig] = None, local_activity_slot_config: Optional[ResourceBasedSlotConfig] = None, ) -> WorkerTuner: # Create composite tuner

I have no strong opinion on those trailing 3 kwarg names (e.g. can be workflow_config:)

K, I can go with that

cretz · 2024-06-25T14:25:43Z

temporalio/worker/_tuning.py

+        The resource based tuner is currently experimental.
+    """
+
+    slot_options: ResourceBasedSlotOptions


This technically doesn't need to be required, though I am ok that it is.

cretz · 2024-06-25T14:27:27Z

temporalio/worker/_tuning.py

+
+
+@dataclass(frozen=True)
+class CompositeTuner(WorkerTuner):


I am now thinking that this class doesn't need to be exposed either but instead can also be a static method on worker tuner that accepts these three kwargs and returns a worker tuner. There is no value from a user POV to expose a new type vs just a method to make a worker tuner from these three values.

(note, I will disagree with myself here when it comes to .NET)

cretz · 2024-06-25T14:29:03Z

temporalio/worker/_worker.py

+            if isinstance(max_workers, int) and max_workers < (
+                max_concurrent_activities or 0
+            ):


I wonder if there is a way for us to introspect the tuner at this time to find its max activities

cretz · 2024-06-25T14:33:15Z

temporalio/worker/_worker.py

+        activity_slot_supplier: temporalio.bridge.worker.SlotSupplier
+        local_activity_slot_supplier: temporalio.bridge.worker.SlotSupplier
+
+        if tuner is not None:


Another approach is to just have WorkerTuner have a private _create_bridge_tuner method, so something like:

if tuner: if max_concurrent_workflow_tasks or max_concurrent_activities or max_concurrent_local_activities: raise ValueError( "Cannot specify max_concurrent_workflow_tasks, max_concurrent_activities, " "or max_concurrent_local_activities when also specifying tuner" ) else: tuner = WorkerTuner.create_fixed( max_concurrent_workflow_tasks=max_concurrent_workflow_tasks, max_concurrent_activities=max_concurrent_activities, max_concurrent_local_activities=max_concurrent_local_activities, ) bridge_tuner = tuner._create_bridge_tuner()

That at least moves the defaults and all tuning code to one place and only requires importing the WorkerTuner

cretz

LGTM, just one minor class name suggestion

cretz · 2024-06-25T18:09:10Z

temporalio/worker/_tuning.py

+
+
+@dataclass(frozen=True)
+class ResourceBasedSlotOptions:


Suggested change

class ResourceBasedSlotOptions:

class ResourceBasedSlotConfig:

Let's change this to Config too

Sushisource added 6 commits June 24, 2024 10:47

Autotune progress from laptop

56ef361

Conversions are compiling / basic tests running

1929bda

Added basic tests

80e41ee

Add docstrings

3e2a000

Fix the mutual exclusion test

cf25933

Proper default specification for slot opions

bcd8262

Sushisource requested a review from a team as a code owner June 24, 2024 23:07

cretz reviewed Jun 25, 2024

View reviewed changes

Sushisource added 3 commits June 25, 2024 09:55

Rename options->config

cd18bb2

Expose fewer things

0c5cedf

Augment check for max workers

fc4a86d

cretz reviewed Jun 25, 2024

View reviewed changes

One last rename

b20cacd

cretz approved these changes Jun 26, 2024

View reviewed changes

Merge branch 'main' into autotune

abe8c6f

Sushisource enabled auto-merge (squash) June 26, 2024 18:10

Sushisource merged commit 7ac4445 into main Jun 26, 2024
12 checks passed

Sushisource deleted the autotune branch June 26, 2024 18:24

gregbrowndev mentioned this pull request Oct 9, 2024

[Feature Request] Activity specific worker tuning #663

Closed

		// Need pyo3 0.21+ for this to be std Duration
		ramp_throttle_ms: u64,

	// Need pyo3 0.21+ for this to be std Duration
	ramp_throttle_ms: u64,
	ramp_throttle_millis: u64,

	class ResourceBasedTunerOptions:
	class ResourceBasedTunerConfig:

		raise NotImplementedError


		class ResourceBasedTuner(WorkerTuner):

	class ResourceBasedSlotOptions:
	class ResourceBasedSlotConfig:

Expose resource based auto-tuner options #559

Expose resource based auto-tuner options #559

Uh oh!

Conversation

Sushisource commented Jun 24, 2024

What was changed

Why?

Checklist

Uh oh!

cretz Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cretz Jun 25, 2024 •

edited

Loading

cretz Jun 25, 2024 •

edited

Loading

cretz Jun 25, 2024 •

edited

Loading