Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Expose resource based auto-tuner options #559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 26, 2024
Merged

Expose resource based auto-tuner options #559

merged 11 commits into from
Jun 26, 2024

Conversation

Sushisource
Copy link
Member

What was changed

Expose resource based auto-tuner

Why?

Adding in all SDKs

Checklist

  1. Closes

  2. How was this tested:
    Added some tests, manual verification with omes / existing core tests

  3. Any docs updates needed?
    Will be after experimental phase

@Sushisource Sushisource requested a review from a team as a code owner June 24, 2024 23:07
Comment on lines +17 to +19
pyo3 = { version = "0.20", features = ["extension-module", "abi3-py38", "anyhow"] }
pyo3-asyncio = { version = "0.20", features = ["tokio-runtime"] }
pythonize = "0.20"
Copy link
Member

@cretz cretz Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the impetus for this upgrade? Is it just for the enum? Not that I disagree, just want to understand. Also, why not upgrade to latest version? (I am assuming latest involves more refactoring?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to get Duration conversion but that's only in latest, I kept what I could. There is no pyo3-asyncio that supports latest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why update at all if you couldn't update enough to get the feature you wanted? (not that I disagree w/ an update for update's sake, just checking if there was something else in 0.20 you're leveraging)

@@ -73,7 +105,7 @@ pub fn new_worker(
config,
client.retry_client.clone().into_inner(),
)
.map_err(|err| PyValueError::new_err(format!("Failed creating worker: {}", err)))?;
.context("Failed creating worker")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand, can you confirm with these context changes what the error string looks like to Python users now? I just want to see exactly what the concatenation looks like since it's not explicit anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll double check what it looks like, but I doubt it's materially different

Comment on lines +343 to +347
if !all_are_same {
return Err(PyValueError::new_err(
"All resource-based slot suppliers must have the same ResourceBasedTunerOptions",
));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a temporary limitation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It always needs to be this way, otherwise whatever slot type has lower targets could be completely starved out by the one with higher targets. Since you're targeting system resource use, it doesn't really make sense to have different targets within the process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, I just wanted to confirm before some of my other suggestions in this PR

Comment on lines +76 to +77
// Need pyo3 0.21+ for this to be std Duration
ramp_throttle_ms: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Need pyo3 0.21+ for this to be std Duration
ramp_throttle_ms: u64,
ramp_throttle_millis: u64,

We do this in plenty of other places, it's fine for the bridge (we usually use _millis suffix)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it'd be nicer if it just worked though



@dataclass(frozen=True)
class ResourceBasedTunerOptions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class ResourceBasedTunerOptions:
class ResourceBasedTunerConfig:

While we try to use kwargs as much as possible, I can understand the use case here, but I think we should suffix it with Config (we've done this in a few other places)

raise NotImplementedError


class ResourceBasedTuner(WorkerTuner):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this should exist now that I think about it. I think this can be on worker tuner:

class WorkerTuner(ABC):
    @staticmethod
    def create_resource_based(
        *,
        target_memory_usage: float,
        target_cpu_usage: float,
        workflow_slot_config: Optional[ResourceBasedSlotConfig] = None,
        activity_slot_config: Optional[ResourceBasedSlotConfig] = None,
        local_activity_slot_config: Optional[ResourceBasedSlotConfig] = None,
    ) -> WorkerTuner:
        # Create composite tuner

I have no strong opinion on those trailing 3 kwarg names (e.g. can be workflow_config:)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, I can go with that

The resource based tuner is currently experimental.
"""

slot_options: ResourceBasedSlotOptions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This technically doesn't need to be required, though I am ok that it is.



@dataclass(frozen=True)
class CompositeTuner(WorkerTuner):
Copy link
Member

@cretz cretz Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now thinking that this class doesn't need to be exposed either but instead can also be a static method on worker tuner that accepts these three kwargs and returns a worker tuner. There is no value from a user POV to expose a new type vs just a method to make a worker tuner from these three values.

(note, I will disagree with myself here when it comes to .NET)

Comment on lines 286 to 288
if isinstance(max_workers, int) and max_workers < (
max_concurrent_activities or 0
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a way for us to introspect the tuner at this time to find its max activities

activity_slot_supplier: temporalio.bridge.worker.SlotSupplier
local_activity_slot_supplier: temporalio.bridge.worker.SlotSupplier

if tuner is not None:
Copy link
Member

@cretz cretz Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach is to just have WorkerTuner have a private _create_bridge_tuner method, so something like:

if tuner:
    if max_concurrent_workflow_tasks or max_concurrent_activities or max_concurrent_local_activities:
        raise ValueError(
            "Cannot specify max_concurrent_workflow_tasks, max_concurrent_activities, "
            "or max_concurrent_local_activities when also specifying tuner"
        )
else:
    tuner = WorkerTuner.create_fixed(
        max_concurrent_workflow_tasks=max_concurrent_workflow_tasks,
        max_concurrent_activities=max_concurrent_activities,
        max_concurrent_local_activities=max_concurrent_local_activities,
    )
bridge_tuner = tuner._create_bridge_tuner()

That at least moves the defaults and all tuning code to one place and only requires importing the WorkerTuner

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor class name suggestion



@dataclass(frozen=True)
class ResourceBasedSlotOptions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class ResourceBasedSlotOptions:
class ResourceBasedSlotConfig:

Let's change this to Config too

@Sushisource Sushisource enabled auto-merge (squash) June 26, 2024 18:10
@Sushisource Sushisource merged commit 7ac4445 into main Jun 26, 2024
12 checks passed
@Sushisource Sushisource deleted the autotune branch June 26, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants