Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jwhite242
Copy link
Collaborator

@jwhite242 jwhite242 commented Aug 15, 2025

Expands available options to pass through to flux, replacing the args dictionary in the batch block which only passed through shell options (--setopt/-o) to the launcher (flux run). Adds two dictionaries to the batch block:

  • allocation_args: enable attaching arbitrary numbers of options to the allocation (jobspec) corresponding to the cli equivalents: -o/--setopt, -S/--setattr, and --conf
  • launcher_args: replaces args for attaching to launcher, but enables passing through any option prefix, not just -o/--setopt

Deprecates args, but leaves it intact as the only option for older flux adapters (< 0.49) pending future removal.

Adds flux directives with all allocation options to serialized step scripts to enable reproducibility and direct cli submission by users to match capability of other script adapters.

Exposes new exclusive syntax for step keys to enable separate allocation/launcher control: only flux uses the launcher option, other schedulers to follow.

Adds script writing and job submission tests for both flux and slurm.

Comment on lines 60 to 61
| Multi-letter | `--` | `=` | `setopt: {foo: bar} | `--setopt=foo=bar` |
| Boolean flag w/key | as above | as above | `setopt: {foobar: } | `--setopt=foobar` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing backticks in the 4th columns here which is making the formatting on mkdocs all wonky

Copy link
Member

@bgunnar5 bgunnar5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this branch is totally done yet but I had some time to do a review now.

Comment on lines 110 to 115
# Handle old scalar syntax which applied to allocatios only
if not isinstance(step_exclusive, dict):
return {
"allocation": step_exclusive,
"launcher": False,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to deprecate this in the future? If so, maybe we should add a warning log statement here to notify users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. Should we deprecate this shorthand? It is more often useful at the allocation level (scheduling to core scheduled partitions being one of the few use cases I can think of), so maybe useful to keep it. But I've no strong opinions on it either way.

return cls.known_alloc_arg_types

@classmethod
def addtl_alloc_arg_type_map(cls, option):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to rename this to make it more obvious that this is a getter method. Maybe get_addtl_alloc_arg_type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's probably good for this version. In the subsequent PR's that will generalize this, something a little more descriptive like 'get_normalized..' or something? though with some iteration to make it not 30 char's long..

Comment on lines 127 to 134
# for av_name, av_value in arg_value.items():
# value_str = render_arg_value(av_name, av_value)
# yield "{prefix}{key}{sep}{value}".format(
# prefix=arg_info['prefix'],
# key=arg_key,
# sep=arg_info['sep'],
# value=value_str
# )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needed? Same question goes for render_arg_value() above

addtl_batch_args = {}

# May want to also support setattr_shell_option at some point?
for batch_arg_type in ["attributes", "shell_options", "conf"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use cls.addtl_alloc_arg_types instead of hardcoded list unless you have a specific reason for this?

Comment on lines +204 to +206
if conf_dict:
LOGGER.warn("'conf' options not currently supported with "
" nested=False. Ignoring.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a flux limitation or a limitation we're enforcing ourselves?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so much a limitation as part of the design. There's no argument for passing it in like with the from_nest and from_batch command methods, likely because unlike those two, this one doesn't create a nested broker to attach it to.

Longer term I think this option in Maestro will may get deprecated unless we find a good use case for it; quite different behavior from the from_nest_command option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, that makes sense

Comment on lines 126 to 128
self._attr_prefixes = ['S', 'setattr']
self._opts_prefixes = ['o', 'setopt']
self._conf_prefixes = ['conf'] # No abbreviated form for this in flux docs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these used? Seems like the same thing as the _allocation_args_map

Comment on lines 188 to 192
"nodes": f"{self._flux_directive}" + "-N {nodes}",
# NOTE: always use seconds to guard against upstream default behavior changes
"walltime": f"{self._flux_directive}" + "-t {walltime}s",
"queue": f"{self._flux_directive}" + "-q {queue}",
"bank": f"{self._flux_directive}" + "--bank {bank}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think you need '+' in these strings; can just use f"{self._flux_directive} -N {nodes}" and similar entries for walltime, queue, and bank

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more to defer them -> walltime, nodes, .. (and later queue/bank) are step specific values vs the flux_directive being adapter specific. The + joined strings get rendered later when writing out the batch script for individual steps.

Comment on lines 291 to 292
# TODO: add better mechanism for tracking whicn args
# actually get used; dicts can't do this..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataclass use-case? 👀

Comment on lines 453 to 459
# Handle the exclusive flags, updating batch block settings (default)
# with what's set in the step
step_exclusive_given = "exclusive" in step.run
step_exclusive = self._exclusive
if step_exclusive_given:
# Override the default with this step's setting
step_exclusive.update(self.get_exclusive(step.run.get("exclusive", False)))
Copy link
Member

@bgunnar5 bgunnar5 Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nearly identical code is in get_header and get_parallelize_command. Consider moving logic to a method to avoid duplication.


# Set up the output directory.
out_dir = environment.remove("OUTPUT_PATH")
if output_path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should output_path be a default argument output_path=None since we have this check?

Copy link
Member

@bgunnar5 bgunnar5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A lot of these suggestions are just removing commented code and print statements. Really nothing major

Comment on lines 57 to 62
| **Key Type** | **Prefix** | **Separator** | **Example YAML** | **Example CLI Input/Directive** |
| :- | :- | :- | :- | :- |
| Single letter | `-` | `" "` (space) | `o: {bar: 42}` | `-o bar=42` |
| Multi-letter | `--` | `=` | `setopt: {foo: bar}` | `--setopt=foo=bar` |
| Boolean flag w/key | as above | as above | `setopt: {foobar: }` | `--setopt=foobar` |
| Boolean flag w/o key | as above | as above | `exclusive: ` | `--exclusive` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Example YAML Column here, you may want each of these to be multi-line instead like users are used to seeing? I've no strong opinion on this, just thinking from users POV.

I've done this in a table before with html tags. So for that first example it'd become:

<pre><code><span>o:</span></br><span>  bar:</span></br><span>    42</span></code></pre>

Ugly in raw text but formats nicely in the table.

### Extra Flux Args
----

As of :material-tag:`1.1.12`, the flux adapter takes advantage of new argument pass through for scheduler options that Maestro cannot abstract away. This is done via `allocation_args` and `launcher_args` in the batch block, which expand upon the previous `args` input which only applied to `$(LAUNCHER)`. There are some caveat's here due to the way Maestro talks to flux. The current flux adapters all use the python api's from Flux to build the batch jobs, with the serialized batch script being serialized separately instead of submitted directly as with the other schedulers. A consequence of this is the `allocation_args` map to specific call points on that python api, and thus the option pass through is not quite arbitrary. There are 4 currently supported options for allocations which cover a majority of usecases (open an issue and let us know if one you need isn't covered!):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As of :material-tag:`1.1.12`, the flux adapter takes advantage of new argument pass through for scheduler options that Maestro cannot abstract away. This is done via `allocation_args` and `launcher_args` in the batch block, which expand upon the previous `args` input which only applied to `$(LAUNCHER)`. There are some caveat's here due to the way Maestro talks to flux. The current flux adapters all use the python api's from Flux to build the batch jobs, with the serialized batch script being serialized separately instead of submitted directly as with the other schedulers. A consequence of this is the `allocation_args` map to specific call points on that python api, and thus the option pass through is not quite arbitrary. There are 4 currently supported options for allocations which cover a majority of usecases (open an issue and let us know if one you need isn't covered!):
As of :material-tag:`1.1.12`, the flux adapter takes advantage of new argument pass through for scheduler options that Maestro cannot abstract away. This is done via `allocation_args` and `launcher_args` in the batch block, which expand upon the previous `args` input which only applied to `$(LAUNCHER)`. There are some caveat's here due to the way Maestro talks to flux. The current flux adapters all use the python api's from Flux to build the batch jobs, with the serialized batch script being serialized separately instead of submitted directly as with the other schedulers. A consequence of this is the `allocation_args` map to specific call points on that python api, and thus the option pass through is not quite arbitrary. There are 4 currently supported options for allocations which cover a majority of usecases (open an issue and let us know if there's one you need that isn't covered!):

resource.rediscover: "true" # Use string "true" for Flux compatibility, not "True" or bool True
launcher_args:
setopt:
optiona: # Boolean flag, no value needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be "option a" or "optional"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optiona; just a made up name for some random option to demonstrate the syntax mapping from yaml to batch script

#flux: --setattr=foobar=whoops
#flux: --conf=resource.rediscover=true
flux run -n 1 -N 1 -c 1 --setopt=optiona myapplication
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, "option a" or "optional"? From this context I'm assuming the former in which case we may want to change the naming convention for clarity?

Comment on lines +264 to +266
.. note::
Should we have an enum for these or something vs random strings?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. Could probably help with safeguarding against unsupported options

Comment on lines +473 to +475
# if addtl:
# args.append("-o")
# args.append(",".join(addtl))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# if addtl:
# args.append("-o")
# args.append(",".join(addtl))

Comment on lines 108 to 114
if self._allocation_args:
self._allocation_args = self._interface.normalize_additional_args(self._allocation_args)
# pprint(f"{self._allocation_args=}")

if self._launcher_args:
self._launcher_args = self._interface.normalize_additional_args(self._launcher_args)
# pprint(f"{self._launcher_args=}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self._allocation_args:
self._allocation_args = self._interface.normalize_additional_args(self._allocation_args)
# pprint(f"{self._allocation_args=}")
if self._launcher_args:
self._launcher_args = self._interface.normalize_additional_args(self._launcher_args)
# pprint(f"{self._launcher_args=}")
if self._allocation_args:
self._allocation_args = self._interface.normalize_additional_args(self._allocation_args)
if self._launcher_args:
self._launcher_args = self._interface.normalize_additional_args(self._launcher_args)

# "Flux URI must be specified in batch or stored in the "
# "environment under 'FLUX_URI'")

# NOTE: Host doesn"t seem to matter for FLUX. sbatch assumes that the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This __init__ method is getting pretty large. May want to consider splitting into separate methods in the future


alloc_eflags = [self._allocation_args.pop(ekey, None) for ekey in exclusive_keys]
if alloc_eflags:
if step_exclusive:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be if step_exclusive and self._exclusive['allocation'] like the launcher section below? If so, you may just want a common method for this shared logic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for commenting on this one.. looking through the rest of the exclusive handling it appears these new modes aren't actually tested either which is troublesome, so let me go add that and verify the right thing's happening here.. the behaviors are a little different given old method exclusive = allocation only.. Might defer some of the refactoring to one of the followon's. I think more of this needs to be in the script adapter base class and just do more normalization upstream, and get it setup for a more layered approach to add on the step keys in the header/writescript/submit funcs and only handle the batch block source in the init.. ugh, i've made a mess, lol..

Comment on lines 428 to 455
# Normalize the allocation args to api flux.job.JobspecV1 expects
packed_alloc_args = self._interface.pack_addtl_batch_args(self._allocation_args)
# print(f"{normalized_alloc_args=}")

# Setup placholder for the queue/bank attributes if no already added by user
# in the allocation_args
# NOTE: should we flatten these treedict style? conf looks like no treedict, but
# the others look like they support it even via python api
# if "system" not in normalized_alloc_args["attributes"]:
# normalized_alloc_args["attributes"]["system"] = {}

# Add queue and bank
queue = self._batch["queue"]
if queue == "":
queue = None
bank = self._batch["bank"]
if bank == "":
bank = None

# if self._batch["queue"]:
# normalized_alloc_args["attributes"]["system"]["queue"] = self._batch["queue"]
# if self._batch["bank"]:
# # TODO: revisit whether it makes sense to add bank if queue is empty ->
# # nested brokers usually have neither, and bank falls through silently..
# normalized_alloc_args["attributes"]["system"]["bank"] = self._batch["bank"]

# pprint(f"Packed alloc args for {step.name}:")
# pprint(packed_alloc_args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Normalize the allocation args to api flux.job.JobspecV1 expects
packed_alloc_args = self._interface.pack_addtl_batch_args(self._allocation_args)
# print(f"{normalized_alloc_args=}")
# Setup placholder for the queue/bank attributes if no already added by user
# in the allocation_args
# NOTE: should we flatten these treedict style? conf looks like no treedict, but
# the others look like they support it even via python api
# if "system" not in normalized_alloc_args["attributes"]:
# normalized_alloc_args["attributes"]["system"] = {}
# Add queue and bank
queue = self._batch["queue"]
if queue == "":
queue = None
bank = self._batch["bank"]
if bank == "":
bank = None
# if self._batch["queue"]:
# normalized_alloc_args["attributes"]["system"]["queue"] = self._batch["queue"]
# if self._batch["bank"]:
# # TODO: revisit whether it makes sense to add bank if queue is empty ->
# # nested brokers usually have neither, and bank falls through silently..
# normalized_alloc_args["attributes"]["system"]["bank"] = self._batch["bank"]
# pprint(f"Packed alloc args for {step.name}:")
# pprint(packed_alloc_args)
# Normalize the allocation args to api flux.job.JobspecV1 expects
packed_alloc_args = self._interface.pack_addtl_batch_args(self._allocation_args)
# Add queue and bank
queue = self._batch["queue"]
if queue == "":
queue = None
bank = self._batch["bank"]
if bank == "":
bank = None

Copy link
Member

@bgunnar5 bgunnar5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants