-
Notifications
You must be signed in to change notification settings - Fork 44
Extra Flux Options #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Extra Flux Options #459
Conversation
…llocation control
…andling in flux interfaces
…gs. Sync up slurm with new syntax
docs/Maestro/scheduling.md
Outdated
| | Multi-letter | `--` | `=` | `setopt: {foo: bar} | `--setopt=foo=bar` | | ||
| | Boolean flag w/key | as above | as above | `setopt: {foobar: } | `--setopt=foobar` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing backticks in the 4th columns here which is making the formatting on mkdocs all wonky
… through allocation and and launcher args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this branch is totally done yet but I had some time to do a review now.
| # Handle old scalar syntax which applied to allocatios only | ||
| if not isinstance(step_exclusive, dict): | ||
| return { | ||
| "allocation": step_exclusive, | ||
| "launcher": False, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to deprecate this in the future? If so, maybe we should add a warning log statement here to notify users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question. Should we deprecate this shorthand? It is more often useful at the allocation level (scheduling to core scheduled partitions being one of the few use cases I can think of), so maybe useful to keep it. But I've no strong opinions on it either way.
| return cls.known_alloc_arg_types | ||
|
|
||
| @classmethod | ||
| def addtl_alloc_arg_type_map(cls, option): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to rename this to make it more obvious that this is a getter method. Maybe get_addtl_alloc_arg_type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's probably good for this version. In the subsequent PR's that will generalize this, something a little more descriptive like 'get_normalized..' or something? though with some iteration to make it not 30 char's long..
| # for av_name, av_value in arg_value.items(): | ||
| # value_str = render_arg_value(av_name, av_value) | ||
| # yield "{prefix}{key}{sep}{value}".format( | ||
| # prefix=arg_info['prefix'], | ||
| # key=arg_key, | ||
| # sep=arg_info['sep'], | ||
| # value=value_str | ||
| # ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needed? Same question goes for render_arg_value() above
| addtl_batch_args = {} | ||
|
|
||
| # May want to also support setattr_shell_option at some point? | ||
| for batch_arg_type in ["attributes", "shell_options", "conf"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use cls.addtl_alloc_arg_types instead of hardcoded list unless you have a specific reason for this?
| if conf_dict: | ||
| LOGGER.warn("'conf' options not currently supported with " | ||
| " nested=False. Ignoring.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a flux limitation or a limitation we're enforcing ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so much a limitation as part of the design. There's no argument for passing it in like with the from_nest and from_batch command methods, likely because unlike those two, this one doesn't create a nested broker to attach it to.
Longer term I think this option in Maestro will may get deprecated unless we find a good use case for it; quite different behavior from the from_nest_command option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, that makes sense
| self._attr_prefixes = ['S', 'setattr'] | ||
| self._opts_prefixes = ['o', 'setopt'] | ||
| self._conf_prefixes = ['conf'] # No abbreviated form for this in flux docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these used? Seems like the same thing as the _allocation_args_map
| "nodes": f"{self._flux_directive}" + "-N {nodes}", | ||
| # NOTE: always use seconds to guard against upstream default behavior changes | ||
| "walltime": f"{self._flux_directive}" + "-t {walltime}s", | ||
| "queue": f"{self._flux_directive}" + "-q {queue}", | ||
| "bank": f"{self._flux_directive}" + "--bank {bank}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think you need '+' in these strings; can just use f"{self._flux_directive} -N {nodes}" and similar entries for walltime, queue, and bank
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more to defer them -> walltime, nodes, .. (and later queue/bank) are step specific values vs the flux_directive being adapter specific. The + joined strings get rendered later when writing out the batch script for individual steps.
| # TODO: add better mechanism for tracking whicn args | ||
| # actually get used; dicts can't do this.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataclass use-case? 👀
| # Handle the exclusive flags, updating batch block settings (default) | ||
| # with what's set in the step | ||
| step_exclusive_given = "exclusive" in step.run | ||
| step_exclusive = self._exclusive | ||
| if step_exclusive_given: | ||
| # Override the default with this step's setting | ||
| step_exclusive.update(self.get_exclusive(step.run.get("exclusive", False))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nearly identical code is in get_header and get_parallelize_command. Consider moving logic to a method to avoid duplication.
|
|
||
| # Set up the output directory. | ||
| out_dir = environment.remove("OUTPUT_PATH") | ||
| if output_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should output_path be a default argument output_path=None since we have this check?
…eeding different formats
…gs ended up in completely different subtrees
… normalization of alloc args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A lot of these suggestions are just removing commented code and print statements. Really nothing major
docs/Maestro/scheduling.md
Outdated
| | **Key Type** | **Prefix** | **Separator** | **Example YAML** | **Example CLI Input/Directive** | | ||
| | :- | :- | :- | :- | :- | | ||
| | Single letter | `-` | `" "` (space) | `o: {bar: 42}` | `-o bar=42` | | ||
| | Multi-letter | `--` | `=` | `setopt: {foo: bar}` | `--setopt=foo=bar` | | ||
| | Boolean flag w/key | as above | as above | `setopt: {foobar: }` | `--setopt=foobar` | | ||
| | Boolean flag w/o key | as above | as above | `exclusive: ` | `--exclusive` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Example YAML Column here, you may want each of these to be multi-line instead like users are used to seeing? I've no strong opinion on this, just thinking from users POV.
I've done this in a table before with html tags. So for that first example it'd become:
<pre><code><span>o:</span></br><span> bar:</span></br><span> 42</span></code></pre>
Ugly in raw text but formats nicely in the table.
docs/Maestro/scheduling.md
Outdated
| ### Extra Flux Args | ||
| ---- | ||
|
|
||
| As of :material-tag:`1.1.12`, the flux adapter takes advantage of new argument pass through for scheduler options that Maestro cannot abstract away. This is done via `allocation_args` and `launcher_args` in the batch block, which expand upon the previous `args` input which only applied to `$(LAUNCHER)`. There are some caveat's here due to the way Maestro talks to flux. The current flux adapters all use the python api's from Flux to build the batch jobs, with the serialized batch script being serialized separately instead of submitted directly as with the other schedulers. A consequence of this is the `allocation_args` map to specific call points on that python api, and thus the option pass through is not quite arbitrary. There are 4 currently supported options for allocations which cover a majority of usecases (open an issue and let us know if one you need isn't covered!): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| As of :material-tag:`1.1.12`, the flux adapter takes advantage of new argument pass through for scheduler options that Maestro cannot abstract away. This is done via `allocation_args` and `launcher_args` in the batch block, which expand upon the previous `args` input which only applied to `$(LAUNCHER)`. There are some caveat's here due to the way Maestro talks to flux. The current flux adapters all use the python api's from Flux to build the batch jobs, with the serialized batch script being serialized separately instead of submitted directly as with the other schedulers. A consequence of this is the `allocation_args` map to specific call points on that python api, and thus the option pass through is not quite arbitrary. There are 4 currently supported options for allocations which cover a majority of usecases (open an issue and let us know if one you need isn't covered!): | |
| As of :material-tag:`1.1.12`, the flux adapter takes advantage of new argument pass through for scheduler options that Maestro cannot abstract away. This is done via `allocation_args` and `launcher_args` in the batch block, which expand upon the previous `args` input which only applied to `$(LAUNCHER)`. There are some caveat's here due to the way Maestro talks to flux. The current flux adapters all use the python api's from Flux to build the batch jobs, with the serialized batch script being serialized separately instead of submitted directly as with the other schedulers. A consequence of this is the `allocation_args` map to specific call points on that python api, and thus the option pass through is not quite arbitrary. There are 4 currently supported options for allocations which cover a majority of usecases (open an issue and let us know if there's one you need that isn't covered!): |
docs/Maestro/scheduling.md
Outdated
| resource.rediscover: "true" # Use string "true" for Flux compatibility, not "True" or bool True | ||
| launcher_args: | ||
| setopt: | ||
| optiona: # Boolean flag, no value needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be "option a" or "optional"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optiona; just a made up name for some random option to demonstrate the syntax mapping from yaml to batch script
| #flux: --setattr=foobar=whoops | ||
| #flux: --conf=resource.rediscover=true | ||
| flux run -n 1 -N 1 -c 1 --setopt=optiona myapplication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question, "option a" or "optional"? From this context I'm assuming the former in which case we may want to change the naming convention for clarity?
| .. note:: | ||
| Should we have an enum for these or something vs random strings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. Could probably help with safeguarding against unsupported options
| # if addtl: | ||
| # args.append("-o") | ||
| # args.append(",".join(addtl)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # if addtl: | |
| # args.append("-o") | |
| # args.append(",".join(addtl)) |
| if self._allocation_args: | ||
| self._allocation_args = self._interface.normalize_additional_args(self._allocation_args) | ||
| # pprint(f"{self._allocation_args=}") | ||
|
|
||
| if self._launcher_args: | ||
| self._launcher_args = self._interface.normalize_additional_args(self._launcher_args) | ||
| # pprint(f"{self._launcher_args=}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if self._allocation_args: | |
| self._allocation_args = self._interface.normalize_additional_args(self._allocation_args) | |
| # pprint(f"{self._allocation_args=}") | |
| if self._launcher_args: | |
| self._launcher_args = self._interface.normalize_additional_args(self._launcher_args) | |
| # pprint(f"{self._launcher_args=}") | |
| if self._allocation_args: | |
| self._allocation_args = self._interface.normalize_additional_args(self._allocation_args) | |
| if self._launcher_args: | |
| self._launcher_args = self._interface.normalize_additional_args(self._launcher_args) |
| # "Flux URI must be specified in batch or stored in the " | ||
| # "environment under 'FLUX_URI'") | ||
|
|
||
| # NOTE: Host doesn"t seem to matter for FLUX. sbatch assumes that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This __init__ method is getting pretty large. May want to consider splitting into separate methods in the future
|
|
||
| alloc_eflags = [self._allocation_args.pop(ekey, None) for ekey in exclusive_keys] | ||
| if alloc_eflags: | ||
| if step_exclusive: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be if step_exclusive and self._exclusive['allocation'] like the launcher section below? If so, you may just want a common method for this shared logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for commenting on this one.. looking through the rest of the exclusive handling it appears these new modes aren't actually tested either which is troublesome, so let me go add that and verify the right thing's happening here.. the behaviors are a little different given old method exclusive = allocation only.. Might defer some of the refactoring to one of the followon's. I think more of this needs to be in the script adapter base class and just do more normalization upstream, and get it setup for a more layered approach to add on the step keys in the header/writescript/submit funcs and only handle the batch block source in the init.. ugh, i've made a mess, lol..
| # Normalize the allocation args to api flux.job.JobspecV1 expects | ||
| packed_alloc_args = self._interface.pack_addtl_batch_args(self._allocation_args) | ||
| # print(f"{normalized_alloc_args=}") | ||
|
|
||
| # Setup placholder for the queue/bank attributes if no already added by user | ||
| # in the allocation_args | ||
| # NOTE: should we flatten these treedict style? conf looks like no treedict, but | ||
| # the others look like they support it even via python api | ||
| # if "system" not in normalized_alloc_args["attributes"]: | ||
| # normalized_alloc_args["attributes"]["system"] = {} | ||
|
|
||
| # Add queue and bank | ||
| queue = self._batch["queue"] | ||
| if queue == "": | ||
| queue = None | ||
| bank = self._batch["bank"] | ||
| if bank == "": | ||
| bank = None | ||
|
|
||
| # if self._batch["queue"]: | ||
| # normalized_alloc_args["attributes"]["system"]["queue"] = self._batch["queue"] | ||
| # if self._batch["bank"]: | ||
| # # TODO: revisit whether it makes sense to add bank if queue is empty -> | ||
| # # nested brokers usually have neither, and bank falls through silently.. | ||
| # normalized_alloc_args["attributes"]["system"]["bank"] = self._batch["bank"] | ||
|
|
||
| # pprint(f"Packed alloc args for {step.name}:") | ||
| # pprint(packed_alloc_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Normalize the allocation args to api flux.job.JobspecV1 expects | |
| packed_alloc_args = self._interface.pack_addtl_batch_args(self._allocation_args) | |
| # print(f"{normalized_alloc_args=}") | |
| # Setup placholder for the queue/bank attributes if no already added by user | |
| # in the allocation_args | |
| # NOTE: should we flatten these treedict style? conf looks like no treedict, but | |
| # the others look like they support it even via python api | |
| # if "system" not in normalized_alloc_args["attributes"]: | |
| # normalized_alloc_args["attributes"]["system"] = {} | |
| # Add queue and bank | |
| queue = self._batch["queue"] | |
| if queue == "": | |
| queue = None | |
| bank = self._batch["bank"] | |
| if bank == "": | |
| bank = None | |
| # if self._batch["queue"]: | |
| # normalized_alloc_args["attributes"]["system"]["queue"] = self._batch["queue"] | |
| # if self._batch["bank"]: | |
| # # TODO: revisit whether it makes sense to add bank if queue is empty -> | |
| # # nested brokers usually have neither, and bank falls through silently.. | |
| # normalized_alloc_args["attributes"]["system"]["bank"] = self._batch["bank"] | |
| # pprint(f"Packed alloc args for {step.name}:") | |
| # pprint(packed_alloc_args) | |
| # Normalize the allocation args to api flux.job.JobspecV1 expects | |
| packed_alloc_args = self._interface.pack_addtl_batch_args(self._allocation_args) | |
| # Add queue and bank | |
| queue = self._batch["queue"] | |
| if queue == "": | |
| queue = None | |
| bank = self._batch["bank"] | |
| if bank == "": | |
| bank = None | |
… defaults. Add additional script test case and slurm script tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Expands available options to pass through to flux, replacing the
argsdictionary in the batch block which only passed through shell options (--setopt/-o) to the launcher (flux run). Adds two dictionaries to the batch block:allocation_args: enable attaching arbitrary numbers of options to the allocation (jobspec) corresponding to the cli equivalents:-o/--setopt,-S/--setattr, and--conflauncher_args: replacesargsfor attaching to launcher, but enables passing through any option prefix, not just-o/--setoptDeprecates
args, but leaves it intact as the only option for older flux adapters (< 0.49) pending future removal.Adds flux directives with all allocation options to serialized step scripts to enable reproducibility and direct cli submission by users to match capability of other script adapters.
Exposes new exclusive syntax for step keys to enable separate allocation/launcher control: only flux uses the launcher option, other schedulers to follow.
Adds script writing and job submission tests for both flux and slurm.