-
Notifications
You must be signed in to change notification settings - Fork 44
Slurm Script Adapter Bug Fix #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# If neither Procs nor Nodes exist, throw an error | ||
procs = resources.get("procs") | ||
nodes = resources.get("nodes") | ||
if procs == None and nodes == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition won't work because procs
and nodes
are never set to None
. You also can't achieve that with the get
call because they will exist in the resources as empty strings (""
). Try this conditional: if not procs and not nodes:
-- this clause will work because Python equates ""
to a "False" value in an if
clause.
"present for Script to proceed." | ||
LOGGER.error(err_msg) | ||
rt_err_msg = "No explicit resources specified. At least one of " | ||
"Nodes or Procs must be set to a non-zero value." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message is fine but should be made clearer actually. When we talked earlier, I forgot to mention that we can use the step's entries to write a specific error message. In this case, make sure that nodes and procs are capitalized exactly as they are in the specification. You can also use the step to print the name of the step that's missing these entries so that the user knows exactly where to look.
"to be present for Script to proceed.".format(step.name) | ||
LOGGER.error(err_msg) | ||
rt_err_msg = "No explicit resources specified in {}. At least one of " | ||
"(nodes) or (procs) must be set to a non-zero value.".format(step.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I entirely mis-typed. When I said parentheses I totally meant quotations. Sorry about that. You can also use the same error message for both the logging and exception.
' value.'.format(step.name) | ||
LOGGER.error(err_msg) | ||
raise RuntimeError(err_msg) | ||
elif procs or (procs and not nodes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional is slightly incorrect -- you need to check if procs
is in the batch
section, not if it's non-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my notes, the goal of this conditional check is because SBATCH ntasks is not present even when 'procs: 4' is in the step. So if procs is in the step, doesn't that mean it won't be in the batch section. In addition, procs is obtained via 'resources.get('procs') and resources is composed of batch and run dicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my notes, the goal of this conditional check is because SBATCH ntasks is not present even when 'procs: 4' is in the step. So if procs is in the step, doesn't that mean it won't be in the batch section. In addition, procs is obtained via 'resources.get('procs') and resources is composed of batch and run dicts.
The procs
key can be in both the batch and the step. So the only time that procs
part of the conditional will be false is if it's not set in either.
So I noticed that in the __init__
you also check for the procs
in the batch
section. With this conditional here, you will add procs
to the header twice. I don't think SLURM will throw an error, but the header should be as clean as possible.
In __init__
instead of adding the --ntasks
key, add it here. Simply save the --ntasks
header line in the __init__
so that it's in the same place as everything else. Something like:
def __init__(self, **kwargs):
...
ntask_header = "#SBATCH --ntasks={procs}"
...
A suggestion on the exception: move to be right after the chainmap construction -- that'll make it so the header throws the exception without the extra logic. If that exception case is try, we can't do anything.
After we do the loop for adding items to the header, nodes
will be there if it's specified (it's set to ""
by default) otherwise we won't have procs
in the header at all at this point. This is now where you should check for both the hint (procs in self._batch
) or if nodes
is not set (because we need to give SLURM some sort of resource to use). In theory, you can even move the exclusive
addition to be later as well so that you're handling resources in one block.
Does that make sense?
There's a subtle bug here too -- it looks like the |
In regards to |
@youngjeffrey -- Sorry, but stood up I mean when the |
In regards to my latest commit, |
Sorry it took me so long to get back, have been busy with other things. I found a case that doesn't work (see below). It turns out that the use of the
|
Do you think it's better to replace the |
Nodes now show up on the Later in the code, I slightly changed the conditional statement for adding |
Honestly, with how the |
We should also rebase your branch so that you get the latest tests, so that you can actually run the tests. |
|
@youngjeffrey -- I just made some commits to change the construction of the resources and capture if
|
…ptadapter.py, added extra conditional
…ption to get_header
7ef27c0
to
2939001
Compare
* Testing by adding comment * Initial commit, reformatted checking for Nodes and Procs in slurmscriptadapter.py, added extra conditional * Removed condition for checking Node, moved condition for raising exception to get_header * Changed code for condition that handles no nodes or procs * Tweaked the error message * Minor variable mispelling * Handled additional case of nodes absent and procs present to add procs to SBATCH * In the case of Nodes absence and Procs presence, it will be added to SBATCH * Nodes now appear in SBATCH header even if uninitialized in a step * Changed Chain Map to regular Dictionary * Remove extra loop and change resource construction. * Cleared styling issues, removed time stamped outputs from Samples directory * Some refactor clean up. Co-authored-by: Frank Di Natale <[email protected]>
* Addition of user enabled workspace hashing (#145) * Addition of hashing to Study parameterization. * Addition of the hashws option to argparse. * Addition of a warning note for users who use labels in steps. * Update setup.py to 1.1.4dev * More generalized FluxScriptAdapter (#149) * Addition of a more general flux ScriptAdapter. * Addition of some casting from int to str * Corrected "gpus" to "ngpus" * Rework jobspec construction to make a valid jobspec. * Check for empty value for cores per task. * README tweak to update quickstart link. (#139) * typos. fixes #141 (#154) * Update to setup.py to reflect dev version 1.0 * Correction to safe pathing for missed cases and make_safe_path enhancements. (#157) * Made pickle and log path string safe for pathing. * Tweaks to make_safe_path to include a base path. * Updates to make_safe_path usage * Correction to not modify the iterator copy. * Correction to fix the format of output status time to avoid a comma that breaks printing. (#160) * Addition of a utility function for formatting times to H:M:S * _StepRecord time methods now call the new utility function. * Tweaks to add days to the format to avoid 3 digit hours. * Tweak to formatting. * Made the day format more parsable. * Removal of _stage_linear since it is now not needed. (#156) * Removal of _stage_linear since it is now not needed. * Addition of linear LULESH samples. * Update the dev to 1.1. * Addition of pargs for passing parameters to custom parameter generation (#152) * Addition of a utility method to create a dictionary from a list of key-value pairs. * Addition of the pargs interface for passing parameters to custom parameter generation. * Addition of a Monte Carlo example that accepts pargs. * Addition of pargs check for dependency on pgen. * Addition of clearer error message for malformed parameters. * Update setup.py * Added confirmation message after launching a study (#163) * Addition of tag to LULESH git dependency. (#169) * Script Adapter Plugin (#167) (#170) Fixes #167 * added pytest to requirements added Pipfile and pipenv settings * Added property key to Abstract.ScriptAdapter (#167) Also added impementation and tests to verify that existing functionality isn't changed * updated factory to use key when registering adapters(#167) * cleanedup linelength * cleaned up imports to be specific to module (#167) * added tests to verify exception for unknown adapter * moved adapters tests to individual files * added test to verify scriptadapter functionality (#167) updated gitignore to have testing and pycharm ignores testing existing adapters in factory (#167) added test to verify factories.keys matches get_valid_adapters (#167) added copyright to file * updated __init__ modules to do dynamic includes * removed unneeeded imports * updated dependency versions * fixed all flake8 errors * updated to run flake8 and pytest when run locally * updated tests to have documentation about purpose and function as requested in #170 * fixed line length * Removal of nose from requirements. * updated to remove nose from the requirements * PyYAML vulnerability fix (#171) * Locking the version of PyYAML to be above 2.1 because of an arbitrary code execution vulnerability. * Addition of a version condition to pyyaml to patch a vulnerability. * Update of Pipfile.lock to match Pipefile. * Minor tweak to indentation for flake8 failure. * fixed pyyaml to requirements (#172) * Addition of a loader to the yaml load call. (#174) Fixes #173 * Addition of a loader to the yaml load call. * Addition of a catch if the loader attribute is missing. * Correction to install enum34 for Python versions < 3.4 (#176) * Moved enum34 to condition dependent on Python<3.4. * Addition of conditional enum34 install for requirements.txt. * Correction of requirements.txt syntax for python version. * Addition of a Dockerfile for tutorials and ease of trying out. (#178) * Addition of a Dockerfile for quick tutorials. * Tweaks for Docker and addition of git. * Tweak to Docker file for caching. * Addition of Docker documentation. * Tweaks to Docker documentation. * Removal of markdown ## * Take out shebang from shell definition and add it when script is written. (#181) * Take out shebang from shell definiton and at it when script is written. * Include shebang in cmd and fix format of string written to file. * Correction to message when stating no to launch. * Enhance shell batch setting to apply to scheduler scripts. (#183) * Extension of shebang feature to allow users to specify shells. * Addition of debug message to print kwargs. * Addition of kwargs. * Addition of basic batch settings to LULESH sample. * Addition of kwargs to Flux adapters. * Docstring tweaks. * Docstring update. * Fixes the addition of the shebang header for SLURM (#184) * Docstring correction for LocalAdapter. * Correction to addition of exec line at top of scripts. * Correction to an accidental reassignment of cmd. * Removal of an assignment of self._exec in SLURM adapter. * Change to transition adapter returns to Record objects. (#177) * Addition of a Record class for storing general data. * Addition of SubmissionRecord type. * Update to the order of for record parameters. * Changes to StepRecord to expect SubmissionRecord returns. * Updates to SLURM and local adapters to use SubmissionRecords. * Slight tweak to LocalAdapter docstring. * Tweak to have SubmissionRecord initialize its base. * Addition of CancellationRecord class. * Changes to CancellationRecord to map based on status. * Additional interface additions and tweaks. * Changes to have cancel use CancellationRecords. * Update to ExecutionGraph to use records. * Updates to SLURM and local adapters to use SubmissionRecords. * Slight tweak to LocalAdapter docstring. * Addition of CancellationRecord class. * Additional interface additions and tweaks. * Changes to have cancel use CancellationRecords. * Cherry pick of execution commit. * Removal of redundant "get" definiton. * Addition of a SLURM enabled LULESH sample specification. * Addition of output for stdout and stderr for Local adapter. * Correction of file to open. * Addition of 3.7 to testing stack. * Added 3.7 to tox.ini. * Removal of py37 in testing. * Addition of build status badge. * Update SLURM sample spec to add missing walltime. * Addition of documentation that covers the set up of a simple study (#168) * Addition of simple Hello World spec. * Addition of basics page to index. * Addition of hello_world documentation. * Additions to hello_world. * More documentation in single step section. * Continued edits to Hello World. * Addition of parameter section. * Addition of a note about %% token. * Addition of directory structure. * Continuation of parameter documentation. * Removal of the depends key. * Addition of the env section description. * Addition of a link to Docker documentation for Dockerfiles. * Addition of single parameter hello world. * Correction of double colons. * Correction of indentation. * Addition of print out to verify output. * Addition of sample specifications for multi and single params. * Addition of more documentation for single param. * Additional output to show parameter results. * Correction to formatting. * Addition of samples. * Addition of simple Hello World spec. * Addition of basics page to index. * Addition of hello_world documentation. * Additions to hello_world. * More documentation in single step section. * Continued edits to Hello World. * Addition of parameter section. * Addition of a note about %% token. * Addition of directory structure. * Continuation of parameter documentation. * Removal of the depends key. * Addition of the env section description. * Addition of a link to Docker documentation for Dockerfiles. * Addition of single parameter hello world. * Correction of double colons. * Correction of indentation. * Addition of print out to verify output. * Addition of sample specifications for multi and single params. * Addition of more documentation for single param. * Additional output to show parameter results. * Correction to formatting. * Updates to docstrings for data structures. * Updates to clear Sphinx warnings. * Removal of escape on the *args becuase of flake8 failure. * Clean up of existing hello world specs. * Addition of multistep example spec. * Removal of * to fix sphinx errors. * Correction to some docstrings. * Tweaks to specs for consistent naming. * Finished multi-step parameterized example. * Tweaks to hello world docs. * Addition of link to examples on GitHub. * Correction of link to examples. * Correction of link to examples (again). * Removal of Pipfile.lock. * Additions to gitignore for vscode and pipenv. * Marking for v1.1.4 release. * Corrected a missed merge for release v1.1.4 * Extend the Specification interface to break out loading from streams. (#198) * Closes #198 * Addition of loading specification "from_str". * Updates to Specification docstrings. * Updates to abstract Specification to change from str to stream. * Updates to YAMLSpecification to use the new stream API. * Removal of IOString * Update to the YAMLSpecification load stream method. * Quickfix: Addition of the accidental removal of the path member variable. * Updating the version to 1.1.5dev (forgotten previously). * Correction to versioning for install. * Addition of version information to package and command line (#205) * Addition of version information. * Tweak to have setup.py pull from __version__ * Addition of command line arg to print version. * Pinning version for release 1.1.5 * Addition of 1.1.5a to line up with PyPi labeling. * Increment up to get rid of a0 * Addition of a simple example and logo to the README. (#208) * Addition of the Maestro logo. * Logo and hello world addition. * Addition parameter section. * Slight tweak to parameter section. * Addition of a reference to samples folder. * Update __init__.py to tick version to dev version. * Enhances pgen to have access to environment specified in a study. (#209) * Add the OUTPUT_PATH and SPECROOT to kwargs for pgen. * Addition of the spec constructed environment. * Remove "study_env" from pgen kwargs. * Update to pgen function parameters. * Update lulesh examples to have pgen vars. * Correction to docstring ordering. * Updates to add scheduled workflows and reorganize. * Correction to HPC wikipedia link. * Updates to the classifiers for setup.py * Addition of long text setting. * Correction of missing quote. * Drop support for Python 3.4 (#218) * Removal of enum34 and Python 3.4 classifiers. PyYAML no longer supports Python 3.4 which is forcing Maestro to also drop support as it has a direct dependency. * Addition of python requirements, download url, and maintainer. * Re-add py2.7. Note: py2.7 unofficially supported. * Re-add enum34 for py2.7. * Removal of py34 from tox tests. * Removal of py3.4 from travis. * Applies workspace substitution to the restart command. (#219) Fixes #217 * Sub in the new restart command. * Addition of restart workspaces to sub. * Fix for WORKSPACE substitutions into restart. * Correction to override restart instead of cmd. * Test/interfaces/lsf (#215) * Implementation of a ScriptAdapter for the IBM LSF Scheduler. Initial implementation of an LSF adapter. Addition of LSF to the interface factory. Correction to time format LSF correction. Tweak to correct for casting Further tweaks to the LSFScriptAdapter Adjustments to the states the LSF adapter can return. More tweaks to LSF states and status checking. Update to the batch setting docstring Tweak to make wallclock time entries two digits Bugfix to the previous commit. Signed-off-by: Francesco Di Natale <[email protected]> * Addition of GPU support. * Addition of a cancel method. * Addition of reservation submissions. * Tweak to use the -nrs flag for jsrun. * Changes to resource allocation parameters. * Removal of some batch headers for LSF. * Tweak to error code for NOJOBS status. * Tweak to skip lines that are part of prev status line. * Correction of --nrs * Correction of task batch key to nodes. * Tweaks to _substitute_parallel_command. Now only pass in step.run by copy and append the popped keys as step resources "snodes" and "sprocs". * Correction to LSF adapter to correct node specification. * Tweaks to checking status of LSF jobs. A tweak to formatting of the output for bjobs. With the new format we get termination reason, which allows us to check for a timed out status. * Correction to bjobs formatting and nojob check. * Corrections to how nodes and procs are being passed. * Correction of the bkill command with multiple job ids. * Tweaks to check_jobs for LSF adapter. * Implementation of a ScriptAdapter for the IBM LSF Scheduler. Initial implementation of an LSF adapter. Addition of LSF to the interface factory. Correction to time format LSF correction. Tweak to correct for casting Further tweaks to the LSFScriptAdapter Adjustments to the states the LSF adapter can return. More tweaks to LSF states and status checking. Update to the batch setting docstring Tweak to make wallclock time entries two digits Bugfix to the previous commit. Signed-off-by: Francesco Di Natale <[email protected]> * Addition of GPU support. * Addition of a cancel method. * Tweak to use the -nrs flag for jsrun. * Changes to resource allocation parameters. * Removal of some batch headers for LSF. * Tweak to error code for NOJOBS status. * Tweak to skip lines that are part of prev status line. * Correction of task batch key to nodes. * Tweaks to _substitute_parallel_command. Now only pass in step.run by copy and append the popped keys as step resources "snodes" and "sprocs". * Correction to LSF adapter to correct node specification. * Tweaks to checking status of LSF jobs. A tweak to formatting of the output for bjobs. With the new format we get termination reason, which allows us to check for a timed out status. * Correction to bjobs formatting and nojob check. * Corrections to how nodes and procs are being passed. * Correction of the bkill command with multiple job ids. * Tweaks to check_jobs for LSF adapter. * Addition of LSF to key for LSFScriptAdapter. * Correction of lsf key. * Addition of a debug statement to catch status command. * Correction of bjobs command. * Additions to status checks. * Rearraging some debug logging. * Testing to see if .split works. * Further LSF tweaks. * Revert back to split with strip. * Removal of -q option due to excessive filtering. * Correction of a missed merge * Correction to use new Records structures. * Style fix for line 207. * Correction to SubmissionRecord creation. * Decode output in check_status to enforce str type. * Decode output in submit. * Addition of retcode to a logger statement. * Sets log and err out for SLURM to parameterized step name (#220) Fixes #213 * First attempt at log name fix. * Correction to header formatting. * Update to dev0 to differentiate for pre-release. * Ticked up version to 1.1.7dev1. * Update Maestro logo link to full link for PyPi. * An update to Maestro's description. * Adding Neptune to the list on Planets (#222) Signed-off-by: Adrien M. Bernede <[email protected]> * Small README tweak. * Updated the study step name in the README.md file (#227) * Updated the package version in the Sphinx docs (#229) * Added a link to Maestro's documentation (#231) * Added a link to the documentation in the README.md * Added a documentation section to the README.md * Improve the performance of expansion (#232) * Addition of override for ExectionGraph to not check cycles. * Addition of documentation for justification of override. * Addition of a newline due to style. * Addition of dill as a dependency. * Fix pickle and unpickle to use dill. * Updated the description in the setup.py file (#233) * Added dill to the requirements.txt file (#235) * Fix to add PID to local log names. (#241) * Refactor to move DAG operations to the back end Conductor (#242) * Removal of SimObject references. * Addition of PickleInterface. * Derive Study and ExecutionGraph from PickleInterface. * Some style clean up. * Clean up unused dill import. * Checking in to develop on another machine. * Start of pre-check. * Removal of precheck. * Tweaks to Maestro and Conductor. * Initial interface and refactor to Conductor class. * Refactor of monitor_study * Tweaks to backend conductor to use Conductor class. * Tweaks to Maestro frontend to use Conductor class. * Minor bug fixes in Conductor class. * Minor tweaks to user cancel in Conductor class. * Port status to the Conductor class. * Continued additions to port to Conductor class. * Slight fix to fix flake8 violation. * Removal of named argument *, python2.7 does not support it. * Refactor to remove parser and logging from Conductor class. * Style clean up to fix flake8 errors. * Updates to the docstrings for PickleInterface. * Updates to the Conductor docstrings. * Small flake8 error fix. * Added pre-commit to enable flake8 checks (#244) Added pre-commit to enable flake8 checks before a commit is accepted. Also reordered requirements.txt to more easily determine which are for development. * Bugfix for logging that didn't appear in submodules (#247) * Improved logging setup. * Transition to a LoggerUtil class. * Addition of docstring to LoggerUtility + cleanup. * Added spec verification via jsonschema added checks for valid keys branch updates working on validation added schema file updates fixed spec fixed spec added jsonschema to deps updates ran black on yamlspecification.py specified newest jsonschema version added manifest added include_package_data to setup.py reformatted json experimental package_data fixed path fixed path fixed path again reverted newline added check for empty strings reworked exception logic implemented reviewer suggestions, shifted exception logic, renamed redundant variables renamed variable removed unused import added missing `self.verify_environment()` call Co-Authored-By: Francesco Di Natale <[email protected]> paths and git dependencies are now array types Co-Authored-By: Francesco Di Natale <[email protected]> removed redundant logic swapped number type to integer moved env schema validation to top, which avoids some types of ambiguous errors removed test yaml removed some additionalProperties restrictions unknown commit removed debug print * Reformatted and added color to logger messages. (#252) Closes #248 Added color to logging and converted some info messages to debug. added colors and cleaned up logger corrected formatting added to dependencies reverted message log level change added debug format debug logging format now works flake8 fix * Bug fix for unintentional variable section requirement. (#256) Closes #255. A bug fix that corrects an unintentional assumption that the variable section in a specification will always exist. * Update to broken venv documentation link. * Addition of a simple dry-run capability. (#259) * Addition of a simple dry-run capability. * Addition of a DRYRUN state. * Tweak to reduce sleep time for dry run. * Renamed dryrun to dry to reduce redundancy. * Enable autoyes when dry running is invoked. * enabled raw sbatch errors to be logged (#262) * enabled raw sbatch errors to be logged * tweaks/correction suggested by Frank * reduced line length * fixed flake8 error in slurm-adapter * Tweaks and fixes to the SLURM header. (#263) * Tweaks and fixes to the SLURM header. Adds the ability to specify GPUs, fixes reservation pairing with accounts, and now uses a ChainMap to handle the internal key conflicts between the batch and step settings. Also introduces the exclusive key. Changed to full parameter names for clarity. * Addition of chainmap package for python2.7 * A check to see if procs is included in the header. Fixes #234 and includes ntasks in the header if the batch section includes the key. * modified executiongraph to round datetimes to nearest second * Adds a check for UNKNOWN state. (#266) Fixed #264 -- When testing #263, SLURM ran out of memory during the completing stage and aborted the jobs and left the job in an unknown state. This PR fixes this issue by defaulting to a failure when the status is found to be UNKNOWN. * Adds a check for UNKNOWN state. * Correction of bad variable name "state" * Tweak to treat UNKNOWN as failed. * Change marking of failed state to unknown. * Some fixes for style and credit from #265 * Official 1.1.7 release. * Official start to 1.1.8dev0. * Addition of README as long description. (#269) * Addition of README as long description. * Dropping encoding as it's not supported in 2.7 * Pgen docs (#275) * Initial pgen docs with itertools example * Add pargs example * Add pgen using numpy plus helper function for 1D distribution * Fix typos, update image * Initial port of complete Parameters documentation * Fix up api docs warnings, add missing script adapters * Add subsection on accessing env block variables inside pgen * Literal imports and some style tweaks. * Some minor title and header tweaks. * Remove out of date notes * Renamed itertools_pgen to reference LULESH. * Update doc strings on parameter generator samples * Make flake8 happy * Misc cleanup and formatting, adding more links and internal references * Fix up section listing, in-text moniker/function formatting Co-authored-by: Frank Di Natale <[email protected]> * Replace unicode quote with ascii quote (#277) * Updated the docs release version to 1.1.7 (#278) * Modified validation logic to skip over variable tokens. (#279) Co-authored-by: Francesco Di Natale <[email protected]> * Inheritable validation and spec module (#280) * added checks for valid keys * branch updates * working on validation * added schema file * updates * fixed spec * fixed spec * refactor updates * verification logic * removed test field * made specification module * removed TODO * fixed flake8 style * schema name is now hardcoded * updated MANIFEST * adjusted imports * tuples -> lists Co-authored-by: Francesco Di Natale <[email protected]> * Tick up to 1.1.8 patch version. * Correction of package data path to schema. * whitespace fixes from pre-commit checks * Start of version to 1.1.9dev0 * Updated the docs version to 1.1.8 (#288) * Added a link to the maestro sheetmusic repo and added descriptions for the (#287) documentation links * Added introduction documentation from the README.md file (#289) * Docs/formatting (#296) * added some missing spaces in multiline text strings * added some missing spaces in multiline text strings * Update parameters.rst (#298) * Update parameters.rst Make as explicit as possible that if you run with PGEN, then just comment out the global.parameters block in your YAML spec file. * Update parameters.rst * Update parameters.rst * Update parameters.rst Made changes according to Francesco DiNatale's suggestion. * Correct documentation compilation warnings (#299) * Updates to correct indentation errors. * Addition of newly compile docs. * Indendations added to Record.get docstring. * Bugfix: Division by string instead of number (#302) When computing CPUs per node on LSF. * Addition of processing multiple directories and globs for status (#301) * Addition of processing multiple directories and globs for status * Minimal edits after first commit * Shortened code to allow more glob functionality * Removed glob, added nargs+ to conductor.py, rearranged code in maestro.py * Removed some redundant code * Reduced more code. Works for multiple directory inputs (* included). Absolute path added * Added refinement features to allow ease of interpreting results * Additional refinement. No more logging for status command. * Flake8 and minor style tweaking * Updated the documentation to pull the recent version (#306) * Updated the Sphinx documentation to pull Maestro’s version from the package * Updated formatting for flake8 check * Updated the documentation for installing Maestro (#308) * Updated the Dockerfile and documentation (#312) * Updated the documentation for building the Docker image and updated the Dockerfile to use Python 3 * Updated the Dockerfile format * Removal of Python 2.7 and addition of newer Python 3 versions (#315) * Removal of py27 and additions of py37+8 * Removal of 3.7 from travis * Tweak to remove py37 matrix * simple bug fix in the YAMLSpec description setter (#319) * Conversion to Poetry Build System (#316) * added pytest and coverage settings * initial change to poetry * remove setup.py and convert to pyproject.toml * fixes for tox and travis pipelines * Changes to support python >3.4 and formatting updates. * adding athey1 as a maintainer * fixed tab spacing * fixed spacing * added jessica as a maintainer * added jeremy as a maintainer * YAMLSpecification Testing (#317) * initial testing on loading spec * black settings added * reformatted by black * added test for validation errors * added test to check for missing key in step * updated schema to add check for non-null variables * added check for study steps * added test for multiple dependencies of the same name * changed yaml spec testing for easier reading * added test of global parameters * fixed test execution to verify error thrown * added output_path test * formatting fix * added tests for steps generation and params generation * added tests for get_study_env * initial testing on loading spec * black settings added * reformatted by black * added test for validation errors * added test to check for missing key in step * updated schema to add check for non-null variables * added check for study steps * added test for multiple dependencies of the same name * changed yaml spec testing for easier reading * added test of global parameters * fixed test execution to verify error thrown * added output_path test * formatting fix * added tests for steps generation and params generation * added tests for get_study_env * updated to add testing for description setter * added test for spec.name setter * pulled spec_path code into a pytest fixture * added pytest and coverage config * added pytest-cov as a tox requirement * added report.xml to gitignore * changed line length for black * added coverage and junit reports to tox * Added coverage appending for test runs * Add name and real_name StudyStep properties. (#314) * Add name and real_name StudyStep properties. There now needs to be an aliasing of nickname with a step name when requested because adapters expect to use the name. StudyStep objects now return their nickname if the nickname is set. Objects such as the ExecutionGraph must now use the real_name for logistic tracking, while adapters can continue to use name. * Addition of extension property for Flux. * Some typo and string style tweaks. * Re-add setup.py to retain standard editable install. * Slurm Script Adapter Bug Fix (#305) * Testing by adding comment * Initial commit, reformatted checking for Nodes and Procs in slurmscriptadapter.py, added extra conditional * Removed condition for checking Node, moved condition for raising exception to get_header * Changed code for condition that handles no nodes or procs * Tweaked the error message * Minor variable mispelling * Handled additional case of nodes absent and procs present to add procs to SBATCH * In the case of Nodes absence and Procs presence, it will be added to SBATCH * Nodes now appear in SBATCH header even if uninitialized in a step * Changed Chain Map to regular Dictionary * Remove extra loop and change resource construction. * Cleared styling issues, removed time stamped outputs from Samples directory * Some refactor clean up. Co-authored-by: Frank Di Natale <[email protected]> * WIP: Add job identifier to status command (#323) * Job ID shows up in Maestro Status, edited executiongraph.py write_status function * Changed sleep time back to original. Job ID gets last index. * Fixed a casting issue (int to str). + style Co-authored-by: Frank Di Natale <[email protected]> * Fixes #324 (#325) * A fix to use communication instead of wait. (#328) * Updated README to correct badges. * Corrected badge colors for PePy * Removal of Python3.5 from testing (#339) * Removal of Python3.5 * Removal of Python 3.5 from testing. * Updates to the FluxScriptAdapter to use the latest Flux. (#251) * Updates to use new Flux. * Moves Flux imports to FluxScriptAdapter. * Updates to cancel to use the new Flux0.16.0 interface. * Added a check for the Flux URI in the environment. * Addition of Record classes to FluxAdapter. * Addition of a modified exec in Flux Adapter. * Correction to the shell header. * Add walltime to the header only when present. * Correction for checking the value of walltime instead of existence. * Tweak to overlook missing keys. * Modified checks for procs and cancellation return. * A calculation of a sensible cores per task when not specified. * Additional sensible set for non-values. * Use of direct import instead of __import__. * Correction to __import__ for Flux. * Readdition of core per task computation. * Tweak to pass nodes as tasks. * Updates to use new Flux. * Moves Flux imports to FluxScriptAdapter. * Updates to cancel to use the new Flux0.16.0 interface. * Added a check for the Flux URI in the environment. * Modified checks for procs and cancellation return. * Use of direct import instead of __import__. * Correction to __import__ for Flux. * Readdition of core per task computation. * Tweak to pass nodes as tasks. * Addition of _flux util module. * Updates to Flux interfacing. * Rename module to get rid of dots. * Tweak to import flux.constants * Renamed 0.16.0 module and added new status functions. * Switch jobs to a list, since ID is included. * Tweak to construct set of joined tuples. * Addition of missing flux.job import. * Tweak to cast set to a list. * Correction of a misnamed key reference. * Correction to incorrect add for attrs set. * Tweak to have the jobid of errored checks returned. * Addition of mapping statuses to strings. * Corrected bad reference to tuple. * Move the FluxInterface to abstracts. * Update flux 0.17.0 interface. * Move flux_0_16_0 to flux_0_17_0 to reflect targeted version. * Tweaks to fix an import error for flux.core.inner.raw * Tweaks to base FluxInterface with new API methods. * Addition of method to get latest interface. * Addition of new API calls to 0.17.0 * Tweaks to 0.17.0 * Addition of a missing comma. * Addition of a missing commas. * Swap out of status call. * Reintroduction of addtl args keyword. * Updates to header construction. * Updates to status checking. * Correction to submission record creation. * Updates to get_statuses. * Cast jobids back to a str. * Test instantiating new handle for status. * Correction to logging error call. * Addition of logging to status check. * Change to print exception as string. * Delayed check for results field to prevent key error. * Reverting to ints for job identifiers. * Readdition of resulttostr (accidental deletion). * Readded check to verify we reuse the flux handle. * Fix to a bad reference to old handle variable. * Addition of debug logging. * Addition of submission to FluxInterface. * Addition of submit to flux0_17_0 * Correction to FluxScriptAdapter to use new submit. * Shift to ABC subclass. * Removal of Singleton base class. * Correction of the FluxInterface name to remove _ * Fixed ngpus type in parameter list. * Fixed ngpus type in submit call. * Reordered improperly ordered return values. * Addition of debug logging for FluxInterace 0.17.0 * Correction to a bad key reference to "jobid" * Addition of missed completed state + correction to abbrev. * Removal of Singleton import for FluxInterface. * Flake8 fix. * Removal of 0.11.0 FluxInterface. * Addition of cancellation to FluxInterface. * Addition of a first pass cancel method. * Tweaks to change when sub-brokers are used. * Addition of cancel call to the interface. * Update to logger formatting. * Update to logger formatting to make verbose. * Additional debug logging for returncode. * Tweak to expected return code. * Tweaks to cancellation API * Addition of missed classmethod decorator. * Addition of statuses for unknown and file not found. * Addition of debug logging in Flux interface 0.17.0 * Correction to cast integers to str * Addition of naming for jobs. * Tweak to FluxAdapter to pass in job name. * Correction of a bug that forced GPUs to 0. * Addition of cores per task. * Removal of the Jobspec debug print. * Addition of a Flux sample spec. * Addition of a 0.18.0 Flux backend. * Tweak to add walltime to backend parameters for submit. * Removal of a debug print of command line. * Addition of docker related files to test flux. * Clean up of commented docker lines. * Addition of walltime support. * Updates to specification validation to support Flux. * Tweaks to Flux test specification. * Addition of TIMEOUT state. * Removal of verbose logging. * Check to make sure walltime is set to a value in 0.18.0 * Fix to correct inf walltime for Flux 0.19.0 * Additional debug logging. * Removal of verbose step logging. * Shift to using from_nest_command * Correction to put script path in list for JobSpec. * Tweak to correct from_nest_command call. * Fixes to catch failed Flux imports. * Removal of Python 3.5 from testing. * Removal of Python3.5 * Addition of ceiling gpu_per_slot calc. * Correction of bad attribute reference. * Addition of cancellation for multiple directories and globs. (#338) * Initial implementation of cancel glob. * Tweaks to prompt user for cancel. * update .coveragerc to ignore tox and venvs (#346) * Spectrum adapter remove/lingering in-class import. (#352) * Commit 38567e7: Spectrum adapter remove/lingering in-class import. Commit 51fb0fd: Remove Spectrum adapter tests Short term: This fixes a bug where a stray import prevented the FluxScriptAdapter from being pickled for interprocess communication. Long term: Cleans up the code from the stale Spectrum adapter and makes it so that we have less dead code. It prevents us having to waste time looking through code that is no longer used. Remove Spectrum adapter tests * Tweaks to fix imports. * Update docker instructions. * Module nicknaming to prevent aliasing. * Logging bugfix to include record name. * Limit to cryptography<3 for build (#353) Short-term goal: The goal for now was to get TravisCI testing Python3.6 again. It was previously pulling source and trying to build cryptography at a more recent version from PyPi. This fix makes a wheel version available so that the recent change to include rust doesn't prevent testing. Long-term: Eventually we will want to move to the most recent cryptography, but for now it breaks 3.6. The current specified schedule for 3.6 is that currently it is only receiving security updates until 12-2021 (when it will presumably be EOL). We will revisit this when we deprecate 3.6 support. * Update to python versions >=3.6 * Add support for Flux core 0.26.0 (#357) * flux: fix version typo in comment * flux: add 0.26.0 adaptor Notable differences from 0.18.0 version: - Use Flux's builtin statustostr & remove adaptor-local statustostr and attr list - Drop the individual calls to `job_id_list` for a single call to `JobList` * Removal of forced Flux default version. * Addition of [email protected] dockerfile. * Removal of unused cb_args variable. Co-authored-by: Francesco Di Natale <[email protected]> * Ticked up the version to 1.1.9dev1 * Flux update (#359) * flux0.26: add version checking during connection Problem: there are several versions of the Flux adaptor and it is very easy to use too new of an adaptor with too old of a Flux version (and vice-versa) Solution: check both the adaptor version and the Flux broker version when making a connection via a new Flux handle and ensure the versions match. If the adaptor version is newer, log an error. If the broker version is newer, log a debug message letting the user know they might benefit from choosing a newer adaptor. * flux0.26: handle the case where the ngpus argument is a string Problem: the `ngpus` argument to `submit` (as well as `parallelize`) can sometimes be a string (e.g., when it isn't the default of 0). This causes validation errors in the Flux jobspec class, which does runtime type checking of the value Solution: explicitly check if `ngpus` is a string, specifically a digit, and if so, convert it to an `int` before passing to Flux * flux0.26: fix number of slots for nested flux launches Problem: when launching with multiple tasks in a nested Flux instance (i.e., `force broker = True`), then the number of slots for the nested instance should be set as the number of tasks not the number of nodes * flux: make `force_broker` the default action Problem: when creating maestro specs that need to be portable across schedulers, it is assumed that a `${LAUNCHER}` call is needed in many scripts, but when `force_broker` is `False`, this results in a doubling up of the parallelism. The call to the `submit` function within the Flux adaptor launches multiple processes and then then `${LAUNCHER}` call within the script gets expanded by the `parallelize` function to also launch multiple processes. Solution: make `force_broker` the default behavior to maximize compatibilty with other schedulers. If users want to avoid the extra overhead of a nested Flux instance, they can always opt out of the default behavior with `force_broker = False`, and then elide their use of the `${LAUNCHER}` variable in the script. * Moved integer check to FluxScriptAdapter main class. * Move the version check to FluxInterface base class. * Update prior interfaces to use connect_to_flux method. * Fix for an empty gpu string in a specification. * Removal of gpu conversion in 0.26.0 Flux backend. Co-authored-by: Francesco Di Natale <[email protected]> * Update to fix the addition of -g erroneously (#361) * Removal of TravisCI due to transition to GitHub Actions. * Update pyproject version to match setup.py * Addition of a first pass GH Action linter. * Status tweaks (#358) * Add bfs/dfs ordered status output * Prototype rich formatted status tables with cli layout switch * Fix bfs ordering option, make it default * Remove unhelpful log output.. * Add rich dependency * Add rich dependency to setup.py * Quit whining flake8... * More flake8 drama.. * Cache status ordering to improve scalability * Add bfs/dfs ordered status output * Prototype rich formatted status tables with cli layout switch * Fix bfs ordering option, make it default * Remove unhelpful log output.. * Quit whining flake8... * More flake8 drama.. * Cache status ordering to improve scalability * Tweak to python version for rich. * Attach params to step records to enable use in status output * Add param name:value table in narrow status layout * Check for new status column to enable backwards compatibility * Checkpoint on renderer factory implementation * Cleanup after successful test * Remove extraneous debug logging * Fix bad indent, tweak themes to play nice on different terminal themes * Unit test for flat status layout * Fix erroneous whitespace * Add step root to workspace in status of parameterized steps * Rework status tests, add narrow layout test, rebaseline * Remove stray debug printing * Add help target and some documentation to the make file * Document layouts, add some test comments * Rename status renderer test file so pytest can find it automatically * Fix bug causing duplicate entries in narrow layout * Update narrow layout test baseline * Add layout screenshots to docs * Add legacy table format back to the status layout options * Convert base renderer to abstract, implement auto registration and auto layout cli choice list * Add some proper google style doc strings to render factory * Docstring, arg defaults change for narrow renderer * Sync up layout methods and doc strings of status renderers * Fix up docstrings on params for StepRecords * Fix incorrect return type documentation * Bug fixes/style tweaks on narrow status layout * Remove debug output * Add documentation for legacy status layout Co-authored-by: Francesco Di Natale <[email protected]> * Slurm QOS specification. (#365) * qos feature added for Slurm * revert naming of this file * Addition of reservation to bsub headers. (#367) * Additional flexibility to walltime parsing for Flux (#369) * Tweaks to make walltime more flexible. * Correction to walltime check in Flux backend. * Additional tweaks for robustness for convert walltime. * Allow more types for walltime to be specified. * Reintroduce conversion call. * Addition of Flux connection to try/catch (#375) * Avoid overriding previous lines in Lulesh example (#382) * Flux Adapter Clean up and Addition of Job Priority (#379) * Additional logic for walltime parsing. * Removal of forcing broker when multinode. * Renamed the use_broker setting to nested. * Addition of urgencies to job submission. * Correction to Urgency mapping. * Bugfix from isdigit to isnumeric. * Addition of from_str for StepUrgency * Added parsing of urgency to FluxAdapter. * Addition of a float type urgency. * Removed exception when finding entries that are not strings. * Update to add get_priority to SchedulerScriptAdapter * Addition of urgency mapping for Flux backend. * Update to the Flux adapter to support urgency modifications. * Rename StepUrgency to StepPriority * Pass on types that can't be substituted. * Removal of annotations to support 3.6 * Corrections to the passing of urgency values. * Change enums to lower case. * Addition of sensible capitalization of priority. * Tweaks to Flux example for new keys. * Removal of resource checks. * Addition of Workflow Community Initiative information. (#385) * Addition of icon form logo. * Addition of Workflow Community Initiative info. * Bump paramiko from 2.7.2 to 2.10.1 (#387) Bumps [paramiko](https://github.com/paramiko/paramiko) from 2.7.2 to 2.10.1. - [Release notes](https://github.com/paramiko/paramiko/releases) - [Changelog](https://github.com/paramiko/paramiko/blob/main/NEWS) - [Commits](paramiko/paramiko@2.7.2...2.10.1) --- updated-dependencies: - dependency-name: paramiko dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add black and update versioning. (#389) * Enable more complete configurability of jsrun launcher (#384) * initial patch for more complete configuration of jsrun launcher * First pass at documenting usage of the lsf/jsrun launcher * Add corresponding maestro steps for each jsrun variant * Add one example of a memory hungry application * Build table mapping jsrun to maestro step keys * Add binding controls, rename keys from snake case, document defaults * Fix up straggling snake case keys * Improve debugging info in schema error messages * Fix rs_per_node, make gpu binding optional since it's new in lsf 10.1 * Update binding flag in examples, add note about gpu binding availability * Add initial lsfscriptadapter tests * Initial pass at general batch block documentation * Remove old commentary * Remove unneeded nodes/procs math in jsrun launcher substitution * Remove unneeded loggin output * Remove more debugging log outputs * Update lsf examples to match json schema for resource specification keys * Cleanup the cpus per rs machinery, schema * Add openmp and mpi lulesh study to exercise lsf resource specification keys * Document the sample lsf lulesh specification * Updating dependency versions. (#392) * Correct conditional to correct empty string catch. (#393) * Move to purely poetry install (#394) * Removal of setup.py for poetry editable. * Addition of install testing. * Make python versions strings to avoid 3.1 Python version. * Version tick for dev2. * Ci/test and release (#396) * First pass at re-enabling tests in ci * Update to newer poetry gh action, tweak cache setting * Fix missing } * Fix incorrect gh action repository name * Disable venv cache * Test out doing flake8 linting with poetry * Revert flake8 linting to separate run for nicer reporting * Test reusable python matrix * Revert reuse test, sync up steps between pip/poetry * Remove missing dependency * force linting pass before running expensive install/pytest steps Co-authored-by: Elsa Gonsiorowski, PhD <[email protected]> Co-authored-by: Francesco Di Natale <[email protected]> Co-authored-by: jsemler <[email protected]> Co-authored-by: Kevin Athey <[email protected]> Co-authored-by: Joe Koning <[email protected]> Co-authored-by: Adrien Bernede <[email protected]> Co-authored-by: Kevin Athey <[email protected]> Co-authored-by: Bay <[email protected]> Co-authored-by: Benjamin Bay <[email protected]> Co-authored-by: crkrenn <[email protected]> Co-authored-by: Christopher R. Krenn <[email protected]> Co-authored-by: Jeremy White <[email protected]> Co-authored-by: Tanim Islam <[email protected]> Co-authored-by: Kenny Weiss <[email protected]> Co-authored-by: Jeffrey Mei <[email protected]> Co-authored-by: scottwedge <[email protected]> Co-authored-by: Stephen Herbein <[email protected]> Co-authored-by: Tobias Duswald <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Addition of user enabled workspace hashing (#145) * Addition of hashing to Study parameterization. * Addition of the hashws option to argparse. * Addition of a warning note for users who use labels in steps. * Update setup.py to 1.1.4dev * More generalized FluxScriptAdapter (#149) * Addition of a more general flux ScriptAdapter. * Addition of some casting from int to str * Corrected "gpus" to "ngpus" * Rework jobspec construction to make a valid jobspec. * Check for empty value for cores per task. * README tweak to update quickstart link. (#139) * typos. fixes #141 (#154) * Update to setup.py to reflect dev version 1.0 * Correction to safe pathing for missed cases and make_safe_path enhancements. (#157) * Made pickle and log path string safe for pathing. * Tweaks to make_safe_path to include a base path. * Updates to make_safe_path usage * Correction to not modify the iterator copy. * Correction to fix the format of output status time to avoid a comma that breaks printing. (#160) * Addition of a utility function for formatting times to H:M:S * _StepRecord time methods now call the new utility function. * Tweaks to add days to the format to avoid 3 digit hours. * Tweak to formatting. * Made the day format more parsable. * Removal of _stage_linear since it is now not needed. (#156) * Removal of _stage_linear since it is now not needed. * Addition of linear LULESH samples. * Update the dev to 1.1. * Addition of pargs for passing parameters to custom parameter generation (#152) * Addition of a utility method to create a dictionary from a list of key-value pairs. * Addition of the pargs interface for passing parameters to custom parameter generation. * Addition of a Monte Carlo example that accepts pargs. * Addition of pargs check for dependency on pgen. * Addition of clearer error message for malformed parameters. * Update setup.py * Added confirmation message after launching a study (#163) * Addition of tag to LULESH git dependency. (#169) * Script Adapter Plugin (#167) (#170) Fixes #167 * added pytest to requirements added Pipfile and pipenv settings * Added property key to Abstract.ScriptAdapter (#167) Also added impementation and tests to verify that existing functionality isn't changed * updated factory to use key when registering adapters(#167) * cleanedup linelength * cleaned up imports to be specific to module (#167) * added tests to verify exception for unknown adapter * moved adapters tests to individual files * added test to verify scriptadapter functionality (#167) updated gitignore to have testing and pycharm ignores testing existing adapters in factory (#167) added test to verify factories.keys matches get_valid_adapters (#167) added copyright to file * updated __init__ modules to do dynamic includes * removed unneeeded imports * updated dependency versions * fixed all flake8 errors * updated to run flake8 and pytest when run locally * updated tests to have documentation about purpose and function as requested in #170 * fixed line length * Removal of nose from requirements. * updated to remove nose from the requirements * PyYAML vulnerability fix (#171) * Locking the version of PyYAML to be above 2.1 because of an arbitrary code execution vulnerability. * Addition of a version condition to pyyaml to patch a vulnerability. * Update of Pipfile.lock to match Pipefile. * Minor tweak to indentation for flake8 failure. * fixed pyyaml to requirements (#172) * Addition of a loader to the yaml load call. (#174) Fixes #173 * Addition of a loader to the yaml load call. * Addition of a catch if the loader attribute is missing. * Correction to install enum34 for Python versions < 3.4 (#176) * Moved enum34 to condition dependent on Python<3.4. * Addition of conditional enum34 install for requirements.txt. * Correction of requirements.txt syntax for python version. * Addition of a Dockerfile for tutorials and ease of trying out. (#178) * Addition of a Dockerfile for quick tutorials. * Tweaks for Docker and addition of git. * Tweak to Docker file for caching. * Addition of Docker documentation. * Tweaks to Docker documentation. * Removal of markdown ## * Take out shebang from shell definition and add it when script is written. (#181) * Take out shebang from shell definiton and at it when script is written. * Include shebang in cmd and fix format of string written to file. * Correction to message when stating no to launch. * Enhance shell batch setting to apply to scheduler scripts. (#183) * Extension of shebang feature to allow users to specify shells. * Addition of debug message to print kwargs. * Addition of kwargs. * Addition of basic batch settings to LULESH sample. * Addition of kwargs to Flux adapters. * Docstring tweaks. * Docstring update. * Fixes the addition of the shebang header for SLURM (#184) * Docstring correction for LocalAdapter. * Correction to addition of exec line at top of scripts. * Correction to an accidental reassignment of cmd. * Removal of an assignment of self._exec in SLURM adapter. * Change to transition adapter returns to Record objects. (#177) * Addition of a Record class for storing general data. * Addition of SubmissionRecord type. * Update to the order of for record parameters. * Changes to StepRecord to expect SubmissionRecord returns. * Updates to SLURM and local adapters to use SubmissionRecords. * Slight tweak to LocalAdapter docstring. * Tweak to have SubmissionRecord initialize its base. * Addition of CancellationRecord class. * Changes to CancellationRecord to map based on status. * Additional interface additions and tweaks. * Changes to have cancel use CancellationRecords. * Update to ExecutionGraph to use records. * Updates to SLURM and local adapters to use SubmissionRecords. * Slight tweak to LocalAdapter docstring. * Addition of CancellationRecord class. * Additional interface additions and tweaks. * Changes to have cancel use CancellationRecords. * Cherry pick of execution commit. * Removal of redundant "get" definiton. * Addition of a SLURM enabled LULESH sample specification. * Addition of output for stdout and stderr for Local adapter. * Correction of file to open. * Addition of 3.7 to testing stack. * Added 3.7 to tox.ini. * Removal of py37 in testing. * Addition of build status badge. * Update SLURM sample spec to add missing walltime. * Addition of documentation that covers the set up of a simple study (#168) * Addition of simple Hello World spec. * Addition of basics page to index. * Addition of hello_world documentation. * Additions to hello_world. * More documentation in single step section. * Continued edits to Hello World. * Addition of parameter section. * Addition of a note about %% token. * Addition of directory structure. * Continuation of parameter documentation. * Removal of the depends key. * Addition of the env section description. * Addition of a link to Docker documentation for Dockerfiles. * Addition of single parameter hello world. * Correction of double colons. * Correction of indentation. * Addition of print out to verify output. * Addition of sample specifications for multi and single params. * Addition of more documentation for single param. * Additional output to show parameter results. * Correction to formatting. * Addition of samples. * Addition of simple Hello World spec. * Addition of basics page to index. * Addition of hello_world documentation. * Additions to hello_world. * More documentation in single step section. * Continued edits to Hello World. * Addition of parameter section. * Addition of a note about %% token. * Addition of directory structure. * Continuation of parameter documentation. * Removal of the depends key. * Addition of the env section description. * Addition of a link to Docker documentation for Dockerfiles. * Addition of single parameter hello world. * Correction of double colons. * Correction of indentation. * Addition of print out to verify output. * Addition of sample specifications for multi and single params. * Addition of more documentation for single param. * Additional output to show parameter results. * Correction to formatting. * Updates to docstrings for data structures. * Updates to clear Sphinx warnings. * Removal of escape on the *args becuase of flake8 failure. * Clean up of existing hello world specs. * Addition of multistep example spec. * Removal of * to fix sphinx errors. * Correction to some docstrings. * Tweaks to specs for consistent naming. * Finished multi-step parameterized example. * Tweaks to hello world docs. * Addition of link to examples on GitHub. * Correction of link to examples. * Correction of link to examples (again). * Removal of Pipfile.lock. * Additions to gitignore for vscode and pipenv. * Marking for v1.1.4 release. * Corrected a missed merge for release v1.1.4 * Extend the Specification interface to break out loading from streams. (#198) * Closes #198 * Addition of loading specification "from_str". * Updates to Specification docstrings. * Updates to abstract Specification to change from str to stream. * Updates to YAMLSpecification to use the new stream API. * Removal of IOString * Update to the YAMLSpecification load stream method. * Quickfix: Addition of the accidental removal of the path member variable. * Updating the version to 1.1.5dev (forgotten previously). * Correction to versioning for install. * Addition of version information to package and command line (#205) * Addition of version information. * Tweak to have setup.py pull from __version__ * Addition of command line arg to print version. * Pinning version for release 1.1.5 * Addition of 1.1.5a to line up with PyPi labeling. * Increment up to get rid of a0 * Addition of a simple example and logo to the README. (#208) * Addition of the Maestro logo. * Logo and hello world addition. * Addition parameter section. * Slight tweak to parameter section. * Addition of a reference to samples folder. * Update __init__.py to tick version to dev version. * Enhances pgen to have access to environment specified in a study. (#209) * Add the OUTPUT_PATH and SPECROOT to kwargs for pgen. * Addition of the spec constructed environment. * Remove "study_env" from pgen kwargs. * Update to pgen function parameters. * Update lulesh examples to have pgen vars. * Correction to docstring ordering. * Updates to add scheduled workflows and reorganize. * Correction to HPC wikipedia link. * Updates to the classifiers for setup.py * Addition of long text setting. * Correction of missing quote. * Drop support for Python 3.4 (#218) * Removal of enum34 and Python 3.4 classifiers. PyYAML no longer supports Python 3.4 which is forcing Maestro to also drop support as it has a direct dependency. * Addition of python requirements, download url, and maintainer. * Re-add py2.7. Note: py2.7 unofficially supported. * Re-add enum34 for py2.7. * Removal of py34 from tox tests. * Removal of py3.4 from travis. * Applies workspace substitution to the restart command. (#219) Fixes #217 * Sub in the new restart command. * Addition of restart workspaces to sub. * Fix for WORKSPACE substitutions into restart. * Correction to override restart instead of cmd. * Test/interfaces/lsf (#215) * Implementation of a ScriptAdapter for the IBM LSF Scheduler. Initial implementation of an LSF adapter. Addition of LSF to the interface factory. Correction to time format LSF correction. Tweak to correct for casting Further tweaks to the LSFScriptAdapter Adjustments to the states the LSF adapter can return. More tweaks to LSF states and status checking. Update to the batch setting docstring Tweak to make wallclock time entries two digits Bugfix to the previous commit. Signed-off-by: Francesco Di Natale <[email protected]> * Addition of GPU support. * Addition of a cancel method. * Addition of reservation submissions. * Tweak to use the -nrs flag for jsrun. * Changes to resource allocation parameters. * Removal of some batch headers for LSF. * Tweak to error code for NOJOBS status. * Tweak to skip lines that are part of prev status line. * Correction of --nrs * Correction of task batch key to nodes. * Tweaks to _substitute_parallel_command. Now only pass in step.run by copy and append the popped keys as step resources "snodes" and "sprocs". * Correction to LSF adapter to correct node specification. * Tweaks to checking status of LSF jobs. A tweak to formatting of the output for bjobs. With the new format we get termination reason, which allows us to check for a timed out status. * Correction to bjobs formatting and nojob check. * Corrections to how nodes and procs are being passed. * Correction of the bkill command with multiple job ids. * Tweaks to check_jobs for LSF adapter. * Implementation of a ScriptAdapter for the IBM LSF Scheduler. Initial implementation of an LSF adapter. Addition of LSF to the interface factory. Correction to time format LSF correction. Tweak to correct for casting Further tweaks to the LSFScriptAdapter Adjustments to the states the LSF adapter can return. More tweaks to LSF states and status checking. Update to the batch setting docstring Tweak to make wallclock time entries two digits Bugfix to the previous commit. Signed-off-by: Francesco Di Natale <[email protected]> * Addition of GPU support. * Addition of a cancel method. * Tweak to use the -nrs flag for jsrun. * Changes to resource allocation parameters. * Removal of some batch headers for LSF. * Tweak to error code for NOJOBS status. * Tweak to skip lines that are part of prev status line. * Correction of task batch key to nodes. * Tweaks to _substitute_parallel_command. Now only pass in step.run by copy and append the popped keys as step resources "snodes" and "sprocs". * Correction to LSF adapter to correct node specification. * Tweaks to checking status of LSF jobs. A tweak to formatting of the output for bjobs. With the new format we get termination reason, which allows us to check for a timed out status. * Correction to bjobs formatting and nojob check. * Corrections to how nodes and procs are being passed. * Correction of the bkill command with multiple job ids. * Tweaks to check_jobs for LSF adapter. * Addition of LSF to key for LSFScriptAdapter. * Correction of lsf key. * Addition of a debug statement to catch status command. * Correction of bjobs command. * Additions to status checks. * Rearraging some debug logging. * Testing to see if .split works. * Further LSF tweaks. * Revert back to split with strip. * Removal of -q option due to excessive filtering. * Correction of a missed merge * Correction to use new Records structures. * Style fix for line 207. * Correction to SubmissionRecord creation. * Decode output in check_status to enforce str type. * Decode output in submit. * Addition of retcode to a logger statement. * Sets log and err out for SLURM to parameterized step name (#220) Fixes #213 * First attempt at log name fix. * Correction to header formatting. * Update to dev0 to differentiate for pre-release. * Ticked up version to 1.1.7dev1. * Update Maestro logo link to full link for PyPi. * An update to Maestro's description. * Adding Neptune to the list on Planets (#222) Signed-off-by: Adrien M. Bernede <[email protected]> * Small README tweak. * Updated the study step name in the README.md file (#227) * Updated the package version in the Sphinx docs (#229) * Added a link to Maestro's documentation (#231) * Added a link to the documentation in the README.md * Added a documentation section to the README.md * Improve the performance of expansion (#232) * Addition of override for ExectionGraph to not check cycles. * Addition of documentation for justification of override. * Addition of a newline due to style. * Addition of dill as a dependency. * Fix pickle and unpickle to use dill. * Updated the description in the setup.py file (#233) * Added dill to the requirements.txt file (#235) * Fix to add PID to local log names. (#241) * Refactor to move DAG operations to the back end Conductor (#242) * Removal of SimObject references. * Addition of PickleInterface. * Derive Study and ExecutionGraph from PickleInterface. * Some style clean up. * Clean up unused dill import. * Checking in to develop on another machine. * Start of pre-check. * Removal of precheck. * Tweaks to Maestro and Conductor. * Initial interface and refactor to Conductor class. * Refactor of monitor_study * Tweaks to backend conductor to use Conductor class. * Tweaks to Maestro frontend to use Conductor class. * Minor bug fixes in Conductor class. * Minor tweaks to user cancel in Conductor class. * Port status to the Conductor class. * Continued additions to port to Conductor class. * Slight fix to fix flake8 violation. * Removal of named argument *, python2.7 does not support it. * Refactor to remove parser and logging from Conductor class. * Style clean up to fix flake8 errors. * Updates to the docstrings for PickleInterface. * Updates to the Conductor docstrings. * Small flake8 error fix. * Added pre-commit to enable flake8 checks (#244) Added pre-commit to enable flake8 checks before a commit is accepted. Also reordered requirements.txt to more easily determine which are for development. * Bugfix for logging that didn't appear in submodules (#247) * Improved logging setup. * Transition to a LoggerUtil class. * Addition of docstring to LoggerUtility + cleanup. * Added spec verification via jsonschema added checks for valid keys branch updates working on validation added schema file updates fixed spec fixed spec added jsonschema to deps updates ran black on yamlspecification.py specified newest jsonschema version added manifest added include_package_data to setup.py reformatted json experimental package_data fixed path fixed path fixed path again reverted newline added check for empty strings reworked exception logic implemented reviewer suggestions, shifted exception logic, renamed redundant variables renamed variable removed unused import added missing `self.verify_environment()` call Co-Authored-By: Francesco Di Natale <[email protected]> paths and git dependencies are now array types Co-Authored-By: Francesco Di Natale <[email protected]> removed redundant logic swapped number type to integer moved env schema validation to top, which avoids some types of ambiguous errors removed test yaml removed some additionalProperties restrictions unknown commit removed debug print * Reformatted and added color to logger messages. (#252) Closes #248 Added color to logging and converted some info messages to debug. added colors and cleaned up logger corrected formatting added to dependencies reverted message log level change added debug format debug logging format now works flake8 fix * Bug fix for unintentional variable section requirement. (#256) Closes #255. A bug fix that corrects an unintentional assumption that the variable section in a specification will always exist. * Update to broken venv documentation link. * Addition of a simple dry-run capability. (#259) * Addition of a simple dry-run capability. * Addition of a DRYRUN state. * Tweak to reduce sleep time for dry run. * Renamed dryrun to dry to reduce redundancy. * Enable autoyes when dry running is invoked. * enabled raw sbatch errors to be logged (#262) * enabled raw sbatch errors to be logged * tweaks/correction suggested by Frank * reduced line length * fixed flake8 error in slurm-adapter * Tweaks and fixes to the SLURM header. (#263) * Tweaks and fixes to the SLURM header. Adds the ability to specify GPUs, fixes reservation pairing with accounts, and now uses a ChainMap to handle the internal key conflicts between the batch and step settings. Also introduces the exclusive key. Changed to full parameter names for clarity. * Addition of chainmap package for python2.7 * A check to see if procs is included in the header. Fixes #234 and includes ntasks in the header if the batch section includes the key. * modified executiongraph to round datetimes to nearest second * Adds a check for UNKNOWN state. (#266) Fixed #264 -- When testing #263, SLURM ran out of memory during the completing stage and aborted the jobs and left the job in an unknown state. This PR fixes this issue by defaulting to a failure when the status is found to be UNKNOWN. * Adds a check for UNKNOWN state. * Correction of bad variable name "state" * Tweak to treat UNKNOWN as failed. * Change marking of failed state to unknown. * Some fixes for style and credit from #265 * Official 1.1.7 release. * Official start to 1.1.8dev0. * Addition of README as long description. (#269) * Addition of README as long description. * Dropping encoding as it's not supported in 2.7 * Pgen docs (#275) * Initial pgen docs with itertools example * Add pargs example * Add pgen using numpy plus helper function for 1D distribution * Fix typos, update image * Initial port of complete Parameters documentation * Fix up api docs warnings, add missing script adapters * Add subsection on accessing env block variables inside pgen * Literal imports and some style tweaks. * Some minor title and header tweaks. * Remove out of date notes * Renamed itertools_pgen to reference LULESH. * Update doc strings on parameter generator samples * Make flake8 happy * Misc cleanup and formatting, adding more links and internal references * Fix up section listing, in-text moniker/function formatting Co-authored-by: Frank Di Natale <[email protected]> * Replace unicode quote with ascii quote (#277) * Updated the docs release version to 1.1.7 (#278) * Modified validation logic to skip over variable tokens. (#279) Co-authored-by: Francesco Di Natale <[email protected]> * Inheritable validation and spec module (#280) * added checks for valid keys * branch updates * working on validation * added schema file * updates * fixed spec * fixed spec * refactor updates * verification logic * removed test field * made specification module * removed TODO * fixed flake8 style * schema name is now hardcoded * updated MANIFEST * adjusted imports * tuples -> lists Co-authored-by: Francesco Di Natale <[email protected]> * Tick up to 1.1.8 patch version. * Correction of package data path to schema. * whitespace fixes from pre-commit checks * Start of version to 1.1.9dev0 * Updated the docs version to 1.1.8 (#288) * Added a link to the maestro sheetmusic repo and added descriptions for the (#287) documentation links * Added introduction documentation from the README.md file (#289) * Docs/formatting (#296) * added some missing spaces in multiline text strings * added some missing spaces in multiline text strings * Update parameters.rst (#298) * Update parameters.rst Make as explicit as possible that if you run with PGEN, then just comment out the global.parameters block in your YAML spec file. * Update parameters.rst * Update parameters.rst * Update parameters.rst Made changes according to Francesco DiNatale's suggestion. * Correct documentation compilation warnings (#299) * Updates to correct indentation errors. * Addition of newly compile docs. * Indendations added to Record.get docstring. * Bugfix: Division by string instead of number (#302) When computing CPUs per node on LSF. * Addition of processing multiple directories and globs for status (#301) * Addition of processing multiple directories and globs for status * Minimal edits after first commit * Shortened code to allow more glob functionality * Removed glob, added nargs+ to conductor.py, rearranged code in maestro.py * Removed some redundant code * Reduced more code. Works for multiple directory inputs (* included). Absolute path added * Added refinement features to allow ease of interpreting results * Additional refinement. No more logging for status command. * Flake8 and minor style tweaking * Updated the documentation to pull the recent version (#306) * Updated the Sphinx documentation to pull Maestro’s version from the package * Updated formatting for flake8 check * Updated the documentation for installing Maestro (#308) * Updated the Dockerfile and documentation (#312) * Updated the documentation for building the Docker image and updated the Dockerfile to use Python 3 * Updated the Dockerfile format * Removal of Python 2.7 and addition of newer Python 3 versions (#315) * Removal of py27 and additions of py37+8 * Removal of 3.7 from travis * Tweak to remove py37 matrix * simple bug fix in the YAMLSpec description setter (#319) * Conversion to Poetry Build System (#316) * added pytest and coverage settings * initial change to poetry * remove setup.py and convert to pyproject.toml * fixes for tox and travis pipelines * Changes to support python >3.4 and formatting updates. * adding athey1 as a maintainer * fixed tab spacing * fixed spacing * added jessica as a maintainer * added jeremy as a maintainer * YAMLSpecification Testing (#317) * initial testing on loading spec * black settings added * reformatted by black * added test for validation errors * added test to check for missing key in step * updated schema to add check for non-null variables * added check for study steps * added test for multiple dependencies of the same name * changed yaml spec testing for easier reading * added test of global parameters * fixed test execution to verify error thrown * added output_path test * formatting fix * added tests for steps generation and params generation * added tests for get_study_env * initial testing on loading spec * black settings added * reformatted by black * added test for validation errors * added test to check for missing key in step * updated schema to add check for non-null variables * added check for study steps * added test for multiple dependencies of the same name * changed yaml spec testing for easier reading * added test of global parameters * fixed test execution to verify error thrown * added output_path test * formatting fix * added tests for steps generation and params generation * added tests for get_study_env * updated to add testing for description setter * added test for spec.name setter * pulled spec_path code into a pytest fixture * added pytest and coverage config * added pytest-cov as a tox requirement * added report.xml to gitignore * changed line length for black * added coverage and junit reports to tox * Added coverage appending for test runs * Add name and real_name StudyStep properties. (#314) * Add name and real_name StudyStep properties. There now needs to be an aliasing of nickname with a step name when requested because adapters expect to use the name. StudyStep objects now return their nickname if the nickname is set. Objects such as the ExecutionGraph must now use the real_name for logistic tracking, while adapters can continue to use name. * Addition of extension property for Flux. * Some typo and string style tweaks. * Re-add setup.py to retain standard editable install. * Slurm Script Adapter Bug Fix (#305) * Testing by adding comment * Initial commit, reformatted checking for Nodes and Procs in slurmscriptadapter.py, added extra conditional * Removed condition for checking Node, moved condition for raising exception to get_header * Changed code for condition that handles no nodes or procs * Tweaked the error message * Minor variable mispelling * Handled additional case of nodes absent and procs present to add procs to SBATCH * In the case of Nodes absence and Procs presence, it will be added to SBATCH * Nodes now appear in SBATCH header even if uninitialized in a step * Changed Chain Map to regular Dictionary * Remove extra loop and change resource construction. * Cleared styling issues, removed time stamped outputs from Samples directory * Some refactor clean up. Co-authored-by: Frank Di Natale <[email protected]> * WIP: Add job identifier to status command (#323) * Job ID shows up in Maestro Status, edited executiongraph.py write_status function * Changed sleep time back to original. Job ID gets last index. * Fixed a casting issue (int to str). + style Co-authored-by: Frank Di Natale <[email protected]> * Fixes #324 (#325) * A fix to use communication instead of wait. (#328) * Updated README to correct badges. * Corrected badge colors for PePy * Removal of Python3.5 from testing (#339) * Removal of Python3.5 * Removal of Python 3.5 from testing. * Updates to the FluxScriptAdapter to use the latest Flux. (#251) * Updates to use new Flux. * Moves Flux imports to FluxScriptAdapter. * Updates to cancel to use the new Flux0.16.0 interface. * Added a check for the Flux URI in the environment. * Addition of Record classes to FluxAdapter. * Addition of a modified exec in Flux Adapter. * Correction to the shell header. * Add walltime to the header only when present. * Correction for checking the value of walltime instead of existence. * Tweak to overlook missing keys. * Modified checks for procs and cancellation return. * A calculation of a sensible cores per task when not specified. * Additional sensible set for non-values. * Use of direct import instead of __import__. * Correction to __import__ for Flux. * Readdition of core per task computation. * Tweak to pass nodes as tasks. * Updates to use new Flux. * Moves Flux imports to FluxScriptAdapter. * Updates to cancel to use the new Flux0.16.0 interface. * Added a check for the Flux URI in the environment. * Modified checks for procs and cancellation return. * Use of direct import instead of __import__. * Correction to __import__ for Flux. * Readdition of core per task computation. * Tweak to pass nodes as tasks. * Addition of _flux util module. * Updates to Flux interfacing. * Rename module to get rid of dots. * Tweak to import flux.constants * Renamed 0.16.0 module and added new status functions. * Switch jobs to a list, since ID is included. * Tweak to construct set of joined tuples. * Addition of missing flux.job import. * Tweak to cast set to a list. * Correction of a misnamed key reference. * Correction to incorrect add for attrs set. * Tweak to have the jobid of errored checks returned. * Addition of mapping statuses to strings. * Corrected bad reference to tuple. * Move the FluxInterface to abstracts. * Update flux 0.17.0 interface. * Move flux_0_16_0 to flux_0_17_0 to reflect targeted version. * Tweaks to fix an import error for flux.core.inner.raw * Tweaks to base FluxInterface with new API methods. * Addition of method to get latest interface. * Addition of new API calls to 0.17.0 * Tweaks to 0.17.0 * Addition of a missing comma. * Addition of a missing commas. * Swap out of status call. * Reintroduction of addtl args keyword. * Updates to header construction. * Updates to status checking. * Correction to submission record creation. * Updates to get_statuses. * Cast jobids back to a str. * Test instantiating new handle for status. * Correction to logging error call. * Addition of logging to status check. * Change to print exception as string. * Delayed check for results field to prevent key error. * Reverting to ints for job identifiers. * Readdition of resulttostr (accidental deletion). * Readded check to verify we reuse the flux handle. * Fix to a bad reference to old handle variable. * Addition of debug logging. * Addition of submission to FluxInterface. * Addition of submit to flux0_17_0 * Correction to FluxScriptAdapter to use new submit. * Shift to ABC subclass. * Removal of Singleton base class. * Correction of the FluxInterface name to remove _ * Fixed ngpus type in parameter list. * Fixed ngpus type in submit call. * Reordered improperly ordered return values. * Addition of debug logging for FluxInterace 0.17.0 * Correction to a bad key reference to "jobid" * Addition of missed completed state + correction to abbrev. * Removal of Singleton import for FluxInterface. * Flake8 fix. * Removal of 0.11.0 FluxInterface. * Addition of cancellation to FluxInterface. * Addition of a first pass cancel method. * Tweaks to change when sub-brokers are used. * Addition of cancel call to the interface. * Update to logger formatting. * Update to logger formatting to make verbose. * Additional debug logging for returncode. * Tweak to expected return code. * Tweaks to cancellation API * Addition of missed classmethod decorator. * Addition of statuses for unknown and file not found. * Addition of debug logging in Flux interface 0.17.0 * Correction to cast integers to str * Addition of naming for jobs. * Tweak to FluxAdapter to pass in job name. * Correction of a bug that forced GPUs to 0. * Addition of cores per task. * Removal of the Jobspec debug print. * Addition of a Flux sample spec. * Addition of a 0.18.0 Flux backend. * Tweak to add walltime to backend parameters for submit. * Removal of a debug print of command line. * Addition of docker related files to test flux. * Clean up of commented docker lines. * Addition of walltime support. * Updates to specification validation to support Flux. * Tweaks to Flux test specification. * Addition of TIMEOUT state. * Removal of verbose logging. * Check to make sure walltime is set to a value in 0.18.0 * Fix to correct inf walltime for Flux 0.19.0 * Additional debug logging. * Removal of verbose step logging. * Shift to using from_nest_command * Correction to put script path in list for JobSpec. * Tweak to correct from_nest_command call. * Fixes to catch failed Flux imports. * Removal of Python 3.5 from testing. * Removal of Python3.5 * Addition of ceiling gpu_per_slot calc. * Correction of bad attribute reference. * Addition of cancellation for multiple directories and globs. (#338) * Initial implementation of cancel glob. * Tweaks to prompt user for cancel. * update .coveragerc to ignore tox and venvs (#346) * Spectrum adapter remove/lingering in-class import. (#352) * Commit 38567e7: Spectrum adapter remove/lingering in-class import. Commit 51fb0fd: Remove Spectrum adapter tests Short term: This fixes a bug where a stray import prevented the FluxScriptAdapter from being pickled for interprocess communication. Long term: Cleans up the code from the stale Spectrum adapter and makes it so that we have less dead code. It prevents us having to waste time looking through code that is no longer used. Remove Spectrum adapter tests * Tweaks to fix imports. * Update docker instructions. * Module nicknaming to prevent aliasing. * Logging bugfix to include record name. * Limit to cryptography<3 for build (#353) Short-term goal: The goal for now was to get TravisCI testing Python3.6 again. It was previously pulling source and trying to build cryptography at a more recent version from PyPi. This fix makes a wheel version available so that the recent change to include rust doesn't prevent testing. Long-term: Eventually we will want to move to the most recent cryptography, but for now it breaks 3.6. The current specified schedule for 3.6 is that currently it is only receiving security updates until 12-2021 (when it will presumably be EOL). We will revisit this when we deprecate 3.6 support. * Update to python versions >=3.6 * Add support for Flux core 0.26.0 (#357) * flux: fix version typo in comment * flux: add 0.26.0 adaptor Notable differences from 0.18.0 version: - Use Flux's builtin statustostr & remove adaptor-local statustostr and attr list - Drop the individual calls to `job_id_list` for a single call to `JobList` * Removal of forced Flux default version. * Addition of [email protected] dockerfile. * Removal of unused cb_args variable. Co-authored-by: Francesco Di Natale <[email protected]> * Ticked up the version to 1.1.9dev1 * Flux update (#359) * flux0.26: add version checking during connection Problem: there are several versions of the Flux adaptor and it is very easy to use too new of an adaptor with too old of a Flux version (and vice-versa) Solution: check both the adaptor version and the Flux broker version when making a connection via a new Flux handle and ensure the versions match. If the adaptor version is newer, log an error. If the broker version is newer, log a debug message letting the user know they might benefit from choosing a newer adaptor. * flux0.26: handle the case where the ngpus argument is a string Problem: the `ngpus` argument to `submit` (as well as `parallelize`) can sometimes be a string (e.g., when it isn't the default of 0). This causes validation errors in the Flux jobspec class, which does runtime type checking of the value Solution: explicitly check if `ngpus` is a string, specifically a digit, and if so, convert it to an `int` before passing to Flux * flux0.26: fix number of slots for nested flux launches Problem: when launching with multiple tasks in a nested Flux instance (i.e., `force broker = True`), then the number of slots for the nested instance should be set as the number of tasks not the number of nodes * flux: make `force_broker` the default action Problem: when creating maestro specs that need to be portable across schedulers, it is assumed that a `${LAUNCHER}` call is needed in many scripts, but when `force_broker` is `False`, this results in a doubling up of the parallelism. The call to the `submit` function within the Flux adaptor launches multiple processes and then then `${LAUNCHER}` call within the script gets expanded by the `parallelize` function to also launch multiple processes. Solution: make `force_broker` the default behavior to maximize compatibilty with other schedulers. If users want to avoid the extra overhead of a nested Flux instance, they can always opt out of the default behavior with `force_broker = False`, and then elide their use of the `${LAUNCHER}` variable in the script. * Moved integer check to FluxScriptAdapter main class. * Move the version check to FluxInterface base class. * Update prior interfaces to use connect_to_flux method. * Fix for an empty gpu string in a specification. * Removal of gpu conversion in 0.26.0 Flux backend. Co-authored-by: Francesco Di Natale <[email protected]> * Update to fix the addition of -g erroneously (#361) * Removal of TravisCI due to transition to GitHub Actions. * Update pyproject version to match setup.py * Addition of a first pass GH Action linter. * Status tweaks (#358) * Add bfs/dfs ordered status output * Prototype rich formatted status tables with cli layout switch * Fix bfs ordering option, make it default * Remove unhelpful log output.. * Add rich dependency * Add rich dependency to setup.py * Quit whining flake8... * More flake8 drama.. * Cache status ordering to improve scalability * Add bfs/dfs ordered status output * Prototype rich formatted status tables with cli layout switch * Fix bfs ordering option, make it default * Remove unhelpful log output.. * Quit whining flake8... * More flake8 drama.. * Cache status ordering to improve scalability * Tweak to python version for rich. * Attach params to step records to enable use in status output * Add param name:value table in narrow status layout * Check for new status column to enable backwards compatibility * Checkpoint on renderer factory implementation * Cleanup after successful test * Remove extraneous debug logging * Fix bad indent, tweak themes to play nice on different terminal themes * Unit test for flat status layout * Fix erroneous whitespace * Add step root to workspace in status of parameterized steps * Rework status tests, add narrow layout test, rebaseline * Remove stray debug printing * Add help target and some documentation to the make file * Document layouts, add some test comments * Rename status renderer test file so pytest can find it automatically * Fix bug causing duplicate entries in narrow layout * Update narrow layout test baseline * Add layout screenshots to docs * Add legacy table format back to the status layout options * Convert base renderer to abstract, implement auto registration and auto layout cli choice list * Add some proper google style doc strings to render factory * Docstring, arg defaults change for narrow renderer * Sync up layout methods and doc strings of status renderers * Fix up docstrings on params for StepRecords * Fix incorrect return type documentation * Bug fixes/style tweaks on narrow status layout * Remove debug output * Add documentation for legacy status layout Co-authored-by: Francesco Di Natale <[email protected]> * Slurm QOS specification. (#365) * qos feature added for Slurm * revert naming of this file * Addition of reservation to bsub headers. (#367) * Additional flexibility to walltime parsing for Flux (#369) * Tweaks to make walltime more flexible. * Correction to walltime check in Flux backend. * Additional tweaks for robustness for convert walltime. * Allow more types for walltime to be specified. * Reintroduce conversion call. * Addition of Flux connection to try/catch (#375) * Avoid overriding previous lines in Lulesh example (#382) * Flux Adapter Clean up and Addition of Job Priority (#379) * Additional logic for walltime parsing. * Removal of forcing broker when multinode. * Renamed the use_broker setting to nested. * Addition of urgencies to job submission. * Correction to Urgency mapping. * Bugfix from isdigit to isnumeric. * Addition of from_str for StepUrgency * Added parsing of urgency to FluxAdapter. * Addition of a float type urgency. * Removed exception when finding entries that are not strings. * Update to add get_priority to SchedulerScriptAdapter * Addition of urgency mapping for Flux backend. * Update to the Flux adapter to support urgency modifications. * Rename StepUrgency to StepPriority * Pass on types that can't be substituted. * Removal of annotations to support 3.6 * Corrections to the passing of urgency values. * Change enums to lower case. * Addition of sensible capitalization of priority. * Tweaks to Flux example for new keys. * Removal of resource checks. * Addition of Workflow Community Initiative information. (#385) * Addition of icon form logo. * Addition of Workflow Community Initiative info. * Bump paramiko from 2.7.2 to 2.10.1 (#387) Bumps [paramiko](https://github.com/paramiko/paramiko) from 2.7.2 to 2.10.1. - [Release notes](https://github.com/paramiko/paramiko/releases) - [Changelog](https://github.com/paramiko/paramiko/blob/main/NEWS) - [Commits](paramiko/paramiko@2.7.2...2.10.1) --- updated-dependencies: - dependency-name: paramiko dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add black and update versioning. (#389) * Enable more complete configurability of jsrun launcher (#384) * initial patch for more complete configuration of jsrun launcher * First pass at documenting usage of the lsf/jsrun launcher * Add corresponding maestro steps for each jsrun variant * Add one example of a memory hungry application * Build table mapping jsrun to maestro step keys * Add binding controls, rename keys from snake case, document defaults * Fix up straggling snake case keys * Improve debugging info in schema error messages * Fix rs_per_node, make gpu binding optional since it's new in lsf 10.1 * Update binding flag in examples, add note about gpu binding availability * Add initial lsfscriptadapter tests * Initial pass at general batch block documentation * Remove old commentary * Remove unneeded nodes/procs math in jsrun launcher substitution * Remove unneeded loggin output * Remove more debugging log outputs * Update lsf examples to match json schema for resource specification keys * Cleanup the cpus per rs machinery, schema * Add openmp and mpi lulesh study to exercise lsf resource specification keys * Document the sample lsf lulesh specification * Updating dependency versions. (#392) * Correct conditional to correct empty string catch. (#393) * Move to purely poetry install (#394) * Removal of setup.py for poetry editable. * Addition of install testing. * Make python versions strings to avoid 3.1 Python version. * Version tick for dev2. * Ci/test and release (#396) * First pass at re-enabling tests in ci * Update to newer poetry gh action, tweak cache setting * Fix missing } * Fix incorrect gh action repository name * Disable venv cache * Test out doing flake8 linting with poetry * Revert flake8 linting to separate run for nicer reporting * Test reusable python matrix * Revert reuse test, sync up steps between pip/poetry * Remove missing dependency * force linting pass before running expensive install/pytest steps Co-authored-by: Elsa Gonsiorowski, PhD <[email protected]> Co-authored-by: Francesco Di Natale <[email protected]> Co-authored-by: jsemler <[email protected]> Co-authored-by: Kevin Athey <[email protected]> Co-authored-by: Joe Koning <[email protected]> Co-authored-by: Adrien Bernede <[email protected]> Co-authored-by: Kevin Athey <[email protected]> Co-authored-by: Bay <[email protected]> Co-authored-by: Benjamin Bay <[email protected]> Co-authored-by: crkrenn <[email protected]> Co-authored-by: Christopher R. Krenn <[email protected]> Co-authored-by: Jeremy White <[email protected]> Co-authored-by: Tanim Islam <[email protected]> Co-authored-by: Kenny Weiss <[email protected]> Co-authored-by: Jeffrey Mei <[email protected]> Co-authored-by: scottwedge <[email protected]> Co-authored-by: Stephen Herbein <[email protected]> Co-authored-by: Tobias Duswald <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> 1.1.9 Release version bump.
#297