-
Notifications
You must be signed in to change notification settings - Fork 3
refactor: remove external dependencies and architectural revamp #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
- Add new omnibenchmark.model module with pydantic-based validation - Remove external omni-schema dependency - Remove old validation infrastructure - Update project dependencies to remove omni-schema BREAKING CHANGE: YAML benchmark parsing now uses internal validation
- Add omnibenchmark.dag module with lightweight graph implementation - Refactor benchmark internals to use custom graph (_graph.py, _dag_builder.py) - Remove networkx dependency and old DAG implementation - Restructure benchmark node handling (_node.py, _paths.py) - Add visualization support (_dot.py, _mermaid.py)
- Update all I/O modules to work with new model system - Remove omnibenchmark.sync module - Update utilities module for compatibility
- Add comprehensive versioning module with git integration - Include version management, git utilities, and exception handling - Support for version validation and management workflows
- Update CLI commands to work with new model and DAG systems - Adapt Snakemake workflows for new benchmark structure - Update workflow scripts and formatters for compatibility
- Add pixi.toml and pixi.lock for modern Python dependency management - Add pixi CI pipeline, disable old pipeline - Add strict type checking configuration and scripts - Update development environment and testing configuration - Add tox.ini for standardized testing
- Add comprehensive test coverage for new model and versioning modules - Refactor existing tests to work with new architecture - Add test factories and fixtures for improved test maintainability - Update all test data files for new validation system - Add documentation templates and reference materials - Update README, CHANGELOG, and contributing guidelines
7e8a6b7
to
c453048
Compare
hmm, is boto3 not installed by default? running clustbench fails on this branch but runs on main. pip install from run_omnibenchmark :
(being the $OB_BRANCH this branch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really amazing work !! It eliminates a lot of technical debt that this progress has accumulated. Thanks a lot for it !
I have just a couple of questions, suggestions and discussions to start.
- Regarding the explicit api_version field: I'm not a fan of adding this to benchmark YAML files, since it introduces unnecessary complexity and maintenance burden for users. Moreover it might led to confusion with the
version
field. I believe the schema structure itself should provide enough information to determine compatibility, and we can infer an "api_version" based on it. - Future work prioritization: Given that there are a lot of TODOs and this is just a start of a refactoring, can you provide a list of future work required to address them, based on the priority that you see?
def _find_duplicates(items: List[str]) -> List[str]: | ||
"""Find duplicate items in a list.""" | ||
from collections import Counter | ||
|
||
counts = Counter(items) | ||
return [item for item, count in counts.items() if count > 1] | ||
|
||
|
||
def _is_url(string: str) -> bool: | ||
"""Check if the string is a valid URL using urlparse.""" | ||
from urllib.parse import urlparse | ||
|
||
try: | ||
result = urlparse(string) | ||
return all([result.scheme, result.netloc]) | ||
except ValueError: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_find_duplicates
and _is_url
are helper methods that I've seen declared in a couple of places already. Let's factor them out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, they're duplicated in model.validation and model.benchmark. TBD
def _validate_environment_path( | ||
self, env: SoftwareEnvironment, benchmark_dir: Path | ||
) -> List[str]: | ||
"""Validate software environment path based on backend. | ||
ARCHITECTURAL WARNING: This method performs system-specific validation | ||
that violates the principle of keeping models abstract and declarative. | ||
TODO: Move system-specific checks (file existence, envmodule availability) | ||
to BenchmarkExecution or a separate validation layer. The model should | ||
only validate data structure and logical consistency, not system state. | ||
""" | ||
errors: List[str] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not move then the whole validation logic in the Validator
superclass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we want to leave just syntactic validation at this level (pydantic model for syntax), and have all the consistency checks (at the benchmark level) plus the module metadata and result validation under an unifying validators module.
"""Pydantic models for Omnibenchmark.""" | ||
|
||
from omnibenchmark.model.benchmark import ( | ||
# Base classes | ||
IdentifiableEntity, | ||
DescribableEntity, | ||
# Enums | ||
APIVersion, | ||
SoftwareBackendEnum, | ||
RepositoryType, | ||
StorageAPIEnum, | ||
# Core models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move the model
package under benchmark
, since it anyways contains models related to the benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we discussed, my plan is to move some interfaces, exceptions etc from benchmark to core, to avoid circular dependencies between validators, model and "benchmark" (in essence, "benchmark" should be very minimal after further refactoring between modules)
|
||
class SimpleDAG: | ||
"""A simple directed acyclic graph implementation.""" | ||
|
||
def __init__(self) -> None: | ||
"""Initialize an empty DAG.""" | ||
self.nodes: Set[Any] = set() | ||
self._edges: Dict[Any, Set[Any]] = defaultdict(set) | ||
self.predecessors: Dict[Any, Set[Any]] = defaultdict(set) | ||
self.node_attrs: Dict[Any, Dict[str, Any]] = defaultdict(dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nicely structured and tested !
This will cover all use cases that we have so far. Thanks
push: | ||
branches: | ||
- main | ||
- dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this run twice when PR-ing from dev
to main
?
I wonder whether there is any push, other than to master, that needs to be checked without a PR / review first
pull_request:
push:
branches:
- main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, would a workflow_dispatch:
make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this run twice when PR-ing from dev to main?
run twice: yes, I think this is pretty standard no? You want to test before (PR discussion) and after you merge (because some times, like in this same PR, squashing or merging introduces errors, mostly due to bad resolution of conflicts or bad rebasing etc.
Re. merging from dev to main, same: we're interested in catching divergence when tagging on main and doing hotfixes that might not be reconciled properly with dev.
tldr; I think it does no harm
- feat(cli)!: --local argument has been renamed to --local-storage | ||
- feat: Support passing of extra arguments from CLI -> Snakemake for `run` commands | ||
- feat: add extra profiler to the snakemake execution (#151) | ||
- refactor!: remove dependency on omni-schema and networkx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that much more going on, e.g. the test framework simplification?
"benchmarker": "Your Name", | ||
"author": "Your Name", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
difference between benchmarker and author?
"version": "1.0.0", | ||
"url": "https://example.com", | ||
"email": "[email protected]", | ||
"path": "/path/to/file", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path to what?
"container": "example/container:latest", | ||
"image": "example/image:latest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
difference between these two? Same kind of comments apply to the rest of the example dict
elif prop_type == "integer": | ||
example_value = 1 | ||
elif prop_type == "number": | ||
example_value = 1.0 | ||
elif prop_type == "boolean": | ||
example_value = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most things (arguments, parameters) react well to a 1
value, would it make more sense to set the defaults as 0
s, 0.0
s, False
s instead, so we reduce chances they are meaningful?
# required (string), Commit hash | ||
commit: c0ffee4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to this update but related conceptually. Does this ob
digest alternatives such as:
- full commit lengths (Mark attempted to use full commits)
- branch names
- tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm afaik we enforce commits and not refs like branch/tag, I understood that was intentional (by the omni-schema spec). Re. long-name, I think we should be permissive yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I think it would be good to have an (undocumented) branch name spec for fast prototyping. But defeats the whole repro purpose, so we should ask the team again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm maybe it's something that we can allow w/ validation rules --strict
apptainer: example_apptainer | ||
|
||
# optional (string), Docker image | ||
docker: example_docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant so apptainer uses a docker image directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that, and accounting for our new plugin that does allow docker execution via podman/udocker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, then we perhaps should get rid of ORAS (protocol) constraints if they're still in place. If using standard snmk plugins we can get images from anywhere, e.g. dockerhub and so on
# optional (string), Environment module name | ||
envmodule: example_envmodule | ||
|
||
# optional (string), EasyBuild config path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full path or basename? meant to make sense of the robotspath and the easybuild config
# required (string), Unique identifier | ||
id: example_id | ||
|
||
# required (string), File path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are they paths or basenames?
|
||
class Benchmark: | ||
def __init__(self, benchmark_yaml: Path, out_dir: Path = Path("out")): | ||
# base path is always the location of the benchmark YAML file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a superimportant point, because it has to do with the .snakemake folder and so, beyond the output dir. Shall we document that? I'm thinking of some users wanting to have their YAML git tracked somewhere in their home, but the ob
run on a scratch or similar. They might want to copy (symlink?) the YAML to the scratch and run ob there, right?
""" | ||
# Count in-degrees | ||
in_degree: Dict[Any, int] = defaultdict(int) | ||
for node in self.nodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, self.nodes
is a Set, so this iteration is not deterministic. Which the point of this function, I know it feels I'm making no sense, and perhaps it's nonsense, but. You're trying to sort the nodes. But in order to sort the nodes, wouldn't it be safer to do so after pre-sorting (not topologically) them with sorted(self.nodes)
or the similar? To avoid future poltergeists with complex DAGs with plenty of exclude
s, intermediate metric collectors etc, where randomly accessing the (node) set might be adding unwanted extra freedom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only purpose of this module is to drop networkx support, that was adding significantly to startup time.
You're trying to sort the nodes
this PR does not introduce any new behavior, it just mimicks the parts of the networkx API that we were already using.
that said, I think this discussion is linked to how we want to deal with the intermediate representation of the config file and the workflow-specific serialization of the blocks (snakemake or others).
in a sense, all the DAG parsing is redundant since snakemake will build its own. However, I would argue that we need simple DAG operations to have a canonical representation and validation of the tree(s) being loop free. Otherwise we're letting snakemake (or backend X) blow up errors directly to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the snmk trick we're using (nesting results of B
as a folder of its parent A
) can give rise to loops. Perhaps this has to do with the mermaid/layout plotting only?
for module in all_modules.values(): # type: ignore | ||
if module.software_environment not in env_ids: # type: ignore | ||
errors.append( | ||
f"Software environment with id '{module.software_environment}' is not defined." # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "not defined" mean, "unavailable" perhaps? or "unloadable"? Worth reporting the modulepath then? So the user has more info/debugging opportunities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unavailable or not loadable speaks about the ability to load them (maybe because the path cannot be resolved).
in this case, they're missing from the definition (or declaration, in the C sense). would that work better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, something like this?
Error Missing software environment. Module X is not defined within the benchmarking YAML: it should be listed as part of the stanza
software_environments
within the benchmarking YAML header.
Generally, stanzas for envmodules are shaped as follows (e.g., for a module loadable with
module load bwa/0.7.17
)
software_environments:
bwa:
description: "a bwa module"
envmodule: "bwa/0.7.17"
Or something like that?
Unless the error had to do with being unloadable. If so, the error might better report the trace of the failed module purge; module load
command instead, as well as the modulepath?
if self.software_backend == SoftwareBackendEnum.conda: # type: ignore | ||
if not env.conda: # type: ignore | ||
errors.append( | ||
f"Conda backend requires conda configuration for environment '{env.id}'" # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error is not very clear, is this capturing the "conda.yml" is not available at the given path? if so the user would benefit from "Failure to find a conda yaml at ./path/to/file as required to run module/metric collector X using conda"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's just doing consistency check.
since we're here: I think there's a possible easy simplification of this whole logic instead of so many if.. else
.
instead of managing conda, apptainer etc as different fields in the environment, I think what we need is type-checking a single entity, environment.
available_environments = List[Environment]
(possibly qualified by checking paths for existence, but that can come up later in the validation process)
where any((environment.type == benchmark.backend for environment in declared_environments))
needs to be True for agiven benchmark definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but how can we typecheck a module name, which only has a meaning after sourcing a bash function (e.g., the module load X
verb for X
), or that it's just a file (conda yaml, because snmk does generate the env itself)? Either we evaluate whether the YAML has the specification (so it's not empty), or whether it works (e.g., the module is loadable, the conda yml is well formed, the apptainer is present or downloadable etc).
Or we could go for validating files, e.g. looking for the module lua file, the conda env yml, or the apptainer local/remote image?
elif self.software_backend == SoftwareBackendEnum.docker: # type: ignore | ||
if not env.apptainer and not env.docker: # type: ignore | ||
errors.append( | ||
f"Docker backend requires apptainer configuration for environment '{env.id}'" # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this message is not very clear, is the image (local? oras?) unavailable? or is it the yaml stanza missing? what does it mean to discuss docker and apptainer separately?
(meaning, I'd make the traces longer/more verbose, because these user errors are likely to happen during the first ob
usage attempts so we'd better be extra informative there)
|
||
return created_version | ||
|
||
def update_benchmark_version_and_commit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@btraven00 I'm reading the versioning, do you have some design / thoughts document somewhere? It might be it has been shared and I can't find it... thanks!
c453048
to
4561d44
Compare
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
147d686
to
4967dca
Compare
boto3 is not expected to be installed by default, it's not a core dependency. it was split to be part of the s3 extra set a few releases ago. |
this is an important discussion. api_version is just a rename (for clarity) of yaml_spec_version. I personally suspect we don't really need this since now we're shipping the model version together with the omnibenchmark version. I think we can drop it for now and do as you say, infer compatibility. It boils down to documenting breaking changes and trying to keep field compatibility with deprecation warnings for a few release cycles (as I'm doing with, e.g., api_version or the storage section). Post 1.0 I feel we should enforce MAYOR API versions, like for instance as docker version works., but I agree that this is gonna be more of a nuisance until ob gets real adoption. We can handle that with proper docs. OTOH, it's an optional field, so it does not really do harm. @imallona how do you feel about this? |
But then the |
@btraven00 re: the We could document in the ob changelog which ob versions are incompatible with which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been checking this PR and I've realized I cannot really compare to the old one to try to figure out possible bugs/design problems given that logics, implementation and tests are all coupled. So I cannot easily runold tests in this version, nor new tests in the old version, to compare. It's a massive amount of lines, I've only gone through ca. 3000.
I understand and respect the reasons of a major rewrite in the context of starting a new thesis, though.
To be able to approve this PR I think it would work best to have an external integration (well, benchmarking) test suite I can run side-by-side with this and 0.3.x.
This offers advantages over switching to claude code for reviewing. Mainly, this suite could be used in the future for sanity checks. (I'm aware many of the changes lately and on this PR have been about making tests faster, but I've encountered uninstalable/nonfunctional releases with green tests, so.) I don't really mind it takes three hours to run, nor I envision it being triggered on push.
That test suite could, for instance, use multiple clustbench
variants (in topology and execution modalities), including (but not restricted to): either micromamba or apptainer, either miniforge or micromamba, single core vs multicore, keep logs vs not, keep going vs not, mis-specifying inputs/outputs paths in many ways, adding a failing step (replacing a module's remote to a dummy module that does not conform to the output generation specs), adding some excludes, removing the metric collector, adding multiple metric collectors, testing on hpc vs not etc.
hmm I'll check, but I'd say that's a problem with clustering_example. is it really using s3? why does it need boto3 if the answer is no? |
Overview
This PR contains architectural refactor that removes external dependencies (
omni-schema
,networkx
) and updates the omnibenchmark codebase with internal implementations and improved tooling.Goals
omni-schema
andnetworkx
dependencies (Closes: consider alternatives to linkml-runtime dependency #63, #..)Major Changes
Core Architecture
omnibenchmark.model
module with pydantic validationomnibenchmark.dag
replaces networkx dependencyomnibenchmark.versioning
for central version handlingBuild System & Tooling
💥 Breaking Changes
YAML Benchmark Parsing: Minimal changes to benchmark definition parsing due to internal validation system. Existing benchmark files should work with minimal or no modifications.
Architectural Justification
Problematic I/O Patterns: The previous architecture had I/O modules directly manipulating YAML files for versioning purposes, creating:
The new architecture addresses these issues with:
Testing
Commit Breakdown
This PR is organized into logical, reviewable commits:
🔍 Review Strategy
Suggested Review Order:
omnibenchmark/model/
andomnibenchmark/dag/
as core changes🚀 Migration Guide
For users upgrading:
✅ Checklist
🤝 Reviewers
Please pay special attention to:
omnibenchmark/model/validation.py
docs/templates/benchmark_template.yaml
omnibenchmark/dag/simple_dag.py
omnibenchmark/benchmark.py
. Further split is needed here, but I tried to keep the changes "minimal" (yes, really.). TheBenchmarkExecution
model keeps the same interface but is now split into "pure" and "impure" parts (abstract model vs. computational environment).Note: This refactor maintains functional compatibility while significantly improving code maintainability, type safety, and reducing external dependencies. The decomposed commit structure should make reviewing manageable despite the large scope.