refactor: remove external dependencies and architectural revamp #170

btraven00 · 2025-08-18T13:04:22Z

Overview

This PR contains architectural refactor that removes external dependencies (omni-schema, networkx) and updates the omnibenchmark codebase with internal implementations and improved tooling.

Goals

Dependency Independence: Removed omni-schema and networkx dependencies (Closes: consider alternatives to linkml-runtime dependency #63, #..)
Internal Validation: Implemented pydantic-based validation system. This takes into account the extensions to module and artifact validation that are coming next.
Custom Graph Library: Built lightweight DAG implementation; networkx was a heavy dependency.
CI simplification: Migrated to pixi buid system for tests: same command to run tests in linux and mac. Addresses speed and cache issues with tests.
Type Safety: Enhanced type checking with strict configuration (Closes: fix existing type annotations #70)

Major Changes

Core Architecture

New Model System: omnibenchmark.model module with pydantic validation
Custom DAG Library: omnibenchmark.dag replaces networkx dependency
Versioning Module: New omnibenchmark.versioning for central version handling
I/O Pattern Cleanup: Fixed problematic pattern where I/O modules were directly writing versioning data to YAML files, creating tight coupling and validation issues
Refactored Benchmark Internals: Modular graph building and visualization

Build System & Tooling

Pixi Migration: Modern Python dependency management
Strict Type Checking: Enhanced pyright configuration
CI/CD Updates: New pixi-based workflows
Development Environment: Improved testing and development setup

💥 Breaking Changes

YAML Benchmark Parsing: Minimal changes to benchmark definition parsing due to internal validation system. Existing benchmark files should work with minimal or no modifications.

Architectural Justification

Problematic I/O Patterns: The previous architecture had I/O modules directly manipulating YAML files for versioning purposes, creating:

Tight Coupling: I/O logic was tightly coupled with versioning and validation concerns
Data Integrity Issues: Direct YAML manipulation bypassed validation layers
Testing Complexity: Difficult to unit test due to file system dependencies
Maintenance Burden: Changes to schema required modifications across multiple I/O modules

The new architecture addresses these issues with:

Separation of Concerns: Dedicated versioning module handles all version-related operations
Centralized Validation: All YAML operations go through the pydantic validation layer
Improved Testability (I hope): Clean interfaces allow for better unit testing and mocking

Testing

Test Refactor: Updated all test suites for new architecture
New Test Infrastructure: Added factories, fixtures, and improved test data
Type Safety: All new code passes strict type checking
Backwards Compatibility: Core functionality preserved

Commit Breakdown

This PR is organized into logical, reviewable commits:

Core Model Infrastructure - Pydantic validation system
Custom DAG Library - Replace networkx with internal implementation
I/O Module Updates - Compatibility with new architecture
Versioning Module - Git integration and version management
CLI & Workflow Updates - User-facing API updates
Build System Migration - Pixi and development tooling
Tests & Documentation - Comprehensive test coverage and docs

🔍 Review Strategy

Suggested Review Order:

Review commit-by-commit for logical progression
Focus on omnibenchmark/model/ and omnibenchmark/dag/ as core changes
Verify test coverage for new functionality
Check CLI compatibility and user-facing changes

🚀 Migration Guide

For users upgrading:

YAML Files: Should work with minimal changes
CLI: All existing commands preserved
API: Internal APIs changed, public interfaces preserved where possible

✅ Checklist

🤝 Reviewers

Please pay special attention to:

Validation logic in omnibenchmark/model/validation.py
Self-documenting schema documentation in docs/templates/benchmark_template.yaml
DAG implementation in omnibenchmark/dag/simple_dag.py
Separation of concerns in omnibenchmark/benchmark.py. Further split is needed here, but I tried to keep the changes "minimal" (yes, really.). The BenchmarkExecution model keeps the same interface but is now split into "pure" and "impure" parts (abstract model vs. computational environment).

Note: This refactor maintains functional compatibility while significantly improving code maintainability, type safety, and reducing external dependencies. The decomposed commit structure should make reviewing manageable despite the large scope.

- Add new omnibenchmark.model module with pydantic-based validation - Remove external omni-schema dependency - Remove old validation infrastructure - Update project dependencies to remove omni-schema BREAKING CHANGE: YAML benchmark parsing now uses internal validation

- Add omnibenchmark.dag module with lightweight graph implementation - Refactor benchmark internals to use custom graph (_graph.py, _dag_builder.py) - Remove networkx dependency and old DAG implementation - Restructure benchmark node handling (_node.py, _paths.py) - Add visualization support (_dot.py, _mermaid.py)

- Update all I/O modules to work with new model system - Remove omnibenchmark.sync module - Update utilities module for compatibility

- Add comprehensive versioning module with git integration - Include version management, git utilities, and exception handling - Support for version validation and management workflows

- Update CLI commands to work with new model and DAG systems - Adapt Snakemake workflows for new benchmark structure - Update workflow scripts and formatters for compatibility

- Add pixi.toml and pixi.lock for modern Python dependency management - Add pixi CI pipeline, disable old pipeline - Add strict type checking configuration and scripts - Update development environment and testing configuration - Add tox.ini for standardized testing

- Add comprehensive test coverage for new model and versioning modules - Refactor existing tests to work with new architecture - Add test factories and fixtures for improved test maintainability - Update all test data files for new validation system - Add documentation templates and reference materials - Update README, CHANGELOG, and contributing guidelines

imallona · 2025-08-19T07:13:03Z

hmm, is boto3 not installed by default? running clustbench fails on this branch but runs on main.

pip install from run_omnibenchmark :

pip install git+https://github.com/omnibenchmark/omnibenchmark.git@"$OB_BRANCH"

(being the $OB_BRANCH this branch)

DanInci

Really amazing work !! It eliminates a lot of technical debt that this progress has accumulated. Thanks a lot for it !

I have just a couple of questions, suggestions and discussions to start.

Regarding the explicit api_version field: I'm not a fan of adding this to benchmark YAML files, since it introduces unnecessary complexity and maintenance burden for users. Moreover it might led to confusion with the version field. I believe the schema structure itself should provide enough information to determine compatibility, and we can infer an "api_version" based on it.
Future work prioritization: Given that there are a lot of TODOs and this is just a start of a refactoring, can you provide a list of future work required to address them, based on the priority that you see?

.github/workflows/pipeline.yml.disabled

.github/workflows/pixi.yml

docs/src/howto.md

omni-environment.yml

DanInci · 2025-08-19T07:49:00Z

omnibenchmark/model/benchmark.py

+def _find_duplicates(items: List[str]) -> List[str]:
+    """Find duplicate items in a list."""
+    from collections import Counter
+
+    counts = Counter(items)
+    return [item for item, count in counts.items() if count > 1]
+
+
+def _is_url(string: str) -> bool:
+    """Check if the string is a valid URL using urlparse."""
+    from urllib.parse import urlparse
+
+    try:
+        result = urlparse(string)
+        return all([result.scheme, result.netloc])
+    except ValueError:
+        return False


_find_duplicates and _is_url are helper methods that I've seen declared in a couple of places already. Let's factor them out.

you're right, they're duplicated in model.validation and model.benchmark. TBD

DanInci · 2025-08-19T07:57:30Z

omnibenchmark/model/benchmark.py

+    def _validate_environment_path(
+        self, env: SoftwareEnvironment, benchmark_dir: Path
+    ) -> List[str]:
+        """Validate software environment path based on backend.
+
+        ARCHITECTURAL WARNING: This method performs system-specific validation
+        that violates the principle of keeping models abstract and declarative.
+
+        TODO: Move system-specific checks (file existence, envmodule availability)
+        to BenchmarkExecution or a separate validation layer. The model should
+        only validate data structure and logical consistency, not system state.
+        """
+        errors: List[str] = []


Why not move then the whole validation logic in the Validator superclass?

I suspect we want to leave just syntactic validation at this level (pydantic model for syntax), and have all the consistency checks (at the benchmark level) plus the module metadata and result validation under an unifying validators module.

omnibenchmark/cli/benchmark.py

DanInci · 2025-08-19T08:05:55Z

omnibenchmark/model/__init__.py

+"""Pydantic models for Omnibenchmark."""
+
+from omnibenchmark.model.benchmark import (
+    # Base classes
+    IdentifiableEntity,
+    DescribableEntity,
+    # Enums
+    APIVersion,
+    SoftwareBackendEnum,
+    RepositoryType,
+    StorageAPIEnum,
+    # Core models


I would move the model package under benchmark, since it anyways contains models related to the benchmark.

as we discussed, my plan is to move some interfaces, exceptions etc from benchmark to core, to avoid circular dependencies between validators, model and "benchmark" (in essence, "benchmark" should be very minimal after further refactoring between modules)

DanInci · 2025-08-19T08:15:04Z

omnibenchmark/dag/simple_dag.py

+
+class SimpleDAG:
+    """A simple directed acyclic graph implementation."""
+
+    def __init__(self) -> None:
+        """Initialize an empty DAG."""
+        self.nodes: Set[Any] = set()
+        self._edges: Dict[Any, Set[Any]] = defaultdict(set)
+        self.predecessors: Dict[Any, Set[Any]] = defaultdict(set)
+        self.node_attrs: Dict[Any, Dict[str, Any]] = defaultdict(dict)


Really nicely structured and tested !
This will cover all use cases that we have so far. Thanks

imallona · 2025-08-19T12:16:54Z

.github/workflows/pixi.yml

+  push:
+    branches:
+      - main
+      - dev


wouldn't this run twice when PR-ing from dev to main?

I wonder whether there is any push, other than to master, that needs to be checked without a PR / review first

pull_request: push: branches: - main

also, would a workflow_dispatch: make sense?

wouldn't this run twice when PR-ing from dev to main?
run twice: yes, I think this is pretty standard no? You want to test before (PR discussion) and after you merge (because some times, like in this same PR, squashing or merging introduces errors, mostly due to bad resolution of conflicts or bad rebasing etc.

Re. merging from dev to main, same: we're interested in catching divergence when tagging on main and doing hotfixes that might not be reconciled properly with dev.

tldr; I think it does no harm

.github/workflows/pixi.yml

imallona · 2025-08-19T12:19:56Z

CHANGELOG.md

- feat(cli)!: --local argument has been renamed to --local-storage
- feat: Support passing of extra arguments from CLI -> Snakemake for `run` commands
- feat: add extra profiler to the snakemake execution (#151)
+- refactor!: remove dependency on omni-schema and networkx


isn't that much more going on, e.g. the test framework simplification?

CITATION.cff

docs/src/howto.md

docs/src/tutorial.md

imallona · 2025-08-19T12:27:27Z

docs/templategen.py

+        "benchmarker": "Your Name",
+        "author": "Your Name",


difference between benchmarker and author?

imallona · 2025-08-19T12:27:57Z

docs/templategen.py

+        "version": "1.0.0",
+        "url": "https://example.com",
+        "email": "[email protected]",
+        "path": "/path/to/file",


path to what?

imallona · 2025-08-19T12:28:38Z

docs/templategen.py

+        "container": "example/container:latest",
+        "image": "example/image:latest",


difference between these two? Same kind of comments apply to the rest of the example dict

imallona · 2025-08-19T12:31:30Z

docs/templategen.py

+                elif prop_type == "integer":
+                    example_value = 1
+                elif prop_type == "number":
+                    example_value = 1.0
+                elif prop_type == "boolean":
+                    example_value = True


Most things (arguments, parameters) react well to a 1 value, would it make more sense to set the defaults as 0s, 0.0s, Falses instead, so we reduce chances they are meaningful?

imallona · 2025-08-19T12:33:04Z

docs/templates/benchmark_template.yaml

+      # required (string), Commit hash
+      commit: c0ffee4


unrelated to this update but related conceptually. Does this ob digest alternatives such as:

full commit lengths (Mark attempted to use full commits)

branch names

tags?

hmm afaik we enforce commits and not refs like branch/tag, I understood that was intentional (by the omni-schema spec). Re. long-name, I think we should be permissive yes.

thanks. I think it would be good to have an (undocumented) branch name spec for fast prototyping. But defeats the whole repro purpose, so we should ask the team again

hmm maybe it's something that we can allow w/ validation rules --strict

imallona · 2025-08-19T12:34:27Z

docs/templates/benchmark_template.yaml

+    apptainer: example_apptainer
+
+    # optional (string), Docker image
+    docker: example_docker


Is this meant so apptainer uses a docker image directly?

that, and accounting for our new plugin that does allow docker execution via podman/udocker

hmm, then we perhaps should get rid of ORAS (protocol) constraints if they're still in place. If using standard snmk plugins we can get images from anywhere, e.g. dockerhub and so on

imallona · 2025-08-19T12:35:02Z

docs/templates/benchmark_template.yaml

+    # optional (string), Environment module name
+    envmodule: example_envmodule
+
+    # optional (string), EasyBuild config path


full path or basename? meant to make sense of the robotspath and the easybuild config

imallona · 2025-08-19T12:35:55Z

docs/templates/benchmark_template.yaml

+        # required (string), Unique identifier
+        id: example_id
+
+        # required (string), File path


are they paths or basenames?

omni-environment.yml

imallona · 2025-08-19T12:42:06Z

omnibenchmark/benchmark/benchmark.py


-class Benchmark:
    def __init__(self, benchmark_yaml: Path, out_dir: Path = Path("out")):
+        # base path is always the location of the benchmark YAML file


this is a superimportant point, because it has to do with the .snakemake folder and so, beyond the output dir. Shall we document that? I'm thinking of some users wanting to have their YAML git tracked somewhere in their home, but the ob run on a scratch or similar. They might want to copy (symlink?) the YAML to the scratch and run ob there, right?

imallona · 2025-08-19T12:54:34Z

omnibenchmark/dag/simple_dag.py

+        """
+        # Count in-degrees
+        in_degree: Dict[Any, int] = defaultdict(int)
+        for node in self.nodes:


hmm, self.nodes is a Set, so this iteration is not deterministic. Which the point of this function, I know it feels I'm making no sense, and perhaps it's nonsense, but. You're trying to sort the nodes. But in order to sort the nodes, wouldn't it be safer to do so after pre-sorting (not topologically) them with sorted(self.nodes) or the similar? To avoid future poltergeists with complex DAGs with plenty of excludes, intermediate metric collectors etc, where randomly accessing the (node) set might be adding unwanted extra freedom.

the only purpose of this module is to drop networkx support, that was adding significantly to startup time.

You're trying to sort the nodes

this PR does not introduce any new behavior, it just mimicks the parts of the networkx API that we were already using.

that said, I think this discussion is linked to how we want to deal with the intermediate representation of the config file and the workflow-specific serialization of the blocks (snakemake or others).

in a sense, all the DAG parsing is redundant since snakemake will build its own. However, I would argue that we need simple DAG operations to have a canonical representation and validation of the tree(s) being loop free. Otherwise we're letting snakemake (or backend X) blow up errors directly to the user.

I don't think the snmk trick we're using (nesting results of B as a folder of its parent A) can give rise to loops. Perhaps this has to do with the mermaid/layout plotting only?

imallona · 2025-08-19T13:00:15Z

omnibenchmark/model/validation.py

+        for module in all_modules.values():  # type: ignore
+            if module.software_environment not in env_ids:  # type: ignore
+                errors.append(
+                    f"Software environment with id '{module.software_environment}' is not defined."  # type: ignore


what does "not defined" mean, "unavailable" perhaps? or "unloadable"? Worth reporting the modulepath then? So the user has more info/debugging opportunities

I think unavailable or not loadable speaks about the ability to load them (maybe because the path cannot be resolved).

in this case, they're missing from the definition (or declaration, in the C sense). would that work better?

Hmm, something like this?

Error Missing software environment. Module X is not defined within the benchmarking YAML: it should be listed as part of the stanza software_environments within the benchmarking YAML header.

Generally, stanzas for envmodules are shaped as follows (e.g., for a module loadable with module load bwa/0.7.17)

software_environments: bwa: description: "a bwa module" envmodule: "bwa/0.7.17"

Or something like that?

Unless the error had to do with being unloadable. If so, the error might better report the trace of the failed module purge; module load command instead, as well as the modulepath?

imallona · 2025-08-19T13:02:47Z

omnibenchmark/model/validation.py

+            if self.software_backend == SoftwareBackendEnum.conda:  # type: ignore
+                if not env.conda:  # type: ignore
+                    errors.append(
+                        f"Conda backend requires conda configuration for environment '{env.id}'"  # type: ignore


I think this error is not very clear, is this capturing the "conda.yml" is not available at the given path? if so the user would benefit from "Failure to find a conda yaml at ./path/to/file as required to run module/metric collector X using conda"

no, it's just doing consistency check.

since we're here: I think there's a possible easy simplification of this whole logic instead of so many if.. else.
instead of managing conda, apptainer etc as different fields in the environment, I think what we need is type-checking a single entity, environment.

available_environments = List[Environment]

(possibly qualified by checking paths for existence, but that can come up later in the validation process)

where any((environment.type == benchmark.backend for environment in declared_environments)) needs to be True for agiven benchmark definition.

but how can we typecheck a module name, which only has a meaning after sourcing a bash function (e.g., the module load X verb for X), or that it's just a file (conda yaml, because snmk does generate the env itself)? Either we evaluate whether the YAML has the specification (so it's not empty), or whether it works (e.g., the module is loadable, the conda yml is well formed, the apptainer is present or downloadable etc).

Or we could go for validating files, e.g. looking for the module lua file, the conda env yml, or the apptainer local/remote image?

imallona · 2025-08-19T13:03:41Z

omnibenchmark/model/validation.py

+            elif self.software_backend == SoftwareBackendEnum.docker:  # type: ignore
+                if not env.apptainer and not env.docker:  # type: ignore
+                    errors.append(
+                        f"Docker backend requires apptainer configuration for environment '{env.id}'"  # type: ignore


Again this message is not very clear, is the image (local? oras?) unavailable? or is it the yaml stanza missing? what does it mean to discuss docker and apptainer separately?

(meaning, I'd make the traces longer/more verbose, because these user errors are likely to happen during the first ob usage attempts so we'd better be extra informative there)

imallona · 2025-08-19T13:11:11Z

omnibenchmark/versioning/git.py

+
+        return created_version
+
+    def update_benchmark_version_and_commit(


@btraven00 I'm reading the versioning, do you have some design / thoughts document somewhere? It might be it has been shared and I can't find it... thanks!

CITATION.cff

codecov · 2025-08-19T13:33:16Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

btraven00 · 2025-08-20T10:23:03Z

hmm, is boto3 not installed by default? running clustbench fails on this branch but runs on main.

boto3 is not expected to be installed by default, it's not a core dependency. it was split to be part of the s3 extra set a few releases ago.

btraven00 · 2025-08-20T10:38:06Z

Regarding the explicit api_version field: I'm not a fan of adding this to benchmark YAML files, since it introduces unnecessary complexity and maintenance burden for users. Moreover it might led to confusion with the version field. I believe the schema structure itself should provide enough information to determine compatibility, and we can infer an "api_version" based on it.

this is an important discussion. api_version is just a rename (for clarity) of yaml_spec_version. I personally suspect we don't really need this since now we're shipping the model version together with the omnibenchmark version.

I think we can drop it for now and do as you say, infer compatibility. It boils down to documenting breaking changes and trying to keep field compatibility with deprecation warnings for a few release cycles (as I'm doing with, e.g., api_version or the storage section).

Post 1.0 I feel we should enforce MAYOR API versions, like for instance as docker version works., but I agree that this is gonna be more of a nuisance until ob gets real adoption. We can handle that with proper docs.

OTOH, it's an optional field, so it does not really do harm.

@imallona how do you feel about this?

imallona · 2025-08-20T15:01:41Z

hmm, is boto3 not installed by default? running clustbench fails on this branch but runs on main.

boto3 is not expected to be installed by default, it's not a core dependency. it was split to be part of the s3 extra set a few releases ago.

But then the clustering_example doesn't run anymore on plain pip install omnibenchmark trace, I don't think this is intended. It works using current main or dev.

imallona · 2025-08-21T08:54:13Z

@btraven00 re: the api_version I'm with Daniel here, benchmark_yaml_spec feels like an easy-to-follow yaml entry to me, because it versions what the user is interacting with (the YAML, not the API at least directly).

We could document in the ob changelog which ob versions are incompatible with which benchmark_yaml_specs versions.

imallona

I've been checking this PR and I've realized I cannot really compare to the old one to try to figure out possible bugs/design problems given that logics, implementation and tests are all coupled. So I cannot easily runold tests in this version, nor new tests in the old version, to compare. It's a massive amount of lines, I've only gone through ca. 3000.

I understand and respect the reasons of a major rewrite in the context of starting a new thesis, though.

To be able to approve this PR I think it would work best to have an external integration (well, benchmarking) test suite I can run side-by-side with this and 0.3.x.

This offers advantages over switching to claude code for reviewing. Mainly, this suite could be used in the future for sanity checks. (I'm aware many of the changes lately and on this PR have been about making tests faster, but I've encountered uninstalable/nonfunctional releases with green tests, so.) I don't really mind it takes three hours to run, nor I envision it being triggered on push.

That test suite could, for instance, use multiple clustbench variants (in topology and execution modalities), including (but not restricted to): either micromamba or apptainer, either miniforge or micromamba, single core vs multicore, keep logs vs not, keep going vs not, mis-specifying inputs/outputs paths in many ways, adding a failing step (replacing a module's remote to a dummy module that does not conform to the output generation specs), adding some excludes, removing the metric collector, adding multiple metric collectors, testing on hpc vs not etc.

btraven00 · 2025-09-19T10:47:03Z

But then the clustering_example doesn't run anymore on plain pip install omnibenchmark trace, I don't think this is intended. It works using current main or dev.

hmm I'll check, but I'd say that's a problem with clustering_example. is it really using s3? why does it need boto3 if the answer is no?

btraven00 added 7 commits August 18, 2025 14:49

refactor: update I/O module for new architecture

5bc2cd9

- Update all I/O modules to work with new model system - Remove omnibenchmark.sync module - Update utilities module for compatibility

feat: add versioning module

1109c75

- Add comprehensive versioning module with git integration - Include version management, git utilities, and exception handling - Support for version validation and management workflows

refactor: update CLI and workflow for new architecture

d892d08

- Update CLI commands to work with new model and DAG systems - Adapt Snakemake workflows for new benchmark structure - Update workflow scripts and formatters for compatibility

btraven00 changed the title ~~refactor: remove external dependencies and architectural Revamp~~ refactor: remove external dependencies and architectural revamp Aug 18, 2025

btraven00 requested review from DanInci and imallona August 18, 2025 13:06

btraven00 force-pushed the refactor/drop-schema-v2 branch from 7e8a6b7 to c453048 Compare August 18, 2025 22:54

imallona added a commit to omnibenchmark/clustering_example that referenced this pull request Aug 19, 2025

Try omnibenchmark/omnibenchmark#170

cd828de

imallona added a commit to omnibenchmark/clustering_example that referenced this pull request Aug 19, 2025

Try omnibenchmark/omnibenchmark#170

1e61b6a

DanInci requested changes Aug 19, 2025

View reviewed changes