Multi-location inference per orchestrator pod#473
Conversation
…tion format Introduces the new config parsing layer that expects: - orchestrator_config: shared settings (model, stream type, azure, etc.) - location_config: list of per-location entries with hls_hydrophone_id and optional model_config_overrides load_model() no longer applies overrides at load time — per-location overrides are resolved at inference time via resolve_location_model_config(). Part of #469. https://claude.ai/code/session_01Dio4ds5gm21JjhpjiWiW8y
… support - run_live_hls(): processes multiple locations sequentially within each time-aligned iteration. Pre-builds per-location HLS clients and model configs. Each location is wrapped in try/except for error isolation. - run_date_range(): extracts existing DateRangeHLS logic unchanged (single-location only). - _process_segment() now takes explicit hls_hydrophone_id and log_prefix params for per-location observability (e.g. "[rpi_orcasound_lab] ..."). - __main__ updated to use parse_config() and dispatch to the appropriate run function based on hls_stream_type. Part of #469. https://claude.ai/code/session_01Dio4ds5gm21JjhpjiWiW8y
… format - All test orch_configs (DateRangeHLS, LiveHLS, Fail, Negative, Positive, LocalDebug) converted to nest shared settings under orchestrator_config and per-location settings under location_config list. - New multi-location LiveHLS test config: LiveHLS_MultiLocation.yml (orcasound_lab + bush_point). - New consolidated deploy configmap: inference-pod-0-configmap.yaml with all 8 locations in a single config (replaces per-location configmaps). Part of #469. https://claude.ai/code/session_01Dio4ds5gm21JjhpjiWiW8y
Adds test_livehls_multilocation_smoke() that runs the orchestrator with LiveHLS_MultiLocation.yml (2 locations: orcasound_lab + bush_point) and verifies both location prefixes appear in log output. Part of #469. https://claude.ai/code/session_01Dio4ds5gm21JjhpjiWiW8y
resolve_location_model_config() now applies two layers of overrides: 1. orchestrator_config.model_config_overrides (shared defaults) 2. location_config[].model_config_overrides (per-location, takes precedence) This simplifies configs — common values like max_batch_size and pred_global_threshold go in orchestrator_config, with only location-specific deviations in location_config (e.g. north_sjc threshold=0.95). Updated deploy configmap and test configs accordingly. Part of #469. https://claude.ai/code/session_01Dio4ds5gm21JjhpjiWiW8y
There was a problem hiding this comment.
Pull request overview
Refactors the Python InferenceSystem live orchestrator to process multiple hydrophone locations sequentially within a single pod/loop iteration, aiming to reduce idle time and per-location memory overhead (fixes #469).
Changes:
- Introduces a new YAML config schema: shared
orchestrator_config+location_configlist, with optional per-locationmodel_config_overrides. - Splits the orchestrator execution into
run_live_hls()(multi-location) andrun_date_range()(single-location), and adds per-location log prefixes. - Updates test orchestrator configs to the new schema and adds a multi-location LiveHLS smoke test; adds a new consolidated Kubernetes ConfigMap.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| InferenceSystem/src/LiveInferenceOrchestrator.py | Adds new config parsing + multi-location LiveHLS loop and per-location override resolution. |
| InferenceSystem/tests/test_orchestrator.py | Adds a multi-location LiveHLS smoke test asserting per-location log prefixes. |
| InferenceSystem/tests/orch_configs/Positive/DateRangeHLS_SunsetBay.yml | Migrates DateRange config to new orchestrator_config/location_config format. |
| InferenceSystem/tests/orch_configs/Positive/DateRangeHLS_PortTownsend.yml | Migrates DateRange config to new format. |
| InferenceSystem/tests/orch_configs/Positive/DateRangeHLS_OrcasoundLab.yml | Migrates DateRange config to new format. |
| InferenceSystem/tests/orch_configs/Positive/DateRangeHLS_NorthSJC.yml | Migrates DateRange config to new format. |
| InferenceSystem/tests/orch_configs/Positive/DateRangeHLS_MastCenter.yml | Migrates DateRange config to new format. |
| InferenceSystem/tests/orch_configs/Positive/DateRangeHLS_BushPoint.yml | Migrates DateRange config to new format. |
| InferenceSystem/tests/orch_configs/Positive/DateRangeHLS_AndrewsBay.yml | Migrates DateRange config to new format. |
| InferenceSystem/tests/orch_configs/Negative/DateRangeHLS_PointRobinson.yml | Migrates DateRange config to new format. |
| InferenceSystem/tests/orch_configs/LocalDebug/DateRangeHLS_SunsetBay_AzureUploadTest.yml | Migrates local debug DateRange config to new format. |
| InferenceSystem/tests/orch_configs/LiveHLS/LiveHLS_OrcasoundLab.yml | Migrates LiveHLS config to new format (single location). |
| InferenceSystem/tests/orch_configs/LiveHLS/LiveHLS_MultiLocation.yml | Adds a new multi-location LiveHLS test config with per-location override. |
| InferenceSystem/tests/orch_configs/Fail/DateRangeHLS_NoAudio2.yml | Migrates fail-case DateRange config to new format. |
| InferenceSystem/tests/orch_configs/Fail/DateRangeHLS_NoAudio.yml | Migrates fail-case DateRange config to new format. |
| InferenceSystem/tests/orch_configs/Fail/DateRangeHLS_IncompleteMinute.yml | Migrates fail-case DateRange config to new format. |
| InferenceSystem/deploy/inference-pod-0-configmap.yaml | Adds a consolidated ConfigMap containing multiple locations under the new schema. |
|
|
||
|
|
||
| def test_livehls_multilocation_smoke(): | ||
| """Smoke test: multi-location LiveHLS processes both locations per iteration.""" |
There was a problem hiding this comment.
Docstring says the multi-location smoke test processes “both locations”, but the test config asserts three location prefixes (orcasound_lab, bush_point, north_sjc). Update the docstring to match the actual expectation to avoid confusion when maintaining the test.
| """Smoke test: multi-location LiveHLS processes both locations per iteration.""" | |
| """Smoke test: multi-location LiveHLS processes all configured locations (3) per iteration.""" |
| orch_config = raw_config["orchestrator_config"] | ||
| locations = raw_config["location_config"] | ||
| for loc in locations: |
There was a problem hiding this comment.
parse_config() unconditionally indexes raw_config["orchestrator_config"] / ["location_config"], so any existing single-location configs (e.g., the current Kubernetes hydrophone-configs ConfigMaps under InferenceSystem/deploy/*-configmap.yaml) will raise a KeyError and crash the pod. Consider either (a) updating the existing deploy manifests/ConfigMaps to the new nested format, or (b) making parse_config backward-compatible (detect old flat format and wrap into a single-entry location_config), and in all cases raise a clear ValueError when required keys are missing.
| max_live_iterations=args.max_live_iterations, | ||
| ) | ||
|
|
||
| elif hls_stream_type == "DateRangeHLS": |
There was a problem hiding this comment.
In the DateRangeHLS path, the code silently picks locations[0] and ignores any additional entries in location_config. It would be safer to validate that exactly one location is provided for DateRangeHLS (and raise a helpful error if not), or explicitly iterate all locations if multi-location DateRangeHLS is intended.
| elif hls_stream_type == "DateRangeHLS": | |
| elif hls_stream_type == "DateRangeHLS": | |
| # DateRangeHLS currently supports exactly one location; validate configuration | |
| if len(locations) != 1: | |
| location_ids = [loc.get("location_id") for loc in locations] | |
| raise ValueError( | |
| f"DateRangeHLS requires exactly one location, but got " | |
| f"{len(locations)} (location_ids={location_ids}). " | |
| "Please update your orchestrator configuration." | |
| ) |
| def parse_config(raw_config): | ||
| """Parse YAML config into (orch_config, locations). | ||
|
|
||
| Expects the new multi-location format:: | ||
|
|
||
| orchestrator_config: | ||
| model_id: ... | ||
| model_config_path: ... | ||
| ... | ||
| location_config: | ||
| - location_id: "rpi_orcasound_lab" | ||
| hls_hydrophone_id: "rpi_orcasound_lab" | ||
| model_config_overrides: { ... } | ||
|
|
||
| Returns | ||
| ------- | ||
| orch_config : dict | ||
| Shared orchestrator settings (from ``orchestrator_config`` key). | ||
| locations : list[dict] | ||
| Per-location dicts. Each has at least ``location_id`` and | ||
| ``hls_hydrophone_id``, plus optional ``model_config_overrides``. | ||
| """ | ||
| orch_config = raw_config["orchestrator_config"] | ||
| locations = raw_config["location_config"] | ||
| for loc in locations: | ||
| if "hls_hydrophone_id" not in loc: | ||
| loc["hls_hydrophone_id"] = loc["location_id"] | ||
| return orch_config, locations |
There was a problem hiding this comment.
The new config parsing/override behavior (parse_config defaulting of hls_hydrophone_id + resolve_location_model_config precedence) isn’t directly unit-tested. Adding focused tests for: old-vs-new config format handling, hls_hydrophone_id defaulting, and override precedence/isolation would make this refactor safer and help catch future config regressions.
Summary
This PR fixes #469. Allows processing >4 locations per VM node without unnecessary memory overhead.
LiveInferenceOrchestrator.pyto support multiple hydrophone locations per container, reducing idle time from ~80% per loop iterationorchestrator_config+location_configlist with per-locationhls_hydrophone_idand optionalmodel_config_overridesdetect_srkw_from_file()run_loop()intorun_date_range()(single-location) andrun_live_hls()(multi-location with sequential processing per iteration)[location_id]prefixes in logs for observability withtry/excepterror isolation so one failing location doesn't block othersinference-pod-0-configmap.yamlwith all 8 locations in a single configTest plan
test_orchestrator.pytests pass (DateRangeHLS positive/negative/fail cases)test_livehls_smokepasses with single-location LiveHLS configtest_livehls_multilocation_smokepasses — verifies both[rpi_orcasound_lab]and[rpi_bush_point]prefixes appear in outputhls_hydrophone_iddefaulting)