feat: inner-level processors #260

bdvllrs · 2025-06-27T13:54:13Z

Description

Introduces input-level pre-processors and output-level post-processors.

What problem does this change solve?

Currently, the pre-processors are created by the runner and they are used indifferently on all input sources.

This is problematic in the case of a cutout where we might want different pre-processors for the different sources (e.g. use remove_nans for the local source and nothing for the global source).

Similarly for post-processors when using the tee output.

What issue or task does this change relate to?

Fixes #251.

Additional notes

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

📚 Documentation preview 📚: https://anemoi-inference--260.org.readthedocs.build/en/260/

HCookie

Thanks for the PR. It matches the overall OOD.
However, I think I disagree with applying the top level pre_processors in each input. Consider the cutout input, it makes more sense to apply the top level processors to the combined input state rather than on each sub-input. Potentially then the output should follow the same steps, apply the top level and then delegate to the sub outputs

src/anemoi/inference/config/run.py

HCookie · 2025-07-01T12:40:36Z

Additionally, due to a PR just merged which hit the outputs, there a number of merge conflicts, let me know if you need a hand resolving it.

gmertes · 2025-07-01T13:43:00Z

src/anemoi/inference/runners/default.py

-        # pre_processors = self.pre_processors
-        post_processors = self.post_processors
-
        input_state = input.create_input_state(date=self.config.date)

        # This hook is needed for the coupled runner
        self.input_state_hook(input_state)

        state = Output.reduce(input_state)
-        for processor in post_processors:
-            state = processor.process(state)

        output.open(state)
        output.write_initial_state(state)

        for state in self.run(input_state=input_state, lead_time=lead_time):
-            for processor in post_processors:
-                LOG.info("Post processor: %s", processor)
-                state = processor.process(state)


I think I agree with Harrison: to keep the top-level processors being applied here, and the inner level processors in the input/output. It does make the code more fragmented, but this way the top-level processor is guaranteed to be applied to the model state and they are fully independent.

To me it makes sense that top level processors are tied to the runner, inner level processors are tied to the input/output object.

I agree that in practice this makes more sense, however input state are applied on ekd.FieldList and not State so they can't be easily applied from outside the inputs. Indeed, state["fields"] is not a FieldList but a dict of numpy arrays.

Note that pre-processors were already applied in the ekd input (and only there) before (I assume because of this).

There is however no problem to do the post-processors in execute as they already expect a State.

bdvllrs · 2025-07-07T10:03:31Z

As explained as a response to @gmertes comment (#260 (comment)), this reasoning would be ideal. However because of the current structure, this makes it complicated for pre-processors to be applied in the runner. Maybe we can work on a refactoring of pre-processors in a future PR?

gmertes

Ah yes, sorry, I forgot that that is indeed the reason why pre processors are applied in the input. I agree to keep it like this, we could consider refactoring it in another PR but it's also more practical to do pre-processing on the FIeldList instead of the State/numpy.

src/anemoi/inference/runners/default.py

src/anemoi/inference/input.py

src/anemoi/inference/runners/default.py

for more information, see https://pre-commit.ci

docs/inference/configs/top-level.rst

gmertes

Awesome work, many thanks!

🤖 Automated Release PR This PR was created by `release-please` to prepare the next release. Once merged: 1. A new version tag will be created 2. A GitHub release will be published 3. The changelog will be updated Changes to be included in the next release: --- ## [0.7.0](0.6.3...0.7.0) (2025-08-04) This release brings a change to the default accumulation behaviour. Prior to this release, accumulated fields were accumulated from the beginning of the forecast. Now, the default is to write accumulated fields unchanged as output by the model. Users that wish to keep the old behaviour and accumulate fields from the beginning of the forecast, need to add the `accumulate_from_start_of_forecast` post-processor to the config, like so: ```yaml post_processors: - accumulate_from_start_of_forecast ``` ### ⚠ BREAKING CHANGES * Stop accumulating from start of forecast by default ([#265](#265)) * Drop python 3.9 support ### Features * Add logging control to mars ([#268](#268)) ([e95f184](e95f184)) * Add Zarr Output ([#275](#275)) ([6c04b44](6c04b44)) * Allow for `Runner.run` to return torch ([#263](#263)) ([77330f7](77330f7)) * Extend GribOutput class to write to FileLike Objects ([#269](#269)) ([b9770e2](b9770e2)) * Inner-level processors ([#260](#260)) ([59664cb](59664cb)) * Move anemoi-inference metadata command to anemoi-utils ([#257](#257)) ([d735be5](d735be5)) * Option to pass extra kwargs to `predict_step` ([#283](#283)) ([1d9eb02](1d9eb02)) * **outputs:** Extend tee to enable postprocessors ([#294](#294)) ([2684293](2684293)) * **post-processors:** Add `assign_mask` post-processor ([#287](#287)) ([0313909](0313909)) * **post-processors:** Extraction post-processors ([#285](#285)) ([7205af1](7205af1)) * Remove python 3.9 from pyproject.toml ([#290](#290)) ([0adbddd](0adbddd)) * Stop accumulating from start of forecast by default ([#265](#265)) ([21826fb](21826fb)) * Temporal interpolation runner ([#227](#227)) ([74048d9](74048d9)) * **waves:** Add ability to update `typed_variables` from config ([#202](#202)) ([c02c45a](c02c45a)) ### Bug Fixes * Add area to template lookup dictionary ([#284](#284)) ([0c5c812](0c5c812)) * Allow input preprocessors to patch data request ([#286](#286)) ([833cb6f](833cb6f)) * Be less helpful ([#295](#295)) ([a134f78](a134f78)) * Checkpoint patching ([#203](#203)) ([77b90c0](77b90c0)) * **grib:** Ocean grib encoding ([#282](#282)) ([b6afaac](b6afaac)) * **plot output:** Cast numpy values to float32 ([#288](#288)) ([3cc6915](3cc6915)), closes [#276](#276) * Provenance git dict reference issue ([#259](#259)) ([2d70411](2d70411)) * Tensor not detached in debug mode ([#279](#279)) ([d9efac5](d9efac5)) * Use data frequency in interpolator inference to be consistent with training ([#266](#266)) ([feac2a4](feac2a4)) --- > [!IMPORTANT] > Please do not change the PR title, manifest file, or any other automatically generated content in this PR unless you understand the implications. Changes here can break the release process. > ⚠️ Merging this PR will: > - Create a new release > - Trigger deployment pipelines > - Update package versions **Before merging:** - Ensure all tests pass - Review the changelog carefully - Get required approvals [Release-please documentation](https://github.com/googleapis/release-please)

github-actions bot added documentation Improvements or additions to documentation config enhancement New feature or request and removed documentation Improvements or additions to documentation config labels Jun 27, 2025

HCookie requested changes Jul 1, 2025

View reviewed changes

src/anemoi/inference/config/run.py Outdated Show resolved Hide resolved

gmertes reviewed Jul 1, 2025

View reviewed changes

bdvllrs force-pushed the feat/per-input-processors branch from 153a795 to 3c89ee9 Compare July 7, 2025 09:13

github-actions bot added documentation Improvements or additions to documentation config labels Jul 7, 2025

bdvllrs requested review from HCookie and gmertes July 7, 2025 10:03

gmertes reviewed Jul 7, 2025

View reviewed changes

src/anemoi/inference/runners/default.py Show resolved Hide resolved

src/anemoi/inference/input.py Show resolved Hide resolved

src/anemoi/inference/runners/default.py Show resolved Hide resolved

bdvllrs and others added 7 commits July 7, 2025 16:51

feat: add per input pre-processors

090e5a4

add per output pre-processors

a2f0f4c

explicitely add pre_processor arg to all possible inputs

8bd4750

add docs for pre- and post-processors

66815b3

[pre-commit.ci] auto fixes from pre-commit.com hooks

758a62c

for more information, see https://pre-commit.ci

move ProcessorConfig to types.py

1df50ec

apply post-processors in runner.execute

9e13e19

bdvllrs force-pushed the feat/per-input-processors branch from d4319e8 to e9a7613 Compare July 7, 2025 14:52

bdvllrs added 2 commits July 7, 2025 16:55

add back post-processors to initial states

f022293

add comments

5e97e5e

bdvllrs force-pushed the feat/per-input-processors branch from e9a7613 to 5e97e5e Compare July 7, 2025 14:55

bdvllrs requested a review from gmertes July 7, 2025 15:03

gmertes reviewed Jul 8, 2025

View reviewed changes

docs/inference/configs/top-level.rst Outdated Show resolved Hide resolved

add missing label

e025e4e

gmertes approved these changes Jul 9, 2025

View reviewed changes

gmertes merged commit 59664cb into main Jul 9, 2025
72 checks passed

gmertes deleted the feat/per-input-processors branch July 9, 2025 09:17

DeployDuck mentioned this pull request Jul 9, 2025

chore(main): Release 0.7.0 #258

Merged

frazane mentioned this pull request Jul 18, 2025

Refactor extract_lam into a post-processor #281

Closed

bdvllrs mentioned this pull request Aug 1, 2025

feat(outputs): Extend tee to enable postprocessors #294

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: inner-level processors #260

feat: inner-level processors #260

Uh oh!

bdvllrs commented Jun 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

HCookie left a comment •

edited

Loading

Uh oh!

Uh oh!

HCookie commented Jul 1, 2025

Uh oh!

gmertes Jul 1, 2025

Uh oh!

bdvllrs Jul 7, 2025 •

edited

Loading

Uh oh!

bdvllrs commented Jul 7, 2025

Uh oh!

gmertes left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmertes left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: inner-level processors #260

feat: inner-level processors #260

Uh oh!

Conversation

bdvllrs commented Jun 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What problem does this change solve?

What issue or task does this change relate to?

Additional notes

Uh oh!

HCookie left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HCookie commented Jul 1, 2025

Uh oh!

gmertes Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

bdvllrs Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdvllrs commented Jul 7, 2025

Uh oh!

gmertes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmertes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bdvllrs commented Jun 27, 2025 •

edited by github-actions bot

Loading

HCookie left a comment •

edited

Loading

bdvllrs Jul 7, 2025 •

edited

Loading