Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bdvllrs
Copy link
Contributor

@bdvllrs bdvllrs commented Jun 27, 2025

Description

Introduces input-level pre-processors and output-level post-processors.

What problem does this change solve?

Currently, the pre-processors are created by the runner and they are used indifferently on all input sources.

This is problematic in the case of a cutout where we might want different pre-processors for the different sources (e.g. use remove_nans for the local source and nothing for the global source).

Similarly for post-processors when using the tee output.

What issue or task does this change relate to?

Fixes #251.

Additional notes

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/


📚 Documentation preview 📚: https://anemoi-inference--260.org.readthedocs.build/en/260/

@github-actions github-actions bot added documentation Improvements or additions to documentation config enhancement New feature or request and removed documentation Improvements or additions to documentation config labels Jun 27, 2025
Copy link
Member

@HCookie HCookie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It matches the overall OOD.
However, I think I disagree with applying the top level pre_processors in each input. Consider the cutout input, it makes more sense to apply the top level processors to the combined input state rather than on each sub-input. Potentially then the output should follow the same steps, apply the top level and then delegate to the sub outputs

@HCookie
Copy link
Member

HCookie commented Jul 1, 2025

Additionally, due to a PR just merged which hit the outputs, there a number of merge conflicts, let me know if you need a hand resolving it.

Comment on lines 99 to 119
# pre_processors = self.pre_processors
post_processors = self.post_processors

input_state = input.create_input_state(date=self.config.date)

# This hook is needed for the coupled runner
self.input_state_hook(input_state)

state = Output.reduce(input_state)
for processor in post_processors:
state = processor.process(state)

output.open(state)
output.write_initial_state(state)

for state in self.run(input_state=input_state, lead_time=lead_time):
for processor in post_processors:
LOG.info("Post processor: %s", processor)
state = processor.process(state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with Harrison: to keep the top-level processors being applied here, and the inner level processors in the input/output. It does make the code more fragmented, but this way the top-level processor is guaranteed to be applied to the model state and they are fully independent.

To me it makes sense that top level processors are tied to the runner, inner level processors are tied to the input/output object.

Copy link
Contributor Author

@bdvllrs bdvllrs Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that in practice this makes more sense, however input state are applied on ekd.FieldList and not State so they can't be easily applied from outside the inputs. Indeed, state["fields"] is not a FieldList but a dict of numpy arrays.

Note that pre-processors were already applied in the ekd input (and only there) before (I assume because of this).

There is however no problem to do the post-processors in execute as they already expect a State.

@bdvllrs bdvllrs force-pushed the feat/per-input-processors branch from 153a795 to 3c89ee9 Compare July 7, 2025 09:13
@github-actions github-actions bot added documentation Improvements or additions to documentation config labels Jul 7, 2025
@bdvllrs
Copy link
Contributor Author

bdvllrs commented Jul 7, 2025

As explained as a response to @gmertes comment (#260 (comment)), this reasoning would be ideal. However because of the current structure, this makes it complicated for pre-processors to be applied in the runner. Maybe we can work on a refactoring of pre-processors in a future PR?

@bdvllrs bdvllrs requested review from HCookie and gmertes July 7, 2025 10:03
Copy link
Member

@gmertes gmertes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, sorry, I forgot that that is indeed the reason why pre processors are applied in the input. I agree to keep it like this, we could consider refactoring it in another PR but it's also more practical to do pre-processing on the FIeldList instead of the State/numpy.

@bdvllrs bdvllrs force-pushed the feat/per-input-processors branch from d4319e8 to e9a7613 Compare July 7, 2025 14:52
@bdvllrs bdvllrs force-pushed the feat/per-input-processors branch from e9a7613 to 5e97e5e Compare July 7, 2025 14:55
@bdvllrs bdvllrs requested a review from gmertes July 7, 2025 15:03
Copy link
Member

@gmertes gmertes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, many thanks!

@gmertes gmertes merged commit 59664cb into main Jul 9, 2025
72 checks passed
@gmertes gmertes deleted the feat/per-input-processors branch July 9, 2025 09:17
gmertes pushed a commit that referenced this pull request Aug 4, 2025
🤖 Automated Release PR

This PR was created by `release-please` to prepare the next release.
Once merged:

1. A new version tag will be created
2. A GitHub release will be published
3. The changelog will be updated

Changes to be included in the next release:
---


##
[0.7.0](0.6.3...0.7.0)
(2025-08-04)
This release brings a change to the default accumulation behaviour.
Prior to this release, accumulated fields were accumulated from the
beginning of the forecast. Now, the default is to write accumulated
fields unchanged as output by the model.

Users that wish to keep the old behaviour and accumulate fields from the
beginning of the forecast, need to add the
`accumulate_from_start_of_forecast` post-processor to the config, like
so:

```yaml
post_processors:
    - accumulate_from_start_of_forecast
```

### ⚠ BREAKING CHANGES

* Stop accumulating from start of forecast by default
([#265](#265))
* Drop python 3.9 support

### Features

* Add logging control to mars
([#268](#268))
([e95f184](e95f184))
* Add Zarr Output
([#275](#275))
([6c04b44](6c04b44))
* Allow for `Runner.run` to return torch
([#263](#263))
([77330f7](77330f7))
* Extend GribOutput class to write to FileLike Objects
([#269](#269))
([b9770e2](b9770e2))
* Inner-level processors
([#260](#260))
([59664cb](59664cb))
* Move anemoi-inference metadata command to anemoi-utils
([#257](#257))
([d735be5](d735be5))
* Option to pass extra kwargs to `predict_step`
([#283](#283))
([1d9eb02](1d9eb02))
* **outputs:** Extend tee to enable postprocessors
([#294](#294))
([2684293](2684293))
* **post-processors:** Add `assign_mask` post-processor
([#287](#287))
([0313909](0313909))
* **post-processors:** Extraction post-processors
([#285](#285))
([7205af1](7205af1))
* Remove python 3.9 from pyproject.toml
([#290](#290))
([0adbddd](0adbddd))
* Stop accumulating from start of forecast by default
([#265](#265))
([21826fb](21826fb))
* Temporal interpolation runner
([#227](#227))
([74048d9](74048d9))
* **waves:** Add ability to update `typed_variables` from config
([#202](#202))
([c02c45a](c02c45a))


### Bug Fixes

* Add area to template lookup dictionary
([#284](#284))
([0c5c812](0c5c812))
* Allow input preprocessors to patch data request
([#286](#286))
([833cb6f](833cb6f))
* Be less helpful
([#295](#295))
([a134f78](a134f78))
* Checkpoint patching
([#203](#203))
([77b90c0](77b90c0))
* **grib:** Ocean grib encoding
([#282](#282))
([b6afaac](b6afaac))
* **plot output:** Cast numpy values to float32
([#288](#288))
([3cc6915](3cc6915)),
closes [#276](#276)
* Provenance git dict reference issue
([#259](#259))
([2d70411](2d70411))
* Tensor not detached in debug mode
([#279](#279))
([d9efac5](d9efac5))
* Use data frequency in interpolator inference to be consistent with
training ([#266](#266))
([feac2a4](feac2a4))

---
> [!IMPORTANT]
> Please do not change the PR title, manifest file, or any other
automatically generated content in this PR unless you understand the
implications. Changes here can break the release process.
> ⚠️ Merging this PR will:
> - Create a new release
> - Trigger deployment pipelines
> - Update package versions

 **Before merging:**
 - Ensure all tests pass
 - Review the changelog carefully
 - Get required approvals

[Release-please
documentation](https://github.com/googleapis/release-please)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Different pre-processors for for different input sources in cutout

4 participants