-
Notifications
You must be signed in to change notification settings - Fork 22
feat: inner-level processors #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. It matches the overall OOD.
However, I think I disagree with applying the top level pre_processors in each input. Consider the cutout input, it makes more sense to apply the top level processors to the combined input state rather than on each sub-input. Potentially then the output should follow the same steps, apply the top level and then delegate to the sub outputs
|
Additionally, due to a PR just merged which hit the outputs, there a number of merge conflicts, let me know if you need a hand resolving it. |
| # pre_processors = self.pre_processors | ||
| post_processors = self.post_processors | ||
|
|
||
| input_state = input.create_input_state(date=self.config.date) | ||
|
|
||
| # This hook is needed for the coupled runner | ||
| self.input_state_hook(input_state) | ||
|
|
||
| state = Output.reduce(input_state) | ||
| for processor in post_processors: | ||
| state = processor.process(state) | ||
|
|
||
| output.open(state) | ||
| output.write_initial_state(state) | ||
|
|
||
| for state in self.run(input_state=input_state, lead_time=lead_time): | ||
| for processor in post_processors: | ||
| LOG.info("Post processor: %s", processor) | ||
| state = processor.process(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with Harrison: to keep the top-level processors being applied here, and the inner level processors in the input/output. It does make the code more fragmented, but this way the top-level processor is guaranteed to be applied to the model state and they are fully independent.
To me it makes sense that top level processors are tied to the runner, inner level processors are tied to the input/output object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in practice this makes more sense, however input state are applied on ekd.FieldList and not State so they can't be easily applied from outside the inputs. Indeed, state["fields"] is not a FieldList but a dict of numpy arrays.
Note that pre-processors were already applied in the ekd input (and only there) before (I assume because of this).
There is however no problem to do the post-processors in execute as they already expect a State.
153a795 to
3c89ee9
Compare
|
As explained as a response to @gmertes comment (#260 (comment)), this reasoning would be ideal. However because of the current structure, this makes it complicated for pre-processors to be applied in the runner. Maybe we can work on a refactoring of pre-processors in a future PR? |
gmertes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, sorry, I forgot that that is indeed the reason why pre processors are applied in the input. I agree to keep it like this, we could consider refactoring it in another PR but it's also more practical to do pre-processing on the FIeldList instead of the State/numpy.
for more information, see https://pre-commit.ci
d4319e8 to
e9a7613
Compare
e9a7613 to
5e97e5e
Compare
gmertes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, many thanks!
🤖 Automated Release PR This PR was created by `release-please` to prepare the next release. Once merged: 1. A new version tag will be created 2. A GitHub release will be published 3. The changelog will be updated Changes to be included in the next release: --- ## [0.7.0](0.6.3...0.7.0) (2025-08-04) This release brings a change to the default accumulation behaviour. Prior to this release, accumulated fields were accumulated from the beginning of the forecast. Now, the default is to write accumulated fields unchanged as output by the model. Users that wish to keep the old behaviour and accumulate fields from the beginning of the forecast, need to add the `accumulate_from_start_of_forecast` post-processor to the config, like so: ```yaml post_processors: - accumulate_from_start_of_forecast ``` ### ⚠ BREAKING CHANGES * Stop accumulating from start of forecast by default ([#265](#265)) * Drop python 3.9 support ### Features * Add logging control to mars ([#268](#268)) ([e95f184](e95f184)) * Add Zarr Output ([#275](#275)) ([6c04b44](6c04b44)) * Allow for `Runner.run` to return torch ([#263](#263)) ([77330f7](77330f7)) * Extend GribOutput class to write to FileLike Objects ([#269](#269)) ([b9770e2](b9770e2)) * Inner-level processors ([#260](#260)) ([59664cb](59664cb)) * Move anemoi-inference metadata command to anemoi-utils ([#257](#257)) ([d735be5](d735be5)) * Option to pass extra kwargs to `predict_step` ([#283](#283)) ([1d9eb02](1d9eb02)) * **outputs:** Extend tee to enable postprocessors ([#294](#294)) ([2684293](2684293)) * **post-processors:** Add `assign_mask` post-processor ([#287](#287)) ([0313909](0313909)) * **post-processors:** Extraction post-processors ([#285](#285)) ([7205af1](7205af1)) * Remove python 3.9 from pyproject.toml ([#290](#290)) ([0adbddd](0adbddd)) * Stop accumulating from start of forecast by default ([#265](#265)) ([21826fb](21826fb)) * Temporal interpolation runner ([#227](#227)) ([74048d9](74048d9)) * **waves:** Add ability to update `typed_variables` from config ([#202](#202)) ([c02c45a](c02c45a)) ### Bug Fixes * Add area to template lookup dictionary ([#284](#284)) ([0c5c812](0c5c812)) * Allow input preprocessors to patch data request ([#286](#286)) ([833cb6f](833cb6f)) * Be less helpful ([#295](#295)) ([a134f78](a134f78)) * Checkpoint patching ([#203](#203)) ([77b90c0](77b90c0)) * **grib:** Ocean grib encoding ([#282](#282)) ([b6afaac](b6afaac)) * **plot output:** Cast numpy values to float32 ([#288](#288)) ([3cc6915](3cc6915)), closes [#276](#276) * Provenance git dict reference issue ([#259](#259)) ([2d70411](2d70411)) * Tensor not detached in debug mode ([#279](#279)) ([d9efac5](d9efac5)) * Use data frequency in interpolator inference to be consistent with training ([#266](#266)) ([feac2a4](feac2a4)) --- > [!IMPORTANT] > Please do not change the PR title, manifest file, or any other automatically generated content in this PR unless you understand the implications. Changes here can break the release process. >⚠️ Merging this PR will: > - Create a new release > - Trigger deployment pipelines > - Update package versions **Before merging:** - Ensure all tests pass - Review the changelog carefully - Get required approvals [Release-please documentation](https://github.com/googleapis/release-please)
Description
Introduces input-level pre-processors and output-level post-processors.
What problem does this change solve?
Currently, the pre-processors are created by the runner and they are used indifferently on all input sources.
This is problematic in the case of a cutout where we might want different pre-processors for the different sources (e.g. use remove_nans for the local source and nothing for the global source).
Similarly for post-processors when using the
teeoutput.What issue or task does this change relate to?
Fixes #251.
Additional notes
As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/
📚 Documentation preview 📚: https://anemoi-inference--260.org.readthedocs.build/en/260/