-
Couldn't load subscription status.
- Fork 25
Generic Multiprocessing Functions for Raster Processing #669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Great! All good for me on the generic implementation, following our discussion in GlacioHack/xdem#704 😉. I'll just add some line-to-line comments on little things we can adjust to make the namings more inter-changeable with Dask, which will be especially useful once we start documenting things! |
tests/test_raster/test_distributing_computing/test_multiproc.py
Outdated
Show resolved
Hide resolved
tests/test_raster/test_distributing_computing/test_multiproc.py
Outdated
Show resolved
Hide resolved
|
@rhugonnet thanks for your feedback, I have modified the little things you have mentioned in your review 😃 |
e3aa433 to
ea16637
Compare
tests/test_raster/test_distributing_computing/test_multiproc.py
Outdated
Show resolved
Hide resolved
|
@rhugonnet I have made a few changes to adapt I have also created a |
|
I see... I think you won't be able to overload here, you might have to In Xarray, they use Potentially, we could also mirror that structure. |
|
And great idea for the I'm not sure if we need to have |
7b4c3c5 to
2a4fad9
Compare
2a4fad9 to
2e2a0a5
Compare
ca1af6a to
3083112
Compare
|
@rhugonnet @adehecq @adebardo I have changed the implementation of the multiprocessing as discussed on 28/03/2025 :
|
|
Perfect, thanks! For the documentation page: Other small remarks:
On this last point, we could think of defining a |
fcb9648 to
37dc50e
Compare
37dc50e to
f4c2f27
Compare
|
Thanks for the work to both of you :) |
Resolves #670.
Context
To enable efficient raster processing on large datasets without exceeding memory limits, this PR introduces generic multiprocessing functions within
geoutils. These functions allow users to apply any processing function to raster data using a tiling approach with overlap handling. The approach minimizes memory usage by processing tiles separately and writing results directly to disk.Features
This PR introduces the following key functions:
map_overlap_multiproc_save: This function divides the input raster into overlapping tiles, processes them in parallel usingmultiprocessing, and writes the processed results to an output raster file.
map_multiproc_collect: This function splits an input raster into overlapping tiles, processes them in parallel, and returns the results as a list. It is intended for cases wherefuncdoes not return aRaster, but instead returns arbitrary values (e.g., numerical statistics, feature extractions, etc.).apply_func_block: A helper function that loads a specific raster tile, applies the provided function, and removes padding to avoid edge effects.load_raster_tile: Loads a specific tile from a raster based on given bounding box coordinates.remove_tile_padding: Removes extra padding from tiles after processing to mitigate edge artifacts.MultiprocConfig: Configuration class for handling multiprocessing parameters.These functions are designed to be generic and reusable for various raster-processing tasks.
Tests
To ensure the correctness of these functions, tests have been implemented:
map_multiprocwith a simple function (Raster.copy()) to verify that tiles are processed correctly and raster is not loaded during processing.load_raster_tileload the right tile.Documentation
A documentation page has been written to explain how to use these generic multiprocessing functions, and they have been added to the API.
Example Usage
A basic example demonstrating the usage of
map_overlap_multiproc_savewith a simple function (Raster.copy()):