Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@rhugonnet
Copy link
Member

@rhugonnet rhugonnet commented Apr 14, 2025

This PR specifies the tolerance argument only for certain versions of Rasterio and above (to avoid undefined keyword arguments being passed to GDAL), and re-structures chunked operations to be more logically organized (and avoid circular import conflicts).

Tolerance argument

I realized we forgot to add this @vschaffn:

  if Version(rio.__version__) > Version("1.4.3"):
        reproj_kwargs.update({"tolerance": 0})

Organization of chunked operation

Additionally, this PR re-writes _reproject() to call on a single and consistent _rio_reproject() function either directly or through _reproject_block() within Multiprocessing or Dask (otherwise we had 2 different codes running reprojection).
This allows to have consistent behaviour (output dtype, nodata, number of threads and memory limit, etc).
And it will be useful to add Dask support in #446.

In terms of structure, this PR adds a _geotransformations.py module where all subfunctions of geotransformations that need to be re-used in other modules can be moved (e.g., for Dask/Multiprocessing chunked implementations of these functions).
It also moves out generic chunked functions from delayed_dask.py into a chunked.py module, where they are called from both multiproc.py (for Multiprocessing) and dask.py (renamed from delayed_dask.py, for Dask).

Original problem

The only problem was the code structure: Having _rio_reproject in raster/geotransformations causes a circular import problem, as it needs to be imported by _reproject_block in raster/distributed_computing/, which is contained in _reproject_multiproc (or _dask) there, that is finally imported in raster/geotransformations again.

So I've left _rio_reproject in raster/distributed_computing/ for now, but we might have to think about how this will affect our architecture, as there will be the exact same problem for every chunked operation we implement!

@rhugonnet rhugonnet requested a review from vschaffn April 14, 2025 11:37
Copy link
Contributor

@vschaffn vschaffn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for the version, maybe we should open a new issue for changing the version of rasterio in its next release, which would remove the line you added.
Then about _rio_reproject, I agree that it is better to have only one reprojection function, but I also agree that the architecture is not good, and I find it very confusing to have the main reprojection code in delayed_dask. I don't know whether it is better to do it this way or to keep two different reprojection functions for the time being, until we find an architecture that's better suited to current developments in multiprocessing.

@rhugonnet
Copy link
Member Author

rhugonnet commented Apr 15, 2025

@vschaffn Thanks for the review!

Your first point on the version check: This line will work* on past and future versions of Rasterio, so we don't need to change/remove it 🙂! (and we don't want to restrain our dependency versions just for a keyword argument)

Your second point: Agreed.
I have changed the structure the following way to fix this. It's not too invasive a change (like creating a new folder or something), we simply need to add a _module.py in the case where subfunctions are also used by other modules (like for chunked operations):

raster
    geotransformations (core functions importing from "_geotransformations", "dask" and "multiproc")
    _geotransformations (new module for subfunctions required across several modules, including "chunked")
    distributed_computing/
        dask (rename of "delayed_dask", moved out chunked functions common to "dask" and "multiproc" to "chunked")
        multiproc (untouched)
        chunked (new module for chunked implementations required by both "dask" and "multiproc")

@rhugonnet rhugonnet changed the title Use tolerance in reproject() only for Rasterio>1.4.3 and re-structure _rio_reproject Use tolerance in reproject() only for Rasterio>1.4.3 and re-structure chunked ops Apr 15, 2025
@vschaffn
Copy link
Contributor

vschaffn commented Apr 15, 2025

Adding a _geotransformation.py file is a great idea, and the new chunked.py too, this simplifies properly the internal dependencies, which were perhaps a little too complex in my version 😄
You just forgot to add the licence header in the files you created.
Otherwise, all right for me !

@rhugonnet rhugonnet merged commit f4956d1 into GlacioHack:main Apr 16, 2025
11 of 13 checks passed
@rhugonnet rhugonnet deleted the rio_tolerance_reproj branch April 16, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants