-
Notifications
You must be signed in to change notification settings - Fork 78
Add support for dask and zarr arrays #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@pllim, no hurry. |
|
Not sure if I'm qualified to review this one as I have never used dask nor zarr. Maybe someone from SunPy like @nabobalis is a better person to review? |
|
@pllim, how about just a basic code review? |
|
@ejeschke , sure, I'll have a look after you resolve the conflict. |
29366f5 to
b0ce57e
Compare
|
@pllim, @nabobalis, thank you! Yes, pleased to have any code review. You can check out this gist to see the tests I did. |
b0ce57e to
8f6d51b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good by me. That is some tricky reshaping to be done for zarr.
|
I would be happy to review this, but wont be able to get to it until next week. If you could post a review request for me, it will end up on my list. :) |
|
@Cadair , can't add you as reviewer, but added you to assignee. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the three test modules could be merged into one to avoid code duplication via subclassing and changing a few things here and there at setup.
I wonder if a rebase will fix RTD build for this PR.
| arr = arr.reshape(shape) | ||
| return arr | ||
|
|
||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you guarantee that d_obj would definitely be Dask object at this point? Or should this be an elif instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. The current code sort of assumes that by process of elimination. Probably it should be an elif with an exception raised if it didn't match anything before. Problem is that I want to detect the cases without having to import zarr or dask, because this becomes the basic slicing function. So if a good duck-typing test could be done that might be a possibility...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I think the code for this is ok, we can refactor it later if a better test for dask arrays can be found.
|
Re: RTD failure -- we'll revisit if it persists after your next round of edits. |
feab24a to
ea31b70
Compare
|
@Cadair, do you have some examples of large images (too large for RAM) that you open up using dask arrays? |
|
@pllim, I believe I addressed all your points. Please have another look. |
Maybe for a future PR? I want to add some more tests to the new |
ginga/tests/test_dask.py
Outdated
|
|
||
| def _2ddata(self, shape, data_np=None): | ||
| if data_np is None: | ||
| data_np = np.asarray([min(i, j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be optimized, especially if you are using sizable shape values:
In [30]: shape = (1000, 500)
In [31]: %timeit np.min(np.indices(shape), axis=0)
2.52 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [32]: %timeit np.asarray([min(i, j) for i in range(shape[0]) for j in range(shape[1])]).reshape(shape)
92.7 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)This comment also applies to all similar occurrences throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ejeschke , do you not wish to address this one? Either way is fine by me, but I just want to make sure it was not overlooked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whups, ok...pushed another commit. Have a look.
ginga/tests/test_numpy.py
Outdated
|
|
||
| def _get_data(self, shape, data_np=None): | ||
| if data_np is None: | ||
| data_np = np.random.randint(0, 10000, shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also remove random here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch.
9007741 to
e52cef2
Compare
|
Rebased, still passing all tests with latest conda installs of |
|
I haven't been using these two packages, so as long as tests pass and you are happy with it, FFTM. |
2f7edaa to
c940b8b
Compare
c940b8b to
94a87b7
Compare
c93dbcf to
d891f13
Compare
This adds support for dask and zarr arrays into BaseImage-derived objects (e.g. AstroImage), e.g. >>> aimg = AstroImage() >>> aimg.load_data(dask_arr) These images can then be loaded directly into a Ginga viewer. Three new pytest files are added: one for numpy, dask and zarr Developer documentation has been updated.
Co-Authored-By: P. L. Lim <[email protected]>
This adds support for
daskandzarrarrays intoBaseImage-derived objects (e.g.AstroImage), e.g.These images can then be loaded directly into a Ginga viewer.
Three new pytest files are added: one for
numpy,daskandzarrDeveloper documentation has been updated.