Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Mikejmnez
Copy link
Collaborator

@Mikejmnez Mikejmnez commented Jul 18, 2025

…r discovery

The following Pull Request:

Example downloading MERRA-2 data

  • Uses the CMR to only access the granules from Jan/1/2023 --- Jan/31/2023
    (1 month, 31 granules, each granule has hourly data)

  • Using Hyrax to subset before download: all lons,-80 < lats < -45

# code not shown queries the CMR for urls, and use constraint expressions to only keep `["/T2M", "/U2M", "/V2M", "/SLP"] ` and their dimensions arrays

len(opendap_urls)
>>> 31

%%time
consolidate_metadata(opendap_urls, concat_dim='time', session=my_session)
>>> datacube has dimensions {'lon[0:1:575]', 'lat[0:1:360]'} , and concat dim: `time`
CPU times: user 530 ms, sys: 163 ms, total: 694 ms
Wall time: 16.5 s

%%time
ds = xr.open_mfdataset(opendap_urls, engine='pydap', session=my_session, combine='nested', concat_dim="time", chunks={'lat':90}, batch=True)
>>> CPU times: user 557 ms, sys: 75 ms, total: 632 ms
Wall time: 704 ms


# lazily compute the daily mean - does not trigger computation 
%%time
nds = ds.isel(lat=slice(20, 90))
nds = nds.resample(time="1D").mean()
>>> CPU times: user 158 ms, sys: 2.01 ms, total: 160 ms
Wall time: 161 ms

######### Store subset data in a local `Test.nc4` file. 
# Before storing, the code below triggers all computations that were "lazy". This is, downloads dap responses of all `T2M`, `U2M`, `V2M` and `SLP` (31 granules, 1 dap url downloads all data for that granule). And before writting into diwnk, xarray computes the time average (daily) per variable .

%%time
nds.to_netcdf("Test.nc4", mode='w')
>>> CPU times: user 3.03 s, sys: 2.26 s, total: 5.29 s
Wall time: 35.3 s


# check that the dap response of a granule is consistent:
my_session.cache.urls()[5].replace("%5B", "[").replace("%5D","]").replace("%3A",":").replace("%3B",";")
>>> 'https://opendap.earthdata.nasa.gov/collections/C1276812863-GES_DISC/granules/M2T1NXSLV.5.12.4:MERRA2_400.tavg1_2d_slv_Nx.20230103.nc4.dap?dap4.ce=SLP[0:1:23][0:1:89][0:1:575];T2M[0:1:23][0:1:89][0:1:575];U2M[0:1:23][0:1:89][0:1:575];V2M[0:1:23][0:1:89][0:1:575]&dap4.checksum=true'

The CE: dap4.ce=SLP[0:1:23][0:1:89][0:1:575];T2M[0:1:23][0:1:89][0:1:575];U2M[0:1:23][0:1:89][0:1:575];V2M[0:1:23][0:1:89][0:1:575]&dap4.checksum=true

shows that indeed Hyrax got the request to slice all arrays in the latitude dimension ([0:1:89]), and that the cached URL contains all variables in it (as opposed to, before, a single dap url per variable)

without batch=True

The last step takes 1min 30 second

@Mikejmnez Mikejmnez marked this pull request as ready for review August 4, 2025 06:52
Copy link
Member

@jgallagher59701 jgallagher59701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but wonder, should batch mode be false by default? It looks like it is set to False on line 89 of client.py. But, then on 124 it looks like it's True by default.

It might just be the diffs are hard to read, but I thought I'd mention it.

@Mikejmnez
Copy link
Collaborator Author

Oh yes! Thanks for pointing it out. Originally, I was setting it to be True by default, and but xarray cannot pass down that argument directly so I decided to make it False, while figuring out a different way to pass batch=True parameters instead.

@Mikejmnez Mikejmnez merged commit ce432f9 into main Aug 4, 2025
9 checks passed
@Mikejmnez Mikejmnez deleted the iss529 branch August 4, 2025 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make sure Consolidate Metadata integrates well with batching mode

3 participants