Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Mikejmnez
Copy link
Collaborator

@Mikejmnez Mikejmnez commented Jul 1, 2025

The following Pull Request:

  • Closes add CRC32_checksum #516
  • Tests are added to demonstrate the new or fixed behavior, and all tests pass
    on a local environment.

@Mikejmnez Mikejmnez marked this pull request as ready for review July 2, 2025 01:53
@Mikejmnez Mikejmnez requested a review from jgallagher59701 July 2, 2025 02:02
@Mikejmnez
Copy link
Collaborator Author

Mikejmnez commented Jul 2, 2025

@ndp-opendap @jgallagher59701
One of the things I am struggling with is that I am seeing no difference whether I append dap4.checksums=true|false to the dap url. It seems I get the same response.

Example

Request a 1D array of int16 sequentual values. This is the variable Y in this dataset

import requests
session = requests.Session()

r_true = session.get("http://test.opendap.org/opendap/dap4/SimpleGroup.nc4.h5.dap?dap4.ce=/SimpleGroup/Y%5B0:1:39%5D&dap4.checksum=true")

r_false = session.get("http://test.opendap.org/opendap/dap4/SimpleGroup.nc4.h5.dap?dap4.ce=/SimpleGroup/Y%5B0:1:39%5D&dap4.checksum=false")

# test for differences between the two:
assert t_true == t_false
>>> True

So no difference in the two responses...

@ndp-opendap
Copy link
Collaborator

There are no differences because test.opendap.org is ignoring the dap4.checksum query string parameter and always returning checksums :)

@Mikejmnez
Copy link
Collaborator Author

There are no differences because test.opendap.org is ignoring the dap4.checksum query string parameter and always returning checksums :)

OK! I missed the part about ignoring the checksum query string parameter...

Copy link

@jhrg jhrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@Mikejmnez
Copy link
Collaborator Author

Mikejmnez commented Jul 2, 2025

@jgallagher59701 @ndp-opendap Last comment regarding checksums:

  • TDS optional checksum query parameter works and the responses do differ (between options true|false).
  • TDS checksum differs from that of Hyrax...
  • When I compute the checksum from the data I download from the TDS or Hyrax, I the same value as that computed by the Hyrax data server.

Example

I looked at a test file available on the test TDS. I downloaded it and put it also on the test.opendap.org server. These 2 datasets are identical.

Now, I will compare the data values in the array temperature, and its checksum.

import numpy as np
from pydap.client import open_url

hyx_url = "http://test.opendap.org/opendap/dap4/mydata1.nc"
tds_url = "https://thredds-test.unidata.ucar.edu/thredds/dap4/testdata/mydata1.nc"

### NOTE
# pydap internal's can now append  `&dap4.checksum=true` to the url when user defines checksum=True as argument below

# create a pydap array with the TDS url
pyds_TDS = open_url(tds_url, protocol='dap4', checksum=True) 

# pydap array with the Hyrax url
pyds_HYX = open_url(hyx_url, protocol='dap4', checksum=True)

print("TDS crc32 checksum: ", pyds_TDS['temperature'][:].attributes['_DAP4_Checksum_CRC32'])
>>> TDS crc32 checksum:  3486137609 # <----------------------- checksum computed by thredds data server

print("Hyrax crc32 checksum: ", pyds_HYX['temperature'][:].attributes['_DAP4_Checksum_CRC32'])
>>> Hyrax crc32 checksum:  3457240079 # <----------------------- checksum computed by hyrax data server

So checksums differ between the two Data servers...

I now look at the data itself and compare values and the checksum I get using python's zlib.crc32 library.

# check data is identical between the two dap responses:

temp_TDS = pyds_TDS['temperature'][:].data # downloads the entire data array from TDS
temp_HYX = pyds_HYX['temperature'][:].data # downloads the entire data array from Hyrax

np.testing.assert_equal(temp_TDS, temp_HYX) # If False, an AssertionError is returned. It True, nothing is returned
zlib.crc32(temp_TDS.reshape(1, np.prod(temp.shape))) == zlib.crc32(temp_HYX.reshape(1, np.prod(temp.shape)))
>>> True

So data is identical whether I download it from TDS or Hyrax. The checksum are the same when computed using the zlib library directly from the numpy arrays.

print("python zlib checksum: ", zlib.crc32(temp_TDS.reshape(1, np.prod(temp.shape))))
>>> python zlib checsksum:  3457240079 # <----------------------- checksum computed by Python's zlib.crc32 library

@Mikejmnez Mikejmnez merged commit 5e5cb1a into main Jul 2, 2025
9 checks passed
@Mikejmnez Mikejmnez deleted the checksums branch July 2, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add CRC32_checksum

3 participants