FEAT: Implementing hash support in QuadDtype#245
Conversation
| assert hash(QuadPrecision("-inf")) == hash(float("-inf")) | ||
| # Standard PyHASH_INF values | ||
| assert hash(QuadPrecision("inf")) == 314159 | ||
| assert hash(QuadPrecision("-inf")) == -314159 |
There was a problem hiding this comment.
TIL, this is a great easter egg.
There was a problem hiding this comment.
Haha that's indeed pretty awesome
There was a problem hiding this comment.
My one guess was since the language is π-thon right, hence π (3.14159) xD
ngoldbaum
left a comment
There was a problem hiding this comment.
LGTM mostly - I left a few comments below.
| // Check for NaN - use pointer hash (each NaN instance gets unique hash) | ||
| // This prevents hash table catastrophic pileups from NaN instances | ||
| if (Sleef_iunordq1(value, value)) { | ||
| return _Py_HashPointer((void *)self); |
There was a problem hiding this comment.
Py_HashPointer was added in newer versions of the C API. You should probably vendor pythoncapi-compat and use the definition exposed by that header. That way when, in the future, CPython removes this symbol, your code won't break.
In general you should avoid directly using private symbols in the C API - they can change or break without notice, even in minor version bumps.
| # define PYHASH_BITS 31 | ||
| #endif | ||
| #define PYHASH_MODULUS (((Py_uhash_t)1 << PYHASH_BITS) - 1) | ||
| #define PYHASH_INF 314159 |
There was a problem hiding this comment.
All these symbols are public in the CPython C API starting in Python 3.13. You can use the compat header like I explained in my other comment for older versions.
quaddtype/tests/test_quaddtype.py
Outdated
| """Test hash works for extreme values without errors.""" | ||
| quad_val = QuadPrecision(value) | ||
| h = hash(quad_val) | ||
| assert isinstance(h, int) |
There was a problem hiding this comment.
Is there any way you can check that these are the "correct" hash values you expect from the algorithm? Probably also best to use values that aren't exactly representable as doubles - I don't know if these values are. And for values that are representable exactly as doubles - check that the hash you get is identical to the hash of the python float with the same value.
There was a problem hiding this comment.
Got it, I'll edit this
There was a problem hiding this comment.
so one way can be since Python support arbitrary long integers so we can use intergers that are power of 2 and bigger than double's max range.
The comparison will be between corresponding quadprecision value and python int
There was a problem hiding this comment.
oh mpmath also supports, this will be straightforward then, so I'll add both above approach as well the mpmath comparison
1e500 is out of double range
In [9]: hash(QuadPrecision("1e500")) == hash(mp.mpf("1e500"))
Out[9]: True| x = (Py_uhash_t)-2; | ||
| } | ||
|
|
||
| return (Py_hash_t)x; |
There was a problem hiding this comment.
I'm not an expert on this, but this does appear to implement the same algorithm as _Py_HashDouble.
There was a problem hiding this comment.
yeah it does, its just the here mantissa has more bits to be consumed inside the loop and some SLEEF specific handlings.
I didn't saw any place where it needs to be specialized for 128-bit values atleast from the comments and articles available.
|
These changes will resolve all the reviews, I just wonder when to run this from probably within the meson file itself, so that later during includes those headers would get updated? |
I think you're only supposed to run that script once, manually, and commit it, and there's no need to automatically run it in the build. |
I see, in prev commit I manually edited the files to use the header, just ran the command and nothing changed so hopefully files are in good condition |
|
Great, merging this in! |
Took me a while to understand the CPython's implementation, hope I mapped it correctly here.
Implementation is referenced from https://github.com/python/cpython/blob/20b69aac0d19a5e5358362410d9710887762f0e7/Python/pyhash.c#L87
This is also present inside comment, writing here as well for explicit expectations
This implements the same algorithm as CPython's
_Py_HashDouble, adapted for quad precision floating point. The algorithm computes a hash based on the reduction of the value modulo the primeP = 2**PYHASH_BITS - 1Key invariant: hash(x) == hash(y) whenever x and y are numerically equal, even if x and y have different types. This ensures that:
hash(QuadPrecision(1.0)) == hash(1.0) == hash(1)The algorithm:
infreturnsPYHASH_INF,nanuses pointer hashfrexp(v) = m * 2^e2^PYHASH_BITS ≡ 1 mod P)closes numpy/numpy-quaddtype#9