Thanks to visit codestin.com
Credit goes to github.com

Skip to content

calculate_log2_keysize in dictobject.c incorrect #133703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
colesbury opened this issue May 8, 2025 · 4 comments
Closed

calculate_log2_keysize in dictobject.c incorrect #133703

colesbury opened this issue May 8, 2025 · 4 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented May 8, 2025

Bug report

Bug description:

Originally reported by @ThomasBr0 in #132762

The _Py_bit_length() and _BitScanReverse64() code paths are incorrect and don't always return the smallest log2 key size to fit the desired number of items. The bug is benign in the sense that the dictionaries are sometimes too big, but they're never too small.

For example calculate_log2_keysize(7) = 4 but calculate_log2_keysize(8) = 3.

/* Find the smallest dk_size >= minsize. */
static inline uint8_t
calculate_log2_keysize(Py_ssize_t minsize)
{
#if SIZEOF_LONG == SIZEOF_SIZE_T
minsize = (minsize | PyDict_MINSIZE) - 1;
return _Py_bit_length(minsize | (PyDict_MINSIZE-1));
#elif defined(_MSC_VER)
// On 64bit Windows, sizeof(long) == 4.
minsize = (minsize | PyDict_MINSIZE) - 1;
unsigned long msb;
_BitScanReverse64(&msb, (uint64_t)minsize);
return (uint8_t)(msb + 1);
#else
uint8_t log2_size;
for (log2_size = PyDict_LOG_MINSIZE;
(((Py_ssize_t)1) << log2_size) < minsize;
log2_size++)
;
return log2_size;
#endif
}

cc @methane @markshannon

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@colesbury colesbury added the type-bug An unexpected behavior, bug, or error label May 8, 2025
@colesbury colesbury added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 8, 2025
@methane
Copy link
Member

methane commented May 9, 2025

Uh, it is my bug.

-     minsize = (minsize | PyDict_MINSIZE) - 1; 
+     minsize = (minsize - 1) | (PyDict_MINSIZE - 1);

@angela-tarantula
Copy link
Contributor

angela-tarantula commented May 9, 2025

Thanks @methane for catching this:

-     minsize = (minsize | PyDict_MINSIZE) - 1; 
+     minsize = (minsize - 1) | (PyDict_MINSIZE - 1);

With that change, you can also simplify the return statement:

-     return _Py_bit_length(minsize | (PyDict_MINSIZE-1));
+     return _Py_bit_length(minsize);

Suggestion: for clarity you might instead write

minsize = PY_MAX(minsize, PyDict_MINSIZE) - 1;
return _Py_bit_length(minsize);

This makes the lower bound explicit, reads more directly at a glance, and the performance difference (if any) should be negligible. The PY_MAX variant even “works” for minsize <= 0, although zero or negative sizes aren’t expected in practice.

methane added a commit to methane/cpython that referenced this issue May 10, 2025
@methane
Copy link
Member

methane commented May 10, 2025

Sorry, I was wrong. (minsize - 1) | (PyDict_MINSIZE - 1) is not correct when minsize=0.

Py_MAX() uses conditional move on both of arm64 and amd64.

@angela-tarantula
Copy link
Contributor

Well, you were correct to say I was wrong to assume the performance difference is necessarily negligible in this context. I’m going to look more closely into assembly-level costs. But yeah, Py_MAX() is needed for when minsize == 0.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 11, 2025
(cherry picked from commit 92337f666e8a076a68305a8d6dc8bc9c095000e9)

Co-authored-by: Inada Naoki <[email protected]>
methane added a commit that referenced this issue May 11, 2025
@methane methane closed this as completed May 11, 2025
methane added a commit that referenced this issue May 11, 2025
(cherry picked from commit 92337f6)
Co-authored-by: Inada Naoki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants