Thanks to visit codestin.com
Credit goes to github.com

Skip to content

High clustering of hash function #20

@ENikS

Description

@ENikS

I've noticed that the dictionary has an unusually high rate of resizes even when the load factor is still in mid 50 - 55%.
Digging deeper, I found that the mixing algorithm used in the implementation is to blame:

        protected static int ReduceHashToIndex(int fullHash, int lenMask)
        {
            var h = (uint)fullHash;

            // xor-shift some upper bits down, in case if variations are mostly in high bits
            // and scatter the bits a little to break up clusters if hashes are periodic (like 42, 43, 44, ...)
            // long clusters can cause long reprobes. small clusters are ok though.
            h = h ^ h >> 15;
            h = h ^ h >> 8;
            h = h + (h >> 3) * 2654435769u;

            return (int)h & lenMask;
        }

I've attached two screenshots of GLSL shaders using the hash functions. The first screenshot is the original Wang/Jenkins hash method used by Dr. Cliff Click.

Jenkins

The second screenshot is the hash used by the C# implementation:
Sadov

As you can clearly see, there is a lot of clustering and, as a result, erroneous resizing even moderately empty tables.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions