Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit e09d7a1

Browse files
committed
Improve speed of hash index build.
In the initial data sort, if the bucket numbers are the same then next sort on the hash value. Because index pages are kept in hash value order, this gains a little speed by allowing the eventual tuple insertions to be done sequentially, avoiding repeated data movement within PageAddItem. This seems to be good for overall speedup of 5%-9%, depending on the incoming data. Simon Riggs, reviewed by Amit Kapila Discussion: https://postgr.es/m/CANbhV-FG-1ZNMBuwhUF7AxxJz3u5137dYL-o6hchK1V_dMw86g@mail.gmail.com
1 parent 70a437a commit e09d7a1

File tree

2 files changed

+21
-5
lines changed

2 files changed

+21
-5
lines changed

src/backend/access/hash/hashsort.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,10 @@ struct HSpool
4242
Relation index;
4343

4444
/*
45-
* We sort the hash keys based on the buckets they belong to. Below masks
46-
* are used in _hash_hashkey2bucket to determine the bucket of given hash
47-
* key.
45+
* We sort the hash keys based on the buckets they belong to, then by the
46+
* hash values themselves, to optimize insertions onto hash pages. The
47+
* masks below are used in _hash_hashkey2bucket to determine the bucket of
48+
* a given hash key.
4849
*/
4950
uint32 high_mask;
5051
uint32 low_mask;

src/backend/utils/sort/tuplesortvariants.c

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1387,14 +1387,17 @@ comparetup_index_hash(const SortTuple *a, const SortTuple *b,
13871387
{
13881388
Bucket bucket1;
13891389
Bucket bucket2;
1390+
uint32 hash1;
1391+
uint32 hash2;
13901392
IndexTuple tuple1;
13911393
IndexTuple tuple2;
13921394
TuplesortPublic *base = TuplesortstateGetPublic(state);
13931395
TuplesortIndexHashArg *arg = (TuplesortIndexHashArg *) base->arg;
13941396

13951397
/*
1396-
* Fetch hash keys and mask off bits we don't want to sort by. We know
1397-
* that the first column of the index tuple is the hash key.
1398+
* Fetch hash keys and mask off bits we don't want to sort by, so that the
1399+
* initial sort is just on the bucket number. We know that the first
1400+
* column of the index tuple is the hash key.
13981401
*/
13991402
Assert(!a->isnull1);
14001403
bucket1 = _hash_hashkey2bucket(DatumGetUInt32(a->datum1),
@@ -1409,6 +1412,18 @@ comparetup_index_hash(const SortTuple *a, const SortTuple *b,
14091412
else if (bucket1 < bucket2)
14101413
return -1;
14111414

1415+
/*
1416+
* If bucket values are equal, sort by hash values. This allows us to
1417+
* insert directly onto bucket/overflow pages, where the index tuples are
1418+
* stored in hash order to allow fast binary search within each page.
1419+
*/
1420+
hash1 = DatumGetUInt32(a->datum1);
1421+
hash2 = DatumGetUInt32(b->datum1);
1422+
if (hash1 > hash2)
1423+
return 1;
1424+
else if (hash1 < hash2)
1425+
return -1;
1426+
14121427
/*
14131428
* If hash values are equal, we sort on ItemPointer. This does not affect
14141429
* validity of the finished index, but it may be useful to have index

0 commit comments

Comments
 (0)