Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 3fccbd9

Browse files
committed
Handle ENOENT status when querying NUMA node
We've assumed that touching the memory is sufficient for a page to be located on one of the NUMA nodes. But a page may be moved to a swap after we touch it, due to memory pressure. We touch the memory before querying the status, but there is no guarantee it won't be moved to the swap in the meantime. The touching happens only on the first call, so later calls are more likely to be affected. And the batching increases the window too. It's up to the kernel if/when pages get moved to swap. We have to accept ENOENT (-2) as a valid result, and handle it without failing. This patch simply treats it as an unknown node, and returns NULL in the two affected views (pg_shmem_allocations_numa and pg_buffercache_numa). Hugepages cannot be swapped out, so this affects only regular pages. Reported by Christoph Berg, investigation and fix by me. Backpatch to 18, where the two views were introduced. Reported-by: Christoph Berg <[email protected]> Discussion: 18 Backpatch-through: https://postgr.es/m/[email protected]
1 parent 302879b commit 3fccbd9

File tree

2 files changed

+35
-9
lines changed

2 files changed

+35
-9
lines changed

contrib/pg_buffercache/pg_buffercache_pages.c

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -551,8 +551,16 @@ pg_buffercache_os_pages_internal(FunctionCallInfo fcinfo, bool include_numa)
551551

552552
if (fctx->include_numa)
553553
{
554-
values[2] = Int32GetDatum(fctx->record[i].numa_node);
555-
nulls[2] = false;
554+
/* status is valid node number */
555+
if (fctx->record[i].numa_node >= 0)
556+
{
557+
values[2] = Int32GetDatum(fctx->record[i].numa_node);
558+
nulls[2] = false;
559+
} else {
560+
/* some kind of error (e.g. pages moved to swap) */
561+
values[2] = (Datum) 0;
562+
nulls[2] = true;
563+
}
556564
}
557565
else
558566
{

src/backend/storage/ipc/shmem.c

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -599,7 +599,7 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
599599
InitMaterializedSRF(fcinfo, 0);
600600

601601
max_nodes = pg_numa_get_max_node();
602-
nodes = palloc_array(Size, max_nodes + 1);
602+
nodes = palloc_array(Size, max_nodes + 2);
603603

604604
/*
605605
* Shared memory allocations can vary in size and may not align with OS
@@ -635,7 +635,6 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
635635
hash_seq_init(&hstat, ShmemIndex);
636636

637637
/* output all allocated entries */
638-
memset(nulls, 0, sizeof(nulls));
639638
while ((ent = (ShmemIndexEnt *) hash_seq_search(&hstat)) != NULL)
640639
{
641640
int i;
@@ -684,22 +683,33 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
684683
elog(ERROR, "failed NUMA pages inquiry status: %m");
685684

686685
/* Count number of NUMA nodes used for this shared memory entry */
687-
memset(nodes, 0, sizeof(Size) * (max_nodes + 1));
686+
memset(nodes, 0, sizeof(Size) * (max_nodes + 2));
688687

689688
for (i = 0; i < shm_ent_page_count; i++)
690689
{
691690
int s = pages_status[i];
692691

693692
/* Ensure we are adding only valid index to the array */
694-
if (s < 0 || s > max_nodes)
693+
if (s >= 0 && s <= max_nodes)
694+
{
695+
/* valid NUMA node */
696+
nodes[s]++;
697+
continue;
698+
}
699+
else if (s == -2)
695700
{
696-
elog(ERROR, "invalid NUMA node id outside of allowed range "
697-
"[0, " UINT64_FORMAT "]: %d", max_nodes, s);
701+
/* -2 means ENOENT (e.g. page was moved to swap) */
702+
nodes[max_nodes + 1]++;
703+
continue;
698704
}
699705

700-
nodes[s]++;
706+
elog(ERROR, "invalid NUMA node id outside of allowed range "
707+
"[0, " UINT64_FORMAT "]: %d", max_nodes, s);
701708
}
702709

710+
/* no NULLs for regular nodes */
711+
memset(nulls, 0, sizeof(nulls));
712+
703713
/*
704714
* Add one entry for each NUMA node, including those without allocated
705715
* memory for this segment.
@@ -713,6 +723,14 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
713723
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
714724
values, nulls);
715725
}
726+
727+
/* The last entry is used for pages without a NUMA node. */
728+
nulls[1] = true;
729+
values[0] = CStringGetTextDatum(ent->key);
730+
values[2] = Int64GetDatum(nodes[max_nodes + 1] * os_page_size);
731+
732+
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
733+
values, nulls);
716734
}
717735

718736
LWLockRelease(ShmemIndexLock);

0 commit comments

Comments
 (0)