Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 53cfe40

Browse files
committed
Block signals while allocating DSM memory.
On Linux, we call posix_fallocate() on shm_open()'d memory to avoid later potential SIGBUS (see commit 899bd78). Based on field reports of systems stuck in an EINTR retry loop there, there, we made it possible to break out of that loop via slightly odd coding where the CHECK_FOR_INTERRUPTS() call was somewhat removed from the loop (see commit 422952e). On further reflection, that was not a great choice for at least two reasons: 1. If interrupts were held, the CHECK_FOR_INTERRUPTS() would do nothing and the EINTR error would be surfaced to the user. 2. If EINTR was reported but neither QueryCancelPending nor ProcDiePending was set, then we'd dutifully retry, but with a bit more understanding of how posix_fallocate() works, it's now clear that you can get into a loop that never terminates. posix_fallocate() is not a function that can do some of the job and tell you about progress if it's interrupted, it has to undo what it's done so far and report EINTR, and if signals keep arriving faster than it can complete (cf recovery conflict signals), you're stuck. Therefore, for now, we'll simply block most signals to guarantee progress. SIGQUIT is not blocked (see InitPostmasterChild()), because its expected handler doesn't return, and unblockable signals like SIGCONT are not expected to arrive at a high rate. For good measure, we'll include the ftruncate() call in the blocked region, and add a retry loop. Back-patch to all supported releases. Reported-by: Alvaro Herrera <[email protected]> Reported-by: Nicola Contu <[email protected]> Reviewed-by: Alvaro Herrera <[email protected]> Reviewed-by: Andres Freund <[email protected]> Discussion: https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql
1 parent 7c5953b commit 53cfe40

File tree

1 file changed

+23
-14
lines changed

1 file changed

+23
-14
lines changed

src/backend/storage/ipc/dsm_impl.c

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,9 @@
6161
#ifdef HAVE_SYS_SHM_H
6262
#include <sys/shm.h>
6363
#endif
64-
#include "pgstat.h"
6564

65+
#include "libpq/pqsignal.h" /* for PG_SETMASK macro */
66+
#include "pgstat.h"
6667
#include "portability/mem.h"
6768
#include "storage/dsm_impl.h"
6869
#include "storage/fd.h"
@@ -333,14 +334,6 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
333334
shm_unlink(name);
334335
errno = save_errno;
335336

336-
/*
337-
* If we received a query cancel or termination signal, we will have
338-
* EINTR set here. If the caller said that errors are OK here, check
339-
* for interrupts immediately.
340-
*/
341-
if (errno == EINTR && elevel >= ERROR)
342-
CHECK_FOR_INTERRUPTS();
343-
344337
ereport(elevel,
345338
(errcode_for_dynamic_shared_memory(),
346339
errmsg("could not resize shared memory segment \"%s\" to %zu bytes: %m",
@@ -415,9 +408,21 @@ static int
415408
dsm_impl_posix_resize(int fd, off_t size)
416409
{
417410
int rc;
411+
int save_errno;
412+
413+
/*
414+
* Block all blockable signals, except SIGQUIT. posix_fallocate() can run
415+
* for quite a long time, and is an all-or-nothing operation. If we
416+
* allowed SIGUSR1 to interrupt us repeatedly (for example, due to recovery
417+
* conflicts), the retry loop might never succeed.
418+
*/
419+
PG_SETMASK(&BlockSig);
418420

419421
/* Truncate (or extend) the file to the requested size. */
420-
rc = ftruncate(fd, size);
422+
do
423+
{
424+
rc = ftruncate(fd, size);
425+
} while (rc < 0 && errno == EINTR);
421426

422427
/*
423428
* On Linux, a shm_open fd is backed by a tmpfs file. After resizing with
@@ -431,14 +436,14 @@ dsm_impl_posix_resize(int fd, off_t size)
431436
if (rc == 0)
432437
{
433438
/*
434-
* We may get interrupted. If so, just retry unless there is an
435-
* interrupt pending. This avoids the possibility of looping forever
436-
* if another backend is repeatedly trying to interrupt us.
439+
* We still use a traditional EINTR retry loop to handle SIGCONT.
440+
* posix_fallocate() doesn't restart automatically, and we don't want
441+
* this to fail if you attach a debugger.
437442
*/
438443
do
439444
{
440445
rc = posix_fallocate(fd, 0, size);
441-
} while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
446+
} while (rc == EINTR);
442447

443448
/*
444449
* The caller expects errno to be set, but posix_fallocate() doesn't
@@ -449,6 +454,10 @@ dsm_impl_posix_resize(int fd, off_t size)
449454
}
450455
#endif /* HAVE_POSIX_FALLOCATE && __linux__ */
451456

457+
save_errno = errno;
458+
PG_SETMASK(&UnBlockSig);
459+
errno = save_errno;
460+
452461
return rc;
453462
}
454463

0 commit comments

Comments
 (0)