Read from the same worker repeatedly until it returns no tuple.

robertmhaas · robertmhaas · commit bc7fcab5e36b · 2015-12-23T14:06:52.000-05:00
The original coding read tuples from workers in round-robin fashion,
but performance testing shows that it works much better to read enough
to empty one queue before moving on to the next.  I believe the
reason for this is that, with the old approach, we could easily wake
up a worker repeatedly to write only one new tuple into the shm_mq
each time.  With this approach, by the time the process gets scheduled,
it has a decent chance of being able to fill the entire buffer in
one go.

Patch by me.  Dilip Kumar helped with performance testing.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
@@ -359,14 +359,20 @@ gather_readnext(GatherState *gatherstate)
 			continue;
 		}
 
-		/* Advance nextreader pointer in round-robin fashion. */
-		gatherstate->nextreader =
-			(gatherstate->nextreader + 1) % gatherstate->nreaders;
-
 		/* If we got a tuple, return it. */
 		if (tup)
 			return tup;
 
+		/*
+		 * Advance nextreader pointer in round-robin fashion.  Note that we
+		 * only reach this code if we weren't able to get a tuple from the
+		 * current worker.  We used to advance the nextreader pointer after
+		 * every tuple, but it turns out to be much more efficient to keep
+		 * reading from the same queue until that would require blocking.
+		 */
+		gatherstate->nextreader =
+			(gatherstate->nextreader + 1) % gatherstate->nreaders;
+
 		/* Have we visited every TupleQueueReader? */
 		if (gatherstate->nextreader == waitpos)
 		{