Thanks to visit codestin.com
Credit goes to chromium.googlesource.com

blob: 39661eb4c4d3d1bbce9494056b4525df9060670f [file] [log] [blame]
dana20fde62011-07-12 14:28:051/*
drhac4f0032014-04-02 18:58:492** 2011-07-09
dana20fde62011-07-12 14:28:053**
4** The author disclaims copyright to this source code. In place of
5** a legal notice, here is a blessing:
6**
7** May you do good and not evil.
8** May you find forgiveness for yourself and forgive others.
9** May you share freely, never taking more than you give.
10**
11*************************************************************************
12** This file contains code for the VdbeSorter object, used in concert with
dan1a088a82014-04-15 19:52:3413** a VdbeCursor to sort large numbers of keys for CREATE INDEX statements
drhac4f0032014-04-02 18:58:4914** or by SELECT statements with ORDER BY clauses that cannot be satisfied
15** using indexes and without LIMIT clauses.
16**
17** The VdbeSorter object implements a multi-threaded external merge sort
drh3de4df22014-04-24 12:28:2818** algorithm that is efficient even if the number of elements being sorted
drhac4f0032014-04-02 18:58:4919** exceeds the available memory.
20**
21** Here is the (internal, non-API) interface between this module and the
22** rest of the SQLite system:
23**
24** sqlite3VdbeSorterInit() Create a new VdbeSorter object.
25**
26** sqlite3VdbeSorterWrite() Add a single new row to the VdbeSorter
27** object. The row is a binary blob in the
28** OP_MakeRecord format that contains both
29** the ORDER BY key columns and result columns
30** in the case of a SELECT w/ ORDER BY, or
31** the complete record for an index entry
32** in the case of a CREATE INDEX.
33**
34** sqlite3VdbeSorterRewind() Sort all content previously added.
35** Position the read cursor on the
36** first sorted element.
37**
38** sqlite3VdbeSorterNext() Advance the read cursor to the next sorted
39** element.
40**
41** sqlite3VdbeSorterRowkey() Return the complete binary blob for the
42** row currently under the read cursor.
43**
44** sqlite3VdbeSorterCompare() Compare the binary blob for the row
45** currently under the read cursor against
46** another binary blob X and report if
47** X is strictly less than the read cursor.
48** Used to enforce uniqueness in a
49** CREATE UNIQUE INDEX statement.
50**
51** sqlite3VdbeSorterClose() Close the VdbeSorter object and reclaim
52** all resources.
53**
54** sqlite3VdbeSorterReset() Refurbish the VdbeSorter for reuse. This
55** is like Close() followed by Init() only
56** much faster.
57**
larrybrbc917382023-06-07 08:40:3158** The interfaces above must be called in a particular order. Write() can
drhac4f0032014-04-02 18:58:4959** only occur in between Init()/Reset() and Rewind(). Next(), Rowkey(), and
dan1a088a82014-04-15 19:52:3460** Compare() can only occur in between Rewind() and Close()/Reset(). i.e.
61**
62** Init()
63** for each record: Write()
64** Rewind()
65** Rowkey()/Compare()
larrybrbc917382023-06-07 08:40:3166** Next()
dan1a088a82014-04-15 19:52:3467** Close()
drhac4f0032014-04-02 18:58:4968**
69** Algorithm:
70**
larrybrbc917382023-06-07 08:40:3171** Records passed to the sorter via calls to Write() are initially held
dan1a088a82014-04-15 19:52:3472** unsorted in main memory. Assuming the amount of memory used never exceeds
73** a threshold, when Rewind() is called the set of records is sorted using
74** an in-memory merge sort. In this case, no temporary files are required
larrybrbc917382023-06-07 08:40:3175** and subsequent calls to Rowkey(), Next() and Compare() read records
dan1a088a82014-04-15 19:52:3476** directly from main memory.
drhac4f0032014-04-02 18:58:4977**
dan1a088a82014-04-15 19:52:3478** If the amount of space used to store records in main memory exceeds the
79** threshold, then the set of records currently in memory are sorted and
80** written to a temporary file in "Packed Memory Array" (PMA) format.
81** A PMA created at this point is known as a "level-0 PMA". Higher levels
82** of PMAs may be created by merging existing PMAs together - for example
83** merging two or more level-0 PMAs together creates a level-1 PMA.
drhac4f0032014-04-02 18:58:4984**
larrybrbc917382023-06-07 08:40:3185** The threshold for the amount of main memory to use before flushing
dan1a088a82014-04-15 19:52:3486** records to a PMA is roughly the same as the limit configured for the
larrybrbc917382023-06-07 08:40:3187** page-cache of the main database. Specifically, the threshold is set to
larrybr55be2162023-06-07 17:03:2288** the value returned by "PRAGMA main.page_size" multiplied by
dan1a088a82014-04-15 19:52:3489** that returned by "PRAGMA main.cache_size", in bytes.
drhac4f0032014-04-02 18:58:4990**
dan1a088a82014-04-15 19:52:3491** If the sorter is running in single-threaded mode, then all PMAs generated
92** are appended to a single temporary file. Or, if the sorter is running in
93** multi-threaded mode then up to (N+1) temporary files may be opened, where
94** N is the configured number of worker threads. In this case, instead of
95** sorting the records and writing the PMA to a temporary file itself, the
96** calling thread usually launches a worker thread to do so. Except, if
97** there are already N worker threads running, the main thread does the work
98** itself.
99**
100** The sorter is running in multi-threaded mode if (a) the library was built
101** with pre-processor symbol SQLITE_MAX_WORKER_THREADS set to a value greater
102** than zero, and (b) worker threads have been enabled at runtime by calling
drh4d9f1882014-11-04 17:23:24103** "PRAGMA threads=N" with some value of N greater than 0.
dan1a088a82014-04-15 19:52:34104**
larrybrbc917382023-06-07 08:40:31105** When Rewind() is called, any data remaining in memory is flushed to a
dan1a088a82014-04-15 19:52:34106** final PMA. So at this point the data is stored in some number of sorted
drh3de4df22014-04-24 12:28:28107** PMAs within temporary files on disk.
dan1a088a82014-04-15 19:52:34108**
109** If there are fewer than SORTER_MAX_MERGE_COUNT PMAs in total and the
110** sorter is running in single-threaded mode, then these PMAs are merged
larrybrbc917382023-06-07 08:40:31111** incrementally as keys are retrieved from the sorter by the VDBE. The
drh5f4a4792014-05-16 20:24:51112** MergeEngine object, described in further detail below, performs this
113** merge.
dan1a088a82014-04-15 19:52:34114**
115** Or, if running in multi-threaded mode, then a background thread is
116** launched to merge the existing PMAs. Once the background thread has
larrybrbc917382023-06-07 08:40:31117** merged T bytes of data into a single sorted PMA, the main thread
dan1a088a82014-04-15 19:52:34118** begins reading keys from that PMA while the background thread proceeds
119** with merging the next T bytes of data. And so on.
120**
larrybrbc917382023-06-07 08:40:31121** Parameter T is set to half the value of the memory threshold used
dan1a088a82014-04-15 19:52:34122** by Write() above to determine when to create a new PMA.
123**
larrybrbc917382023-06-07 08:40:31124** If there are more than SORTER_MAX_MERGE_COUNT PMAs in total when
125** Rewind() is called, then a hierarchy of incremental-merges is used.
126** First, T bytes of data from the first SORTER_MAX_MERGE_COUNT PMAs on
dan1a088a82014-04-15 19:52:34127** disk are merged together. Then T bytes of data from the second set, and
128** so on, such that no operation ever merges more than SORTER_MAX_MERGE_COUNT
129** PMAs at a time. This done is to improve locality.
130**
131** If running in multi-threaded mode and there are more than
132** SORTER_MAX_MERGE_COUNT PMAs on disk when Rewind() is called, then more
133** than one background thread may be created. Specifically, there may be
134** one background thread for each temporary file on disk, and one background
135** thread to merge the output of each of the others to a single PMA for
136** the main thread to read from.
dana20fde62011-07-12 14:28:05137*/
dana20fde62011-07-12 14:28:05138#include "sqliteInt.h"
139#include "vdbeInt.h"
140
larrybrbc917382023-06-07 08:40:31141/*
dan82a8a9f2014-04-12 19:34:44142** If SQLITE_DEBUG_SORTER_THREADS is defined, this module outputs various
143** messages to stderr that may be helpful in understanding the performance
144** characteristics of the sorter in multi-threaded mode.
145*/
146#if 0
147# define SQLITE_DEBUG_SORTER_THREADS 1
148#endif
dana20fde62011-07-12 14:28:05149
150/*
dan0a792382014-11-25 18:59:55151** Hard-coded maximum amount of data to accumulate in memory before flushing
152** to a level 0 PMA. The purpose of this limit is to prevent various integer
153** overflows. 512MiB.
154*/
drh3bd17912015-01-02 15:55:29155#define SQLITE_MAX_PMASZ (1<<29)
dan0a792382014-11-25 18:59:55156
157/*
drhac4f0032014-04-02 18:58:49158** Private objects used by the sorter
159*/
drha634fb12014-04-03 02:54:27160typedef struct MergeEngine MergeEngine; /* Merge PMAs together */
161typedef struct PmaReader PmaReader; /* Incrementally read one PMA */
dan1a088a82014-04-15 19:52:34162typedef struct PmaWriter PmaWriter; /* Incrementally write one PMA */
drha634fb12014-04-03 02:54:27163typedef struct SorterRecord SorterRecord; /* A record being sorted */
164typedef struct SortSubtask SortSubtask; /* A sub-task in the sort process */
dan1a088a82014-04-15 19:52:34165typedef struct SorterFile SorterFile; /* Temporary file object wrapper */
166typedef struct SorterList SorterList; /* In-memory list of records */
drh3de4df22014-04-24 12:28:28167typedef struct IncrMerger IncrMerger; /* Read & merge multiple PMAs */
dana20fde62011-07-12 14:28:05168
dan82a8a9f2014-04-12 19:34:44169/*
larrybrbc917382023-06-07 08:40:31170** A container for a temp file handle and the current amount of data
dan82a8a9f2014-04-12 19:34:44171** stored in the file.
172*/
173struct SorterFile {
174 sqlite3_file *pFd; /* File handle */
175 i64 iEof; /* Bytes of data stored in pFd */
176};
danf8768412014-03-17 15:43:05177
178/*
drh3de4df22014-04-24 12:28:28179** An in-memory list of objects to be sorted.
dane6f7bc62011-08-12 16:11:43180**
drh3de4df22014-04-24 12:28:28181** If aMemory==0 then each object is allocated separately and the objects
182** are connected using SorterRecord.u.pNext. If aMemory!=0 then all objects
183** are stored in the aMemory[] bulk memory, one right after the other, and
184** are connected using SorterRecord.u.iNext.
danf8768412014-03-17 15:43:05185*/
dan82a8a9f2014-04-12 19:34:44186struct SorterList {
187 SorterRecord *pList; /* Linked list of records */
drh3de4df22014-04-24 12:28:28188 u8 *aMemory; /* If non-NULL, bulk memory to hold pList */
drh568643f2023-10-06 12:15:01189 i64 szPMA; /* Size of pList as PMA in bytes */
dand30ab3d2014-04-09 20:04:17190};
191
danf8768412014-03-17 15:43:05192/*
drha634fb12014-04-03 02:54:27193** The MergeEngine object is used to combine two or more smaller PMAs into
194** one big PMA using a merge operation. Separate PMAs all need to be
195** combined into one big PMA in order to be able to step through the sorted
196** records in order.
dana20fde62011-07-12 14:28:05197**
drhde823be2014-05-20 11:03:53198** The aReadr[] array contains a PmaReader object for each of the PMAs being
199** merged. An aReadr[] object either points to a valid key or else is at EOF.
drhac651962014-07-28 14:54:50200** ("EOF" means "End Of File". When aReadr[] is at EOF there is no more data.)
drha634fb12014-04-03 02:54:27201** For the purposes of the paragraphs below, we assume that the array is
202** actually N elements in size, where N is the smallest power of 2 greater
drhde823be2014-05-20 11:03:53203** to or equal to the number of PMAs being merged. The extra aReadr[] elements
drha634fb12014-04-03 02:54:27204** are treated as if they are empty (always at EOF).
danc6e73452011-08-04 12:14:04205**
danf25eef92011-08-04 18:43:37206** The aTree[] array is also N elements in size. The value of N is stored in
drha634fb12014-04-03 02:54:27207** the MergeEngine.nTree variable.
dana20fde62011-07-12 14:28:05208**
209** The final (N/2) elements of aTree[] contain the results of comparing
larrybrbc917382023-06-07 08:40:31210** pairs of PMA keys together. Element i contains the result of
drhde823be2014-05-20 11:03:53211** comparing aReadr[2*i-N] and aReadr[2*i-N+1]. Whichever key is smaller, the
larrybrbc917382023-06-07 08:40:31212** aTree element is set to the index of it.
dana20fde62011-07-12 14:28:05213**
214** For the purposes of this comparison, EOF is considered greater than any
215** other key value. If the keys are equal (only possible with two EOF
216** values), it doesn't matter which index is stored.
217**
larrybrbc917382023-06-07 08:40:31218** The (N/4) elements of aTree[] that precede the final (N/2) described
drhde823be2014-05-20 11:03:53219** above contains the index of the smallest of each block of 4 PmaReaders
larrybrbc917382023-06-07 08:40:31220** And so on. So that aTree[1] contains the index of the PmaReader that
dana20fde62011-07-12 14:28:05221** currently points to the smallest key value. aTree[0] is unused.
222**
223** Example:
224**
drhde823be2014-05-20 11:03:53225** aReadr[0] -> Banana
226** aReadr[1] -> Feijoa
227** aReadr[2] -> Elderberry
228** aReadr[3] -> Currant
229** aReadr[4] -> Grapefruit
230** aReadr[5] -> Apple
231** aReadr[6] -> Durian
232** aReadr[7] -> EOF
dana20fde62011-07-12 14:28:05233**
234** aTree[] = { X, 5 0, 5 0, 3, 5, 6 }
235**
larrybrbc917382023-06-07 08:40:31236** The current element is "Apple" (the value of the key indicated by
drhde823be2014-05-20 11:03:53237** PmaReader 5). When the Next() operation is invoked, PmaReader 5 will
dana20fde62011-07-12 14:28:05238** be advanced to the next key in its segment. Say the next key is
239** "Eggplant":
240**
drhde823be2014-05-20 11:03:53241** aReadr[5] -> Eggplant
dana20fde62011-07-12 14:28:05242**
drhde823be2014-05-20 11:03:53243** The contents of aTree[] are updated first by comparing the new PmaReader
244** 5 key to the current key of PmaReader 4 (still "Grapefruit"). The PmaReader
dana20fde62011-07-12 14:28:05245** 5 value is still smaller, so aTree[6] is set to 5. And so on up the tree.
drhde823be2014-05-20 11:03:53246** The value of PmaReader 6 - "Durian" - is now smaller than that of PmaReader
dana20fde62011-07-12 14:28:05247** 5, so aTree[3] is set to 6. Key 0 is smaller than key 6 (Banana<Durian),
248** so the value written into element 1 of the array is 0. As follows:
249**
250** aTree[] = { X, 0 0, 6 0, 3, 5, 6 }
251**
252** In other words, each time we advance to the next sorter element, log2(N)
253** key comparison operations are required, where N is the number of segments
254** being merged (rounded up to the next power of 2).
255*/
drha634fb12014-04-03 02:54:27256struct MergeEngine {
drhde823be2014-05-20 11:03:53257 int nTree; /* Used size of aTree/aReadr (power of 2) */
drhac651962014-07-28 14:54:50258 SortSubtask *pTask; /* Used by this thread only */
drha634fb12014-04-03 02:54:27259 int *aTree; /* Current state of incremental merge */
drhde823be2014-05-20 11:03:53260 PmaReader *aReadr; /* Array of PmaReaders to merge data from */
danf8768412014-03-17 15:43:05261};
262
263/*
drhac651962014-07-28 14:54:50264** This object represents a single thread of control in a sort operation.
dan1a088a82014-04-15 19:52:34265** Exactly VdbeSorter.nTask instances of this object are allocated
266** as part of each VdbeSorter object. Instances are never allocated any
267** other way. VdbeSorter.nTask is set to the number of worker threads allowed
drhac651962014-07-28 14:54:50268** (see SQLITE_CONFIG_WORKER_THREADS) plus one (the main thread). Thus for
269** single-threaded operation, there is exactly one instance of this object
270** and for multi-threaded operation there are two or more instances.
dan1a088a82014-04-15 19:52:34271**
272** Essentially, this structure contains all those fields of the VdbeSorter
273** structure for which each thread requires a separate instance. For example,
larrybrbc917382023-06-07 08:40:31274** each thread requeries its own UnpackedRecord object to unpack records in
dan1a088a82014-04-15 19:52:34275** as part of comparison operations.
276**
larrybrbc917382023-06-07 08:40:31277** Before a background thread is launched, variable bDone is set to 0. Then,
278** right before it exits, the thread itself sets bDone to 1. This is used for
dan1a088a82014-04-15 19:52:34279** two purposes:
280**
281** 1. When flushing the contents of memory to a level-0 PMA on disk, to
282** attempt to select a SortSubtask for which there is not already an
283** active background thread (since doing so causes the main thread
284** to block until it finishes).
285**
286** 2. If SQLITE_DEBUG_SORTER_THREADS is defined, to determine if a call
287** to sqlite3ThreadJoin() is likely to block. Cases that are likely to
288** block provoke debugging output.
289**
290** In both cases, the effects of the main thread seeing (bDone==0) even
291** after the thread has finished are not dire. So we don't worry about
292** memory barriers and such here.
293*/
dana9d91112015-03-28 19:56:41294typedef int (*SorterCompare)(SortSubtask*,int*,const void*,int,const void*,int);
dan1a088a82014-04-15 19:52:34295struct SortSubtask {
296 SQLiteThread *pThread; /* Background thread, if any */
297 int bDone; /* Set if thread is finished but not joined */
drh568643f2023-10-06 12:15:01298 int nPMA; /* Number of PMAs currently in file */
dan1a088a82014-04-15 19:52:34299 VdbeSorter *pSorter; /* Sorter that owns this sub-task */
300 UnpackedRecord *pUnpacked; /* Space to unpack a record */
301 SorterList list; /* List for thread to write to a PMA */
dana9d91112015-03-28 19:56:41302 SorterCompare xCompare; /* Compare function to use */
dan1a088a82014-04-15 19:52:34303 SorterFile file; /* Temp file for level-0 PMAs */
304 SorterFile file2; /* Space for other PMAs */
305};
306
dana9d91112015-03-28 19:56:41307
dan1a088a82014-04-15 19:52:34308/*
larrybrbc917382023-06-07 08:40:31309** Main sorter structure. A single instance of this is allocated for each
danf8768412014-03-17 15:43:05310** sorter cursor created by the VDBE.
dan4be4c402014-04-11 19:43:07311**
312** mxKeysize:
313** As records are added to the sorter by calls to sqlite3VdbeSorterWrite(),
314** this variable is updated so as to be set to the size on disk of the
315** largest record in the sorter.
danf8768412014-03-17 15:43:05316*/
dana20fde62011-07-12 14:28:05317struct VdbeSorter {
drh34163c62011-09-02 21:42:33318 int mnPmaSize; /* Minimum PMA size, in bytes */
319 int mxPmaSize; /* Maximum PMA size, in bytes. 0==no limit */
dan1a088a82014-04-15 19:52:34320 int mxKeysize; /* Largest serialized key seen so far */
321 int pgsz; /* Main database page size */
drhde823be2014-05-20 11:03:53322 PmaReader *pReader; /* Readr data from here after Rewind() */
danf77ceba2014-04-14 18:41:21323 MergeEngine *pMerger; /* Or here, if bUseThreads==0 */
dan1a088a82014-04-15 19:52:34324 sqlite3 *db; /* Database connection */
325 KeyInfo *pKeyInfo; /* How to compare records */
dand30ab3d2014-04-09 20:04:17326 UnpackedRecord *pUnpacked; /* Used by VdbeSorterCompare() */
danf77ceba2014-04-14 18:41:21327 SorterList list; /* List of in-memory records */
328 int iMemory; /* Offset of free space in list.aMemory */
329 int nMemory; /* Size of list.aMemory allocation in bytes */
330 u8 bUsePMA; /* True if one or more PMAs created */
331 u8 bUseThreads; /* True to use background threads */
332 u8 iPrev; /* Previous thread used to flush PMA */
333 u8 nTask; /* Size of aTask[] array */
dan57a14092015-03-26 11:55:03334 u8 typeMask;
drhcebf06c2025-03-14 18:10:02335 SortSubtask aTask[FLEXARRAY]; /* One or more subtasks */
dana20fde62011-07-12 14:28:05336};
337
drhcebf06c2025-03-14 18:10:02338/* Size (in bytes) of a VdbeSorter object that works with N or fewer subtasks */
339#define SZ_VDBESORTER(N) (offsetof(VdbeSorter,aTask)+(N)*sizeof(SortSubtask))
340
dan57a14092015-03-26 11:55:03341#define SORTER_TYPE_INTEGER 0x01
342#define SORTER_TYPE_TEXT 0x02
343
dana20fde62011-07-12 14:28:05344/*
drha634fb12014-04-03 02:54:27345** An instance of the following object is used to read records out of a
346** PMA, in sorted order. The next key to be read is cached in nKey/aKey.
drha4c8ca02014-07-28 17:18:28347** aKey might point into aMap or into aBuffer. If neither of those locations
348** contain a contiguous representation of the key, then aAlloc is allocated
larrybrbc917382023-06-07 08:40:31349** and the key is copied into aAlloc and aKey is made to point to aAlloc.
drha4c8ca02014-07-28 17:18:28350**
351** pFd==0 at EOF.
dana20fde62011-07-12 14:28:05352*/
drha634fb12014-04-03 02:54:27353struct PmaReader {
drha4c8ca02014-07-28 17:18:28354 i64 iReadOff; /* Current read offset */
355 i64 iEof; /* 1 byte past EOF for this PmaReader */
356 int nAlloc; /* Bytes of space at aAlloc */
357 int nKey; /* Number of bytes in key */
358 sqlite3_file *pFd; /* File handle we are reading from */
359 u8 *aAlloc; /* Space for aKey if aBuffer and pMap wont work */
360 u8 *aKey; /* Pointer to current key */
361 u8 *aBuffer; /* Current read buffer */
362 int nBuffer; /* Size of read buffer in bytes */
363 u8 *aMap; /* Pointer to mapping of entire file */
364 IncrMerger *pIncr; /* Incremental merger */
dan3b2c9b32012-07-23 19:25:39365};
366
367/*
larrybrbc917382023-06-07 08:40:31368** Normally, a PmaReader object iterates through an existing PMA stored
dan4be4c402014-04-11 19:43:07369** within a temp file. However, if the PmaReader.pIncr variable points to
370** an object of the following type, it may be used to iterate/merge through
371** multiple PMAs simultaneously.
dan1a088a82014-04-15 19:52:34372**
larrybrbc917382023-06-07 08:40:31373** There are two types of IncrMerger object - single (bUseThread==0) and
374** multi-threaded (bUseThread==1).
dan1a088a82014-04-15 19:52:34375**
larrybrbc917382023-06-07 08:40:31376** A multi-threaded IncrMerger object uses two temporary files - aFile[0]
377** and aFile[1]. Neither file is allowed to grow to more than mxSz bytes in
378** size. When the IncrMerger is initialized, it reads enough data from
379** pMerger to populate aFile[0]. It then sets variables within the
380** corresponding PmaReader object to read from that file and kicks off
381** a background thread to populate aFile[1] with the next mxSz bytes of
382** sorted record data from pMerger.
dan1a088a82014-04-15 19:52:34383**
384** When the PmaReader reaches the end of aFile[0], it blocks until the
385** background thread has finished populating aFile[1]. It then exchanges
386** the contents of the aFile[0] and aFile[1] variables within this structure,
387** sets the PmaReader fields to read from the new aFile[0] and kicks off
388** another background thread to populate the new aFile[1]. And so on, until
389** the contents of pMerger are exhausted.
390**
391** A single-threaded IncrMerger does not open any temporary files of its
392** own. Instead, it has exclusive access to mxSz bytes of space beginning
larrybrbc917382023-06-07 08:40:31393** at offset iStartOff of file pTask->file2. And instead of using a
dan1a088a82014-04-15 19:52:34394** background thread to prepare data for the PmaReader, with a single
395** threaded IncrMerger the allocate part of pTask->file2 is "refilled" with
396** keys from pMerger by the calling thread whenever the PmaReader runs out
397** of data.
dan3b2c9b32012-07-23 19:25:39398*/
dand30ab3d2014-04-09 20:04:17399struct IncrMerger {
dand30ab3d2014-04-09 20:04:17400 SortSubtask *pTask; /* Task that owns this merger */
dan4be4c402014-04-11 19:43:07401 MergeEngine *pMerger; /* Merge engine thread reads data from */
402 i64 iStartOff; /* Offset to start writing file at */
403 int mxSz; /* Maximum bytes of data to store */
404 int bEof; /* Set to true when merge is finished */
405 int bUseThread; /* True to use a bg thread for this object */
406 SorterFile aFile[2]; /* aFile[0] for reading, [1] for writing */
dan3b2c9b32012-07-23 19:25:39407};
408
409/*
drha634fb12014-04-03 02:54:27410** An instance of this object is used for writing a PMA.
411**
412** The PMA is written one record at a time. Each record is of an arbitrary
413** size. But I/O is more efficient if it occurs in page-sized blocks where
414** each block is aligned on a page boundary. This object caches writes to
415** the PMA so that aligned, page-size blocks are written.
dan3b2c9b32012-07-23 19:25:39416*/
drha634fb12014-04-03 02:54:27417struct PmaWriter {
drh07f54792012-08-07 22:53:01418 int eFWErr; /* Non-zero if in an error state */
dan3b2c9b32012-07-23 19:25:39419 u8 *aBuffer; /* Pointer to write buffer */
420 int nBuffer; /* Size of write buffer in bytes */
421 int iBufStart; /* First byte of buffer to write */
422 int iBufEnd; /* Last byte of buffer to write */
423 i64 iWriteOff; /* Offset of start of buffer in file */
drha4c8ca02014-07-28 17:18:28424 sqlite3_file *pFd; /* File handle to write to */
dana20fde62011-07-12 14:28:05425};
426
dan5134d132011-09-02 10:31:11427/*
drha634fb12014-04-03 02:54:27428** This object is the header on a single record while that record is being
429** held in memory and prior to being written out as part of a PMA.
dan69719522014-03-27 19:25:02430**
431** How the linked list is connected depends on how memory is being managed
432** by this module. If using a separate allocation for each in-memory record
dan1a088a82014-04-15 19:52:34433** (VdbeSorter.list.aMemory==0), then the list is always connected using the
dan69719522014-03-27 19:25:02434** SorterRecord.u.pNext pointers.
435**
dan1a088a82014-04-15 19:52:34436** Or, if using the single large allocation method (VdbeSorter.list.aMemory!=0),
dan69719522014-03-27 19:25:02437** then while records are being accumulated the list is linked using the
438** SorterRecord.u.iNext offset. This is because the aMemory[] array may
439** be sqlite3Realloc()ed while records are being accumulated. Once the VM
440** has finished passing records to the sorter, or when the in-memory buffer
441** is full, the list is sorted. As part of the sorting process, it is
442** converted to use the SorterRecord.u.pNext pointers. See function
443** vdbeSorterSort() for details.
dan5134d132011-09-02 10:31:11444*/
445struct SorterRecord {
drha634fb12014-04-03 02:54:27446 int nVal; /* Size of the record in bytes */
dan69719522014-03-27 19:25:02447 union {
448 SorterRecord *pNext; /* Pointer to next record in list */
449 int iNext; /* Offset within aMemory of next record */
450 } u;
drha634fb12014-04-03 02:54:27451 /* The data for the record immediately follows this header */
dan5134d132011-09-02 10:31:11452};
453
dan69719522014-03-27 19:25:02454/* Return a pointer to the buffer containing the record data for SorterRecord
455** object p. Should be used as if:
456**
457** void *SRVAL(SorterRecord *p) { return (void*)&p[1]; }
458*/
459#define SRVAL(p) ((void*)((SorterRecord*)(p) + 1))
460
dana20fde62011-07-12 14:28:05461
drha634fb12014-04-03 02:54:27462/* Maximum number of PMAs that a single MergeEngine can merge */
danf834eff2011-08-05 11:49:12463#define SORTER_MAX_MERGE_COUNT 16
dan7fe62702011-08-02 10:56:22464
dand30ab3d2014-04-09 20:04:17465static int vdbeIncrSwap(IncrMerger*);
dan1a088a82014-04-15 19:52:34466static void vdbeIncrFree(IncrMerger *);
dand30ab3d2014-04-09 20:04:17467
dana20fde62011-07-12 14:28:05468/*
drhac651962014-07-28 14:54:50469** Free all memory belonging to the PmaReader object passed as the
danc6e73452011-08-04 12:14:04470** argument. All structure fields are set to zero before returning.
dana20fde62011-07-12 14:28:05471*/
drhde823be2014-05-20 11:03:53472static void vdbePmaReaderClear(PmaReader *pReadr){
473 sqlite3_free(pReadr->aAlloc);
474 sqlite3_free(pReadr->aBuffer);
drha4c8ca02014-07-28 17:18:28475 if( pReadr->aMap ) sqlite3OsUnfetch(pReadr->pFd, 0, pReadr->aMap);
drhde823be2014-05-20 11:03:53476 vdbeIncrFree(pReadr->pIncr);
477 memset(pReadr, 0, sizeof(PmaReader));
dana20fde62011-07-12 14:28:05478}
479
480/*
drhac651962014-07-28 14:54:50481** Read the next nByte bytes of data from the PMA p.
dan3b2c9b32012-07-23 19:25:39482** If successful, set *ppOut to point to a buffer containing the data
483** and return SQLITE_OK. Otherwise, if an error occurs, return an SQLite
484** error code.
485**
drhac651962014-07-28 14:54:50486** The buffer returned in *ppOut is only valid until the
dan3b2c9b32012-07-23 19:25:39487** next call to this function.
488*/
drha634fb12014-04-03 02:54:27489static int vdbePmaReadBlob(
drhde823be2014-05-20 11:03:53490 PmaReader *p, /* PmaReader from which to take the blob */
dan3b2c9b32012-07-23 19:25:39491 int nByte, /* Bytes of data to read */
492 u8 **ppOut /* OUT: Pointer to buffer containing data */
493){
dan9d0c0ea2012-07-26 09:21:14494 int iBuf; /* Offset within buffer to read from */
495 int nAvail; /* Bytes of data available in buffer */
danface0872014-03-27 17:23:41496
497 if( p->aMap ){
498 *ppOut = &p->aMap[p->iReadOff];
499 p->iReadOff += nByte;
500 return SQLITE_OK;
501 }
502
dan3b2c9b32012-07-23 19:25:39503 assert( p->aBuffer );
504
larrybrbc917382023-06-07 08:40:31505 /* If there is no more data to be read from the buffer, read the next
dan9d0c0ea2012-07-26 09:21:14506 ** p->nBuffer bytes of data from the file into it. Or, if there are less
507 ** than p->nBuffer bytes remaining in the PMA, read all remaining data. */
dan3b2c9b32012-07-23 19:25:39508 iBuf = p->iReadOff % p->nBuffer;
509 if( iBuf==0 ){
dan9d0c0ea2012-07-26 09:21:14510 int nRead; /* Bytes to read from disk */
511 int rc; /* sqlite3OsRead() return code */
dan3b2c9b32012-07-23 19:25:39512
dan9d0c0ea2012-07-26 09:21:14513 /* Determine how many bytes of data to read. */
dand4e97e82012-10-26 19:22:45514 if( (p->iEof - p->iReadOff) > (i64)p->nBuffer ){
515 nRead = p->nBuffer;
516 }else{
517 nRead = (int)(p->iEof - p->iReadOff);
518 }
dan3b2c9b32012-07-23 19:25:39519 assert( nRead>0 );
dan9d0c0ea2012-07-26 09:21:14520
drhde823be2014-05-20 11:03:53521 /* Readr data from the file. Return early if an error occurs. */
drha4c8ca02014-07-28 17:18:28522 rc = sqlite3OsRead(p->pFd, p->aBuffer, nRead, p->iReadOff);
dan3b2c9b32012-07-23 19:25:39523 assert( rc!=SQLITE_IOERR_SHORT_READ );
524 if( rc!=SQLITE_OK ) return rc;
525 }
larrybrbc917382023-06-07 08:40:31526 nAvail = p->nBuffer - iBuf;
dan3b2c9b32012-07-23 19:25:39527
528 if( nByte<=nAvail ){
dan9d0c0ea2012-07-26 09:21:14529 /* The requested data is available in the in-memory buffer. In this
larrybrbc917382023-06-07 08:40:31530 ** case there is no need to make a copy of the data, just return a
dan9d0c0ea2012-07-26 09:21:14531 ** pointer into the buffer to the caller. */
dan3b2c9b32012-07-23 19:25:39532 *ppOut = &p->aBuffer[iBuf];
533 p->iReadOff += nByte;
534 }else{
dan9d0c0ea2012-07-26 09:21:14535 /* The requested data is not all available in the in-memory buffer.
536 ** In this case, allocate space at p->aAlloc[] to copy the requested
537 ** range into. Then return a copy of pointer p->aAlloc to the caller. */
538 int nRem; /* Bytes remaining to copy */
539
540 /* Extend the p->aAlloc[] allocation if required. */
dan3b2c9b32012-07-23 19:25:39541 if( p->nAlloc<nByte ){
danf8768412014-03-17 15:43:05542 u8 *aNew;
drh0aa32312019-04-13 04:01:12543 sqlite3_int64 nNew = MAX(128, 2*(sqlite3_int64)p->nAlloc);
dan3b2c9b32012-07-23 19:25:39544 while( nByte>nNew ) nNew = nNew*2;
danf8768412014-03-17 15:43:05545 aNew = sqlite3Realloc(p->aAlloc, nNew);
mistachkinfad30392016-02-13 23:43:46546 if( !aNew ) return SQLITE_NOMEM_BKPT;
dan09ac7ec2012-08-06 19:28:20547 p->nAlloc = nNew;
danf8768412014-03-17 15:43:05548 p->aAlloc = aNew;
dan3b2c9b32012-07-23 19:25:39549 }
550
dan9d0c0ea2012-07-26 09:21:14551 /* Copy as much data as is available in the buffer into the start of
552 ** p->aAlloc[]. */
dan3b2c9b32012-07-23 19:25:39553 memcpy(p->aAlloc, &p->aBuffer[iBuf], nAvail);
554 p->iReadOff += nAvail;
555 nRem = nByte - nAvail;
dan9d0c0ea2012-07-26 09:21:14556
557 /* The following loop copies up to p->nBuffer bytes per iteration into
558 ** the p->aAlloc[] buffer. */
dan3b2c9b32012-07-23 19:25:39559 while( nRem>0 ){
drha634fb12014-04-03 02:54:27560 int rc; /* vdbePmaReadBlob() return code */
dan9d0c0ea2012-07-26 09:21:14561 int nCopy; /* Number of bytes to copy */
drh92d317f2024-08-07 14:54:54562 u8 *aNext = 0; /* Pointer to buffer to copy data from */
dan3b2c9b32012-07-23 19:25:39563
564 nCopy = nRem;
565 if( nRem>p->nBuffer ) nCopy = p->nBuffer;
drha634fb12014-04-03 02:54:27566 rc = vdbePmaReadBlob(p, nCopy, &aNext);
dan3b2c9b32012-07-23 19:25:39567 if( rc!=SQLITE_OK ) return rc;
568 assert( aNext!=p->aAlloc );
drhc76520c2024-08-07 15:17:37569 assert( aNext!=0 );
dan3b2c9b32012-07-23 19:25:39570 memcpy(&p->aAlloc[nByte - nRem], aNext, nCopy);
571 nRem -= nCopy;
572 }
573
574 *ppOut = p->aAlloc;
575 }
576
577 return SQLITE_OK;
578}
579
580/*
581** Read a varint from the stream of data accessed by p. Set *pnOut to
582** the value read.
583*/
drha634fb12014-04-03 02:54:27584static int vdbePmaReadVarint(PmaReader *p, u64 *pnOut){
dan3b2c9b32012-07-23 19:25:39585 int iBuf;
586
danface0872014-03-27 17:23:41587 if( p->aMap ){
588 p->iReadOff += sqlite3GetVarint(&p->aMap[p->iReadOff], pnOut);
dan3b2c9b32012-07-23 19:25:39589 }else{
danface0872014-03-27 17:23:41590 iBuf = p->iReadOff % p->nBuffer;
591 if( iBuf && (p->nBuffer-iBuf)>=9 ){
592 p->iReadOff += sqlite3GetVarint(&p->aBuffer[iBuf], pnOut);
593 }else{
594 u8 aVarint[16], *a;
595 int i = 0, rc;
596 do{
drha634fb12014-04-03 02:54:27597 rc = vdbePmaReadBlob(p, 1, &a);
danface0872014-03-27 17:23:41598 if( rc ) return rc;
599 aVarint[(i++)&0xf] = a[0];
600 }while( (a[0]&0x80)!=0 );
601 sqlite3GetVarint(aVarint, pnOut);
602 }
dan3b2c9b32012-07-23 19:25:39603 }
604
605 return SQLITE_OK;
606}
607
dan1a088a82014-04-15 19:52:34608/*
609** Attempt to memory map file pFile. If successful, set *pp to point to the
larrybrbc917382023-06-07 08:40:31610** new mapping and return SQLITE_OK. If the mapping is not attempted
dan1a088a82014-04-15 19:52:34611** (because the file is too large or the VFS layer is configured not to use
612** mmap), return SQLITE_OK and set *pp to NULL.
613**
614** Or, if an error occurs, return an SQLite error code. The final value of
615** *pp is undefined in this case.
616*/
dand30ab3d2014-04-09 20:04:17617static int vdbeSorterMapFile(SortSubtask *pTask, SorterFile *pFile, u8 **pp){
618 int rc = SQLITE_OK;
dan1a088a82014-04-15 19:52:34619 if( pFile->iEof<=(i64)(pTask->pSorter->db->nMaxSorterMmap) ){
daned7bcba2014-09-15 16:50:34620 sqlite3_file *pFd = pFile->pFd;
621 if( pFd->pMethods->iVersion>=3 ){
622 rc = sqlite3OsFetch(pFd, 0, (int)pFile->iEof, (void**)pp);
623 testcase( rc!=SQLITE_OK );
624 }
dand30ab3d2014-04-09 20:04:17625 }
626 return rc;
627}
dan3b2c9b32012-07-23 19:25:39628
629/*
drh8a4865f2014-07-28 18:57:40630** Attach PmaReader pReadr to file pFile (if it is not already attached to
larrybrbc917382023-06-07 08:40:31631** that file) and seek it to offset iOff within the file. Return SQLITE_OK
dan1a088a82014-04-15 19:52:34632** if successful, or an SQLite error code if an error occurs.
dana20fde62011-07-12 14:28:05633*/
dan1a088a82014-04-15 19:52:34634static int vdbePmaReaderSeek(
635 SortSubtask *pTask, /* Task context */
drhac651962014-07-28 14:54:50636 PmaReader *pReadr, /* Reader whose cursor is to be moved */
dan1a088a82014-04-15 19:52:34637 SorterFile *pFile, /* Sorter file to read from */
638 i64 iOff /* Offset in pFile */
danc6e73452011-08-04 12:14:04639){
dand30ab3d2014-04-09 20:04:17640 int rc = SQLITE_OK;
danc6e73452011-08-04 12:14:04641
drhde823be2014-05-20 11:03:53642 assert( pReadr->pIncr==0 || pReadr->pIncr->bEof==0 );
dand30ab3d2014-04-09 20:04:17643
drhc0fea3c2014-07-30 18:47:12644 if( sqlite3FaultSim(201) ) return SQLITE_IOERR_READ;
drhde823be2014-05-20 11:03:53645 if( pReadr->aMap ){
drha4c8ca02014-07-28 17:18:28646 sqlite3OsUnfetch(pReadr->pFd, 0, pReadr->aMap);
drhde823be2014-05-20 11:03:53647 pReadr->aMap = 0;
dana20fde62011-07-12 14:28:05648 }
drhde823be2014-05-20 11:03:53649 pReadr->iReadOff = iOff;
650 pReadr->iEof = pFile->iEof;
drha4c8ca02014-07-28 17:18:28651 pReadr->pFd = pFile->pFd;
danc6e73452011-08-04 12:14:04652
drhde823be2014-05-20 11:03:53653 rc = vdbeSorterMapFile(pTask, pFile, &pReadr->aMap);
654 if( rc==SQLITE_OK && pReadr->aMap==0 ){
dan1a088a82014-04-15 19:52:34655 int pgsz = pTask->pSorter->pgsz;
drhde823be2014-05-20 11:03:53656 int iBuf = pReadr->iReadOff % pgsz;
657 if( pReadr->aBuffer==0 ){
658 pReadr->aBuffer = (u8*)sqlite3Malloc(pgsz);
mistachkinfad30392016-02-13 23:43:46659 if( pReadr->aBuffer==0 ) rc = SQLITE_NOMEM_BKPT;
drhde823be2014-05-20 11:03:53660 pReadr->nBuffer = pgsz;
dan1a088a82014-04-15 19:52:34661 }
dan22ace892014-04-15 20:52:27662 if( rc==SQLITE_OK && iBuf ){
dan1a088a82014-04-15 19:52:34663 int nRead = pgsz - iBuf;
drhde823be2014-05-20 11:03:53664 if( (pReadr->iReadOff + nRead) > pReadr->iEof ){
665 nRead = (int)(pReadr->iEof - pReadr->iReadOff);
dan4be4c402014-04-11 19:43:07666 }
dan1a088a82014-04-15 19:52:34667 rc = sqlite3OsRead(
drha4c8ca02014-07-28 17:18:28668 pReadr->pFd, &pReadr->aBuffer[iBuf], nRead, pReadr->iReadOff
dan1a088a82014-04-15 19:52:34669 );
drhac651962014-07-28 14:54:50670 testcase( rc!=SQLITE_OK );
dand30ab3d2014-04-09 20:04:17671 }
dan1e74e602011-08-06 12:01:58672 }
673
674 return rc;
675}
676
dana20fde62011-07-12 14:28:05677/*
drhde823be2014-05-20 11:03:53678** Advance PmaReader pReadr to the next key in its PMA. Return SQLITE_OK if
dana20fde62011-07-12 14:28:05679** no error occurs, or an SQLite error code if one does.
dana20fde62011-07-12 14:28:05680*/
drhde823be2014-05-20 11:03:53681static int vdbePmaReaderNext(PmaReader *pReadr){
dand30ab3d2014-04-09 20:04:17682 int rc = SQLITE_OK; /* Return Code */
dana20fde62011-07-12 14:28:05683 u64 nRec = 0; /* Size of record in bytes */
dan3b2c9b32012-07-23 19:25:39684
dan1e74e602011-08-06 12:01:58685
drhde823be2014-05-20 11:03:53686 if( pReadr->iReadOff>=pReadr->iEof ){
687 IncrMerger *pIncr = pReadr->pIncr;
dand30ab3d2014-04-09 20:04:17688 int bEof = 1;
dan1a088a82014-04-15 19:52:34689 if( pIncr ){
690 rc = vdbeIncrSwap(pIncr);
691 if( rc==SQLITE_OK && pIncr->bEof==0 ){
692 rc = vdbePmaReaderSeek(
drhde823be2014-05-20 11:03:53693 pIncr->pTask, pReadr, &pIncr->aFile[0], pIncr->iStartOff
dan1a088a82014-04-15 19:52:34694 );
dand30ab3d2014-04-09 20:04:17695 bEof = 0;
dan407fae02012-07-23 20:10:35696 }
dan3b2c9b32012-07-23 19:25:39697 }
698
dand30ab3d2014-04-09 20:04:17699 if( bEof ){
700 /* This is an EOF condition */
drhde823be2014-05-20 11:03:53701 vdbePmaReaderClear(pReadr);
drhac651962014-07-28 14:54:50702 testcase( rc!=SQLITE_OK );
dand30ab3d2014-04-09 20:04:17703 return rc;
dan3b2c9b32012-07-23 19:25:39704 }
dan1e74e602011-08-06 12:01:58705 }
dan3b2c9b32012-07-23 19:25:39706
dan1e74e602011-08-06 12:01:58707 if( rc==SQLITE_OK ){
drhde823be2014-05-20 11:03:53708 rc = vdbePmaReadVarint(pReadr, &nRec);
dand30ab3d2014-04-09 20:04:17709 }
dana20fde62011-07-12 14:28:05710 if( rc==SQLITE_OK ){
drhde823be2014-05-20 11:03:53711 pReadr->nKey = (int)nRec;
712 rc = vdbePmaReadBlob(pReadr, (int)nRec, &pReadr->aKey);
drhac651962014-07-28 14:54:50713 testcase( rc!=SQLITE_OK );
dana20fde62011-07-12 14:28:05714 }
715
716 return rc;
717}
718
719/*
drhde823be2014-05-20 11:03:53720** Initialize PmaReader pReadr to scan through the PMA stored in file pFile
larrybrbc917382023-06-07 08:40:31721** starting at offset iStart and ending at offset iEof-1. This function
722** leaves the PmaReader pointing to the first key in the PMA (or EOF if the
dana20fde62011-07-12 14:28:05723** PMA is empty).
dand30ab3d2014-04-09 20:04:17724**
larrybrbc917382023-06-07 08:40:31725** If the pnByte parameter is NULL, then it is assumed that the file
dand30ab3d2014-04-09 20:04:17726** contains a single PMA, and that that PMA omits the initial length varint.
dana20fde62011-07-12 14:28:05727*/
drha634fb12014-04-03 02:54:27728static int vdbePmaReaderInit(
dand30ab3d2014-04-09 20:04:17729 SortSubtask *pTask, /* Task context */
730 SorterFile *pFile, /* Sorter file to read from */
731 i64 iStart, /* Start offset in pFile */
drhde823be2014-05-20 11:03:53732 PmaReader *pReadr, /* PmaReader to populate */
dana20fde62011-07-12 14:28:05733 i64 *pnByte /* IN/OUT: Increment this value by PMA size */
734){
dan1a088a82014-04-15 19:52:34735 int rc;
dana20fde62011-07-12 14:28:05736
dand30ab3d2014-04-09 20:04:17737 assert( pFile->iEof>iStart );
drhde823be2014-05-20 11:03:53738 assert( pReadr->aAlloc==0 && pReadr->nAlloc==0 );
739 assert( pReadr->aBuffer==0 );
740 assert( pReadr->aMap==0 );
dana20fde62011-07-12 14:28:05741
drhde823be2014-05-20 11:03:53742 rc = vdbePmaReaderSeek(pTask, pReadr, pFile, iStart);
danface0872014-03-27 17:23:41743 if( rc==SQLITE_OK ){
drhd1dd7502016-01-12 14:10:05744 u64 nByte = 0; /* Size of PMA in bytes */
drhde823be2014-05-20 11:03:53745 rc = vdbePmaReadVarint(pReadr, &nByte);
746 pReadr->iEof = pReadr->iReadOff + nByte;
danface0872014-03-27 17:23:41747 *pnByte += nByte;
dan1e74e602011-08-06 12:01:58748 }
749
750 if( rc==SQLITE_OK ){
drhde823be2014-05-20 11:03:53751 rc = vdbePmaReaderNext(pReadr);
dan1e74e602011-08-06 12:01:58752 }
753 return rc;
dana20fde62011-07-12 14:28:05754}
755
dan5134d132011-09-02 10:31:11756/*
dan7004f3f2015-03-30 12:06:26757** A version of vdbeSorterCompare() that assumes that it has already been
larrybrbc917382023-06-07 08:40:31758** determined that the first field of key1 is equal to the first field of
dan7004f3f2015-03-30 12:06:26759** key2.
760*/
761static int vdbeSorterCompareTail(
762 SortSubtask *pTask, /* Subtask context (for pKeyInfo) */
763 int *pbKey2Cached, /* True if pTask->pUnpacked is pKey2 */
764 const void *pKey1, int nKey1, /* Left side of comparison */
765 const void *pKey2, int nKey2 /* Right side of comparison */
766){
767 UnpackedRecord *r2 = pTask->pUnpacked;
768 if( *pbKey2Cached==0 ){
drh8658a8d2025-06-02 13:54:33769 sqlite3VdbeRecordUnpack(nKey2, pKey2, r2);
dan7004f3f2015-03-30 12:06:26770 *pbKey2Cached = 1;
771 }
772 return sqlite3VdbeRecordCompareWithSkip(nKey1, pKey1, r2, 1);
773}
dan5134d132011-09-02 10:31:11774
775/*
larrybrbc917382023-06-07 08:40:31776** Compare key1 (buffer pKey1, size nKey1 bytes) with key2 (buffer pKey2,
drha634fb12014-04-03 02:54:27777** size nKey2 bytes). Use (pTask->pKeyInfo) for the collation sequences
danfad9f9a2014-04-01 18:41:51778** used by the comparison. Return the result of the comparison.
dan5134d132011-09-02 10:31:11779**
dana9d91112015-03-28 19:56:41780** If IN/OUT parameter *pbKey2Cached is true when this function is called,
781** it is assumed that (pTask->pUnpacked) contains the unpacked version
782** of key2. If it is false, (pTask->pUnpacked) is populated with the unpacked
783** version of key2 and *pbKey2Cached set to true before returning.
dan8b1ea142011-09-03 14:36:13784**
drha634fb12014-04-03 02:54:27785** If an OOM error is encountered, (pTask->pUnpacked->error_rc) is set
danfad9f9a2014-04-01 18:41:51786** to SQLITE_NOMEM.
dan5134d132011-09-02 10:31:11787*/
danfad9f9a2014-04-01 18:41:51788static int vdbeSorterCompare(
drha634fb12014-04-03 02:54:27789 SortSubtask *pTask, /* Subtask context (for pKeyInfo) */
dana9d91112015-03-28 19:56:41790 int *pbKey2Cached, /* True if pTask->pUnpacked is pKey2 */
drhc041c162012-07-24 19:46:38791 const void *pKey1, int nKey1, /* Left side of comparison */
danfad9f9a2014-04-01 18:41:51792 const void *pKey2, int nKey2 /* Right side of comparison */
dan5134d132011-09-02 10:31:11793){
drha634fb12014-04-03 02:54:27794 UnpackedRecord *r2 = pTask->pUnpacked;
dana9d91112015-03-28 19:56:41795 if( !*pbKey2Cached ){
drh8658a8d2025-06-02 13:54:33796 sqlite3VdbeRecordUnpack(nKey2, pKey2, r2);
dana9d91112015-03-28 19:56:41797 *pbKey2Cached = 1;
dan8b1ea142011-09-03 14:36:13798 }
drh75179de2014-09-16 14:37:35799 return sqlite3VdbeRecordCompare(nKey1, pKey1, r2);
dana20fde62011-07-12 14:28:05800}
801
802/*
dana9d91112015-03-28 19:56:41803** A specially optimized version of vdbeSorterCompare() that assumes that
804** the first field of each key is a TEXT value and that the collation
805** sequence to compare them with is BINARY.
806*/
807static int vdbeSorterCompareText(
808 SortSubtask *pTask, /* Subtask context (for pKeyInfo) */
809 int *pbKey2Cached, /* True if pTask->pUnpacked is pKey2 */
810 const void *pKey1, int nKey1, /* Left side of comparison */
811 const void *pKey2, int nKey2 /* Right side of comparison */
812){
813 const u8 * const p1 = (const u8 * const)pKey1;
814 const u8 * const p2 = (const u8 * const)pKey2;
815 const u8 * const v1 = &p1[ p1[0] ]; /* Pointer to value 1 */
816 const u8 * const v2 = &p2[ p2[0] ]; /* Pointer to value 2 */
817
818 int n1;
819 int n2;
820 int res;
821
drh02a95eb2020-01-28 20:27:42822 getVarint32NR(&p1[1], n1);
823 getVarint32NR(&p2[1], n2);
drhae2ac852017-05-27 22:42:36824 res = memcmp(v1, v2, (MIN(n1, n2) - 13)/2);
dana9d91112015-03-28 19:56:41825 if( res==0 ){
826 res = n1 - n2;
827 }
828
829 if( res==0 ){
drha485ad12017-08-02 22:43:14830 if( pTask->pSorter->pKeyInfo->nKeyField>1 ){
dan7004f3f2015-03-30 12:06:26831 res = vdbeSorterCompareTail(
832 pTask, pbKey2Cached, pKey1, nKey1, pKey2, nKey2
833 );
dana9d91112015-03-28 19:56:41834 }
835 }else{
drh8658a8d2025-06-02 13:54:33836 assert( pTask->pSorter->pKeyInfo->aSortFlags!=0 );
dan6e118922019-08-12 16:36:38837 assert( !(pTask->pSorter->pKeyInfo->aSortFlags[0]&KEYINFO_ORDER_BIGNULL) );
838 if( pTask->pSorter->pKeyInfo->aSortFlags[0] ){
dana9d91112015-03-28 19:56:41839 res = res * -1;
840 }
841 }
842
843 return res;
844}
845
846/*
847** A specially optimized version of vdbeSorterCompare() that assumes that
848** the first field of each key is an INTEGER value.
849*/
850static int vdbeSorterCompareInt(
851 SortSubtask *pTask, /* Subtask context (for pKeyInfo) */
852 int *pbKey2Cached, /* True if pTask->pUnpacked is pKey2 */
853 const void *pKey1, int nKey1, /* Left side of comparison */
854 const void *pKey2, int nKey2 /* Right side of comparison */
855){
856 const u8 * const p1 = (const u8 * const)pKey1;
857 const u8 * const p2 = (const u8 * const)pKey2;
858 const int s1 = p1[1]; /* Left hand serial type */
859 const int s2 = p2[1]; /* Right hand serial type */
860 const u8 * const v1 = &p1[ p1[0] ]; /* Pointer to value 1 */
861 const u8 * const v2 = &p2[ p2[0] ]; /* Pointer to value 2 */
862 int res; /* Return value */
863
864 assert( (s1>0 && s1<7) || s1==8 || s1==9 );
865 assert( (s2>0 && s2<7) || s2==8 || s2==9 );
866
drhcaab5f42017-04-03 12:04:39867 if( s1==s2 ){
868 /* The two values have the same sign. Compare using memcmp(). */
869 static const u8 aLen[] = {0, 1, 2, 3, 4, 6, 8, 0, 0, 0 };
870 const u8 n = aLen[s1];
871 int i;
872 res = 0;
873 for(i=0; i<n; i++){
874 if( (res = v1[i] - v2[i])!=0 ){
875 if( ((v1[0] ^ v2[0]) & 0x80)!=0 ){
876 res = v1[0] & 0x80 ? -1 : +1;
877 }
878 break;
879 }
880 }
881 }else if( s1>7 && s2>7 ){
dana9d91112015-03-28 19:56:41882 res = s1 - s2;
883 }else{
drhcaab5f42017-04-03 12:04:39884 if( s2>7 ){
885 res = +1;
886 }else if( s1>7 ){
887 res = -1;
dana9d91112015-03-28 19:56:41888 }else{
drhcaab5f42017-04-03 12:04:39889 res = s1 - s2;
890 }
891 assert( res!=0 );
dana9d91112015-03-28 19:56:41892
drhcaab5f42017-04-03 12:04:39893 if( res>0 ){
894 if( *v1 & 0x80 ) res = -1;
895 }else{
896 if( *v2 & 0x80 ) res = +1;
dana9d91112015-03-28 19:56:41897 }
898 }
899
drh8658a8d2025-06-02 13:54:33900 assert( pTask->pSorter->pKeyInfo->aSortFlags!=0 );
dana9d91112015-03-28 19:56:41901 if( res==0 ){
drha485ad12017-08-02 22:43:14902 if( pTask->pSorter->pKeyInfo->nKeyField>1 ){
dan7004f3f2015-03-30 12:06:26903 res = vdbeSorterCompareTail(
904 pTask, pbKey2Cached, pKey1, nKey1, pKey2, nKey2
905 );
dana9d91112015-03-28 19:56:41906 }
dan6e118922019-08-12 16:36:38907 }else if( pTask->pSorter->pKeyInfo->aSortFlags[0] ){
908 assert( !(pTask->pSorter->pKeyInfo->aSortFlags[0]&KEYINFO_ORDER_BIGNULL) );
dana9d91112015-03-28 19:56:41909 res = res * -1;
910 }
911
912 return res;
dana20fde62011-07-12 14:28:05913}
914
915/*
916** Initialize the temporary index cursor just opened as a sorter cursor.
dan31a0bfd2014-04-16 19:04:23917**
drha485ad12017-08-02 22:43:14918** Usually, the sorter module uses the value of (pCsr->pKeyInfo->nKeyField)
dan31a0bfd2014-04-16 19:04:23919** to determine the number of fields that should be compared from the
920** records being sorted. However, if the value passed as argument nField
921** is non-zero and the sorter is able to guarantee a stable sort, nField
922** is used instead. This is used when sorting records for a CREATE INDEX
923** statement. In this case, keys are always delivered to the sorter in
larrybrbc917382023-06-07 08:40:31924** order of the primary key, which happens to be make up the final part
dan31a0bfd2014-04-16 19:04:23925** of the records being sorted. So if the sort is stable, there is never
926** any reason to compare PK fields and they can be ignored for a small
927** performance boost.
928**
929** The sorter can guarantee a stable sort when running in single-threaded
930** mode, but not in multi-threaded mode.
931**
932** SQLITE_OK is returned if successful, or an SQLite error code otherwise.
dana20fde62011-07-12 14:28:05933*/
drha634fb12014-04-03 02:54:27934int sqlite3VdbeSorterInit(
935 sqlite3 *db, /* Database connection (for malloc()) */
936 int nField, /* Number of key fields in each record */
937 VdbeCursor *pCsr /* Cursor that holds the new sorter */
938){
dan5134d132011-09-02 10:31:11939 int pgsz; /* Page size of main database */
drha634fb12014-04-03 02:54:27940 int i; /* Used to iterate through aTask[] */
drh34163c62011-09-02 21:42:33941 VdbeSorter *pSorter; /* The new sorter */
danf8768412014-03-17 15:43:05942 KeyInfo *pKeyInfo; /* Copy of pCsr->pKeyInfo with db==0 */
943 int szKeyInfo; /* Size of pCsr->pKeyInfo in bytes */
drhef86b942025-02-17 17:33:14944 i64 sz; /* Size of pSorter in bytes */
dan2f170012014-03-28 19:18:16945 int rc = SQLITE_OK;
drhb0f935e2014-05-12 15:30:00946#if SQLITE_MAX_WORKER_THREADS==0
drh8f0dab32014-05-16 12:18:08947# define nWorker 0
drhb0f935e2014-05-12 15:30:00948#else
drh111544c2014-08-29 16:20:47949 int nWorker;
950#endif
951
952 /* Initialize the upper limit on the number of worker threads */
953#if SQLITE_MAX_WORKER_THREADS>0
954 if( sqlite3TempInMemory(db) || sqlite3GlobalConfig.bCoreMutex==0 ){
955 nWorker = 0;
956 }else{
957 nWorker = db->aLimit[SQLITE_LIMIT_WORKER_THREADS];
958 }
drh028696c2014-08-25 23:44:44959#endif
960
961 /* Do not allow the total number of threads (main thread + all workers)
962 ** to exceed the maximum merge count */
963#if SQLITE_MAX_WORKER_THREADS>=SORTER_MAX_MERGE_COUNT
964 if( nWorker>=SORTER_MAX_MERGE_COUNT ){
965 nWorker = SORTER_MAX_MERGE_COUNT-1;
966 }
drhb0f935e2014-05-12 15:30:00967#endif
dan5134d132011-09-02 10:31:11968
drhb2486682022-01-03 01:43:28969 assert( pCsr->pKeyInfo );
970 assert( !pCsr->isEphemeral );
drhc960dcb2015-11-20 19:22:01971 assert( pCsr->eCurType==CURTYPE_SORTER );
drhef86b942025-02-17 17:33:14972 assert( sizeof(KeyInfo) + UMXV(pCsr->pKeyInfo->nKeyField)*sizeof(CollSeq*)
973 < 0x7fffffff );
drh7590bfd2025-06-02 09:49:07974 assert( pCsr->pKeyInfo->nKeyField<=pCsr->pKeyInfo->nAllField );
975 szKeyInfo = SZ_KEYINFO(pCsr->pKeyInfo->nAllField);
drhcebf06c2025-03-14 18:10:02976 sz = SZ_VDBESORTER(nWorker+1);
danb3f56fd2014-03-31 19:57:34977
978 pSorter = (VdbeSorter*)sqlite3DbMallocZero(db, sz + szKeyInfo);
drhc960dcb2015-11-20 19:22:01979 pCsr->uc.pSorter = pSorter;
drh34163c62011-09-02 21:42:33980 if( pSorter==0 ){
mistachkinfad30392016-02-13 23:43:46981 rc = SQLITE_NOMEM_BKPT;
dan2f170012014-03-28 19:18:16982 }else{
danebd2ecd2020-09-07 11:14:27983 Btree *pBt = db->aDb[0].pBt;
dan1a088a82014-04-15 19:52:34984 pSorter->pKeyInfo = pKeyInfo = (KeyInfo*)((u8*)pSorter + sz);
dan2f170012014-03-28 19:18:16985 memcpy(pKeyInfo, pCsr->pKeyInfo, szKeyInfo);
986 pKeyInfo->db = 0;
dan57a14092015-03-26 11:55:03987 if( nField && nWorker==0 ){
drha485ad12017-08-02 22:43:14988 pKeyInfo->nKeyField = nField;
drh7590bfd2025-06-02 09:49:07989 assert( nField<=pCsr->pKeyInfo->nAllField );
dan57a14092015-03-26 11:55:03990 }
drh7590bfd2025-06-02 09:49:07991 /* It is OK that pKeyInfo reuses the aSortFlags field from pCsr->pKeyInfo,
992 ** since the pCsr->pKeyInfo->aSortFlags[] array is invariant and lives
993 ** longer that pSorter. */
994 assert( pKeyInfo->aSortFlags==pCsr->pKeyInfo->aSortFlags );
danebd2ecd2020-09-07 11:14:27995 sqlite3BtreeEnter(pBt);
996 pSorter->pgsz = pgsz = sqlite3BtreeGetPageSize(pBt);
997 sqlite3BtreeLeave(pBt);
drha634fb12014-04-03 02:54:27998 pSorter->nTask = nWorker + 1;
mistachkincdabd7b2015-10-14 20:34:57999 pSorter->iPrev = (u8)(nWorker - 1);
dand30ab3d2014-04-09 20:04:171000 pSorter->bUseThreads = (pSorter->nTask>1);
dan1a088a82014-04-15 19:52:341001 pSorter->db = db;
drha634fb12014-04-03 02:54:271002 for(i=0; i<pSorter->nTask; i++){
1003 SortSubtask *pTask = &pSorter->aTask[i];
dand30ab3d2014-04-09 20:04:171004 pTask->pSorter = pSorter;
dan2f170012014-03-28 19:18:161005 }
dan5134d132011-09-02 10:31:111006
dan2f170012014-03-28 19:18:161007 if( !sqlite3TempInMemory(db) ){
danfc26f7c2016-04-14 15:44:371008 i64 mxCache; /* Cache size in bytes*/
drh3bd17912015-01-02 15:55:291009 u32 szPma = sqlite3GlobalConfig.szPma;
1010 pSorter->mnPmaSize = szPma * pgsz;
danfc26f7c2016-04-14 15:44:371011
dan2f170012014-03-28 19:18:161012 mxCache = db->aDb[0].pSchema->cache_size;
danfc26f7c2016-04-14 15:44:371013 if( mxCache<0 ){
1014 /* A negative cache-size value C indicates that the cache is abs(C)
1015 ** KiB in size. */
1016 mxCache = mxCache * -1024;
1017 }else{
1018 mxCache = mxCache * pgsz;
1019 }
1020 mxCache = MIN(mxCache, SQLITE_MAX_PMASZ);
1021 pSorter->mxPmaSize = MAX(pSorter->mnPmaSize, (int)mxCache);
dan2f170012014-03-28 19:18:161022
drhb2a0f752017-08-28 15:51:351023 /* Avoid large memory allocations if the application has requested
1024 ** SQLITE_CONFIG_SMALL_MALLOC. */
1025 if( sqlite3GlobalConfig.bSmallMalloc==0 ){
dan2f170012014-03-28 19:18:161026 assert( pSorter->iMemory==0 );
1027 pSorter->nMemory = pgsz;
dan82a8a9f2014-04-12 19:34:441028 pSorter->list.aMemory = (u8*)sqlite3Malloc(pgsz);
mistachkinfad30392016-02-13 23:43:461029 if( !pSorter->list.aMemory ) rc = SQLITE_NOMEM_BKPT;
dan2f170012014-03-28 19:18:161030 }
dan69719522014-03-27 19:25:021031 }
dan57a14092015-03-26 11:55:031032
larrybrbc917382023-06-07 08:40:311033 if( pKeyInfo->nAllField<13
dana9d91112015-03-28 19:56:411034 && (pKeyInfo->aColl[0]==0 || pKeyInfo->aColl[0]==db->pDfltColl)
dan6e118922019-08-12 16:36:381035 && (pKeyInfo->aSortFlags[0] & KEYINFO_ORDER_BIGNULL)==0
dana9d91112015-03-28 19:56:411036 ){
dan57a14092015-03-26 11:55:031037 pSorter->typeMask = SORTER_TYPE_INTEGER | SORTER_TYPE_TEXT;
1038 }
drhca892a72011-09-03 00:17:511039 }
dan5134d132011-09-02 10:31:111040
dan2f170012014-03-28 19:18:161041 return rc;
dan5134d132011-09-02 10:31:111042}
drh8f0dab32014-05-16 12:18:081043#undef nWorker /* Defined at the top of this function */
dan5134d132011-09-02 10:31:111044
1045/*
1046** Free the list of sorted records starting at pRecord.
1047*/
1048static void vdbeSorterRecordFree(sqlite3 *db, SorterRecord *pRecord){
1049 SorterRecord *p;
1050 SorterRecord *pNext;
1051 for(p=pRecord; p; p=pNext){
dan69719522014-03-27 19:25:021052 pNext = p->u.pNext;
dan5134d132011-09-02 10:31:111053 sqlite3DbFree(db, p);
1054 }
dana20fde62011-07-12 14:28:051055}
1056
1057/*
larrybrbc917382023-06-07 08:40:311058** Free all resources owned by the object indicated by argument pTask. All
drha634fb12014-04-03 02:54:271059** fields of *pTask are zeroed before returning.
danf8768412014-03-17 15:43:051060*/
drha634fb12014-04-03 02:54:271061static void vdbeSortSubtaskCleanup(sqlite3 *db, SortSubtask *pTask){
1062 sqlite3DbFree(db, pTask->pUnpacked);
drh5f4a4792014-05-16 20:24:511063#if SQLITE_MAX_WORKER_THREADS>0
1064 /* pTask->list.aMemory can only be non-zero if it was handed memory
1065 ** from the main thread. That only occurs SQLITE_MAX_WORKER_THREADS>0 */
1066 if( pTask->list.aMemory ){
dan82a8a9f2014-04-12 19:34:441067 sqlite3_free(pTask->list.aMemory);
drh5f4a4792014-05-16 20:24:511068 }else
1069#endif
1070 {
1071 assert( pTask->list.aMemory==0 );
1072 vdbeSorterRecordFree(0, pTask->list.pList);
dan2f170012014-03-28 19:18:161073 }
dand30ab3d2014-04-09 20:04:171074 if( pTask->file.pFd ){
1075 sqlite3OsCloseFree(pTask->file.pFd);
danf8768412014-03-17 15:43:051076 }
dan4be4c402014-04-11 19:43:071077 if( pTask->file2.pFd ){
1078 sqlite3OsCloseFree(pTask->file2.pFd);
dan4be4c402014-04-11 19:43:071079 }
dan96974bd2015-04-11 20:20:291080 memset(pTask, 0, sizeof(SortSubtask));
danf8768412014-03-17 15:43:051081}
1082
dan82a8a9f2014-04-12 19:34:441083#ifdef SQLITE_DEBUG_SORTER_THREADS
1084static void vdbeSorterWorkDebug(SortSubtask *pTask, const char *zEvent){
1085 i64 t;
1086 int iTask = (pTask - pTask->pSorter->aTask);
dana9f43d72014-04-17 08:57:171087 sqlite3OsCurrentTimeInt64(pTask->pSorter->db->pVfs, &t);
dan82a8a9f2014-04-12 19:34:441088 fprintf(stderr, "%lld:%d %s\n", t, iTask, zEvent);
1089}
drh958d2612014-04-18 13:40:071090static void vdbeSorterRewindDebug(const char *zEvent){
drha959bf52021-06-15 15:15:401091 i64 t = 0;
1092 sqlite3_vfs *pVfs = sqlite3_vfs_find(0);
1093 if( ALWAYS(pVfs) ) sqlite3OsCurrentTimeInt64(pVfs, &t);
dan82a8a9f2014-04-12 19:34:441094 fprintf(stderr, "%lld:X %s\n", t, zEvent);
1095}
1096static void vdbeSorterPopulateDebug(
1097 SortSubtask *pTask,
1098 const char *zEvent
1099){
1100 i64 t;
1101 int iTask = (pTask - pTask->pSorter->aTask);
dana9f43d72014-04-17 08:57:171102 sqlite3OsCurrentTimeInt64(pTask->pSorter->db->pVfs, &t);
dan82a8a9f2014-04-12 19:34:441103 fprintf(stderr, "%lld:bg%d %s\n", t, iTask, zEvent);
1104}
1105static void vdbeSorterBlockDebug(
1106 SortSubtask *pTask,
1107 int bBlocked,
1108 const char *zEvent
1109){
1110 if( bBlocked ){
1111 i64 t;
dana9f43d72014-04-17 08:57:171112 sqlite3OsCurrentTimeInt64(pTask->pSorter->db->pVfs, &t);
dan82a8a9f2014-04-12 19:34:441113 fprintf(stderr, "%lld:main %s\n", t, zEvent);
1114 }
1115}
1116#else
1117# define vdbeSorterWorkDebug(x,y)
drh958d2612014-04-18 13:40:071118# define vdbeSorterRewindDebug(y)
dan82a8a9f2014-04-12 19:34:441119# define vdbeSorterPopulateDebug(x,y)
1120# define vdbeSorterBlockDebug(x,y,z)
1121#endif
1122
danb3f56fd2014-03-31 19:57:341123#if SQLITE_MAX_WORKER_THREADS>0
dan82a8a9f2014-04-12 19:34:441124/*
dan1a088a82014-04-15 19:52:341125** Join thread pTask->thread.
dan82a8a9f2014-04-12 19:34:441126*/
dan1a088a82014-04-15 19:52:341127static int vdbeSorterJoinThread(SortSubtask *pTask){
dan82a8a9f2014-04-12 19:34:441128 int rc = SQLITE_OK;
dan1a088a82014-04-15 19:52:341129 if( pTask->pThread ){
dan82a8a9f2014-04-12 19:34:441130#ifdef SQLITE_DEBUG_SORTER_THREADS
dan1a088a82014-04-15 19:52:341131 int bDone = pTask->bDone;
dan82a8a9f2014-04-12 19:34:441132#endif
drhb92284d2014-07-29 18:46:301133 void *pRet = SQLITE_INT_TO_PTR(SQLITE_ERROR);
dan82a8a9f2014-04-12 19:34:441134 vdbeSorterBlockDebug(pTask, !bDone, "enter");
drhb92284d2014-07-29 18:46:301135 (void)sqlite3ThreadJoin(pTask->pThread, &pRet);
dan82a8a9f2014-04-12 19:34:441136 vdbeSorterBlockDebug(pTask, !bDone, "exit");
drhb92284d2014-07-29 18:46:301137 rc = SQLITE_PTR_TO_INT(pRet);
dan1a088a82014-04-15 19:52:341138 assert( pTask->bDone==1 );
1139 pTask->bDone = 0;
1140 pTask->pThread = 0;
dan82a8a9f2014-04-12 19:34:441141 }
1142 return rc;
1143}
1144
1145/*
1146** Launch a background thread to run xTask(pIn).
1147*/
1148static int vdbeSorterCreateThread(
dan1a088a82014-04-15 19:52:341149 SortSubtask *pTask, /* Thread will use this task object */
dan82a8a9f2014-04-12 19:34:441150 void *(*xTask)(void*), /* Routine to run in a separate thread */
1151 void *pIn /* Argument passed into xTask() */
1152){
dan1a088a82014-04-15 19:52:341153 assert( pTask->pThread==0 && pTask->bDone==0 );
1154 return sqlite3ThreadCreate(&pTask->pThread, xTask, pIn);
dan82a8a9f2014-04-12 19:34:441155}
1156
1157/*
larrybrbc917382023-06-07 08:40:311158** Join all outstanding threads launched by SorterWrite() to create
dan82a8a9f2014-04-12 19:34:441159** level-0 PMAs.
1160*/
danf8768412014-03-17 15:43:051161static int vdbeSorterJoinAll(VdbeSorter *pSorter, int rcin){
1162 int rc = rcin;
1163 int i;
dan0d3a4082014-05-05 15:58:401164
1165 /* This function is always called by the main user thread.
1166 **
larrybrbc917382023-06-07 08:40:311167 ** If this function is being called after SorterRewind() has been called,
dan0d3a4082014-05-05 15:58:401168 ** it is possible that thread pSorter->aTask[pSorter->nTask-1].pThread
1169 ** is currently attempt to join one of the other threads. To avoid a race
larrybrbc917382023-06-07 08:40:311170 ** condition where this thread also attempts to join the same object, join
dan0d3a4082014-05-05 15:58:401171 ** thread pSorter->aTask[pSorter->nTask-1].pThread first. */
1172 for(i=pSorter->nTask-1; i>=0; i--){
drha634fb12014-04-03 02:54:271173 SortSubtask *pTask = &pSorter->aTask[i];
dan1a088a82014-04-15 19:52:341174 int rc2 = vdbeSorterJoinThread(pTask);
dan82a8a9f2014-04-12 19:34:441175 if( rc==SQLITE_OK ) rc = rc2;
danf8768412014-03-17 15:43:051176 }
1177 return rc;
1178}
danb3f56fd2014-03-31 19:57:341179#else
1180# define vdbeSorterJoinAll(x,rcin) (rcin)
dan1a088a82014-04-15 19:52:341181# define vdbeSorterJoinThread(pTask) SQLITE_OK
danb3f56fd2014-03-31 19:57:341182#endif
danf8768412014-03-17 15:43:051183
1184/*
drhac651962014-07-28 14:54:501185** Allocate a new MergeEngine object capable of handling up to
1186** nReader PmaReader inputs.
1187**
1188** nReader is automatically rounded up to the next power of two.
1189** nReader may not exceed SORTER_MAX_MERGE_COUNT even after rounding up.
danf8768412014-03-17 15:43:051190*/
drhde823be2014-05-20 11:03:531191static MergeEngine *vdbeMergeEngineNew(int nReader){
1192 int N = 2; /* Smallest power of two >= nReader */
drhef86b942025-02-17 17:33:141193 i64 nByte; /* Total bytes of space to allocate */
drha634fb12014-04-03 02:54:271194 MergeEngine *pNew; /* Pointer to allocated object to return */
danf8768412014-03-17 15:43:051195
drhde823be2014-05-20 11:03:531196 assert( nReader<=SORTER_MAX_MERGE_COUNT );
dand30ab3d2014-04-09 20:04:171197
drhde823be2014-05-20 11:03:531198 while( N<nReader ) N += N;
drha634fb12014-04-03 02:54:271199 nByte = sizeof(MergeEngine) + N * (sizeof(int) + sizeof(PmaReader));
danf8768412014-03-17 15:43:051200
drh190d6952014-05-16 17:31:421201 pNew = sqlite3FaultSim(100) ? 0 : (MergeEngine*)sqlite3MallocZero(nByte);
danf8768412014-03-17 15:43:051202 if( pNew ){
1203 pNew->nTree = N;
drhac651962014-07-28 14:54:501204 pNew->pTask = 0;
drhde823be2014-05-20 11:03:531205 pNew->aReadr = (PmaReader*)&pNew[1];
1206 pNew->aTree = (int*)&pNew->aReadr[N];
danf8768412014-03-17 15:43:051207 }
danf8768412014-03-17 15:43:051208 return pNew;
1209}
1210
1211/*
drha634fb12014-04-03 02:54:271212** Free the MergeEngine object passed as the only argument.
drh5c2b3142014-03-25 13:17:411213*/
drha634fb12014-04-03 02:54:271214static void vdbeMergeEngineFree(MergeEngine *pMerger){
drh5c2b3142014-03-25 13:17:411215 int i;
1216 if( pMerger ){
1217 for(i=0; i<pMerger->nTree; i++){
drhde823be2014-05-20 11:03:531218 vdbePmaReaderClear(&pMerger->aReadr[i]);
drh5c2b3142014-03-25 13:17:411219 }
1220 }
drh5c2b3142014-03-25 13:17:411221 sqlite3_free(pMerger);
1222}
1223
1224/*
dan1a088a82014-04-15 19:52:341225** Free all resources associated with the IncrMerger object indicated by
1226** the first argument.
1227*/
1228static void vdbeIncrFree(IncrMerger *pIncr){
1229 if( pIncr ){
1230#if SQLITE_MAX_WORKER_THREADS>0
1231 if( pIncr->bUseThread ){
1232 vdbeSorterJoinThread(pIncr->pTask);
1233 if( pIncr->aFile[0].pFd ) sqlite3OsCloseFree(pIncr->aFile[0].pFd);
1234 if( pIncr->aFile[1].pFd ) sqlite3OsCloseFree(pIncr->aFile[1].pFd);
1235 }
1236#endif
1237 vdbeMergeEngineFree(pIncr->pMerger);
1238 sqlite3_free(pIncr);
1239 }
1240}
1241
1242/*
drh65ea12c2014-03-19 17:41:361243** Reset a sorting cursor back to its original empty state.
1244*/
1245void sqlite3VdbeSorterReset(sqlite3 *db, VdbeSorter *pSorter){
drh5c2b3142014-03-25 13:17:411246 int i;
drha634fb12014-04-03 02:54:271247 (void)vdbeSorterJoinAll(pSorter, SQLITE_OK);
drh6cc37592014-05-15 16:56:561248 assert( pSorter->bUseThreads || pSorter->pReader==0 );
1249#if SQLITE_MAX_WORKER_THREADS>0
dand30ab3d2014-04-09 20:04:171250 if( pSorter->pReader ){
1251 vdbePmaReaderClear(pSorter->pReader);
1252 sqlite3DbFree(db, pSorter->pReader);
1253 pSorter->pReader = 0;
drh65ea12c2014-03-19 17:41:361254 }
drh6cc37592014-05-15 16:56:561255#endif
danf77ceba2014-04-14 18:41:211256 vdbeMergeEngineFree(pSorter->pMerger);
1257 pSorter->pMerger = 0;
drha634fb12014-04-03 02:54:271258 for(i=0; i<pSorter->nTask; i++){
1259 SortSubtask *pTask = &pSorter->aTask[i];
1260 vdbeSortSubtaskCleanup(db, pTask);
dan96974bd2015-04-11 20:20:291261 pTask->pSorter = pSorter;
drh65ea12c2014-03-19 17:41:361262 }
dan82a8a9f2014-04-12 19:34:441263 if( pSorter->list.aMemory==0 ){
1264 vdbeSorterRecordFree(0, pSorter->list.pList);
danface0872014-03-27 17:23:411265 }
dan82a8a9f2014-04-12 19:34:441266 pSorter->list.pList = 0;
1267 pSorter->list.szPMA = 0;
drh5c2b3142014-03-25 13:17:411268 pSorter->bUsePMA = 0;
danface0872014-03-27 17:23:411269 pSorter->iMemory = 0;
dan4be4c402014-04-11 19:43:071270 pSorter->mxKeysize = 0;
dand30ab3d2014-04-09 20:04:171271 sqlite3DbFree(db, pSorter->pUnpacked);
1272 pSorter->pUnpacked = 0;
drh65ea12c2014-03-19 17:41:361273}
1274
drh65ea12c2014-03-19 17:41:361275/*
dana20fde62011-07-12 14:28:051276** Free any cursor components allocated by sqlite3VdbeSorterXXX routines.
1277*/
1278void sqlite3VdbeSorterClose(sqlite3 *db, VdbeCursor *pCsr){
drhc960dcb2015-11-20 19:22:011279 VdbeSorter *pSorter;
1280 assert( pCsr->eCurType==CURTYPE_SORTER );
1281 pSorter = pCsr->uc.pSorter;
dana20fde62011-07-12 14:28:051282 if( pSorter ){
drh65ea12c2014-03-19 17:41:361283 sqlite3VdbeSorterReset(db, pSorter);
dan82a8a9f2014-04-12 19:34:441284 sqlite3_free(pSorter->list.aMemory);
dana20fde62011-07-12 14:28:051285 sqlite3DbFree(db, pSorter);
drhc960dcb2015-11-20 19:22:011286 pCsr->uc.pSorter = 0;
dana20fde62011-07-12 14:28:051287 }
1288}
1289
dana9f43d72014-04-17 08:57:171290#if SQLITE_MAX_MMAP_SIZE>0
1291/*
1292** The first argument is a file-handle open on a temporary file. The file
1293** is guaranteed to be nByte bytes or smaller in size. This function
1294** attempts to extend the file to nByte bytes in size and to ensure that
1295** the VFS has memory mapped it.
1296**
1297** Whether or not the file does end up memory mapped of course depends on
1298** the specific VFS implementation.
1299*/
drha4c8ca02014-07-28 17:18:281300static void vdbeSorterExtendFile(sqlite3 *db, sqlite3_file *pFd, i64 nByte){
drhd74a90e2014-09-19 19:43:201301 if( nByte<=(i64)(db->nMaxSorterMmap) && pFd->pMethods->iVersion>=3 ){
dand348c662014-12-30 14:40:531302 void *p = 0;
1303 int chunksize = 4*1024;
1304 sqlite3OsFileControlHint(pFd, SQLITE_FCNTL_CHUNK_SIZE, &chunksize);
1305 sqlite3OsFileControlHint(pFd, SQLITE_FCNTL_SIZE_HINT, &nByte);
1306 sqlite3OsFetch(pFd, 0, (int)nByte, &p);
drh204b4192024-02-07 19:17:441307 if( p ) sqlite3OsUnfetch(pFd, 0, p);
dana9f43d72014-04-17 08:57:171308 }
1309}
1310#else
drhcd4b6372014-07-29 17:22:121311# define vdbeSorterExtendFile(x,y,z)
dana9f43d72014-04-17 08:57:171312#endif
1313
dana20fde62011-07-12 14:28:051314/*
danc6e73452011-08-04 12:14:041315** Allocate space for a file-handle and open a temporary file. If successful,
drha4c8ca02014-07-28 17:18:281316** set *ppFd to point to the malloc'd file-handle and return SQLITE_OK.
1317** Otherwise, set *ppFd to 0 and return an SQLite error code.
danc6e73452011-08-04 12:14:041318*/
dana9f43d72014-04-17 08:57:171319static int vdbeSorterOpenTempFile(
1320 sqlite3 *db, /* Database handle doing sort */
1321 i64 nExtend, /* Attempt to extend file to this size */
drha4c8ca02014-07-28 17:18:281322 sqlite3_file **ppFd
dana9f43d72014-04-17 08:57:171323){
danface0872014-03-27 17:23:411324 int rc;
drh2b3f1402015-03-18 16:00:441325 if( sqlite3FaultSim(202) ) return SQLITE_IOERR_ACCESS;
drha4c8ca02014-07-28 17:18:281326 rc = sqlite3OsOpenMalloc(db->pVfs, 0, ppFd,
dan9d711422011-08-15 14:41:011327 SQLITE_OPEN_TEMP_JOURNAL |
1328 SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE |
danface0872014-03-27 17:23:411329 SQLITE_OPEN_EXCLUSIVE | SQLITE_OPEN_DELETEONCLOSE, &rc
danc6e73452011-08-04 12:14:041330 );
danface0872014-03-27 17:23:411331 if( rc==SQLITE_OK ){
1332 i64 max = SQLITE_MAX_MMAP_SIZE;
drha4c8ca02014-07-28 17:18:281333 sqlite3OsFileControlHint(*ppFd, SQLITE_FCNTL_MMAP_SIZE, (void*)&max);
dana9f43d72014-04-17 08:57:171334 if( nExtend>0 ){
drha4c8ca02014-07-28 17:18:281335 vdbeSorterExtendFile(db, *ppFd, nExtend);
dana9f43d72014-04-17 08:57:171336 }
danface0872014-03-27 17:23:411337 }
1338 return rc;
danc6e73452011-08-04 12:14:041339}
1340
dan5134d132011-09-02 10:31:111341/*
larrybrbc917382023-06-07 08:40:311342** If it has not already been allocated, allocate the UnpackedRecord
1343** structure at pTask->pUnpacked. Return SQLITE_OK if successful (or
dan1a088a82014-04-15 19:52:341344** if no allocation was required), or SQLITE_NOMEM otherwise.
1345*/
dan82a8a9f2014-04-12 19:34:441346static int vdbeSortAllocUnpacked(SortSubtask *pTask){
1347 if( pTask->pUnpacked==0 ){
drha582b012016-12-21 19:45:541348 pTask->pUnpacked = sqlite3VdbeAllocUnpackedRecord(pTask->pSorter->pKeyInfo);
1349 if( pTask->pUnpacked==0 ) return SQLITE_NOMEM_BKPT;
drha485ad12017-08-02 22:43:141350 pTask->pUnpacked->nField = pTask->pSorter->pKeyInfo->nKeyField;
dan82a8a9f2014-04-12 19:34:441351 pTask->pUnpacked->errCode = 0;
1352 }
1353 return SQLITE_OK;
1354}
1355
1356
dan5134d132011-09-02 10:31:111357/*
drh59ebc992011-09-14 13:23:211358** Merge the two sorted lists p1 and p2 into a single list.
dan5134d132011-09-02 10:31:111359*/
drhb982bfe2016-05-20 14:54:541360static SorterRecord *vdbeSorterMerge(
drha634fb12014-04-03 02:54:271361 SortSubtask *pTask, /* Calling thread context */
dan5134d132011-09-02 10:31:111362 SorterRecord *p1, /* First list to merge */
drhb982bfe2016-05-20 14:54:541363 SorterRecord *p2 /* Second list to merge */
dan5134d132011-09-02 10:31:111364){
dan5134d132011-09-02 10:31:111365 SorterRecord *pFinal = 0;
1366 SorterRecord **pp = &pFinal;
dana9d91112015-03-28 19:56:411367 int bCached = 0;
dan5134d132011-09-02 10:31:111368
drhb982bfe2016-05-20 14:54:541369 assert( p1!=0 && p2!=0 );
1370 for(;;){
drh04a962f2011-09-03 16:42:381371 int res;
dana9d91112015-03-28 19:56:411372 res = pTask->xCompare(
1373 pTask, &bCached, SRVAL(p1), p1->nVal, SRVAL(p2), p2->nVal
1374 );
1375
drh04a962f2011-09-03 16:42:381376 if( res<=0 ){
1377 *pp = p1;
dan69719522014-03-27 19:25:021378 pp = &p1->u.pNext;
1379 p1 = p1->u.pNext;
drhb982bfe2016-05-20 14:54:541380 if( p1==0 ){
1381 *pp = p2;
1382 break;
1383 }
drh04a962f2011-09-03 16:42:381384 }else{
1385 *pp = p2;
dana9d91112015-03-28 19:56:411386 pp = &p2->u.pNext;
dan69719522014-03-27 19:25:021387 p2 = p2->u.pNext;
dana9d91112015-03-28 19:56:411388 bCached = 0;
drhb982bfe2016-05-20 14:54:541389 if( p2==0 ){
1390 *pp = p1;
1391 break;
1392 }
dan5134d132011-09-02 10:31:111393 }
1394 }
drhb982bfe2016-05-20 14:54:541395 return pFinal;
dan5134d132011-09-02 10:31:111396}
dan1e74e602011-08-06 12:01:581397
danc6e73452011-08-04 12:14:041398/*
dana9d91112015-03-28 19:56:411399** Return the SorterCompare function to compare values collected by the
1400** sorter object passed as the only argument.
1401*/
1402static SorterCompare vdbeSorterGetCompare(VdbeSorter *p){
1403 if( p->typeMask==SORTER_TYPE_INTEGER ){
1404 return vdbeSorterCompareInt;
1405 }else if( p->typeMask==SORTER_TYPE_TEXT ){
larrybrbc917382023-06-07 08:40:311406 return vdbeSorterCompareText;
dana9d91112015-03-28 19:56:411407 }
1408 return vdbeSorterCompare;
1409}
1410
1411/*
larrybrbc917382023-06-07 08:40:311412** Sort the linked list of records headed at pTask->pList. Return
1413** SQLITE_OK if successful, or an SQLite error code (i.e. SQLITE_NOMEM) if
danf8768412014-03-17 15:43:051414** an error occurs.
dan5134d132011-09-02 10:31:111415*/
dan82a8a9f2014-04-12 19:34:441416static int vdbeSorterSort(SortSubtask *pTask, SorterList *pList){
dan5134d132011-09-02 10:31:111417 int i;
dan5134d132011-09-02 10:31:111418 SorterRecord *p;
dan82a8a9f2014-04-12 19:34:441419 int rc;
drh38587152019-10-07 20:33:261420 SorterRecord *aSlot[64];
dan82a8a9f2014-04-12 19:34:441421
1422 rc = vdbeSortAllocUnpacked(pTask);
1423 if( rc!=SQLITE_OK ) return rc;
dan5134d132011-09-02 10:31:111424
dan57a14092015-03-26 11:55:031425 p = pList->pList;
dana9d91112015-03-28 19:56:411426 pTask->xCompare = vdbeSorterGetCompare(pTask->pSorter);
drh38587152019-10-07 20:33:261427 memset(aSlot, 0, sizeof(aSlot));
dan5134d132011-09-02 10:31:111428
dan5134d132011-09-02 10:31:111429 while( p ){
dan69719522014-03-27 19:25:021430 SorterRecord *pNext;
dan82a8a9f2014-04-12 19:34:441431 if( pList->aMemory ){
1432 if( (u8*)p==pList->aMemory ){
dan69719522014-03-27 19:25:021433 pNext = 0;
1434 }else{
dan82a8a9f2014-04-12 19:34:441435 assert( p->u.iNext<sqlite3MallocSize(pList->aMemory) );
1436 pNext = (SorterRecord*)&pList->aMemory[p->u.iNext];
dan69719522014-03-27 19:25:021437 }
1438 }else{
1439 pNext = p->u.pNext;
1440 }
dan2f170012014-03-28 19:18:161441
dan69719522014-03-27 19:25:021442 p->u.pNext = 0;
drh59ebc992011-09-14 13:23:211443 for(i=0; aSlot[i]; i++){
drhb982bfe2016-05-20 14:54:541444 p = vdbeSorterMerge(pTask, p, aSlot[i]);
drh8346cee2025-03-01 11:47:011445 /* ,--Each aSlot[] holds twice as much as the previous. So we cannot use
1446 ** | up all 64 aSlots[] with only a 64-bit address space.
1447 ** v */
1448 assert( i<ArraySize(aSlot) );
dan5134d132011-09-02 10:31:111449 aSlot[i] = 0;
1450 }
dan5134d132011-09-02 10:31:111451 aSlot[i] = p;
1452 p = pNext;
1453 }
1454
1455 p = 0;
drh38587152019-10-07 20:33:261456 for(i=0; i<ArraySize(aSlot); i++){
drhb982bfe2016-05-20 14:54:541457 if( aSlot[i]==0 ) continue;
1458 p = p ? vdbeSorterMerge(pTask, p, aSlot[i]) : aSlot[i];
dan5134d132011-09-02 10:31:111459 }
dan82a8a9f2014-04-12 19:34:441460 pList->pList = p;
dan5134d132011-09-02 10:31:111461
larrybrbc917382023-06-07 08:40:311462 assert( pTask->pUnpacked->errCode==SQLITE_OK
1463 || pTask->pUnpacked->errCode==SQLITE_NOMEM
dand94d4ee2014-05-05 09:08:541464 );
1465 return pTask->pUnpacked->errCode;
dan5134d132011-09-02 10:31:111466}
1467
dan3b2c9b32012-07-23 19:25:391468/*
drha634fb12014-04-03 02:54:271469** Initialize a PMA-writer object.
dan3b2c9b32012-07-23 19:25:391470*/
drha634fb12014-04-03 02:54:271471static void vdbePmaWriterInit(
drha4c8ca02014-07-28 17:18:281472 sqlite3_file *pFd, /* File handle to write to */
drha634fb12014-04-03 02:54:271473 PmaWriter *p, /* Object to populate */
danf8768412014-03-17 15:43:051474 int nBuf, /* Buffer size */
drha4c8ca02014-07-28 17:18:281475 i64 iStart /* Offset of pFd to begin writing at */
dan3b2c9b32012-07-23 19:25:391476){
drha634fb12014-04-03 02:54:271477 memset(p, 0, sizeof(PmaWriter));
danf8768412014-03-17 15:43:051478 p->aBuffer = (u8*)sqlite3Malloc(nBuf);
drh07f54792012-08-07 22:53:011479 if( !p->aBuffer ){
mistachkinfad30392016-02-13 23:43:461480 p->eFWErr = SQLITE_NOMEM_BKPT;
drh07f54792012-08-07 22:53:011481 }else{
1482 p->iBufEnd = p->iBufStart = (iStart % nBuf);
1483 p->iWriteOff = iStart - p->iBufStart;
1484 p->nBuffer = nBuf;
drha4c8ca02014-07-28 17:18:281485 p->pFd = pFd;
drh07f54792012-08-07 22:53:011486 }
dan3b2c9b32012-07-23 19:25:391487}
1488
1489/*
drha634fb12014-04-03 02:54:271490** Write nData bytes of data to the PMA. Return SQLITE_OK
dan3b2c9b32012-07-23 19:25:391491** if successful, or an SQLite error code if an error occurs.
1492*/
drha634fb12014-04-03 02:54:271493static void vdbePmaWriteBlob(PmaWriter *p, u8 *pData, int nData){
dan3b2c9b32012-07-23 19:25:391494 int nRem = nData;
drh07f54792012-08-07 22:53:011495 while( nRem>0 && p->eFWErr==0 ){
dan3b2c9b32012-07-23 19:25:391496 int nCopy = nRem;
1497 if( nCopy>(p->nBuffer - p->iBufEnd) ){
1498 nCopy = p->nBuffer - p->iBufEnd;
1499 }
1500
1501 memcpy(&p->aBuffer[p->iBufEnd], &pData[nData-nRem], nCopy);
1502 p->iBufEnd += nCopy;
1503 if( p->iBufEnd==p->nBuffer ){
larrybrbc917382023-06-07 08:40:311504 p->eFWErr = sqlite3OsWrite(p->pFd,
1505 &p->aBuffer[p->iBufStart], p->iBufEnd - p->iBufStart,
dan3b2c9b32012-07-23 19:25:391506 p->iWriteOff + p->iBufStart
1507 );
dan3b2c9b32012-07-23 19:25:391508 p->iBufStart = p->iBufEnd = 0;
1509 p->iWriteOff += p->nBuffer;
1510 }
1511 assert( p->iBufEnd<p->nBuffer );
1512
1513 nRem -= nCopy;
1514 }
dan3b2c9b32012-07-23 19:25:391515}
1516
1517/*
drha634fb12014-04-03 02:54:271518** Flush any buffered data to disk and clean up the PMA-writer object.
1519** The results of using the PMA-writer after this call are undefined.
larrybrbc917382023-06-07 08:40:311520** Return SQLITE_OK if flushing the buffered data succeeds or is not
dan3b2c9b32012-07-23 19:25:391521** required. Otherwise, return an SQLite error code.
1522**
1523** Before returning, set *piEof to the offset immediately following the
1524** last byte written to the file.
1525*/
drha634fb12014-04-03 02:54:271526static int vdbePmaWriterFinish(PmaWriter *p, i64 *piEof){
drh07f54792012-08-07 22:53:011527 int rc;
1528 if( p->eFWErr==0 && ALWAYS(p->aBuffer) && p->iBufEnd>p->iBufStart ){
larrybrbc917382023-06-07 08:40:311529 p->eFWErr = sqlite3OsWrite(p->pFd,
1530 &p->aBuffer[p->iBufStart], p->iBufEnd - p->iBufStart,
dan3b2c9b32012-07-23 19:25:391531 p->iWriteOff + p->iBufStart
1532 );
1533 }
1534 *piEof = (p->iWriteOff + p->iBufEnd);
danf8768412014-03-17 15:43:051535 sqlite3_free(p->aBuffer);
drh07f54792012-08-07 22:53:011536 rc = p->eFWErr;
drha634fb12014-04-03 02:54:271537 memset(p, 0, sizeof(PmaWriter));
dan3b2c9b32012-07-23 19:25:391538 return rc;
1539}
1540
1541/*
larrybrbc917382023-06-07 08:40:311542** Write value iVal encoded as a varint to the PMA. Return
dan3b2c9b32012-07-23 19:25:391543** SQLITE_OK if successful, or an SQLite error code if an error occurs.
1544*/
drha634fb12014-04-03 02:54:271545static void vdbePmaWriteVarint(PmaWriter *p, u64 iVal){
larrybrbc917382023-06-07 08:40:311546 int nByte;
dan3b2c9b32012-07-23 19:25:391547 u8 aByte[10];
1548 nByte = sqlite3PutVarint(aByte, iVal);
drha634fb12014-04-03 02:54:271549 vdbePmaWriteBlob(p, aByte, nByte);
dan3b2c9b32012-07-23 19:25:391550}
dan5134d132011-09-02 10:31:111551
1552/*
dan82a8a9f2014-04-12 19:34:441553** Write the current contents of in-memory linked-list pList to a level-0
larrybrbc917382023-06-07 08:40:311554** PMA in the temp file belonging to sub-task pTask. Return SQLITE_OK if
dan82a8a9f2014-04-12 19:34:441555** successful, or an SQLite error code otherwise.
dane6f7bc62011-08-12 16:11:431556**
1557** The format of a PMA is:
1558**
1559** * A varint. This varint contains the total number of bytes of content
1560** in the PMA (not including the varint itself).
1561**
larrybrbc917382023-06-07 08:40:311562** * One or more records packed end-to-end in order of ascending keys.
1563** Each record consists of a varint followed by a blob of data (the
dane6f7bc62011-08-12 16:11:431564** key). The varint is the number of bytes in the blob of data.
danc6e73452011-08-04 12:14:041565*/
dan82a8a9f2014-04-12 19:34:441566static int vdbeSorterListToPMA(SortSubtask *pTask, SorterList *pList){
dan1a088a82014-04-15 19:52:341567 sqlite3 *db = pTask->pSorter->db;
danc6e73452011-08-04 12:14:041568 int rc = SQLITE_OK; /* Return code */
drha634fb12014-04-03 02:54:271569 PmaWriter writer; /* Object used to write to the file */
dan3b2c9b32012-07-23 19:25:391570
dan82a8a9f2014-04-12 19:34:441571#ifdef SQLITE_DEBUG
larrybrbc917382023-06-07 08:40:311572 /* Set iSz to the expected size of file pTask->file after writing the PMA.
dan82a8a9f2014-04-12 19:34:441573 ** This is used by an assert() statement at the end of this function. */
1574 i64 iSz = pList->szPMA + sqlite3VarintLen(pList->szPMA) + pTask->file.iEof;
1575#endif
danc6e73452011-08-04 12:14:041576
dan82a8a9f2014-04-12 19:34:441577 vdbeSorterWorkDebug(pTask, "enter");
drha634fb12014-04-03 02:54:271578 memset(&writer, 0, sizeof(PmaWriter));
dan82a8a9f2014-04-12 19:34:441579 assert( pList->szPMA>0 );
danc6e73452011-08-04 12:14:041580
1581 /* If the first temporary PMA file has not been opened, open it now. */
dand30ab3d2014-04-09 20:04:171582 if( pTask->file.pFd==0 ){
dana9f43d72014-04-17 08:57:171583 rc = vdbeSorterOpenTempFile(db, 0, &pTask->file.pFd);
dand30ab3d2014-04-09 20:04:171584 assert( rc!=SQLITE_OK || pTask->file.pFd );
1585 assert( pTask->file.iEof==0 );
drha634fb12014-04-03 02:54:271586 assert( pTask->nPMA==0 );
danc6e73452011-08-04 12:14:041587 }
1588
danface0872014-03-27 17:23:411589 /* Try to get the file to memory map */
1590 if( rc==SQLITE_OK ){
dan1a088a82014-04-15 19:52:341591 vdbeSorterExtendFile(db, pTask->file.pFd, pTask->file.iEof+pList->szPMA+9);
danface0872014-03-27 17:23:411592 }
1593
dan82a8a9f2014-04-12 19:34:441594 /* Sort the list */
1595 if( rc==SQLITE_OK ){
1596 rc = vdbeSorterSort(pTask, pList);
danc6e73452011-08-04 12:14:041597 }
1598
1599 if( rc==SQLITE_OK ){
dan5134d132011-09-02 10:31:111600 SorterRecord *p;
1601 SorterRecord *pNext = 0;
dan3b2c9b32012-07-23 19:25:391602
dan1a088a82014-04-15 19:52:341603 vdbePmaWriterInit(pTask->file.pFd, &writer, pTask->pSorter->pgsz,
dand30ab3d2014-04-09 20:04:171604 pTask->file.iEof);
drha634fb12014-04-03 02:54:271605 pTask->nPMA++;
dan82a8a9f2014-04-12 19:34:441606 vdbePmaWriteVarint(&writer, pList->szPMA);
1607 for(p=pList->pList; p; p=pNext){
dan69719522014-03-27 19:25:021608 pNext = p->u.pNext;
drha634fb12014-04-03 02:54:271609 vdbePmaWriteVarint(&writer, p->nVal);
1610 vdbePmaWriteBlob(&writer, SRVAL(p), p->nVal);
dan82a8a9f2014-04-12 19:34:441611 if( pList->aMemory==0 ) sqlite3_free(p);
danc6e73452011-08-04 12:14:041612 }
dan82a8a9f2014-04-12 19:34:441613 pList->pList = p;
dand30ab3d2014-04-09 20:04:171614 rc = vdbePmaWriterFinish(&writer, &pTask->file.iEof);
danf8768412014-03-17 15:43:051615 }
1616
dan82a8a9f2014-04-12 19:34:441617 vdbeSorterWorkDebug(pTask, "exit");
1618 assert( rc!=SQLITE_OK || pList->pList==0 );
1619 assert( rc!=SQLITE_OK || pTask->file.iEof==iSz );
danf8768412014-03-17 15:43:051620 return rc;
1621}
1622
1623/*
drhbde27aa2014-07-28 20:16:411624** Advance the MergeEngine to its next entry.
1625** Set *pbEof to true there is no next entry because
drhac651962014-07-28 14:54:501626** the MergeEngine has reached the end of all its inputs.
danf8768412014-03-17 15:43:051627**
1628** Return SQLITE_OK if successful or an error code if an error occurs.
1629*/
drhac651962014-07-28 14:54:501630static int vdbeMergeEngineStep(
drhac651962014-07-28 14:54:501631 MergeEngine *pMerger, /* The merge engine to advance to the next row */
1632 int *pbEof /* Set TRUE at EOF. Set false for more content */
danf8768412014-03-17 15:43:051633){
1634 int rc;
drhde823be2014-05-20 11:03:531635 int iPrev = pMerger->aTree[1];/* Index of PmaReader to advance */
drhbde27aa2014-07-28 20:16:411636 SortSubtask *pTask = pMerger->pTask;
drhac651962014-07-28 14:54:501637
drhde823be2014-05-20 11:03:531638 /* Advance the current PmaReader */
1639 rc = vdbePmaReaderNext(&pMerger->aReadr[iPrev]);
danf8768412014-03-17 15:43:051640
1641 /* Update contents of aTree[] */
danff9fce42014-03-29 06:27:351642 if( rc==SQLITE_OK ){
1643 int i; /* Index of aTree[] to recalculate */
drhde823be2014-05-20 11:03:531644 PmaReader *pReadr1; /* First PmaReader to compare */
1645 PmaReader *pReadr2; /* Second PmaReader to compare */
dana9d91112015-03-28 19:56:411646 int bCached = 0;
danff9fce42014-03-29 06:27:351647
drhde823be2014-05-20 11:03:531648 /* Find the first two PmaReaders to compare. The one that was just
danff9fce42014-03-29 06:27:351649 ** advanced (iPrev) and the one next to it in the array. */
drhde823be2014-05-20 11:03:531650 pReadr1 = &pMerger->aReadr[(iPrev & 0xFFFE)];
1651 pReadr2 = &pMerger->aReadr[(iPrev | 0x0001)];
danff9fce42014-03-29 06:27:351652
1653 for(i=(pMerger->nTree+iPrev)/2; i>0; i=i/2){
drhde823be2014-05-20 11:03:531654 /* Compare pReadr1 and pReadr2. Store the result in variable iRes. */
danff9fce42014-03-29 06:27:351655 int iRes;
drha4c8ca02014-07-28 17:18:281656 if( pReadr1->pFd==0 ){
danff9fce42014-03-29 06:27:351657 iRes = +1;
drha4c8ca02014-07-28 17:18:281658 }else if( pReadr2->pFd==0 ){
danff9fce42014-03-29 06:27:351659 iRes = -1;
1660 }else{
dana9d91112015-03-28 19:56:411661 iRes = pTask->xCompare(pTask, &bCached,
1662 pReadr1->aKey, pReadr1->nKey, pReadr2->aKey, pReadr2->nKey
danff9fce42014-03-29 06:27:351663 );
1664 }
1665
drhde823be2014-05-20 11:03:531666 /* If pReadr1 contained the smaller value, set aTree[i] to its index.
1667 ** Then set pReadr2 to the next PmaReader to compare to pReadr1. In this
1668 ** case there is no cache of pReadr2 in pTask->pUnpacked, so set
1669 ** pKey2 to point to the record belonging to pReadr2.
danff9fce42014-03-29 06:27:351670 **
drhde823be2014-05-20 11:03:531671 ** Alternatively, if pReadr2 contains the smaller of the two values,
1672 ** set aTree[i] to its index and update pReadr1. If vdbeSorterCompare()
drha634fb12014-04-03 02:54:271673 ** was actually called above, then pTask->pUnpacked now contains
drhde823be2014-05-20 11:03:531674 ** a value equivalent to pReadr2. So set pKey2 to NULL to prevent
1675 ** vdbeSorterCompare() from decoding pReadr2 again.
danff9fce42014-03-29 06:27:351676 **
1677 ** If the two values were equal, then the value from the oldest
drhde823be2014-05-20 11:03:531678 ** PMA should be considered smaller. The VdbeSorter.aReadr[] array
1679 ** is sorted from oldest to newest, so pReadr1 contains older values
1680 ** than pReadr2 iff (pReadr1<pReadr2). */
1681 if( iRes<0 || (iRes==0 && pReadr1<pReadr2) ){
1682 pMerger->aTree[i] = (int)(pReadr1 - pMerger->aReadr);
1683 pReadr2 = &pMerger->aReadr[ pMerger->aTree[i ^ 0x0001] ];
dana9d91112015-03-28 19:56:411684 bCached = 0;
danff9fce42014-03-29 06:27:351685 }else{
dan29f1a192015-04-02 09:06:211686 if( pReadr1->pFd ) bCached = 0;
drhde823be2014-05-20 11:03:531687 pMerger->aTree[i] = (int)(pReadr2 - pMerger->aReadr);
1688 pReadr1 = &pMerger->aReadr[ pMerger->aTree[i ^ 0x0001] ];
danff9fce42014-03-29 06:27:351689 }
1690 }
drha4c8ca02014-07-28 17:18:281691 *pbEof = (pMerger->aReadr[pMerger->aTree[1]].pFd==0);
danf8768412014-03-17 15:43:051692 }
1693
dand94d4ee2014-05-05 09:08:541694 return (rc==SQLITE_OK ? pTask->pUnpacked->errCode : rc);
danf8768412014-03-17 15:43:051695}
1696
drh958d2612014-04-18 13:40:071697#if SQLITE_MAX_WORKER_THREADS>0
danf8768412014-03-17 15:43:051698/*
dan1a088a82014-04-15 19:52:341699** The main routine for background threads that write level-0 PMAs.
danf8768412014-03-17 15:43:051700*/
dan82a8a9f2014-04-12 19:34:441701static void *vdbeSorterFlushThread(void *pCtx){
drha634fb12014-04-03 02:54:271702 SortSubtask *pTask = (SortSubtask*)pCtx;
dan82a8a9f2014-04-12 19:34:441703 int rc; /* Return code */
dan1a088a82014-04-15 19:52:341704 assert( pTask->bDone==0 );
dan82a8a9f2014-04-12 19:34:441705 rc = vdbeSorterListToPMA(pTask, &pTask->list);
dan1a088a82014-04-15 19:52:341706 pTask->bDone = 1;
danf8768412014-03-17 15:43:051707 return SQLITE_INT_TO_PTR(rc);
1708}
drh958d2612014-04-18 13:40:071709#endif /* SQLITE_MAX_WORKER_THREADS>0 */
danf8768412014-03-17 15:43:051710
1711/*
dan82a8a9f2014-04-12 19:34:441712** Flush the current contents of VdbeSorter.list to a new PMA, possibly
danf8768412014-03-17 15:43:051713** using a background thread.
danf8768412014-03-17 15:43:051714*/
dan82a8a9f2014-04-12 19:34:441715static int vdbeSorterFlushPMA(VdbeSorter *pSorter){
1716#if SQLITE_MAX_WORKER_THREADS==0
1717 pSorter->bUsePMA = 1;
1718 return vdbeSorterListToPMA(&pSorter->aTask[0], &pSorter->list);
1719#else
danf8768412014-03-17 15:43:051720 int rc = SQLITE_OK;
1721 int i;
drha634fb12014-04-03 02:54:271722 SortSubtask *pTask = 0; /* Thread context used to create new PMA */
1723 int nWorker = (pSorter->nTask-1);
danf8768412014-03-17 15:43:051724
larrybrbc917382023-06-07 08:40:311725 /* Set the flag to indicate that at least one PMA has been written.
dan82a8a9f2014-04-12 19:34:441726 ** Or will be, anyhow. */
danf8768412014-03-17 15:43:051727 pSorter->bUsePMA = 1;
dan82a8a9f2014-04-12 19:34:441728
1729 /* Select a sub-task to sort and flush the current list of in-memory
1730 ** records to disk. If the sorter is running in multi-threaded mode,
1731 ** round-robin between the first (pSorter->nTask-1) tasks. Except, if
1732 ** the background thread from a sub-tasks previous turn is still running,
1733 ** skip it. If the first (pSorter->nTask-1) sub-tasks are all still busy,
1734 ** fall back to using the final sub-task. The first (pSorter->nTask-1)
larrybrbc917382023-06-07 08:40:311735 ** sub-tasks are preferred as they use background threads - the final
dan82a8a9f2014-04-12 19:34:441736 ** sub-task uses the main thread. */
dan578e1ca2014-04-01 15:38:441737 for(i=0; i<nWorker; i++){
1738 int iTest = (pSorter->iPrev + i + 1) % nWorker;
drha634fb12014-04-03 02:54:271739 pTask = &pSorter->aTask[iTest];
dan1a088a82014-04-15 19:52:341740 if( pTask->bDone ){
1741 rc = vdbeSorterJoinThread(pTask);
danf8768412014-03-17 15:43:051742 }
dand94d4ee2014-05-05 09:08:541743 if( rc!=SQLITE_OK || pTask->pThread==0 ) break;
danf8768412014-03-17 15:43:051744 }
1745
1746 if( rc==SQLITE_OK ){
dan82a8a9f2014-04-12 19:34:441747 if( i==nWorker ){
danf8768412014-03-17 15:43:051748 /* Use the foreground thread for this operation */
dan82a8a9f2014-04-12 19:34:441749 rc = vdbeSorterListToPMA(&pSorter->aTask[nWorker], &pSorter->list);
1750 }else{
1751 /* Launch a background thread for this operation */
drh55f66b32019-07-16 19:44:321752 u8 *aMem;
1753 void *pCtx;
dan82a8a9f2014-04-12 19:34:441754
drh55f66b32019-07-16 19:44:321755 assert( pTask!=0 );
dan1a088a82014-04-15 19:52:341756 assert( pTask->pThread==0 && pTask->bDone==0 );
dan82a8a9f2014-04-12 19:34:441757 assert( pTask->list.pList==0 );
1758 assert( pTask->list.aMemory==0 || pSorter->list.aMemory!=0 );
1759
drh55f66b32019-07-16 19:44:321760 aMem = pTask->list.aMemory;
1761 pCtx = (void*)pTask;
drh0f8f2672014-09-01 17:36:461762 pSorter->iPrev = (u8)(pTask - pSorter->aTask);
dan82a8a9f2014-04-12 19:34:441763 pTask->list = pSorter->list;
1764 pSorter->list.pList = 0;
1765 pSorter->list.szPMA = 0;
1766 if( aMem ){
1767 pSorter->list.aMemory = aMem;
1768 pSorter->nMemory = sqlite3MallocSize(aMem);
dan0d51def2014-05-03 14:28:141769 }else if( pSorter->list.aMemory ){
dan82a8a9f2014-04-12 19:34:441770 pSorter->list.aMemory = sqlite3Malloc(pSorter->nMemory);
mistachkinfad30392016-02-13 23:43:461771 if( !pSorter->list.aMemory ) return SQLITE_NOMEM_BKPT;
dandd95d302014-04-02 15:15:251772 }
dan82a8a9f2014-04-12 19:34:441773
dan1a088a82014-04-15 19:52:341774 rc = vdbeSorterCreateThread(pTask, vdbeSorterFlushThread, pCtx);
danf8768412014-03-17 15:43:051775 }
danc6e73452011-08-04 12:14:041776 }
1777
1778 return rc;
drh958d2612014-04-18 13:40:071779#endif /* SQLITE_MAX_WORKER_THREADS!=0 */
danc6e73452011-08-04 12:14:041780}
1781
1782/*
dan5134d132011-09-02 10:31:111783** Add a record to the sorter.
dana20fde62011-07-12 14:28:051784*/
dan5134d132011-09-02 10:31:111785int sqlite3VdbeSorterWrite(
drhac4f0032014-04-02 18:58:491786 const VdbeCursor *pCsr, /* Sorter cursor */
dan5134d132011-09-02 10:31:111787 Mem *pVal /* Memory cell containing record */
1788){
drhc960dcb2015-11-20 19:22:011789 VdbeSorter *pSorter;
dan7733a4d2011-09-02 18:03:161790 int rc = SQLITE_OK; /* Return Code */
1791 SorterRecord *pNew; /* New list element */
dan69719522014-03-27 19:25:021792 int bFlush; /* True to flush contents of memory to PMA */
drh568643f2023-10-06 12:15:011793 i64 nReq; /* Bytes of memory required */
1794 i64 nPMA; /* Bytes of PMA space required */
dan57a14092015-03-26 11:55:031795 int t; /* serial type of first record field */
1796
drhc960dcb2015-11-20 19:22:011797 assert( pCsr->eCurType==CURTYPE_SORTER );
1798 pSorter = pCsr->uc.pSorter;
drh02a95eb2020-01-28 20:27:421799 getVarint32NR((const u8*)&pVal->z[1], t);
dan57a14092015-03-26 11:55:031800 if( t>0 && t<10 && t!=7 ){
1801 pSorter->typeMask &= SORTER_TYPE_INTEGER;
dana9d91112015-03-28 19:56:411802 }else if( t>10 && (t & 0x01) ){
dan57a14092015-03-26 11:55:031803 pSorter->typeMask &= SORTER_TYPE_TEXT;
1804 }else{
1805 pSorter->typeMask = 0;
1806 }
dan69719522014-03-27 19:25:021807
dan5134d132011-09-02 10:31:111808 assert( pSorter );
drh2a5d9902011-08-26 00:34:451809
dan69719522014-03-27 19:25:021810 /* Figure out whether or not the current contents of memory should be
1811 ** flushed to a PMA before continuing. If so, do so.
1812 **
1813 ** If using the single large allocation mode (pSorter->aMemory!=0), then
1814 ** flush the contents of memory to a new PMA if (a) at least one value is
1815 ** already in memory and (b) the new value will not fit in memory.
larrybrbc917382023-06-07 08:40:311816 **
dan69719522014-03-27 19:25:021817 ** Or, if using separate allocations for each record, flush the contents
1818 ** of memory to a PMA if either of the following are true:
dan5134d132011-09-02 10:31:111819 **
larrybrbc917382023-06-07 08:40:311820 ** * The total memory allocated for the in-memory list is greater
dan5134d132011-09-02 10:31:111821 ** than (page-size * cache-size), or
1822 **
larrybrbc917382023-06-07 08:40:311823 ** * The total memory allocated for the in-memory list is greater
dan5134d132011-09-02 10:31:111824 ** than (page-size * 10) and sqlite3HeapNearlyFull() returns true.
1825 */
dan69719522014-03-27 19:25:021826 nReq = pVal->n + sizeof(SorterRecord);
1827 nPMA = pVal->n + sqlite3VarintLen(pVal->n);
dane7c84cc2014-03-29 09:34:451828 if( pSorter->mxPmaSize ){
dan82a8a9f2014-04-12 19:34:441829 if( pSorter->list.aMemory ){
dane7c84cc2014-03-29 09:34:451830 bFlush = pSorter->iMemory && (pSorter->iMemory+nReq) > pSorter->mxPmaSize;
1831 }else{
1832 bFlush = (
dan82a8a9f2014-04-12 19:34:441833 (pSorter->list.szPMA > pSorter->mxPmaSize)
1834 || (pSorter->list.szPMA > pSorter->mnPmaSize && sqlite3HeapNearlyFull())
dane7c84cc2014-03-29 09:34:451835 );
1836 }
1837 if( bFlush ){
dan82a8a9f2014-04-12 19:34:441838 rc = vdbeSorterFlushPMA(pSorter);
1839 pSorter->list.szPMA = 0;
dane7c84cc2014-03-29 09:34:451840 pSorter->iMemory = 0;
dan82a8a9f2014-04-12 19:34:441841 assert( rc!=SQLITE_OK || pSorter->list.pList==0 );
dane7c84cc2014-03-29 09:34:451842 }
dan69719522014-03-27 19:25:021843 }
1844
dan82a8a9f2014-04-12 19:34:441845 pSorter->list.szPMA += nPMA;
dan4be4c402014-04-11 19:43:071846 if( nPMA>pSorter->mxKeysize ){
1847 pSorter->mxKeysize = nPMA;
1848 }
dan69719522014-03-27 19:25:021849
dan82a8a9f2014-04-12 19:34:441850 if( pSorter->list.aMemory ){
dan69719522014-03-27 19:25:021851 int nMin = pSorter->iMemory + nReq;
1852
1853 if( nMin>pSorter->nMemory ){
1854 u8 *aNew;
drh0aa32312019-04-13 04:01:121855 sqlite3_int64 nNew = 2 * (sqlite3_int64)pSorter->nMemory;
dan2eb2ca82019-04-16 11:21:131856 int iListOff = -1;
1857 if( pSorter->list.pList ){
1858 iListOff = (u8*)pSorter->list.pList - pSorter->list.aMemory;
1859 }
dan69719522014-03-27 19:25:021860 while( nNew < nMin ) nNew = nNew*2;
1861 if( nNew > pSorter->mxPmaSize ) nNew = pSorter->mxPmaSize;
1862 if( nNew < nMin ) nNew = nMin;
dan82a8a9f2014-04-12 19:34:441863 aNew = sqlite3Realloc(pSorter->list.aMemory, nNew);
mistachkinfad30392016-02-13 23:43:461864 if( !aNew ) return SQLITE_NOMEM_BKPT;
dan2eb2ca82019-04-16 11:21:131865 if( iListOff>=0 ){
1866 pSorter->list.pList = (SorterRecord*)&aNew[iListOff];
1867 }
dan82a8a9f2014-04-12 19:34:441868 pSorter->list.aMemory = aNew;
dan69719522014-03-27 19:25:021869 pSorter->nMemory = nNew;
1870 }
1871
dan82a8a9f2014-04-12 19:34:441872 pNew = (SorterRecord*)&pSorter->list.aMemory[pSorter->iMemory];
dan69719522014-03-27 19:25:021873 pSorter->iMemory += ROUND8(nReq);
drh2aac8c72016-01-25 22:08:111874 if( pSorter->list.pList ){
1875 pNew->u.iNext = (int)((u8*)(pSorter->list.pList) - pSorter->list.aMemory);
1876 }
dan69719522014-03-27 19:25:021877 }else{
dane7c84cc2014-03-29 09:34:451878 pNew = (SorterRecord *)sqlite3Malloc(nReq);
dan69719522014-03-27 19:25:021879 if( pNew==0 ){
mistachkinfad30392016-02-13 23:43:461880 return SQLITE_NOMEM_BKPT;
dan69719522014-03-27 19:25:021881 }
dan82a8a9f2014-04-12 19:34:441882 pNew->u.pNext = pSorter->list.pList;
dan69719522014-03-27 19:25:021883 }
1884
1885 memcpy(SRVAL(pNew), pVal->z, pVal->n);
1886 pNew->nVal = pVal->n;
dan82a8a9f2014-04-12 19:34:441887 pSorter->list.pList = pNew;
dan5134d132011-09-02 10:31:111888
dana20fde62011-07-12 14:28:051889 return rc;
1890}
1891
1892/*
dand30ab3d2014-04-09 20:04:171893** Read keys from pIncr->pMerger and populate pIncr->aFile[1]. The format
1894** of the data stored in aFile[1] is the same as that used by regular PMAs,
1895** except that the number-of-bytes varint is omitted from the start.
1896*/
1897static int vdbeIncrPopulate(IncrMerger *pIncr){
1898 int rc = SQLITE_OK;
1899 int rc2;
dan4be4c402014-04-11 19:43:071900 i64 iStart = pIncr->iStartOff;
dand30ab3d2014-04-09 20:04:171901 SorterFile *pOut = &pIncr->aFile[1];
dan1a088a82014-04-15 19:52:341902 SortSubtask *pTask = pIncr->pTask;
dand30ab3d2014-04-09 20:04:171903 MergeEngine *pMerger = pIncr->pMerger;
1904 PmaWriter writer;
1905 assert( pIncr->bEof==0 );
1906
dan1a088a82014-04-15 19:52:341907 vdbeSorterPopulateDebug(pTask, "enter");
dan4be4c402014-04-11 19:43:071908
dan1a088a82014-04-15 19:52:341909 vdbePmaWriterInit(pOut->pFd, &writer, pTask->pSorter->pgsz, iStart);
dand30ab3d2014-04-09 20:04:171910 while( rc==SQLITE_OK ){
1911 int dummy;
drhde823be2014-05-20 11:03:531912 PmaReader *pReader = &pMerger->aReadr[ pMerger->aTree[1] ];
dand30ab3d2014-04-09 20:04:171913 int nKey = pReader->nKey;
1914 i64 iEof = writer.iWriteOff + writer.iBufEnd;
1915
1916 /* Check if the output file is full or if the input has been exhausted.
1917 ** In either case exit the loop. */
drha4c8ca02014-07-28 17:18:281918 if( pReader->pFd==0 ) break;
dan4be4c402014-04-11 19:43:071919 if( (iEof + nKey + sqlite3VarintLen(nKey))>(iStart + pIncr->mxSz) ) break;
dand30ab3d2014-04-09 20:04:171920
1921 /* Write the next key to the output. */
1922 vdbePmaWriteVarint(&writer, nKey);
1923 vdbePmaWriteBlob(&writer, pReader->aKey, nKey);
drhbde27aa2014-07-28 20:16:411924 assert( pIncr->pMerger->pTask==pTask );
1925 rc = vdbeMergeEngineStep(pIncr->pMerger, &dummy);
dand30ab3d2014-04-09 20:04:171926 }
1927
1928 rc2 = vdbePmaWriterFinish(&writer, &pOut->iEof);
1929 if( rc==SQLITE_OK ) rc = rc2;
dan1a088a82014-04-15 19:52:341930 vdbeSorterPopulateDebug(pTask, "exit");
dand30ab3d2014-04-09 20:04:171931 return rc;
1932}
1933
dan1a088a82014-04-15 19:52:341934#if SQLITE_MAX_WORKER_THREADS>0
1935/*
1936** The main routine for background threads that populate aFile[1] of
1937** multi-threaded IncrMerger objects.
1938*/
dan82a8a9f2014-04-12 19:34:441939static void *vdbeIncrPopulateThread(void *pCtx){
dand30ab3d2014-04-09 20:04:171940 IncrMerger *pIncr = (IncrMerger*)pCtx;
dan82a8a9f2014-04-12 19:34:441941 void *pRet = SQLITE_INT_TO_PTR( vdbeIncrPopulate(pIncr) );
dan1a088a82014-04-15 19:52:341942 pIncr->pTask->bDone = 1;
dan82a8a9f2014-04-12 19:34:441943 return pRet;
dand30ab3d2014-04-09 20:04:171944}
1945
dan1a088a82014-04-15 19:52:341946/*
1947** Launch a background thread to populate aFile[1] of pIncr.
1948*/
dan82a8a9f2014-04-12 19:34:441949static int vdbeIncrBgPopulate(IncrMerger *pIncr){
dan1a088a82014-04-15 19:52:341950 void *p = (void*)pIncr;
dan82a8a9f2014-04-12 19:34:441951 assert( pIncr->bUseThread );
dan1a088a82014-04-15 19:52:341952 return vdbeSorterCreateThread(pIncr->pTask, vdbeIncrPopulateThread, p);
dand30ab3d2014-04-09 20:04:171953}
dana20fde62011-07-12 14:28:051954#endif
dand30ab3d2014-04-09 20:04:171955
dan1a088a82014-04-15 19:52:341956/*
1957** This function is called when the PmaReader corresponding to pIncr has
1958** finished reading the contents of aFile[0]. Its purpose is to "refill"
drhde823be2014-05-20 11:03:531959** aFile[0] such that the PmaReader should start rereading it from the
dan1a088a82014-04-15 19:52:341960** beginning.
1961**
larrybrbc917382023-06-07 08:40:311962** For single-threaded objects, this is accomplished by literally reading
1963** keys from pIncr->pMerger and repopulating aFile[0].
dan1a088a82014-04-15 19:52:341964**
larrybrbc917382023-06-07 08:40:311965** For multi-threaded objects, all that is required is to wait until the
1966** background thread is finished (if it is not already) and then swap
dan1a088a82014-04-15 19:52:341967** aFile[0] and aFile[1] in place. If the contents of pMerger have not
1968** been exhausted, this function also launches a new background thread
1969** to populate the new aFile[1].
1970**
1971** SQLITE_OK is returned on success, or an SQLite error code otherwise.
1972*/
dand30ab3d2014-04-09 20:04:171973static int vdbeIncrSwap(IncrMerger *pIncr){
1974 int rc = SQLITE_OK;
dand30ab3d2014-04-09 20:04:171975
dan4be4c402014-04-11 19:43:071976#if SQLITE_MAX_WORKER_THREADS>0
dan82a8a9f2014-04-12 19:34:441977 if( pIncr->bUseThread ){
dan1a088a82014-04-15 19:52:341978 rc = vdbeSorterJoinThread(pIncr->pTask);
dan4be4c402014-04-11 19:43:071979
1980 if( rc==SQLITE_OK ){
1981 SorterFile f0 = pIncr->aFile[0];
1982 pIncr->aFile[0] = pIncr->aFile[1];
1983 pIncr->aFile[1] = f0;
1984 }
1985
1986 if( rc==SQLITE_OK ){
1987 if( pIncr->aFile[0].iEof==pIncr->iStartOff ){
1988 pIncr->bEof = 1;
1989 }else{
1990 rc = vdbeIncrBgPopulate(pIncr);
1991 }
1992 }
dan82a8a9f2014-04-12 19:34:441993 }else
1994#endif
1995 {
dan4be4c402014-04-11 19:43:071996 rc = vdbeIncrPopulate(pIncr);
dand30ab3d2014-04-09 20:04:171997 pIncr->aFile[0] = pIncr->aFile[1];
dan4be4c402014-04-11 19:43:071998 if( pIncr->aFile[0].iEof==pIncr->iStartOff ){
dand30ab3d2014-04-09 20:04:171999 pIncr->bEof = 1;
dand30ab3d2014-04-09 20:04:172000 }
dana20fde62011-07-12 14:28:052001 }
2002
2003 return rc;
2004}
2005
2006/*
dan1a088a82014-04-15 19:52:342007** Allocate and return a new IncrMerger object to read data from pMerger.
dan7f0a24b2014-04-16 16:43:052008**
2009** If an OOM condition is encountered, return NULL. In this case free the
2010** pMerger argument before returning.
dana20fde62011-07-12 14:28:052011*/
drhbde27aa2014-07-28 20:16:412012static int vdbeIncrMergerNew(
drha4c8ca02014-07-28 17:18:282013 SortSubtask *pTask, /* The thread that will be using the new IncrMerger */
2014 MergeEngine *pMerger, /* The MergeEngine that the IncrMerger will control */
2015 IncrMerger **ppOut /* Write the new IncrMerger here */
dana20fde62011-07-12 14:28:052016){
dan7f0a24b2014-04-16 16:43:052017 int rc = SQLITE_OK;
drh190d6952014-05-16 17:31:422018 IncrMerger *pIncr = *ppOut = (IncrMerger*)
2019 (sqlite3FaultSim(100) ? 0 : sqlite3MallocZero(sizeof(*pIncr)));
dan4be4c402014-04-11 19:43:072020 if( pIncr ){
dan4be4c402014-04-11 19:43:072021 pIncr->pMerger = pMerger;
2022 pIncr->pTask = pTask;
2023 pIncr->mxSz = MAX(pTask->pSorter->mxKeysize+9,pTask->pSorter->mxPmaSize/2);
2024 pTask->file2.iEof += pIncr->mxSz;
dan7f0a24b2014-04-16 16:43:052025 }else{
2026 vdbeMergeEngineFree(pMerger);
mistachkinfad30392016-02-13 23:43:462027 rc = SQLITE_NOMEM_BKPT;
dan4be4c402014-04-11 19:43:072028 }
drh33d28ab2021-10-28 12:07:432029 assert( *ppOut!=0 || rc!=SQLITE_OK );
dan7f0a24b2014-04-16 16:43:052030 return rc;
dan4be4c402014-04-11 19:43:072031}
2032
drh958d2612014-04-18 13:40:072033#if SQLITE_MAX_WORKER_THREADS>0
dan1a088a82014-04-15 19:52:342034/*
2035** Set the "use-threads" flag on object pIncr.
2036*/
drhbde27aa2014-07-28 20:16:412037static void vdbeIncrMergerSetThreads(IncrMerger *pIncr){
danf7f425d2014-05-03 20:43:132038 pIncr->bUseThread = 1;
2039 pIncr->pTask->file2.iEof -= pIncr->mxSz;
dan4be4c402014-04-11 19:43:072040}
drh958d2612014-04-18 13:40:072041#endif /* SQLITE_MAX_WORKER_THREADS>0 */
dan4be4c402014-04-11 19:43:072042
drh8a4865f2014-07-28 18:57:402043
2044
2045/*
2046** Recompute pMerger->aTree[iOut] by comparing the next keys on the
2047** two PmaReaders that feed that entry. Neither of the PmaReaders
2048** are advanced. This routine merely does the comparison.
2049*/
2050static void vdbeMergeEngineCompare(
2051 MergeEngine *pMerger, /* Merge engine containing PmaReaders to compare */
2052 int iOut /* Store the result in pMerger->aTree[iOut] */
2053){
2054 int i1;
2055 int i2;
2056 int iRes;
2057 PmaReader *p1;
2058 PmaReader *p2;
2059
2060 assert( iOut<pMerger->nTree && iOut>0 );
2061
2062 if( iOut>=(pMerger->nTree/2) ){
2063 i1 = (iOut - pMerger->nTree/2) * 2;
2064 i2 = i1 + 1;
2065 }else{
2066 i1 = pMerger->aTree[iOut*2];
2067 i2 = pMerger->aTree[iOut*2+1];
2068 }
2069
2070 p1 = &pMerger->aReadr[i1];
2071 p2 = &pMerger->aReadr[i2];
2072
2073 if( p1->pFd==0 ){
2074 iRes = i2;
2075 }else if( p2->pFd==0 ){
2076 iRes = i1;
2077 }else{
dana9d91112015-03-28 19:56:412078 SortSubtask *pTask = pMerger->pTask;
2079 int bCached = 0;
drh8a4865f2014-07-28 18:57:402080 int res;
dana9d91112015-03-28 19:56:412081 assert( pTask->pUnpacked!=0 ); /* from vdbeSortSubtaskMain() */
2082 res = pTask->xCompare(
2083 pTask, &bCached, p1->aKey, p1->nKey, p2->aKey, p2->nKey
drh8a4865f2014-07-28 18:57:402084 );
2085 if( res<=0 ){
2086 iRes = i1;
2087 }else{
2088 iRes = i2;
2089 }
2090 }
2091
2092 pMerger->aTree[iOut] = iRes;
2093}
2094
2095/*
drhd9065142014-07-28 19:58:412096** Allowed values for the eMode parameter to vdbeMergeEngineInit()
drh8a4865f2014-07-28 18:57:402097** and vdbePmaReaderIncrMergeInit().
drhc6904612014-07-30 17:21:372098**
2099** Only INCRINIT_NORMAL is valid in single-threaded builds (when
2100** SQLITE_MAX_WORKER_THREADS==0). The other values are only used
2101** when there exists one or more separate worker threads.
drh8a4865f2014-07-28 18:57:402102*/
dana9f43d72014-04-17 08:57:172103#define INCRINIT_NORMAL 0
2104#define INCRINIT_TASK 1
2105#define INCRINIT_ROOT 2
drh8a4865f2014-07-28 18:57:402106
larrybrbc917382023-06-07 08:40:312107/*
dan36b948f2015-05-02 12:40:122108** Forward reference required as the vdbeIncrMergeInit() and
2109** vdbePmaReaderIncrInit() routines are called mutually recursively when
2110** building a merge tree.
drh8a4865f2014-07-28 18:57:402111*/
dan36b948f2015-05-02 12:40:122112static int vdbePmaReaderIncrInit(PmaReader *pReadr, int eMode);
danf77ceba2014-04-14 18:41:212113
dan7f0a24b2014-04-16 16:43:052114/*
drha4c8ca02014-07-28 17:18:282115** Initialize the MergeEngine object passed as the second argument. Once this
larrybrbc917382023-06-07 08:40:312116** function returns, the first key of merged data may be read from the
drha4c8ca02014-07-28 17:18:282117** MergeEngine object in the usual fashion.
dan7f0a24b2014-04-16 16:43:052118**
dana9f43d72014-04-17 08:57:172119** If argument eMode is INCRINIT_ROOT, then it is assumed that any IncrMerge
dan7f0a24b2014-04-16 16:43:052120** objects attached to the PmaReader objects that the merger reads from have
2121** already been populated, but that they have not yet populated aFile[0] and
2122** set the PmaReader objects up to read from it. In this case all that is
drhde823be2014-05-20 11:03:532123** required is to call vdbePmaReaderNext() on each PmaReader to point it at
dan7f0a24b2014-04-16 16:43:052124** its first key.
2125**
larrybrbc917382023-06-07 08:40:312126** Otherwise, if eMode is any value other than INCRINIT_ROOT, then use
2127** vdbePmaReaderIncrMergeInit() to initialize each PmaReader that feeds data
dandb30fc42014-04-16 17:41:222128** to pMerger.
dan7f0a24b2014-04-16 16:43:052129**
2130** SQLITE_OK is returned if successful, or an SQLite error code otherwise.
2131*/
drhd9065142014-07-28 19:58:412132static int vdbeMergeEngineInit(
drha4c8ca02014-07-28 17:18:282133 SortSubtask *pTask, /* Thread that will run pMerger */
2134 MergeEngine *pMerger, /* MergeEngine to initialize */
dana9f43d72014-04-17 08:57:172135 int eMode /* One of the INCRINIT_XXX constants */
danf77ceba2014-04-14 18:41:212136){
dan7f0a24b2014-04-16 16:43:052137 int rc = SQLITE_OK; /* Return code */
drhde823be2014-05-20 11:03:532138 int i; /* For looping over PmaReader objects */
drhf396eca2018-08-21 12:46:342139 int nTree; /* Number of subtrees to merge */
2140
2141 /* Failure to allocate the merge would have been detected prior to
2142 ** invoking this routine */
2143 assert( pMerger!=0 );
danf77ceba2014-04-14 18:41:212144
drhc6904612014-07-30 17:21:372145 /* eMode is always INCRINIT_NORMAL in single-threaded mode */
2146 assert( SQLITE_MAX_WORKER_THREADS>0 || eMode==INCRINIT_NORMAL );
2147
drha4c8ca02014-07-28 17:18:282148 /* Verify that the MergeEngine is assigned to a single thread */
drh0f8f2672014-09-01 17:36:462149 assert( pMerger->pTask==0 );
drha4c8ca02014-07-28 17:18:282150 pMerger->pTask = pTask;
2151
drhf396eca2018-08-21 12:46:342152 nTree = pMerger->nTree;
drh8a4865f2014-07-28 18:57:402153 for(i=0; i<nTree; i++){
drhc6904612014-07-30 17:21:372154 if( SQLITE_MAX_WORKER_THREADS>0 && eMode==INCRINIT_ROOT ){
drhde823be2014-05-20 11:03:532155 /* PmaReaders should be normally initialized in order, as if they are
dane18e90e2014-05-03 19:33:002156 ** reading from the same temp file this makes for more linear file IO.
drhde823be2014-05-20 11:03:532157 ** However, in the INCRINIT_ROOT case, if PmaReader aReadr[nTask-1] is
dane18e90e2014-05-03 19:33:002158 ** in use it will block the vdbePmaReaderNext() call while it uses
2159 ** the main thread to fill its buffer. So calling PmaReaderNext()
drhde823be2014-05-20 11:03:532160 ** on this PmaReader before any of the multi-threaded PmaReaders takes
dane18e90e2014-05-03 19:33:002161 ** better advantage of multi-processor hardware. */
drhde823be2014-05-20 11:03:532162 rc = vdbePmaReaderNext(&pMerger->aReadr[nTree-i-1]);
danf77ceba2014-04-14 18:41:212163 }else{
dan36b948f2015-05-02 12:40:122164 rc = vdbePmaReaderIncrInit(&pMerger->aReadr[i], INCRINIT_NORMAL);
danf77ceba2014-04-14 18:41:212165 }
drh8a4865f2014-07-28 18:57:402166 if( rc!=SQLITE_OK ) return rc;
danf77ceba2014-04-14 18:41:212167 }
2168
drh8a4865f2014-07-28 18:57:402169 for(i=pMerger->nTree-1; i>0; i--){
2170 vdbeMergeEngineCompare(pMerger, i);
danf77ceba2014-04-14 18:41:212171 }
drh8a4865f2014-07-28 18:57:402172 return pTask->pUnpacked->errCode;
danf77ceba2014-04-14 18:41:212173}
2174
dana9f43d72014-04-17 08:57:172175/*
dan36b948f2015-05-02 12:40:122176** The PmaReader passed as the first argument is guaranteed to be an
2177** incremental-reader (pReadr->pIncr!=0). This function serves to open
2178** and/or initialize the temp file related fields of the IncrMerge
drhde823be2014-05-20 11:03:532179** object at (pReadr->pIncr).
dana9f43d72014-04-17 08:57:172180**
drha4c8ca02014-07-28 17:18:282181** If argument eMode is set to INCRINIT_NORMAL, then all PmaReaders
larrybrbc917382023-06-07 08:40:312182** in the sub-tree headed by pReadr are also initialized. Data is then
2183** loaded into the buffers belonging to pReadr and it is set to point to
dan36b948f2015-05-02 12:40:122184** the first key in its range.
dana9f43d72014-04-17 08:57:172185**
drha4c8ca02014-07-28 17:18:282186** If argument eMode is set to INCRINIT_TASK, then pReadr is guaranteed
drhde823be2014-05-20 11:03:532187** to be a multi-threaded PmaReader and this function is being called in a
larrybrbc917382023-06-07 08:40:312188** background thread. In this case all PmaReaders in the sub-tree are
dana9f43d72014-04-17 08:57:172189** initialized as for INCRINIT_NORMAL and the aFile[1] buffer belonging to
drha4c8ca02014-07-28 17:18:282190** pReadr is populated. However, pReadr itself is not set up to point
dana9f43d72014-04-17 08:57:172191** to its first key. A call to vdbePmaReaderNext() is still required to do
larrybrbc917382023-06-07 08:40:312192** that.
dana9f43d72014-04-17 08:57:172193**
larrybrbc917382023-06-07 08:40:312194** The reason this function does not call vdbePmaReaderNext() immediately
drha4c8ca02014-07-28 17:18:282195** in the INCRINIT_TASK case is that vdbePmaReaderNext() assumes that it has
dana9f43d72014-04-17 08:57:172196** to block on thread (pTask->thread) before accessing aFile[1]. But, since
2197** this entire function is being run by thread (pTask->thread), that will
2198** lead to the current background thread attempting to join itself.
2199**
2200** Finally, if argument eMode is set to INCRINIT_ROOT, it may be assumed
drhde823be2014-05-20 11:03:532201** that pReadr->pIncr is a multi-threaded IncrMerge objects, and that all
dana9f43d72014-04-17 08:57:172202** child-trees have already been initialized using IncrInit(INCRINIT_TASK).
drhde823be2014-05-20 11:03:532203** In this case vdbePmaReaderNext() is called on all child PmaReaders and
2204** the current PmaReader set to point to the first key in its range.
dana9f43d72014-04-17 08:57:172205**
2206** SQLITE_OK is returned if successful, or an SQLite error code otherwise.
2207*/
drh8a4865f2014-07-28 18:57:402208static int vdbePmaReaderIncrMergeInit(PmaReader *pReadr, int eMode){
dan4be4c402014-04-11 19:43:072209 int rc = SQLITE_OK;
drhde823be2014-05-20 11:03:532210 IncrMerger *pIncr = pReadr->pIncr;
dan36b948f2015-05-02 12:40:122211 SortSubtask *pTask = pIncr->pTask;
2212 sqlite3 *db = pTask->pSorter->db;
drhc6904612014-07-30 17:21:372213
2214 /* eMode is always INCRINIT_NORMAL in single-threaded mode */
2215 assert( SQLITE_MAX_WORKER_THREADS>0 || eMode==INCRINIT_NORMAL );
2216
dan36b948f2015-05-02 12:40:122217 rc = vdbeMergeEngineInit(pTask, pIncr->pMerger, eMode);
dan4be4c402014-04-11 19:43:072218
larrybrbc917382023-06-07 08:40:312219 /* Set up the required files for pIncr. A multi-threaded IncrMerge object
dan36b948f2015-05-02 12:40:122220 ** requires two temp files to itself, whereas a single-threaded object
2221 ** only requires a region of pTask->file2. */
2222 if( rc==SQLITE_OK ){
2223 int mxSz = pIncr->mxSz;
drhb0f935e2014-05-12 15:30:002224#if SQLITE_MAX_WORKER_THREADS>0
dan36b948f2015-05-02 12:40:122225 if( pIncr->bUseThread ){
2226 rc = vdbeSorterOpenTempFile(db, mxSz, &pIncr->aFile[0].pFd);
2227 if( rc==SQLITE_OK ){
2228 rc = vdbeSorterOpenTempFile(db, mxSz, &pIncr->aFile[1].pFd);
2229 }
2230 }else
drhb0f935e2014-05-12 15:30:002231#endif
dan36b948f2015-05-02 12:40:122232 /*if( !pIncr->bUseThread )*/{
2233 if( pTask->file2.pFd==0 ){
2234 assert( pTask->file2.iEof>0 );
2235 rc = vdbeSorterOpenTempFile(db, pTask->file2.iEof, &pTask->file2.pFd);
2236 pTask->file2.iEof = 0;
2237 }
2238 if( rc==SQLITE_OK ){
2239 pIncr->aFile[1].pFd = pTask->file2.pFd;
2240 pIncr->iStartOff = pTask->file2.iEof;
2241 pTask->file2.iEof += mxSz;
dan4be4c402014-04-11 19:43:072242 }
2243 }
dan36b948f2015-05-02 12:40:122244 }
dan4be4c402014-04-11 19:43:072245
drhb0f935e2014-05-12 15:30:002246#if SQLITE_MAX_WORKER_THREADS>0
dan36b948f2015-05-02 12:40:122247 if( rc==SQLITE_OK && pIncr->bUseThread ){
2248 /* Use the current thread to populate aFile[1], even though this
2249 ** PmaReader is multi-threaded. If this is an INCRINIT_TASK object,
larrybrbc917382023-06-07 08:40:312250 ** then this function is already running in background thread
2251 ** pIncr->pTask->thread.
dan36b948f2015-05-02 12:40:122252 **
larrybrbc917382023-06-07 08:40:312253 ** If this is the INCRINIT_ROOT object, then it is running in the
dan36b948f2015-05-02 12:40:122254 ** main VDBE thread. But that is Ok, as that thread cannot return
larrybrbc917382023-06-07 08:40:312255 ** control to the VDBE or proceed with anything useful until the
dan36b948f2015-05-02 12:40:122256 ** first results are ready from this merger object anyway.
2257 */
2258 assert( eMode==INCRINIT_ROOT || eMode==INCRINIT_TASK );
2259 rc = vdbeIncrPopulate(pIncr);
2260 }
drhb0f935e2014-05-12 15:30:002261#endif
dan4be4c402014-04-11 19:43:072262
dan36b948f2015-05-02 12:40:122263 if( rc==SQLITE_OK && (SQLITE_MAX_WORKER_THREADS==0 || eMode!=INCRINIT_TASK) ){
2264 rc = vdbePmaReaderNext(pReadr);
dan4be4c402014-04-11 19:43:072265 }
dan36b948f2015-05-02 12:40:122266
dan4be4c402014-04-11 19:43:072267 return rc;
2268}
2269
dan92a20dd2014-04-14 08:45:322270#if SQLITE_MAX_WORKER_THREADS>0
dan7f0a24b2014-04-16 16:43:052271/*
larrybrbc917382023-06-07 08:40:312272** The main routine for vdbePmaReaderIncrMergeInit() operations run in
dandb30fc42014-04-16 17:41:222273** background threads.
dan7f0a24b2014-04-16 16:43:052274*/
dan36b948f2015-05-02 12:40:122275static void *vdbePmaReaderBgIncrInit(void *pCtx){
danbe3018c2014-04-14 07:30:392276 PmaReader *pReader = (PmaReader*)pCtx;
drh8a4865f2014-07-28 18:57:402277 void *pRet = SQLITE_INT_TO_PTR(
2278 vdbePmaReaderIncrMergeInit(pReader,INCRINIT_TASK)
2279 );
dan1a088a82014-04-15 19:52:342280 pReader->pIncr->pTask->bDone = 1;
danbe3018c2014-04-14 07:30:392281 return pRet;
2282}
dan36b948f2015-05-02 12:40:122283#endif
danbe3018c2014-04-14 07:30:392284
dan7f0a24b2014-04-16 16:43:052285/*
dan36b948f2015-05-02 12:40:122286** If the PmaReader passed as the first argument is not an incremental-reader
2287** (if pReadr->pIncr==0), then this function is a no-op. Otherwise, it invokes
2288** the vdbePmaReaderIncrMergeInit() function with the parameters passed to
2289** this routine to initialize the incremental merge.
larrybrbc917382023-06-07 08:40:312290**
2291** If the IncrMerger object is multi-threaded (IncrMerger.bUseThread==1),
dan36b948f2015-05-02 12:40:122292** then a background thread is launched to call vdbePmaReaderIncrMergeInit().
2293** Or, if the IncrMerger is single threaded, the same function is called
2294** using the current thread.
dan7f0a24b2014-04-16 16:43:052295*/
dan36b948f2015-05-02 12:40:122296static int vdbePmaReaderIncrInit(PmaReader *pReadr, int eMode){
2297 IncrMerger *pIncr = pReadr->pIncr; /* Incremental merger */
2298 int rc = SQLITE_OK; /* Return code */
2299 if( pIncr ){
2300#if SQLITE_MAX_WORKER_THREADS>0
2301 assert( pIncr->bUseThread==0 || eMode==INCRINIT_TASK );
2302 if( pIncr->bUseThread ){
2303 void *pCtx = (void*)pReadr;
2304 rc = vdbeSorterCreateThread(pIncr->pTask, vdbePmaReaderBgIncrInit, pCtx);
2305 }else
dan92a20dd2014-04-14 08:45:322306#endif
dan36b948f2015-05-02 12:40:122307 {
2308 rc = vdbePmaReaderIncrMergeInit(pReadr, eMode);
2309 }
2310 }
2311 return rc;
2312}
danbe3018c2014-04-14 07:30:392313
dan4be4c402014-04-11 19:43:072314/*
2315** Allocate a new MergeEngine object to merge the contents of nPMA level-0
2316** PMAs from pTask->file. If no error occurs, set *ppOut to point to
2317** the new object and return SQLITE_OK. Or, if an error does occur, set *ppOut
2318** to NULL and return an SQLite error code.
2319**
2320** When this function is called, *piOffset is set to the offset of the
larrybrbc917382023-06-07 08:40:312321** first PMA to read from pTask->file. Assuming no error occurs, it is
dan4be4c402014-04-11 19:43:072322** set to the offset immediately following the last byte of the last
2323** PMA before returning. If an error does occur, then the final value of
2324** *piOffset is undefined.
2325*/
2326static int vdbeMergeEngineLevel0(
2327 SortSubtask *pTask, /* Sorter task to read from */
2328 int nPMA, /* Number of PMAs to read */
drhde823be2014-05-20 11:03:532329 i64 *piOffset, /* IN/OUT: Readr offset in pTask->file */
dan4be4c402014-04-11 19:43:072330 MergeEngine **ppOut /* OUT: New merge-engine */
2331){
2332 MergeEngine *pNew; /* Merge engine to return */
2333 i64 iOff = *piOffset;
2334 int i;
2335 int rc = SQLITE_OK;
2336
2337 *ppOut = pNew = vdbeMergeEngineNew(nPMA);
mistachkinfad30392016-02-13 23:43:462338 if( pNew==0 ) rc = SQLITE_NOMEM_BKPT;
dan4be4c402014-04-11 19:43:072339
2340 for(i=0; i<nPMA && rc==SQLITE_OK; i++){
drhb1f4efd2016-02-19 14:20:462341 i64 nDummy = 0;
drhde823be2014-05-20 11:03:532342 PmaReader *pReadr = &pNew->aReadr[i];
2343 rc = vdbePmaReaderInit(pTask, &pTask->file, iOff, pReadr, &nDummy);
2344 iOff = pReadr->iEof;
dan4be4c402014-04-11 19:43:072345 }
2346
2347 if( rc!=SQLITE_OK ){
2348 vdbeMergeEngineFree(pNew);
2349 *ppOut = 0;
2350 }
2351 *piOffset = iOff;
2352 return rc;
2353}
2354
dan7f0a24b2014-04-16 16:43:052355/*
2356** Return the depth of a tree comprising nPMA PMAs, assuming a fanout of
2357** SORTER_MAX_MERGE_COUNT. The returned value does not include leaf nodes.
2358**
2359** i.e.
2360**
2361** nPMA<=16 -> TreeDepth() == 0
2362** nPMA<=256 -> TreeDepth() == 1
2363** nPMA<=65536 -> TreeDepth() == 2
2364*/
2365static int vdbeSorterTreeDepth(int nPMA){
2366 int nDepth = 0;
2367 i64 nDiv = SORTER_MAX_MERGE_COUNT;
2368 while( nDiv < (i64)nPMA ){
2369 nDiv = nDiv * SORTER_MAX_MERGE_COUNT;
2370 nDepth++;
2371 }
2372 return nDepth;
2373}
dan4be4c402014-04-11 19:43:072374
dan7f0a24b2014-04-16 16:43:052375/*
2376** pRoot is the root of an incremental merge-tree with depth nDepth (according
2377** to vdbeSorterTreeDepth()). pLeaf is the iSeq'th leaf to be added to the
2378** tree, counting from zero. This function adds pLeaf to the tree.
2379**
2380** If successful, SQLITE_OK is returned. If an error occurs, an SQLite error
2381** code is returned and pLeaf is freed.
2382*/
2383static int vdbeSorterAddToTree(
2384 SortSubtask *pTask, /* Task context */
2385 int nDepth, /* Depth of tree according to TreeDepth() */
2386 int iSeq, /* Sequence number of leaf within tree */
2387 MergeEngine *pRoot, /* Root of tree */
2388 MergeEngine *pLeaf /* Leaf to add to tree */
dan4be4c402014-04-11 19:43:072389){
2390 int rc = SQLITE_OK;
dan7f0a24b2014-04-16 16:43:052391 int nDiv = 1;
2392 int i;
2393 MergeEngine *p = pRoot;
dan4be4c402014-04-11 19:43:072394 IncrMerger *pIncr;
2395
drhbde27aa2014-07-28 20:16:412396 rc = vdbeIncrMergerNew(pTask, pLeaf, &pIncr);
dan7f0a24b2014-04-16 16:43:052397
2398 for(i=1; i<nDepth; i++){
2399 nDiv = nDiv * SORTER_MAX_MERGE_COUNT;
dan4be4c402014-04-11 19:43:072400 }
2401
dan7f0a24b2014-04-16 16:43:052402 for(i=1; i<nDepth && rc==SQLITE_OK; i++){
2403 int iIter = (iSeq / nDiv) % SORTER_MAX_MERGE_COUNT;
drhde823be2014-05-20 11:03:532404 PmaReader *pReadr = &p->aReadr[iIter];
dan7f0a24b2014-04-16 16:43:052405
drhde823be2014-05-20 11:03:532406 if( pReadr->pIncr==0 ){
dan7f0a24b2014-04-16 16:43:052407 MergeEngine *pNew = vdbeMergeEngineNew(SORTER_MAX_MERGE_COUNT);
2408 if( pNew==0 ){
mistachkinfad30392016-02-13 23:43:462409 rc = SQLITE_NOMEM_BKPT;
dan7f0a24b2014-04-16 16:43:052410 }else{
drhbde27aa2014-07-28 20:16:412411 rc = vdbeIncrMergerNew(pTask, pNew, &pReadr->pIncr);
dan7f0a24b2014-04-16 16:43:052412 }
2413 }
dand94d4ee2014-05-05 09:08:542414 if( rc==SQLITE_OK ){
drhde823be2014-05-20 11:03:532415 p = pReadr->pIncr->pMerger;
dand94d4ee2014-05-05 09:08:542416 nDiv = nDiv / SORTER_MAX_MERGE_COUNT;
2417 }
dan4be4c402014-04-11 19:43:072418 }
2419
2420 if( rc==SQLITE_OK ){
drhde823be2014-05-20 11:03:532421 p->aReadr[iSeq % SORTER_MAX_MERGE_COUNT].pIncr = pIncr;
dan7f0a24b2014-04-16 16:43:052422 }else{
2423 vdbeIncrFree(pIncr);
2424 }
2425 return rc;
2426}
2427
2428/*
2429** This function is called as part of a SorterRewind() operation on a sorter
2430** that has already written two or more level-0 PMAs to one or more temp
larrybrbc917382023-06-07 08:40:312431** files. It builds a tree of MergeEngine/IncrMerger/PmaReader objects that
dan7f0a24b2014-04-16 16:43:052432** can be used to incrementally merge all PMAs on disk.
2433**
2434** If successful, SQLITE_OK is returned and *ppOut set to point to the
2435** MergeEngine object at the root of the tree before returning. Or, if an
larrybrbc917382023-06-07 08:40:312436** error occurs, an SQLite error code is returned and the final value
dan7f0a24b2014-04-16 16:43:052437** of *ppOut is undefined.
2438*/
drhac651962014-07-28 14:54:502439static int vdbeSorterMergeTreeBuild(
2440 VdbeSorter *pSorter, /* The VDBE cursor that implements the sort */
2441 MergeEngine **ppOut /* Write the MergeEngine here */
2442){
dan7f0a24b2014-04-16 16:43:052443 MergeEngine *pMain = 0;
2444 int rc = SQLITE_OK;
2445 int iTask;
2446
drhb0f935e2014-05-12 15:30:002447#if SQLITE_MAX_WORKER_THREADS>0
larrybrbc917382023-06-07 08:40:312448 /* If the sorter uses more than one task, then create the top-level
2449 ** MergeEngine here. This MergeEngine will read data from exactly
dan7f0a24b2014-04-16 16:43:052450 ** one PmaReader per sub-task. */
2451 assert( pSorter->bUseThreads || pSorter->nTask==1 );
2452 if( pSorter->nTask>1 ){
2453 pMain = vdbeMergeEngineNew(pSorter->nTask);
mistachkinfad30392016-02-13 23:43:462454 if( pMain==0 ) rc = SQLITE_NOMEM_BKPT;
dan7f0a24b2014-04-16 16:43:052455 }
drhb0f935e2014-05-12 15:30:002456#endif
dan7f0a24b2014-04-16 16:43:052457
drh5f4a4792014-05-16 20:24:512458 for(iTask=0; rc==SQLITE_OK && iTask<pSorter->nTask; iTask++){
dan7f0a24b2014-04-16 16:43:052459 SortSubtask *pTask = &pSorter->aTask[iTask];
drhc6904612014-07-30 17:21:372460 assert( pTask->nPMA>0 || SQLITE_MAX_WORKER_THREADS>0 );
2461 if( SQLITE_MAX_WORKER_THREADS==0 || pTask->nPMA ){
dan7f0a24b2014-04-16 16:43:052462 MergeEngine *pRoot = 0; /* Root node of tree for this task */
2463 int nDepth = vdbeSorterTreeDepth(pTask->nPMA);
2464 i64 iReadOff = 0;
2465
2466 if( pTask->nPMA<=SORTER_MAX_MERGE_COUNT ){
2467 rc = vdbeMergeEngineLevel0(pTask, pTask->nPMA, &iReadOff, &pRoot);
2468 }else{
2469 int i;
2470 int iSeq = 0;
2471 pRoot = vdbeMergeEngineNew(SORTER_MAX_MERGE_COUNT);
mistachkinfad30392016-02-13 23:43:462472 if( pRoot==0 ) rc = SQLITE_NOMEM_BKPT;
dan7f0a24b2014-04-16 16:43:052473 for(i=0; i<pTask->nPMA && rc==SQLITE_OK; i += SORTER_MAX_MERGE_COUNT){
2474 MergeEngine *pMerger = 0; /* New level-0 PMA merger */
2475 int nReader; /* Number of level-0 PMAs to merge */
2476
2477 nReader = MIN(pTask->nPMA - i, SORTER_MAX_MERGE_COUNT);
2478 rc = vdbeMergeEngineLevel0(pTask, nReader, &iReadOff, &pMerger);
2479 if( rc==SQLITE_OK ){
2480 rc = vdbeSorterAddToTree(pTask, nDepth, iSeq++, pRoot, pMerger);
2481 }
2482 }
2483 }
2484
2485 if( rc==SQLITE_OK ){
drh5f4a4792014-05-16 20:24:512486#if SQLITE_MAX_WORKER_THREADS>0
2487 if( pMain!=0 ){
drhbde27aa2014-07-28 20:16:412488 rc = vdbeIncrMergerNew(pTask, pRoot, &pMain->aReadr[iTask].pIncr);
drh5f4a4792014-05-16 20:24:512489 }else
2490#endif
2491 {
2492 assert( pMain==0 );
2493 pMain = pRoot;
dan7f0a24b2014-04-16 16:43:052494 }
2495 }else{
2496 vdbeMergeEngineFree(pRoot);
2497 }
2498 }
dan4be4c402014-04-11 19:43:072499 }
2500
2501 if( rc!=SQLITE_OK ){
dan7f0a24b2014-04-16 16:43:052502 vdbeMergeEngineFree(pMain);
2503 pMain = 0;
dan4be4c402014-04-11 19:43:072504 }
dan7f0a24b2014-04-16 16:43:052505 *ppOut = pMain;
dan4be4c402014-04-11 19:43:072506 return rc;
dand30ab3d2014-04-09 20:04:172507}
2508
2509/*
dandb30fc42014-04-16 17:41:222510** This function is called as part of an sqlite3VdbeSorterRewind() operation
2511** on a sorter that has written two or more PMAs to temporary files. It sets
2512** up either VdbeSorter.pMerger (for single threaded sorters) or pReader
2513** (for multi-threaded sorters) so that it can be used to iterate through
2514** all records stored in the sorter.
2515**
2516** SQLITE_OK is returned if successful, or an SQLite error code otherwise.
dand30ab3d2014-04-09 20:04:172517*/
dandb30fc42014-04-16 17:41:222518static int vdbeSorterSetupMerge(VdbeSorter *pSorter){
dan7f0a24b2014-04-16 16:43:052519 int rc; /* Return code */
dand30ab3d2014-04-09 20:04:172520 SortSubtask *pTask0 = &pSorter->aTask[0];
dan4be4c402014-04-11 19:43:072521 MergeEngine *pMain = 0;
drh958d2612014-04-18 13:40:072522#if SQLITE_MAX_WORKER_THREADS
dan1a088a82014-04-15 19:52:342523 sqlite3 *db = pTask0->pSorter->db;
dan57a14092015-03-26 11:55:032524 int i;
dana9d91112015-03-28 19:56:412525 SorterCompare xCompare = vdbeSorterGetCompare(pSorter);
dan57a14092015-03-26 11:55:032526 for(i=0; i<pSorter->nTask; i++){
2527 pSorter->aTask[i].xCompare = xCompare;
2528 }
drh958d2612014-04-18 13:40:072529#endif
dand30ab3d2014-04-09 20:04:172530
dan7f0a24b2014-04-16 16:43:052531 rc = vdbeSorterMergeTreeBuild(pSorter, &pMain);
dand30ab3d2014-04-09 20:04:172532 if( rc==SQLITE_OK ){
danf77ceba2014-04-14 18:41:212533#if SQLITE_MAX_WORKER_THREADS
dand94d4ee2014-05-05 09:08:542534 assert( pSorter->bUseThreads==0 || pSorter->nTask>1 );
danf77ceba2014-04-14 18:41:212535 if( pSorter->bUseThreads ){
drh958d2612014-04-18 13:40:072536 int iTask;
mistachkin7bdc9742014-10-16 21:39:172537 PmaReader *pReadr = 0;
danf77ceba2014-04-14 18:41:212538 SortSubtask *pLast = &pSorter->aTask[pSorter->nTask-1];
2539 rc = vdbeSortAllocUnpacked(pLast);
2540 if( rc==SQLITE_OK ){
drhde823be2014-05-20 11:03:532541 pReadr = (PmaReader*)sqlite3DbMallocZero(db, sizeof(PmaReader));
2542 pSorter->pReader = pReadr;
mistachkinfad30392016-02-13 23:43:462543 if( pReadr==0 ) rc = SQLITE_NOMEM_BKPT;
dan92a20dd2014-04-14 08:45:322544 }
danf77ceba2014-04-14 18:41:212545 if( rc==SQLITE_OK ){
drhbde27aa2014-07-28 20:16:412546 rc = vdbeIncrMergerNew(pLast, pMain, &pReadr->pIncr);
dan7f0a24b2014-04-16 16:43:052547 if( rc==SQLITE_OK ){
drhbde27aa2014-07-28 20:16:412548 vdbeIncrMergerSetThreads(pReadr->pIncr);
danf77ceba2014-04-14 18:41:212549 for(iTask=0; iTask<(pSorter->nTask-1); iTask++){
2550 IncrMerger *pIncr;
drhde823be2014-05-20 11:03:532551 if( (pIncr = pMain->aReadr[iTask].pIncr) ){
drhbde27aa2014-07-28 20:16:412552 vdbeIncrMergerSetThreads(pIncr);
danf77ceba2014-04-14 18:41:212553 assert( pIncr->pTask!=pLast );
2554 }
2555 }
dand94d4ee2014-05-05 09:08:542556 for(iTask=0; rc==SQLITE_OK && iTask<pSorter->nTask; iTask++){
dan36b948f2015-05-02 12:40:122557 /* Check that:
larrybrbc917382023-06-07 08:40:312558 **
dan36b948f2015-05-02 12:40:122559 ** a) The incremental merge object is configured to use the
2560 ** right task, and
2561 ** b) If it is using task (nTask-1), it is configured to run
2562 ** in single-threaded mode. This is important, as the
2563 ** root merge (INCRINIT_ROOT) will be using the same task
2564 ** object.
2565 */
drhde823be2014-05-20 11:03:532566 PmaReader *p = &pMain->aReadr[iTask];
dan36b948f2015-05-02 12:40:122567 assert( p->pIncr==0 || (
2568 (p->pIncr->pTask==&pSorter->aTask[iTask]) /* a */
2569 && (iTask!=pSorter->nTask-1 || p->pIncr->bUseThread==0) /* b */
2570 ));
2571 rc = vdbePmaReaderIncrInit(p, INCRINIT_TASK);
danbe3018c2014-04-14 07:30:392572 }
2573 }
dan7f0a24b2014-04-16 16:43:052574 pMain = 0;
dan4be4c402014-04-11 19:43:072575 }
danf77ceba2014-04-14 18:41:212576 if( rc==SQLITE_OK ){
drh8a4865f2014-07-28 18:57:402577 rc = vdbePmaReaderIncrMergeInit(pReadr, INCRINIT_ROOT);
danf77ceba2014-04-14 18:41:212578 }
2579 }else
dan92a20dd2014-04-14 08:45:322580#endif
danf77ceba2014-04-14 18:41:212581 {
drhd9065142014-07-28 19:58:412582 rc = vdbeMergeEngineInit(pTask0, pMain, INCRINIT_NORMAL);
dan7f0a24b2014-04-16 16:43:052583 pSorter->pMerger = pMain;
dan22ace892014-04-15 20:52:272584 pMain = 0;
dand30ab3d2014-04-09 20:04:172585 }
2586 }
dand30ab3d2014-04-09 20:04:172587
dan22ace892014-04-15 20:52:272588 if( rc!=SQLITE_OK ){
dan22ace892014-04-15 20:52:272589 vdbeMergeEngineFree(pMain);
2590 }
dand30ab3d2014-04-09 20:04:172591 return rc;
2592}
2593
2594
2595/*
drha634fb12014-04-03 02:54:272596** Once the sorter has been populated by calls to sqlite3VdbeSorterWrite,
2597** this function is called to prepare for iterating through the records
2598** in sorted order.
dana20fde62011-07-12 14:28:052599*/
drh958d2612014-04-18 13:40:072600int sqlite3VdbeSorterRewind(const VdbeCursor *pCsr, int *pbEof){
drhc960dcb2015-11-20 19:22:012601 VdbeSorter *pSorter;
dana20fde62011-07-12 14:28:052602 int rc = SQLITE_OK; /* Return code */
dana20fde62011-07-12 14:28:052603
drhc960dcb2015-11-20 19:22:012604 assert( pCsr->eCurType==CURTYPE_SORTER );
2605 pSorter = pCsr->uc.pSorter;
dana20fde62011-07-12 14:28:052606 assert( pSorter );
danc6e73452011-08-04 12:14:042607
dan9fed5582011-09-02 11:45:312608 /* If no data has been written to disk, then do not do so now. Instead,
2609 ** sort the VdbeSorter.pRecord list. The vdbe layer will read data directly
2610 ** from the in-memory list. */
danf8768412014-03-17 15:43:052611 if( pSorter->bUsePMA==0 ){
dan82a8a9f2014-04-12 19:34:442612 if( pSorter->list.pList ){
danf8768412014-03-17 15:43:052613 *pbEof = 0;
dan82a8a9f2014-04-12 19:34:442614 rc = vdbeSorterSort(&pSorter->aTask[0], &pSorter->list);
danc6e73452011-08-04 12:14:042615 }else{
danf8768412014-03-17 15:43:052616 *pbEof = 1;
danc6e73452011-08-04 12:14:042617 }
danf8768412014-03-17 15:43:052618 return rc;
danc6e73452011-08-04 12:14:042619 }
dan5134d132011-09-02 10:31:112620
larrybrbc917382023-06-07 08:40:312621 /* Write the current in-memory list to a PMA. When the VdbeSorterWrite()
dand94d4ee2014-05-05 09:08:542622 ** function flushes the contents of memory to disk, it immediately always
2623 ** creates a new list consisting of a single key immediately afterwards.
2624 ** So the list is never empty at this point. */
2625 assert( pSorter->list.pList );
2626 rc = vdbeSorterFlushPMA(pSorter);
danf8768412014-03-17 15:43:052627
2628 /* Join all threads */
2629 rc = vdbeSorterJoinAll(pSorter, rc);
2630
drh958d2612014-04-18 13:40:072631 vdbeSorterRewindDebug("rewind");
danf8768412014-03-17 15:43:052632
larrybrbc917382023-06-07 08:40:312633 /* Assuming no errors have occurred, set up a merger structure to
dand30ab3d2014-04-09 20:04:172634 ** incrementally read and merge all remaining PMAs. */
2635 assert( pSorter->pReader==0 );
danf8768412014-03-17 15:43:052636 if( rc==SQLITE_OK ){
dandb30fc42014-04-16 17:41:222637 rc = vdbeSorterSetupMerge(pSorter);
dand30ab3d2014-04-09 20:04:172638 *pbEof = 0;
danf8768412014-03-17 15:43:052639 }
2640
drh958d2612014-04-18 13:40:072641 vdbeSorterRewindDebug("rewinddone");
dana20fde62011-07-12 14:28:052642 return rc;
2643}
2644
2645/*
drh2ab792e2017-05-30 18:34:072646** Advance to the next element in the sorter. Return value:
2647**
2648** SQLITE_OK success
2649** SQLITE_DONE end of data
2650** otherwise some kind of error.
dana20fde62011-07-12 14:28:052651*/
drh2ab792e2017-05-30 18:34:072652int sqlite3VdbeSorterNext(sqlite3 *db, const VdbeCursor *pCsr){
drhc960dcb2015-11-20 19:22:012653 VdbeSorter *pSorter;
dana20fde62011-07-12 14:28:052654 int rc; /* Return code */
2655
drhc960dcb2015-11-20 19:22:012656 assert( pCsr->eCurType==CURTYPE_SORTER );
2657 pSorter = pCsr->uc.pSorter;
danf77ceba2014-04-14 18:41:212658 assert( pSorter->bUsePMA || (pSorter->pReader==0 && pSorter->pMerger==0) );
2659 if( pSorter->bUsePMA ){
2660 assert( pSorter->pReader==0 || pSorter->pMerger==0 );
2661 assert( pSorter->bUseThreads==0 || pSorter->pReader );
2662 assert( pSorter->bUseThreads==1 || pSorter->pMerger );
drhb0f935e2014-05-12 15:30:002663#if SQLITE_MAX_WORKER_THREADS>0
danf77ceba2014-04-14 18:41:212664 if( pSorter->bUseThreads ){
2665 rc = vdbePmaReaderNext(pSorter->pReader);
drh2ab792e2017-05-30 18:34:072666 if( rc==SQLITE_OK && pSorter->pReader->pFd==0 ) rc = SQLITE_DONE;
drhb0f935e2014-05-12 15:30:002667 }else
2668#endif
2669 /*if( !pSorter->bUseThreads )*/ {
drh2ab792e2017-05-30 18:34:072670 int res = 0;
drh8d9da632015-01-12 17:56:062671 assert( pSorter->pMerger!=0 );
drhbde27aa2014-07-28 20:16:412672 assert( pSorter->pMerger->pTask==(&pSorter->aTask[0]) );
drh2ab792e2017-05-30 18:34:072673 rc = vdbeMergeEngineStep(pSorter->pMerger, &res);
2674 if( rc==SQLITE_OK && res ) rc = SQLITE_DONE;
dan344510e2014-03-19 20:01:252675 }
danc6e73452011-08-04 12:14:042676 }else{
dan82a8a9f2014-04-12 19:34:442677 SorterRecord *pFree = pSorter->list.pList;
2678 pSorter->list.pList = pFree->u.pNext;
dan69719522014-03-27 19:25:022679 pFree->u.pNext = 0;
dan82a8a9f2014-04-12 19:34:442680 if( pSorter->list.aMemory==0 ) vdbeSorterRecordFree(db, pFree);
drh2ab792e2017-05-30 18:34:072681 rc = pSorter->list.pList ? SQLITE_OK : SQLITE_DONE;
dana20fde62011-07-12 14:28:052682 }
2683 return rc;
2684}
2685
2686/*
larrybrbc917382023-06-07 08:40:312687** Return a pointer to a buffer owned by the sorter that contains the
dana20fde62011-07-12 14:28:052688** current key.
2689*/
2690static void *vdbeSorterRowkey(
danc6e73452011-08-04 12:14:042691 const VdbeSorter *pSorter, /* Sorter object */
dana20fde62011-07-12 14:28:052692 int *pnKey /* OUT: Size of current key in bytes */
2693){
2694 void *pKey;
danf77ceba2014-04-14 18:41:212695 if( pSorter->bUsePMA ){
drhb0f935e2014-05-12 15:30:002696 PmaReader *pReader;
2697#if SQLITE_MAX_WORKER_THREADS>0
2698 if( pSorter->bUseThreads ){
2699 pReader = pSorter->pReader;
2700 }else
2701#endif
2702 /*if( !pSorter->bUseThreads )*/{
drhde823be2014-05-20 11:03:532703 pReader = &pSorter->pMerger->aReadr[pSorter->pMerger->aTree[1]];
drhb0f935e2014-05-12 15:30:002704 }
danf77ceba2014-04-14 18:41:212705 *pnKey = pReader->nKey;
2706 pKey = pReader->aKey;
dana20fde62011-07-12 14:28:052707 }else{
dan82a8a9f2014-04-12 19:34:442708 *pnKey = pSorter->list.pList->nVal;
2709 pKey = SRVAL(pSorter->list.pList);
dane6f7bc62011-08-12 16:11:432710 }
dana20fde62011-07-12 14:28:052711 return pKey;
2712}
2713
2714/*
dan52791122011-08-08 16:44:252715** Copy the current sorter key into the memory cell pOut.
2716*/
2717int sqlite3VdbeSorterRowkey(const VdbeCursor *pCsr, Mem *pOut){
drhc960dcb2015-11-20 19:22:012718 VdbeSorter *pSorter;
dan52791122011-08-08 16:44:252719 void *pKey; int nKey; /* Sorter key to copy into pOut */
2720
drhc960dcb2015-11-20 19:22:012721 assert( pCsr->eCurType==CURTYPE_SORTER );
2722 pSorter = pCsr->uc.pSorter;
dan52791122011-08-08 16:44:252723 pKey = vdbeSorterRowkey(pSorter, &nKey);
drh322f2852014-09-19 00:43:392724 if( sqlite3VdbeMemClearAndResize(pOut, nKey) ){
mistachkinfad30392016-02-13 23:43:462725 return SQLITE_NOMEM_BKPT;
dana20fde62011-07-12 14:28:052726 }
2727 pOut->n = nKey;
2728 MemSetTypeFlag(pOut, MEM_Blob);
2729 memcpy(pOut->z, pKey, nKey);
2730
2731 return SQLITE_OK;
2732}
2733
dan5134d132011-09-02 10:31:112734/*
2735** Compare the key in memory cell pVal with the key that the sorter cursor
2736** passed as the first argument currently points to. For the purposes of
2737** the comparison, ignore the rowid field at the end of each record.
2738**
danfad9f9a2014-04-01 18:41:512739** If the sorter cursor key contains any NULL values, consider it to be
drhac4f0032014-04-02 18:58:492740** less than pVal. Even if pVal also contains NULL values.
danfad9f9a2014-04-01 18:41:512741**
dan5134d132011-09-02 10:31:112742** If an error occurs, return an SQLite error code (i.e. SQLITE_NOMEM).
2743** Otherwise, set *pRes to a negative, zero or positive value if the
2744** key in pVal is smaller than, equal to or larger than the current sorter
2745** key.
drhac4f0032014-04-02 18:58:492746**
2747** This routine forms the core of the OP_SorterCompare opcode, which in
2748** turn is used to verify uniqueness when constructing a UNIQUE INDEX.
dan5134d132011-09-02 10:31:112749*/
2750int sqlite3VdbeSorterCompare(
drhc041c162012-07-24 19:46:382751 const VdbeCursor *pCsr, /* Sorter cursor */
dan5134d132011-09-02 10:31:112752 Mem *pVal, /* Value to compare to current sorter key */
drhbd1c8812014-07-30 14:44:242753 int nKeyCol, /* Compare this many columns */
dan5134d132011-09-02 10:31:112754 int *pRes /* OUT: Result of comparison */
2755){
drhc960dcb2015-11-20 19:22:012756 VdbeSorter *pSorter;
2757 UnpackedRecord *r2;
2758 KeyInfo *pKeyInfo;
danfad9f9a2014-04-01 18:41:512759 int i;
dan9fed5582011-09-02 11:45:312760 void *pKey; int nKey; /* Sorter key to compare pVal with */
2761
drhc960dcb2015-11-20 19:22:012762 assert( pCsr->eCurType==CURTYPE_SORTER );
2763 pSorter = pCsr->uc.pSorter;
2764 r2 = pSorter->pUnpacked;
2765 pKeyInfo = pCsr->pKeyInfo;
dand30ab3d2014-04-09 20:04:172766 if( r2==0 ){
drha582b012016-12-21 19:45:542767 r2 = pSorter->pUnpacked = sqlite3VdbeAllocUnpackedRecord(pKeyInfo);
mistachkinfad30392016-02-13 23:43:462768 if( r2==0 ) return SQLITE_NOMEM_BKPT;
drhbd1c8812014-07-30 14:44:242769 r2->nField = nKeyCol;
dand30ab3d2014-04-09 20:04:172770 }
drhbd1c8812014-07-30 14:44:242771 assert( r2->nField==nKeyCol );
danfad9f9a2014-04-01 18:41:512772
dan9fed5582011-09-02 11:45:312773 pKey = vdbeSorterRowkey(pSorter, &nKey);
drh8658a8d2025-06-02 13:54:332774 sqlite3VdbeRecordUnpack(nKey, pKey, r2);
drhbd1c8812014-07-30 14:44:242775 for(i=0; i<nKeyCol; i++){
danfad9f9a2014-04-01 18:41:512776 if( r2->aMem[i].flags & MEM_Null ){
2777 *pRes = -1;
2778 return SQLITE_OK;
2779 }
2780 }
2781
drh75179de2014-09-16 14:37:352782 *pRes = sqlite3VdbeRecordCompare(pVal->n, pVal->z, r2);
drh59ebc992011-09-14 13:23:212783 return SQLITE_OK;
dan5134d132011-09-02 10:31:112784}