Why Sort?
A classic problem in computer science!
Data requested in sorted order
External Sorting e.g., find students in increasing gpa order
Sorting is first step in bulk loading B+ tree index.
Chapter 13 Sorting useful for eliminating duplicate copies in a
collection of records (Why?)
Sort-merge join algorithm involves sorting.
Problem: sort 1Gb of data with 1Mb of RAM.
why not virtual memory?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
2-Way Sort: Requires 3 Buffers Two-Way External Merge Sort
3,4 6,2 9,4 8,7 5,6 3,1 2 Input file
Each pass we read + write PASS 0
Pass 1: Read a page, sort it, write it. each page in file. 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs
PASS 1
only one buffer page is used N pages in the file => the 2,3 4,7 1,3
2-page runs
8,9 5,6 2
Pass 2, 3, …, etc.: number of passes 4,6
PASS 2
three buffer pages used. = log2 N + 1 2,3
4,4 1,2 4-page runs
So toal cost is: 6,7 3,5
( ) 8,9 6
INPUT 1 2 N log 2 N + 1 PASS 3
OUTPUT 1,2
Idea: Divide and conquer: 2,3
INPUT 2
sort subfiles and merge 3,4 8-page runs
4,5
Main memory buffers Disk
Disk 6,6
7,8
9
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
General External Merge Sort Cost of External Merge Sort
* More than 3 buffer pages. How can we utilize them?
To sort a file with N pages using B buffer pages: Number of passes: 1 + log B −1 N / B
Pass 0: use B buffer pages. Produce N / B sorted runs of B Cost = 2N * (# of passes)
pages each. E.g., with 5 buffer pages, to sort 108 page file:
Pass 2, …, etc.: merge B-1 runs. Pass 0: 108 / 5 = 22 sorted runs of 5 pages each
INPUT 1 (last run is only 3 pages)
Pass 1: 22 / 4 = 6 sorted runs of 20 pages each
... ... (last run is only 8 pages)
INPUT 2
... OUTPUT
Pass 2: 2 sorted runs, 80 pages and 28 pages
INPUT B-1 Pass 3: Sorted file of 108 pages
Disk Disk
B Main memory buffers
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
Number of Passes of External Sort Internal Sort Algorithm
Quicksort is a fast way to sort in memory.
N B=3 B=5 B=9 B=17 B=129 B=257
An alternative is “tournament sort” (a.k.a.
100 7 4 3 2 1 1 “heapsort”)
1,000 10 5 4 3 2 2 Top: Read in B blocks
10,000 13 7 5 4 2 2 Output: move smallest record to output buffer
100,000 17 9 6 5 3 3 Read in a new record r
1,000,000 20 10 7 5 3 3 insert r into “heap”
10,000,000 23 12 8 6 4 3 if r not smallest, then GOTO Output
100,000,000 26 14 9 7 4 4 else remove r from “heap”
1,000,000,000 30 15 10 8 5 4 output “heap” in order; GOTO Top
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
More on Heapsort I/O for External Merge Sort
Fact: average length of a run in heapsort is 2B … longer runs often means fewer passes!
The “snowplow” analogy Actually, do I/O a page at a time
Worst-Case: In fact, read a block of pages sequentially!
What is min length of a run? Suggests we should make each buffer
How does this arise? (input/output) be a block of pages.
Best-Case: B But this will reduce fan-out during merge passes!
What is max length of a run? In practice, most files still sorted in 2-3 passes.
How does this arise?
Quicksort is faster, but ...
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
Number of Passes of Optimized Sort Double Buffering
To reduce wait time for I/O request to
N B=1,000 B=5,000 B=10,000
complete, can prefetch into `shadow block’.
100 1 1 1
Potentially, more passes; in practice, most files still
1,000 1 1 1 sorted in 2-3 passes.
10,000 2 2 1
100,000 3 2 2 INPUT 1
INPUT 1'
1,000,000 3 2 2
INPUT 2
10,000,000 4 3 3 INPUT 2'
OUTPUT
OUTPUT'
100,000,000 5 3 3 b
1,000,000,000 5 4 3 Disk INPUT k
block size
Disk
INPUT k'
* Block size = 32, initial pass produces runs of size 2B. B main memory buffers, k-way merge
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
Sorting Records! Using B+ Trees for Sorting
Sorting has become a blood sport! Scenario: Table to be sorted has B+ tree index on
Parallel sorting is the name of the game ...
sorting column(s).
Idea: Can retrieve records in order by traversing
Datamation: Sort 1M records of size 100 bytes
leaf pages.
Typical DBMS: 15 minutes
World record: 3.5 seconds Is this a good idea?
• 12-CPU SGI machine, 96 disks, 2GB of RAM Cases to consider:
New benchmarks proposed: B+ tree is clustered Good idea!
Minute Sort: How many can you sort in 1 minute? B+ tree is not clustered Could be a very bad idea!
Dollar Sort: How many can you sort for $1.00?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
Clustered B+ Tree Used for Sorting Unclustered B+ Tree Used for Sorting
Cost: root to the left-
Alternative (2) for data entries; each data
most leaf, then retrieve
Index
(Directs search)
entry contains rid of a data record. In general,
all leaf pages one I/O per data record!
(Alternative 1)
Data Entries
If Alternative 2 is used? ("Sequence set") Index
(Directs search)
Additional cost of
retrieving data records:
each page fetched just Data Entries
("Sequence set")
once. Data Records
* Always better than external sorting!
Data Records
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
External Sorting vs. Unclustered Index Summary
N Sorting p=1 p=10 p=100 External sorting is important; DBMS may dedicate
part of buffer pool for sorting!
100 200 100 1,000 10,000
External merge sort minimizes disk I/O cost:
1,000 2,000 1,000 10,000 100,000
Pass 0: Produces sorted runs of size B (# buffer pages).
10,000 40,000 10,000 100,000 1,000,000 Later passes: merge runs.
100,000 600,000 100,000 1,000,000 10,000,000 # of runs merged at a time depends on B, and block size.
1,000,000 8,000,000 1,000,000 10,000,000 100,000,000 Larger block size means less I/O cost per page.
10,000,000 80,000,000 10,000,000 100,000,000 1,000,000,000 Larger block size means smaller # runs merged.
In practice, # of runs rarely more than 2 or 3.
* p: # of records per page
* B=1,000 and block size=32 for sorting
* p=100 is the more realistic value.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18
Summary, cont.
Choice of internal sort algorithm may matter:
Quicksort: Quick!
Heap/tournament sort: slower (2x), longer runs
The best sorts are wildly fast:
Despite 40+ years of research, we’re still
improving!
Clustered B+ tree is good for sorting;
unclustered tree is usually very bad.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19