MonetDB/X100 M tDB/X100 a ( y) fast column-store (very)
Marcin Z k M i Zukowski, Peter Boncz ki P t B
Sndor Hman, Niels Nes
CWI, Amsterdam, The Netherlands
Disclaimer a different one
This talk is about data-intensive applications pp
Data warehousing, analytical processing Scientific data information retrieval data,
Transaction processing is a different story
A lot of content wake up!
MonetDB/X100 overview 2
Outline
Traditional database performance p
Improvements in MonetDB
MonetDB/X100
Query execution Storage
MonetDB/X100 overview
Database performance p
TPC-H 1GB, Query 1 ,Q y Selects 98% of fact table (6M rows), computes net prices and aggregates all t i d t ll Performance:
C program: MySQL: DBMS X: ? 26.2s 28.1s
MonetDB/X100 overview
Database performance p
TPC-H 1GB, Query 1 ,Q y Selects 98% of fact table (6M rows), computes net prices and aggregates all t i d t ll Performance:
C program: MySQL: DBMS X: 0.2s 26.2s 28.1s
MonetDB/X100 overview
Database pe o a ce a a y ed atabase performance analyzed
Why so slow? y
Inefficient data storage format Inefficient query processing model
MonetDB/X100 overview
N-ary storage model (NSM) y g ( )
Fixed-width attributes in a record
101 Joe 103 Edward 27 21 Black Scissorhand
MonetDB/X100 overview
Real-life NSM implementation p
Slotted pages, example: p g , p
101 27 Joe Black
103 03
21
Edward Scissorhand
MonetDB/X100 overview
NSM problems p
Poor bandwidth use
Always read all the attributes Terrible on disk Bad in memory
Complex attribute access
Variable-length fields Null fields
MonetDB/X100 overview
Column stores to the rescue!
Store attributes separately p y
Read l R d only attributes used by a query ib db
MonetDB/X100 overview
10
Traditional column stores
Data path p
Read columns from disk Convert into NSM Use NSM-based processing
Examples: Sybase IQ, Vertica Not enough! g
Only I/O problem addressed
MonetDB/X100 overview
11
How databases run a query q y
Query
SELECT name, salary .19 salary*.19 AS tax FROM employee WHERE age > 25
MonetDB/X100 overview
12
Database operators p
Tuple-at-a-time iterator interface: - open() - next(): tuple - close() next() is called: - for each operator - for each tuple Complex code repeated over and over
MonetDB/X100 overview
13
Primitive functions
Provide data-specific computational functionality Called once for every operation on every tuple. e er t ple Even worse with complex tuple representation Perform one operation (e.g. addition) in one call
MonetDB/X100 overview
14
DBMS performance - IPT p
Lots of repeated, unnecessary code p , y
Operator logic Function calls Attribute access Most instructions NOT processing any actual data!
High instructions-per-tuple (IPT) factor
MonetDB/X100 overview
15
Modern CPUs
New CPU features over last 20 years y
Instruction and data cache Deep pipeline with out-of-order execution out of order Superscalar features multiple instructions at once SIMD instructions (SSE)
Great for e.g. multimedia processing but bad for database code!
MonetDB/X100 overview
16
DBMS performance - CPI p
CPU-unfriendly code y
Complicated code Poor use of CPU cache (both data and instructions) Processing one value at a time Compilers cant help much
High cycles-per-instruction (CPI) factor
MonetDB/X100 overview
17
DBMS performance p
Performance factors:
High instructions-per-tuple High cycles-per-instruction cycles per instruction Very high cycles-per-tuple (CPT)
Others can do better
Scientific computing,
How can we?
MonetDB/X100 overview
18
MonetDB
MonetDB 1993-now, developed at CWI , p
Peters PhD Column store Improves computational efficiency
Predecessor of MonetDB/X100
MonetDB/X100 overview
19
MonetDB: a column store
save disk I/O when scan-intensive queries / q need a few columns
MonetDB/X100 overview
20
MonetDB: a column store
save disk I/O when scan-intensive queries / q need a few columns avoid an expression interpreter to improve id i i t t t i computational efficiency
MonetDB/X100 overview
21
MonetDB in action
SELECT FROM WHERE id, name, (age-30)*50 as bonus people age > 30
MonetDB/X100 overview
22
MonetDB in action
SELECT FROM WHERE id, name, (age-30)*50 as bonus people age > 30
MonetDB/X100 overview
23
MonetDB in action
SELECT FROM WHERE
CPU Efficiency depends on nice id, name, (age-30)*50 as bonus - out-of-order execution people - few dependencies (control,data) age > 30 - compiler support
code
int gt ( select_g _float( {
Compilers love simple loops over arrays oid* res, , -l loop-pipelining i li i float* column, - automatic SIMD float val, int n)
for(int j=0 i=0; i<n; i++) j=0,i=0; if (column[i] >val) res[j++] = i; return j;
Simple, hardSimple hard coded semantics in operators
MonetDB/X100 overview
24
MonetDB: a column store
save disk I/O when scan-intensive queries / q need a few columns avoid an expression interpreter to improve id i i t t t i computational efficiency
Simple algebra Monet Interpreter Language (MIL) Hard-coded operator semantics no function calls Array-like processing
MonetDB/X100 overview
25
MonetDB problem p
SELECT FROM WHERE id, name, (age-30)*50 as bonus people age > 30
MATERIALIZED intermediate results
MonetDB/X100 overview
26
Materialization problem p
Extra main-memory bandwidth y
Performance is sub-optimal but still faster than anything else (5 years ago )
Reduces scalability
Cant afford writing to disk Only effective for limited data sizes and not all query types
MonetDB/X100 overview
27
MonetDB: a Faustian Pact
You want efficiency y
Simple hard-coded operators
I take scalabilit scalability
Result materialization and XQuery Supports SQL
Open-source download: C program: 0.2s monetdb.cwi.nl MonetDB: 3.7s 3 7s MySQL: DBMS X:
MonetDB/X100 overview
26.2s 28.1s
28
MonetDB/X100 /
My PhD thesis y Motivation:
lets fi M l fix MonetDB scalability DB l bili and improve the performance on the way
Core ideas:
New e e u o model e execution ode High performance column storage
MonetDB/X100 overview
29
Typical Relational DBMS Engine yp g
Query
SELECT name, salary .19 salary*.19 AS tax FROM employee WHERE age > 25
MonetDB/X100 overview
30
MonetDB/X100: Vectors /
MonetDB/X100 overview
31
MonetDB/X100: Vectors o et / 00 ecto s
Vector contains data of multiple tuples (~100-1000) All operations work on entire vectors Effect: much less operator.next() and primitive calls.
MonetDB/X100 overview 32
Vectors
Column slices as unary arrays
NOT: Vertical is a better table storage layout than horizontal (though we still think it often is) RATIONALE: - Simple array operations are p y p well-supported by compilers - SIMD friendly layout - Assumed cache-resident
MonetDB/X100 overview 33
Vectorized Primitives
int select_lt_int_col_int_val ( int *res, t es, int *col, int val, int n) { for(int j i 0; i<n; i++) j=i=0; if (col[i] < val) res[j++] = i; return j; }
Most primitives take just 0.5 (!) to 10 cycles per tuple 10-100+ times faster than tuple-at-a-time t l t ti
MonetDB/X100 overview
34
MonetDB/X100 /
Both efficiency y
Vectorized primitives
and scalability scalabilit
Pipelined query evaluation
C program: MonetDB/X100: MonetDB: MySQL: DBMS X:
MonetDB/X100 overview
0.2s 0.6s 3.7s 26.2s 28.1s 28 1s
35
Memory Hierarchy y y
X100 query engine
CPU cache h
RAM
ColumnBM (buffer manager) (raid) Disk(s)
MonetDB/X100 overview
36
Optimal Vector size? p
X100 query engine
All vectors together should fit the CPU cache Depends on the query
CPU cache h
RAM
ColumnBM (buffer manager) (raid) Disk(s)
MonetDB/X100 overview
37
Varying the Vector size y g
Less and less operator.next() and primitive function calls (interpretation overhead) ( interpretation overhead )
MonetDB/X100 overview
38
Varying the Vector size y g
Vectors start to exceed the CPU cache, causing additional memory traffic
MonetDB/X100 overview
39
MonetDB/MIL materializes co u s o et / ate a es columns
X100 query engine MonetDB/MIL
CPU cache h
RAM
ColumnBM (buffer manager) (raid) Disk(s)
MonetDB/X100 overview
40
Why is X100 so fast? y
Reduced interpretation overhead
100x less Function Calls
Good CPU cache use
High locality in the primitives g y p Cache-conscious data placement
No Tuple Navigation
Primitives only see arrays
Vectorization allows algorithmic optimization CPU and compiler-friendly function bodies
Multiple work units, loop-pipelining, SIMD
MonetDB/X100 overview
41
Feeding the Beast g
X100 uses < 100 cycles p tuple for TPC-H Q1 y per p Q
Q1 has ~30 bytes of used columns per tuple 3GHz CPU core eats 900MB/s No problem for RAM But disk-based data?
MonetDB/X100 overview
42
Using Disk in the 21th century
Poor random disk access needs to be compensated with more and more disk heads. (tens, hundreds thousands!)
Youre better off with scanning!
MonetDB/X100 overview
43
Using Disk in Data Warehousing g g
Goals: 1. 1 2. 3. Scan-based disk access *only* (full or partial scans) S b d di k * l * (f ll ti l ) Minimize bandwidth Benefit, not suffer from concurrency
Database strategies: replicate tables in multiple orders (goal 1) p p (g ) clustering join-tables in foreign-key order (goal 1) keep dimension tables in RAM (goal 1&2) scan-optimized indices (goal 1&2) use a column-store (goal 2) increase disk bandwidth with lightweight compression (goal 2) coordinate concurrent disk access (goals 2&3) di di k ( l
MonetDB/X100 overview 44
Feeding the Beast (1) g ( )
Two ideas pursued: p
Lightweight compression to enhance disk bandwidth Maximizing disk scan sharing in concurrent queries.
MonetDB/X100 overview
45
Compression to improve I/O bandwidth
0.9GB/s q y consumption / query p 1/3 CPU for decompression 1.8GB/s needed
new lightweight compression schemes
MonetDB/X100 overview 46
Key Ingredients y g
Compress relations on a per-column basis
Easy to exploit redundancy
Keep data compressed in main-memory main memory
More data can be buffered
Decompress vector at a time
Minimize main-memory overhead
Use light-weight, CPU-efficient algorithms
Exploit processing power of modern CPUs
MonetDB/X100 overview 47
Results
MonetDB/X100 overview
48
TPC-H 100 GB
Decent improvement with fast disks
TPC-H query Compression ratio 01 03 04 05 06 07 4.33 3.04 8.15 3.81 4.39 1.71 MonetDB/X100 on 1 CPU 4 disks Speedup 4.41 3.10 7.58 3.55 4.50 1.66 Time (s) 69.6 11.3 2.4 15.3 10.7 72.0 12 disks Speedup 1.29 1.48 2.67 1.06 2.35 0.84 Time (s) 50.9 6.0 1.8 16.2 4.6 40.8 DB2 8 CPUs 142 disks Time (s) 111.9 15.1 12.5 84.0 17.1 86.5
Linear speedup MonetDB/X100 overview with slow disks
Competes with DB2 using ~10x less resources 49
Feeding the Beast (2) g ( )
Two ideas pursued: p
Lightweight compression to enhance disk bandwidth Maximizing disk scan sharing in concurrent queries.
MonetDB/X100 overview
50
Concurrent scans
Multiple queries p q scanning the same table
Different start times Different scan ranges
Compete for disk access and buffer space FCFS request C S scheduling: poor latency
MonetDB/X100 overview 51
Normal scans in real life
MonetDB/X100 overview
52
Shared scans
Observation: queries q often do not need data in a sequential order q Idea: make queries share the scanning share process Two existing types:
Attach Elevator
MonetDB/X100 overview 53
Attach in real life
MonetDB/X100 overview
54
Elevator in real life
MonetDB/X100 overview
55
Existing shared scans g
Benefits
Less I/O operations Better data reuse
Problems
Sharing decisions static (when a query starts) Misses opportunities in a dynamic environment Not sensitive to different query types
MonetDB/X100 overview
56
Relevance scans
Core ideas
Dynamically adapt to the current situation Allow fully arbitrary data order
Goals:
Maximize data sharing Optimize latency and throughput Work for different types of queries
MonetDB/X100 overview
57
Relevance in real life
MonetDB/X100 overview
58
Results
MonetDB/X100 overview
59
Conclusion
Presented MonetDB/X100
A new database kernel developed at CWI d t b k ld l d t Uses block-oriented iterator model (vectorization) works amazingly well
So fast
must reduce hunger for hard disks
Column storage specialized in sequential access g p q + Lightweight compression schemes (give ~~ factor 3) + Cooperative Bandwidth Sharing (gives ~~ factor 2)
Good performance results
Fastest raw 100GB TPC-H performance around (** not fair) Beats IR systems on Terabyte TREC
MonetDB/X100 overview 60
Literature
MonetDB
P.A. Boncz. MIL Primitives for Querying a Fragmented World. VLDB Journal, 1999. P.A. Boncz. Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Ph.d. thesis, 2002.
MonetDB/X100
P.A. Boncz, M.Zukowski, N.Nes. MonetDB/X100: Hyper-pipelining Query Execution, CIDR 2005. M. Zukowski S. Heman, N. Nes, P A Boncz Super Scalar RAM-CPU M Zukowski, S Heman N Nes P.A. Boncz. Super-Scalar RAM CPU Cache Compression. ICDE 2006. Compression 2006 M. Zukowski, S. Heman, N.Nes, P.A. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS, VLDB 2007.
Other
S. Padmanabhan, T. Malkemus, R. Agarwal, A. Jhingran. Block oriented processing of relational database operations in modern computer architectures. ICDE 2001. J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. ICDE 1998. M. Stonebraker, et al. C-Store: A Column Oriented DBMS. VLDB 2005. D.J. Abadi, S.R. Madden, M.C. Ferreira. Integrating Compression and Execution in Column Oriented Database Column-Oriented Systems. SIGMOD 2006. S. Harizopoulos, V. Liang, D. Abadi, S. Madden. Performance Tradeoffs in Read-Optimized Databases. VLDB 2006. D.J. Abadi, D.S. Myers, D.J. DeWitt, S.R. Madden. Materialization Strategies in a Column-Oriented DBMS. ICDE 2007. C.A. L C A Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, K. Wong. Increasing buffer-locality for multiple B Bh tt h j T M lk S P d bh K W I i b ff l lit f lti l relational table scans through grouping and throttling. ICDE 2007.
MonetDB/X100 overview
61
The End
Thank you! Questions?
MonetDB/X100 overview
62