Advanced Databases
8 Physical data structures and query optimization
Physical data structures and query optimization
Study of inside DB technology: why?
DBMSs provide transparent services: So transparent that, so far, we could ignore many implementation details! So far DBMSs have always been a black box So why should we open the box? Knowing how it works may help to use it better Some services are provided separately
Physical data structures and query optimization
DataBase Management System DBMS
A system (software product) capable of managing data
collections which are: large ((much) larger than the central memory available on the computers that run the software) persistent (with a lifetime which is independent of single executions of the programs that access them) shared (in use by several applications at a time) guaranteeing reliability (i.e. tolerance to hardware and software failures) and privacy (by disciplining and controlling all accesses).
Physical data structures and query optimization
Access and query manager
SQL
Query manager Access methods manager Buffer manager Secondary memory manager Secondary Memory
Physical data structures and query optimization
Technology of DBMSs - topics
Query management ("optimization") Physical data structures and access structures Buffer and secondary memory management Reliability control Concurrency control Distributed architectures
Physical data structures and query optimization
Main and Secondary memory (1)
Programs can only refer to data stored in main memory Databases must be stored (mainly) in secondary memory for two reasons: size persistence Data stored in secondary memory can only be used if first transferred to main memory (which explains the terms "main" and "secondary")
Physical data structures and query optimization
Main and Secondary memory (2)
Secondary memory devices are organized in blocks of (usually) fixed length (order of magnitude: a few KBs) The only available operations for such devices are reading and writing one page, i.e. the byte stream corresponding to a block; For convenience and simplicity, we will use block and page as synonyms
Physical data structures and query optimization
Main and Secondary memory (3)
Secondary memory access:
seek time (8-12ms) - head positioning latency time (2-8ms) - disc rotation transfer time (~1ms) - data transfer
as an average, hardly less than 10 ms overall The cost of an access to secondary memory is 4 orders of magnitude higher than that to main memory In "I/O bound" applications the cost exclusively depends on the number of accesses to secondary memory
Physical data structures and query optimization
Main and Secondary memory (4)
New storage technologies: SSD Only transfer time Different model, (surprisingly, apparently) similar performance (for DB-like loads) Continuous writes stress the erasing process (that is the weak link) Other approaches Main memory databases again, different cost models Efficient when read operations significantly exceed writes
Physical data structures and query optimization
10
DBMS and file system (1)
The File System (FS) is the component of the Operating Systems which manages access to secondary memory DBMSs make limited use of FS functionalities: to create and delete files and for reading and writing single blocks or sequences of consecutive blocks. The DBMS directly manages the file organization, both in terms of the distribution of records within blocks and with respect to the internal structure of each block.
Physical data structures and query optimization
11
DBMS and file system (2)
The DBMS manages the blocks of allocated files as if they were a single large space in secondary memory. It builds in such space the physical structures with which tables are implemented. A file is typically dedicated to a single table, but. It may happen that a file contains data belonging to more than one table and that the tuples of one table are split in more than one file.
Physical data structures and query optimization
12
Blocks and records
Blocks (the "physical" components of a file) and records (the "logical" components) generally have different size: The size of a block depends on the file system The size of a record depends on the needs of applications and is normally variable within a file
Physical data structures and query optimization
13
Block Factor
The number of records within a block SR: Size of a record (assumed constant in the file for simpicity: "fixed length record") SB: Size of a block if SB > SR, there may be many records in each block:
SB / SR
The rest of the space can be used ("spanned" records (or "hung-up" records)) non used ("unspanned" records)
Physical data structures and query optimization
14
Physical access structures
Used for the efficient storage and manipulation of data within the DBMS Encoded as access methods, that is, software modules providing data access and manipulation primitives for each physical access structure Each DBMS has a distinctive and limited set of access methods We will consider three types of data access structures: Sequential Hash-based Tree-based (or index-based)
Physical data structures and query optimization
15
Organization of tuples within pages
Each access method has its own page organization In the case of sequential and hash-based methods each page has: An initial part (block header) and a final part (block trailer) containing control information used by the file system An initial part (page header) and a final part (page trailer) containing control information about the access method A page dictionary, which contains pointers to each item of useful elementary data contained in the page A useful part, which contains the data. In general, the page dictionary and the useful data grow as opposing stacks A checksum, to detect corrupted data Tree structures have a different page organization
Physical data structures and query optimization
16
Organization of tuples within pages
page dictionary useful part of the page checksum
*t1 *t2 *t3
tuple t3
tuple t2 stack
tuple t1
stack
Page-headerBlock-header-
control information about the access method control information used by the file system
-trailer -trailer
Physical data structures and query optimization
17
Page manager primitives
Insertion and update of a tuple may require a reorganization of the page (there is enough space to store the extra bytes) or even usage of a new page (if there is not enough space) Deletion of a tuple often carried out by marking the tuple as invalid Access to a field of a particular tuple after identifying the tuple by means of its key or its offset, the field is identified according to the offset and the length of the field itself
Physical data structures and query optimization
18
Sequential structures
Characterized by a sequential arrangement of tuples in the secondary memory Three cases: entry-sequenced, array, sequentially-ordered In an entry-sequenced organization, the sequence of the tuples is dictated by their order of entry In an array organization, the tuples (all of the same size) are arranged as in an array, and their positions depend on the values of an index (or indexes) In a sequentially-ordered organization, the position of each tuple in the sequence depends on the value of a key field, that induces the ordering
Physical data structures and query optimization
19
Entry-sequenced sequential structure
Optimal for carrying out sequential reading and writing operations Optimal for space occupancy, as it uses all the blocks available for files and all the space within the blocks Non optimal with respect to searching specific data units updates that increase the size of a tuple
Physical data structures and query optimization
20
Array sequential structure
Possible only when the tuples are of fixed length Made of n adjacent blocks, each block with m slots available to store m tuples Each tuple has a numeric index i and is placed in the i-th position of the array
Physical data structures and query optimization
21
Sequentially-ordered sequential structure
Each tuple has a position based on the value of the key field Historically, such structures were used on sequential devices (tapes). This has fallen out of use, but for data streams and system logs The main problems are insertions or updates which increase the physical space - they require reordering techniques for the tuples already present: Options to avoid global reorderings:
Differential files (example: yellow pages) Leaving a certain number of slots free at the time of first loading, followed by local reordering operations Integrating the sequentially ordered files with an overflow file, where new tuples are inserted into blocks linked to form an overflow chain
Physical data structures and query optimization
22
Hash-based access structures
Ensure an efficient associative access to data, based on the value of a key field A hash-based structure has B blocks (often adjacent) A hash algorithm is applied to the key field and returns a value between zero and B-1. This value is interpreted as the position of the block in the file, and used both for reading and writing the block This is the most efficient technique for queries with equality predicates, but it is rather inefficient for queries with interval predicates
Physical data structures and query optimization
23
Features of hash-based structures
Primitive interface: hash(fileId,Key):BlockId The implementation consists of two parts. folding, transforms the key values so that they become positive integer values, uniformly distributed over a large range hashing transforms the positive binary number into a number between zero and B - 1 Optimal performance if the file is larger than necessary. Let: T be the number of tuples expected for the file, F be the average number of tuples stored in each page; then a good choice for B is T/(0.8 x F), using only 80% of the available space
Physical data structures and query optimization
24
Collisions
Collisions occur when the same block number is associated to too many tuples. They are critical when the maximum number of tuples per block is exceeded Collisions are solved by adding an overflow chain This gives the additional cost of scanning the chain The average length of the overflow chain is a function of the ratio T/(F x B) and of the average number F of tuples per page:
.5 .6 .7 .8 .9 T/(FxB) 1 0.5 0.75 1.167 2.0 4.495 2 0.177 0.293 0.494 0.903 2.146 3 0.087 0.158 0.286 0.554 1.377 5 0.031 0.066 0.136 0.289 0.777 10 0.005 0.015 0.042 0.110 0.345
Physical data structures and query optimization
25
An example
40 records hash table with 50 positions: 1 collision of 4 values 2 collisions of 3 values 5 collisions of 2 values
M 60600 66301 205751 205802 200902 116202 200604 66005 116455 200205 201159 205610 201260 102360 205460 205912 205762 200464 205617 205667
M mod 50 0 1 1 2 2 2 4 5 5 5 9 10 10 10 10 12 12 14 17 17
M 200268 205619 210522 205724 205977 205478 200430 210533 205887 200138 102338 102690 115541 206092 205693 205845 200296 205796 200498 206049
M mod 50 18 19 22 24 27 28 30 33 37 38 38 40 41 42 43 45 46 46 48 49
Physical data structures and query optimization
26
About hashing
Performs best for direct access based on equality for values of the key Collisions (overflow) are typically managed with linked blocks into an area called overflow file Inefficient for access based on interval predicates or based on the value of non-key attributes Hash files "degenerate" if the extra-space is too small (should be at least 120% of the minimum required space) and if the file size changes a lot over time
Physical data structures and query optimization
27
Tree structures
The most frequently used in relational DBMSs SQL indexes are implemented in this way Gives associative access based on the value of a key no constraints on the physical location of the tuples Note: the primary key of the relational model and the keys for hash-based and tree structures are different concepts
Physical data structures and query optimization
28
Index file
Index: an auxiliary structure for the efficient access to the records of a file based upon the values of a given field or record of fields called the index key The index concept: analytic index of a book, seen as a pair (term-page list), alphabetically ordered, at the end of a book The index key is not a primary key!
Physical data structures and query optimization
29
Tree structures
first level
root node
paolo
Each tree has: one root node several intermediate nodes several leaf nodes
mauro
renzo
second level
bice
dino
mauro
paolo
renzo
teresa
Each node corresponds to a block
Pointers to tuples (arbitrarily organized)
The links between the nodes are established by pointers to mass memory In general, each node has a large number of descendants (fan out), and therefore the majority of pages are leaf nodes In a balanced tree, the lengths of the paths from the root node to the leaf nodes are all equal. Balanced trees give optimal performance.
Physical data structures and query optimization
30
Structure of the tree nodes
P0
K 1 P1
.....
K ii
Pi
.....
K F PF
sub-tree with keys K K1
sub-tree with keys K i K K i+1
sub-tree with keys K KF
Physical data structures and query optimization
31
Primary vs. secondary indexes
Indexes can be used as a primary access structure: Tuples are stored in the index nodes or in a file ordered according to the index key (also: clustered index)
Possibly a sparse index: with less index entries than the number of tuples of the file, as tuples are ordered in the file
Indexes are (more) often used as secondary access structures Tuples are stored according to another structure (hashed, entry-sequenced, another index with a different key) The index nodes only contain key values and pointers
Necessarily a dense index: one index entry pointing to every tuple in the file is required (or the tuple is lost)
Physical data structures and query optimization
32
B and B+ trees
B+ trees The leaf nodes are linked in a chain ordered by the key Supports interval queries efficiently The most used by relational DBMSs B trees No sequential connection for leaf nodes Intermediate nodes use two pointers for each key value Ki one points directly to the block that contains the tuple corresponding to Ki the other points to a sub-tree with keys greater than Ki and less than Ki+1
Physical data structures and query optimization
33
An example of B+ tree
root node first level
mauro paolo
Pointers to index nodes
renzo
second level
bice dino mauro paolo renzo teresa
Pointers to blocks of tuples (arbitrarily organized)
Physical data structures and query optimization
34
An example of B tree
k1 k6 k10
k2
k3
k4
k5
k7
k8
k9
t(k2)
t(k3)
t(k4)
t(k5) t(k1)
t(k6) t(k10)
t(k7)
t(k8)
t(k9)
Physical data structures and query optimization
35
Search technique
Looking for a tuple with key value V, at each intermediate node:
if V < K1 follow P0 if V KF follow PF otherwise, follow Pj such that Kj V < Kj+1
P0 K 1 P1 ..... K ii Pi ..... K F PF
sub-tree with keys K K1
sub-tree with keys K i K K i+1
sub-tree with keys K KF
The leaf nodes can be organized in two ways:
In key-sequenced trees tuples are contained in the leaves In indirect trees leaf nodes contain pointers to the tuples, allocated with any other primary mechanism (entrysequenced, hash, key-sequenced, ...)
Physical data structures and query optimization
36
Split and Merge operations
SPLIT: required when the insertion of a new tuple cannot be done locally to a node Causes an increase of pointers in the superior node and thus could recursively cause another split MERGE: required when two close nodes have entries that could be condensed into a single node. Done in order to keep a high node filling and minimal paths from the root to the leaves. Causes a decrease of pointers in the superior node and thus could recursively cause another merge
Physical data structures and query optimization
k1 k1 k6 k2 k3 k4 k6 k3 k4 k5 k5
37
Initial situation
Split and merge
a. insert k3: split k1 k1 k2
b. delete k2: merge k1 k1 k6 k3 k4 k5
Physical data structures and query optimization
38
Index usage
Syntax in SQL: create [unique] index IndexName on TableName(AttributeList) drop index IndexName Every table should have: A primary index, with key-sequenced structure, normally unique, on the primary key Several secondary indexes, both unique and not unique, on the attributes most used for selections and joins They are progressively added, checking that the system actually uses them, and without excess
Physical data structures and query optimization
39
Query optimization
Optimizer: an important module in the architecture of a DBMS It receives a query written in SQL and produces an access program in object or internal format, which uses the data access methods. Steps: Lexical, syntactic and semantic analysis Translation into an internal representation Algebraic optimization Cost-based optimization Code generation
Physical data structures and query optimization
40
Internal representation of queries
A tree representation, similar to that of relational algebra: Leaf nodes correspond to the physical data structures (tables, indexes, files). intermediate nodes represent physical data access operations that are supported by the access methods Typical operations include sequential scans, orderings, indexed accesses and various methods for evaluating joins and aggregate queries, as well as materialization choices for intermediate results
Physical data structures and query optimization
41
Query optimization input-output
Input: query in SQL SELECT R.a FROM R,S,T WHERE R.a=S.a AND R.b=T.b Output: execution plan project select(R.a=S.a, R.b=T.b) cartProd R S cartProd T dupElim&project build3 probe2 build2 probe1 scan T scan S build1 scan R
Physical data structures and query optimization
42
Approaches to query execution
Compile and store: the query is compiled once and executed many times The internal code is stored in the DBMS, together with an indication of the dependencies of the code on the particular versions of catalog used at compile time On relevant changes of the catalog, the compilation of the query is invalidated and repeated Compile and go: immediate execution, no storage Even if not stored, the code may live for a while in the DBMS and be available for other executions
Physical data structures and query optimization
43
Relation profiles
Profiles contain quantitative information about tables and are stored in the data dictionary: the cardinality (number of tuples) of each table T the dimension in bytes of each attribute Aj in T the number of distinct values of each attribute Aj in T the minimum and maximum values of each attribute Aj in T Periodically calculated by activating appropriate system primitives (for example, the update statistics command) Used in cost-based optimization for estimating the size of the intermediate results produced by the query execution plan
Physical data structures and query optimization
44
Sequential scan
Performs a sequential access to all the tuples of a table or of an intermediate result, at the same time executing various operations, such as: Projection to a set of attributes Selection on a simple predicate (of type: Ai = v) Sort (ordering) Insertions, deletions, and modifications of the tuples currently accessed during the scan Primitives: Open, next, read, modify, insert, delete, close
Physical data structures and query optimization
45
Sort
This operation is used for ordering the data according to the value of one or more attributes. We distinguish: Sort in main memory, typically performed by means of ad-hoc algorithms Sort of large files, which can not be transferred to main memory, performed by merging smaller parts with already sorted parts
Physical data structures and query optimization
46
Indexed access
Indexes are used when queries include: simple predicates (of the type Ai = v) interval predicates (of the type v1 Ai v2) These predicates are said to be supported by indexes built on Ai With conjunctions of supported predicates, the DBMS chooses the most selective supported predicate for the primary access, and evaluates the other predicates in main memory With disjunctions of predicates:
if any of them is not supported a scan is needed; if all are supported, indexes can be used (on all of them) and then duplicate elimination is normally required
Physical data structures and query optimization
47
Join Methods
Joins are the most frequent (and costly) operations in DBMSs There are several methods for join evaluation, among which: nested-loop, merge-scan and hashed. These three methods are based on scanning, hashing, and ordering.
Physical data structures and query optimization
48
Nested-loop join
External table A
External scan
Internal table A a -----------------------------
----------------
a
Internal scan or indexed access
---------------
Physical data structures and query optimization
49
Merge-scan join
Left Table A a b b c c e f Left scan Right scan A a a b c e e g Right Table
-------------------------------
---------------
Physical data structures and query optimization
50
Hashed join
a Left Table hash(a) Right Table hash(a)
A
d e a c
A
e m a a
j j
j z
Physical data structures and query optimization
51
Cost-based optimization
An optimization problem, whose decisions are: The data access operations to execute (e.g., scan vs index access) The order of operations (e.g., the join order) The option to allocate to each operation (e.g., choosing the join method) Parallelism and pipelining can improve performances Further options appear in selecting a plan within a distributed context
Physical data structures and query optimization
52
Approach to query optimization
Optimization approach: Make use of profiles and of approximate cost formulas Construct a decision tree, in which each node corresponds to a choice; each leaf node corresponds to a specific execution plan. Assign to each plan a cost: Ctotal = CI/O nI/O + Ccpu ncpu Choose the plan with the lowest cost, based on operations research (branch and bound) Optimizers should obtain good solutions in a very short time
Physical data structures and query optimization
53
An example of decision tree
R 1 S 2 T
(R
S)
(R
T)
(S
T)
nested-loop, 1 R internal
nested-loop R external
merge-scan
hash-join, 1 hash on R
hash-join, 1 hash on S
nested-loop, 2 T internal strategy 1
nested-loop, T external strategy 2
merge-scan
hash-join, 2 hash on T strategy 4
hash-join, 2 hash on (R strategy 5
S)
strategy 3
.............
Physical data structures and query optimization
54
Query processing components
User
Query
Catalog Statistic profiles Analyzer Internal representation Optimizer Execution plan Execution layer
Result
Mass memory: - Data - Indexes
Physical data structures and query optimization
55
Centralized architecture (DBMS)
User
DBMS
Catalog Statistic profiles
Query
Analyzer Internal representation Optimizer Execution plan Execution layer
Result
Mass memory: - Data - Indexes
Physical data structures and query optimization Distributed database with master-slave optimization
56
Site S1 (master)
Distributed catalogs Distributed profiles Analyzer Optimizer Distributed execution plan
User
Site S2 (slave)
Execution layer
Database Site S3 (slave)
Execution layer
Execution layer
Database
Database
Physical data structures and query optimization Distributed optimization with negotiation
57
Site S1
Distributed catalogs Distributed profiles
User
Analyzer Optimizer Distributed execution plan Execution layer
Site S2
Optimizer
Execution layer
Distributed negotiation
Database
Execution layer
Optimizer
Database
Site S3
Database
Physical data structures and query optimization Distributed system with mediator and wrappers
58
Mediator
User
Wrapper W1
(Web source)
Analyzer Optimizer Distributed execution plan Execution layer
Execution environment
Wrapper W2
(file system)
Execution environment
Database
Wrapper W3
(program)
Execution environment
Physical data structures and query optimization
59
Overall view: components of a DBMS
USER LOG QUERY ANALYZER OPTIMIZER [TRANSACTION] ACCESS MANAGER FILE MANAGER CONCURRENCY MANAGER LOCK TABLES RELIABILITY MANAGER DUMP
BUFFER MANAGER
STATISTICS USER DATA
INDEXES SYSTEM DATA
MAIN MEMORY