INFO445 Advanced Database Design, Management, and Maintenance
Lecture 5: Advanced RDBMS
Professor Melody Y. Ivory-Ndiaye
Admin
Upcoming assignment(s)
Next Monday by noon
Lab Assignment #2
Exam #1 (wks 1-5)
Next Thursday
Last Time
How data and indexes are represented in a RDBMS
What is the system catalog?
Query processing is a complex process
What are the components? What are the steps?
Need to explore alternative ways to execute query and choose the most efficient approach (query evaluation plan)
How is a QEP determined? What is the benefit of understanding QEPs?
Outline
Interface
QEPs Storing data
SQL Commands
Reproduced from Database Management Systems, by Ramakrishnan and Gehrke, pg 20
Activity I
Activity II
Steps in Typical QEP Optimization
1. Deconstruct conjunctive (i.e., AND) selections into a sequence of single selection operations 2. Move selection operations down the query tree for the earliest possible execution 3. Execute first those selection and join operations that will produce the smallest relations 4. Replace Cartesian product (i.e., cross-product) operations that are followed by a selection condition by join operations 5. Deconstruct and move as far down the tree as possible lists of projection attributes, creating new projections where needed 6. Identify those subtrees whose operations can be pipelined, and execute them using pipelining
Break
Storing Data
How is data stored in a computer system and how does it affect DB design? How is physical storage structured and how does it affect DB design? How does a DBMS manage space on disks?
10
Data Storage in Computer Systems
Memory hierarchy
How does DBMS use each level? Implications?
CPU Primary storage Secondary storage Tertiary storage
???
11
Data Storage in Computer Systems
DBMS stores information on (hard) disks
Referred to as physical storage
This has major implications for DBMS design
READ: transfer data from disk to main memory (RAM) WRITE: transfer data from RAM to disk Both are high-cost operations, relative to in-memory operations, so must be planned carefully!
12
Physical Storage (Disks)
Support random access vs. sequential Data is stored and retrieved in units called disk blocks or pages Unlike RAM, time to retrieve a disk page varies depending upon its location on disk
Relative placement of pages on disk has major impact on DBMS performance!
13
Structure of Disks
The platters spin (say, 90rps) The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder Only one head reads/writes at any one time Disk block or page size is a multiple of sector size (which is fixed)
Access time = seek time + rotational delay + transfer time
Spindle Tracks
Disk head
Sector
Arm movement
Platters
Arm assembly
How does a DBMS take advantage of the structure of disks?
14
Disk Space Manager
Lowest layer of DBMS software manages space on disk Higher levels call upon this layer to:
allocate/de-allocate a page read/write a page
Request for a sequence of pages must be satisfied by allocating the pages sequentially on disk!
Keeps track of free disk space
15
Buffer Manager
Minimize disk accesses
Data must be in RAM for DBMS to operate on it Maintain table of <frame#, pageid> pairs
disk page free frame MAIN MEMORY DISK
Page Requests from Higher Levels
BUFFER POOL
DB
choice of frame dictated by replacement policy
16
Data Storage in DBMS
DB fields are organized into records
Different record formats (fixed- or variable-length)
Pages store collections of records
Has slots for x records
Depends on record format
Maintains a directory (index) of slot usage (pointers to records)
Files store collections of pages
Maintains a directory (index) of pages (pointers to pages)
17
Indexes
Data structure to organize data records on disk (hash, tree)
Optimize retrieval operations Retrieve records satisfying search conditions on search key fields
Different types of indexes
Primary index on a set of fields that includes the primary key Secondary index on other sets of fields Clustered data organized in same order as index
18
Activity III
19
In Summary
QEPs
Optimization heuristics? How can understanding them help us to write more efficient queries?
Storing data
How data is stored and its implications? What components play a role?
20
21