DBMS Chapter 9 - Exercise Question
Answers
Exercise 9.1
The most important difference between a disk and a tape is that disks support direct
(random) access while tapes only support sequential access. This makes disks significantly
faster and more suitable for DBMS workloads where frequent and random access to data is
needed.
Exercise 9.2
Seek time: The time taken by the disk arm to move the heads to the track where the data is
stored.
Rotational delay: The time it takes for the desired sector of the disk to rotate under the
read-write head.
Transfer time: The time taken to actually transfer data once the read-write head is
positioned correctly.
Exercise 9.3
Besides faster access, the key difference is the nature of volatility. Main memory is volatile
and loses data on power loss. In contrast, disk maintains persistence, retaining data even
after a power failure, which is essential for data durability in DBMS.
Exercise 9.4
To store a frequently scanned sequential file, pages should be stored in contiguous disk
blocks. Additionally, placing these blocks on the outer tracks can increase transfer rates due
to higher data density.
Exercise 9.5
1. Capacity of a track = 512 bytes/sector * 50 sectors = 25,600 bytes
Capacity per surface = 25,600 bytes * 2000 tracks = 51,200,000 bytes (51.2 MB)
Total capacity = 51.2 MB * 10 surfaces = 512 MB
2. Number of cylinders = 2000 (equal to number of tracks per surface)
3. Valid block sizes must be a multiple of 512 bytes. So, 256 is invalid, 2048 and 51200 are
valid.
4. Max rotational delay = 1 rotation = 60 sec / 5400 rpm = 11.11 ms
5. Transfer rate = 25,600 bytes per 11.11 ms ≈ 2.3 MB/sec
Exercise 9.6
1. 1024 / 100 = 10 records per block
2. 100,000 / 10 = 10,000 blocks. One surface holds 2000 tracks * 50 blocks = 100,000 blocks
(each track has 50 blocks), so 1 surface is sufficient.
3. 512 MB / 100 = 5,242,880 records
4. Block 1 of track 1 on next surface would hold page 2 if not parallel. If heads operate in
parallel, all block 1s on each surface can be read simultaneously.
5. Sequential read = 10,000 blocks * 11.11 ms ≈ 111 sec (if no parallelism). With parallel
heads, ≈ 11.1 sec.
6. Random read = 10,000 * (seek + rotational delay) = 10,000 * (10 + 5.5) ms ≈ 155 seconds.
Exercise 9.7
If a page is in the buffer but not pinned, it can be reused by another process. If pinned, it
means it's actively being used and cannot be replaced or flushed.
Exercise 9.8
A buffer manager writes a page to disk when:
- The page is dirty and chosen for replacement.
- During checkpoints.
- When explicitly forced by a transaction commit or WAL requirement.
Exercise 9.9
A page is 'pinned' when in use by a transaction. It's unpinned when the operation is
completed. DBMS is responsible for pinning and unpinning via the buffer manager.
Exercise 9.10
When a page is modified, it's marked dirty. The buffer manager ensures it is flushed to disk
(write-back) when required — typically during replacement or commit. WAL ensures
changes are logged before they’re flushed.
Exercise 9.11
If all pages in the buffer are dirty, one must be written back to disk before replacement. This
can introduce delays unless writes are proactively managed.
Exercise 9.12
Sequential flooding occurs when a large sequential scan displaces all pages in the buffer
pool, pushing out other useful pages. LRU replacement policies are especially susceptible to
this issue.
Exercise 9.13
DBMS buffer managers support pinning, prefetching, forced writes, and management
policies tailored to query patterns — which typical OS buffer managers don’t provide.
Exercise 9.14
Prefetching is the act of loading pages before they are explicitly requested, based on
anticipated access patterns. It helps reduce wait times and improves sequential access
efficiency.
Exercise 9.15
1. Disk-controlled prefetching may not align with DBMS’s knowledge of query plans.
2. Multiple queries with different access patterns may evict each other’s pages.
3. DBMS-managed prefetching is context-aware and smarter.
4. Segmented disk caches help but don’t eliminate the need for DBMS prefetching.
Exercise 9.16
Two record formats:
1. Fixed-length: Simple and fast but wastes space with variable-length fields.
2. Variable-length with offsets: Space-efficient and flexible but requires more complex
access logic.
Exercise 9.17
Two page formats:
1. Slotted pages: Allow record movement, support variable-length records.
2. Fixed layout: Simple and fast but inflexible to changes.
Trade-off: Slotted pages are more flexible, fixed pages are faster.
Exercise 9.18
1. Fixed-size directory is simple but wastes space and limits capacity.
2. Sorting can be achieved by rearranging slot pointers rather than records, preserving
record IDs.
Exercise 9.19
1. List-based: Simple but hard to search; Directory-based: Easier navigation, scales better.
Directory preferred for variable-length.
2. Use slotted pages that can link to either full or free-space lists.
Exercise 9.20
1. All-pages and free-space pages list gives fast traversal but wastes memory. Full and free-
space pages list is more memory-efficient.
2. Use a slotted page layout with space tracking metadata.
Exercise 9.21
1. Place on inner track to reduce contention.
2. Outer track to benefit from higher sequential throughput.
3. Spread across mid-to-outer tracks for balanced access.
4. Outer tracks for best performance.
Exercise 9.22
Pin count tracks how many processes are using a page. A simple flag would not handle
multiple concurrent accesses correctly, whereas a count allows accurate tracking.
DBMS Chapter 9 Review Questions - Detailed Answers
1. 1. Explain the term memory hierarchy. What are the differences between primary,
secondary, and tertiary storage? Give examples of each. Which of these is volatile, and
which are persistent? Why is persistent storage more important for a DBMS than, say, a
program that generates prime numbers?
Memory hierarchy refers to the structured organization of different types of storage,
ordered by speed, cost, and size. It includes:
- Primary Storage: Fast and directly accessible by the CPU (e.g., RAM). It is volatile.
- Secondary Storage: Slower, persistent storage (e.g., HDD, SSD).
- Tertiary Storage: Used for archiving (e.g., magnetic tapes, optical disks). Slowest and
cheapest.
DBMSs require persistent storage because they need to retain data across sessions and
power failures, unlike a prime number generator which can regenerate results anytime.
2. 2. Why are disks used so widely in a DBMS? What are their advantages over main
memory and tapes? What are their relative disadvantages?
Disks are preferred in DBMSs for their large storage capacity, persistence, and relatively low
cost. Compared to RAM, disks are cheaper and non-volatile. Compared to tapes, disks offer
random access, which is essential for performance. However, disks are slower than RAM
and more fragile than tapes for long-term storage.
3. 3. What is a disk block or page? How are blocks arranged in a disk? How does this affect
the time to access a block? Discuss seek time, rotational delay, and transfer time.
A disk block/page is the minimum unit of data transfer between disk and memory. Blocks
are arranged in concentric tracks on platters, grouped into cylinders across surfaces. Access
time is impacted by:
- Seek Time: Time to move the head to the track.
- Rotational Delay: Time waiting for the block to rotate under the head.
- Transfer Time: Time to read/write the block data.
4. 4. Explain how careful placement of pages on the disk to exploit the geometry of a disk
can minimize the seek time and rotational delay when pages are read sequentially.
Sequentially storing pages on adjacent tracks or cylinders reduces seek time. Placing related
pages within the same cylinder or adjacent cylinders and aligning their positions across
surfaces reduces rotational delay and head movement, enhancing sequential read
performance.
5. 5. Explain what a RAID system is and how it improves performance and reliability.
Discuss striping and its impact on performance and redundancy and its impact on
reliability. What are the trade-offs between reliability and performance in the different
RAID organizations called RAID levels?
RAID (Redundant Array of Independent Disks) is a storage system combining multiple disks
for redundancy and/or performance. Striping distributes data across disks for faster access.
Redundancy (mirroring/parity) ensures data availability upon failure. Trade-offs:
- RAID 0: High performance, no redundancy.
- RAID 1: Mirroring, high reliability, low write performance.
- RAID 5/6: Balanced performance and fault tolerance using parity bits.
6. 6. What is the role of the DBMS disk space manager? Why do database systems not rely
on the operating system instead?
The disk space manager manages data allocation and layout on disk for efficiency. DBMSs
bypass OS to optimize disk usage patterns, control page layout, prefetching, caching, and to
support features like recovery, concurrency control, and logging.
7. 7. Why does every page request in a DBMS go through the buffer manager? What is the
buffer pool? What is the difference between a frame in a buffer pool, a page in a file, and
a block on a disk?
The buffer manager caches disk pages in main memory to minimize disk I/O. The buffer
pool is a memory area where pages are temporarily stored. A frame is a slot in the buffer
pool, a page is a logical DB unit, and a block is the physical unit on disk.
8. 8. What information does the buffer manager maintain for each page in the buffer pool?
What information is maintained for each frame? What is the significance of pin_count
and the dirty flag for a page? Under what conditions can a page in the pool be replaced?
Under what conditions must a replaced page be written back to disk?
Each frame has metadata: page ID, pin_count (active users), dirty flag (if modified). A page
with pin_count=0 can be replaced. If dirty, it must be flushed to disk before replacement to
ensure durability.
9. 9. Why does the buffer manager have to replace pages in the buffer pool? How is a page
chosen for replacement? What is sequential flooding, and what replacement policy
causes it?
When the buffer pool is full, pages are replaced. Replacement is based on policies like LRU.
Sequential flooding occurs when sequential reads evict all other useful pages. LRU causes
this by evicting least recently used pages that might still be needed.
10. 10. A DBMS buffer manager can often predict the access pattern for disk pages. How
does it utilize this ability to minimize I/O costs? Discuss prefetching. What is forcing,
and why is it required to support the write-ahead log protocol in a DBMS?
Prefetching loads pages in advance based on access patterns, reducing latency. Forcing
ensures dirty pages are written to disk before a commit, per Write-Ahead Logging (WAL), to
support recovery and maintain consistency.
11. 11. Why is the abstraction of a file of records important? How is the software in a DBMS
layered to take advantage of this?
It abstracts low-level storage, enabling modularity. DBMS layers: File manager (manages
pages), buffer manager (in-memory pages), record manager (manages records in files). This
separation enhances flexibility and maintainability.
12. 12. What is a heap file? How are pages organized in a heap file? Discuss list versus
directory organizations.
A heap file stores unordered records. Pages can be linked via a list or indexed in a directory:
- List: Simpler but slower to search.
- Directory: Faster search/update by storing page metadata.
13. 13. Describe how records are arranged on a page. What is a slot, and how are slots used
to identify records? How do slots enable us to move records on a page without altering
the record's identifier? What are the differences in page organizations for fixed-length
and variable-length records?
Records are stored in slots, which map record ID to its location. Slot table enables record
movement without changing IDs. Fixed-length records are simple to manage. Variable-
length records need extra metadata (offsets) and complicate free space management.
14. 14. What are the differences in how fields are arranged within fixed-length and variable-
length records? For variable-length records, explain how the array of offsets
organization provides direct access to a specific field and supports null values.
Fixed-length fields are stored consecutively. Variable-length fields use an offset array to
store field start positions, enabling direct access and null marking (e.g., by offset = -1). This
organization supports fast reads and flexible data structures.