0% found this document useful (0 votes)

53 views50 pages

Data Storage and Access Methods: Min Song IS698

The document discusses physical database design and access methods. It describes how physical records are stored on disk using blocks and how different access methods like sequential, indexed sequential, random, and hashed access work. It also covers key physical design decisions around storage format, data arrangement, indexes, and query optimization that impact performance.

Uploaded by

Mohammad Hamayun Shams

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views50 pages

Data Storage and Access Methods: Min Song IS698

Uploaded by

Mohammad Hamayun Shams

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 50

Data Storage and Access Methods

Min Song IS698

Database Design Process

Application 1 Application 2 Application 3 Application 4

External Model
Application 1

External Model

Conceptual requirements
Application 2

Conceptual requirements
Application 3

Conceptual requirements
Application 4

Conceptual Model

Logical Model

Internal Model

Conceptual requirements

Physical Design

Physical Database Design

Many physical database design decisions are implicit in the technology adopted Also, organizations may have standards or an information architecture that specifies operating systems, DBMS, and data access languages -- thus constraining the range of possible physical implementations. We will be concerned with some of the possible physical implementation issues

Physical Database Design

The primary goal of physical database design is data processing efficiency We will concentrate on choices often available to optimize performance of database services Physical Database Design requires information gathered during earlier stages of the design process

Physical Design Information

Information needed for physical file and database design includes:
Normalized relations plus size estimates for them Definitions of each attribute Descriptions of where and when data are used entered, retrieved, deleted, updated, and how often Expectations and requirements for response time, and data security, backup, recovery, retention and integrity Descriptions of the technologies used to implement the database

Physical Design Decisions

There are several critical decisions that will affect the integrity and performance of the system
Storage Format Physical record composition Data arrangement Indexes Query optimization and performance tuning

Storage Format
Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database Data Type (format) is chosen to minimize storage space and maximize data integrity

Objectives of data type selection

Minimize storage space Represent all possible values Improve data integrity Support all data manipulations The correct data type should, in minimal space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

Access Data Types

Numeric (1, 2, 4, 8 bytes, fixed or float) Text (255 max) Memo (64000 max) Date/Time (8 bytes) Currency (8 bytes, 15 digits + 4 digits decimal) Autonumber (4 bytes) Yes/No (1 bit) OLE (limited only by disk space) Hyperlinks (up to 64000 chars)

Access Numeric types

Byte Integer

Stores numbers from 0 to 255 (no fractions). 1 byte

Stores numbers from 32,768 to 32,767 (no fractions) 2 bytes Long Integer (Default) Stores numbers from 2,147,483,648 to 2,147,483,647 (no fractions). 4 bytes Single Stores numbers from -3.402823E38 to 1.401298E45 for negative values and from 1.401298E45 to 3.402823E38 for positive values. 4 bytes Double Stores numbers from 1.79769313486231E308 to 4.94065645841247E324 for negative values and from 1.79769313486231E308 to 4.94065645841247E324 for positive values. 15 8 bytes Replication ID Globally unique identifier (GUID) N/A 16 bytes

Designing Physical Records

A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit Fixed Length and variable fields

Data Storage
Storing Data: Disks Buffer manager Representing relational data in a disk

The Memory Hierarchy

Main Memory = Disk Cache Processor Cache: Volatile access time 10 nanos 256M-1G 512K Access time: 10-100 nanoseconds Disk Tape Persistent 1.5 MB/S transfer rate 10-100 GB storage 280 GB typical speed: capacity Rate=5-10 MB/S Only sequential access Access time= Not for operational 10-15 msecs. data

Main Memory
Fastest, most expensive (excluding cache) Today: 512MB are common even on PCs Many databases could fit in memory
New industry trend: Main Memory Database E.g TimesTen

Main issue is volatility

Secondary Storage
Disks Slower, cheaper than main memory Persistent !!! The unit of disk I/O = block
Typically 1 block = 4k A disk block is also called a disk page or simply a page

Used with a main memory buffer

Block
Blocking factor (bfr) for a file is the average number of records stored in a disk block. Suppose the block size of a database system is 2000 bytes. Customer table has an average record length of 190 bytes. Assume the overhead of a block for the data is 100 bytes.
What is the blocking factor?

The Mechanics of Disk

Mechanical characteristics: Rotation speed (5400RPM) Disk head Number of platters (1-30) Number of tracks (<=10000) Number of sectors (256/track) Number of bytes / sector (29=512) Block size (212=4096)
Cylinder

Spindle Tracks

Sector

Arm movement

Platters

Arm assembly

Important Disk Access Characteristics

Block access time = Disk latency + transfer time Disk latency = seek time + rotational latency Seek time = time for the head to reach the right track 10ms 40ms Rotational latency = rotation time to get to the right sector Time for one rotation = 10ms Average rotation latency = 10ms/2 Transfer time = typically 5-10MB/s Disks read/write one block at a time (typically 4kB)

Representing Data Elements

Relational database elements:
CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name))

A tuple is represented as a record

Record Formats: Fixed Length

F1 L1 F2 L2 F3 F4

Base address (B)

Address = B+L1+L2

Information about field types same for all records in a file; stored in system catalogs. Finding ith field requires scan of record. Note the importance of schema information!

Record Header
To schema length F1
L1 header timestamp F2 L2

F3
L3

F4
L4

Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist

Variable Length Records

Other header information

header

F1 L1

F2 L2

F3
L3

F4
L4

length

Place the fixed fields first: F1, F2 Then the variable length fields: F3, F4 Null values take 2 bytes only Sometimes they take 0 bytes (when at the end)

Records With Referencing Fields

Other header information

header

F1 L1

F2 L2

F3
L3

length

E.g. to represent one-many or many-many relationships

Storing Records in Blocks

Blocks have fixed size (typically 4k)
BLOCK R4 R3 R2 R1

Spanning Records Across Blocks

block header block header

When records are very large Or even medium size: saves space in blocks

BLOB
Binary large objects Supported by modern database systems E.g. images, sounds, etc. Storage: attempt to cluster blocks together

Modifications: Insertion
File is unsorted
add it to the end

File is sorted:
Is there space in the right block ?
Yes: we are lucky, store it there

Is there space in a neighboring block ?

Look 1-2 blocks to the left/right, shift records

If anything else fails, create overflow block

Overflow Blocks
Blockn-1 Blockn Blockn+1

Overflow

After a while the file starts being dominated by overflow blocks: time to reorganize

Modifications: Deletions
Free space in block, shift records Maybe be able to eliminate an overflow block

Modifications: Updates
If new record is shorter than previous, easy If it is longer, need to shift records, create overflow blocks

Physical Addresses
Each block and each record have a physical address that consists of:
The host The disk The cylinder number The track number The block within the track For records: an offset in the block sometimes this is in the blocks header

Logical Addresses
Logical address: a string of bytes (1016) More flexible: can blocks/records around But need translation table:
Logical address L1 L2 L3 Physical address P1 P2 P3

Main Memory Address

When the block is read in main memory, it receives a main memory address Buffer manager has another translation table
Memory address M1 M2 M3 Logical address L1 L2 L3

Designing Physical/Internal Model Overview terminology Access methods

Physical Design
Internal Model/Physical Model
User request Interface 1
External Model

DBMS Model Internal

Access Methods

Interface 2 Operating System Access Methods

Interface 3

Data Base

Physical Design
Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database. Interface 3: The internal model access methods and OS access methods access the physical records of the database.

Physical File Design

A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records Pointers - a field of data that can be used to locate a related field or record of data Access Methods - An operating system algorithm for storing and locating data in secondary storage Pages - The amount of data read or written in one disk input or output operation

Internal Model Access Methods

Many types of access methods:
Physical Sequential Indexed Sequential Indexed Random Inverted Direct Hashed

Differences in
Access Efficiency Storage Efficiency

Physical Sequential
Key values of the physical records are in logical sequence Main use is for dump and restore Access method may be used for storage as well as retrieval Storage Efficiency is near 100% Access Efficiency is poor (unless fixed size physical records)

Indexed Sequential
Key values of the physical records are in logical sequence Access method may be used for storage and retrieval Index of key values is maintained with entries for the highest key values per block(s) Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow Storage Efficiency depends on size of index and volatility of database

Index Sequential
Adams Becker Dumpling

Data File Block 1

Actual Value Dumpling Harty Texaci ...

Address Block Number 1 2 3

Getta Harty

Block 2

Mobile Sunoci Texaci

Block 3

Indexed Sequential: Two Levels

Key Value
150 385 Key Value 385 678 805 Address

Address
1 2

001 003 . . 150 251 . . 385 455 480 . . 536 605 610 . . 678 705 710 . . 785

7 8 9

Key Value
536 678

Address
3 4

Key Value
785 805

Address
5 6

791 . . 805

Indexed Random
Key values of the physical records are not necessarily in logical sequence Index may be stored and accessed with Indexed Sequential Access Method Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence. Access method may be used for storage and retrieval

Indexed Random
Becker Harty
Actual Value Adams Becker Dumpling Getta Address Block Number 2 1 3 2

Adams Getta

Harty

Dumpling

Btree
F || P || Z| B || D || F| H || L || P| R || S || Z|

Devils Flyers Hawkeyes Hoosiers Minors Panthers Seminoles

Aces Boilers Cars

Inverted
Key values of the physical records are not necessarily in logical sequence Access Method is better used for retrieval An index for every field to be inverted may be built Access efficiency depends on number of database records, levels of index, and storage allocated for index

Inverted
CH 145 101, 103,104
Actual Value CH 145 CS 201 CS 623 PH 345 Address Block Number 1 2 3

Student name

Course Number

Adams Becker

CH145 cs201

Dumpling ch145

CS 201 102

Getta
Harty Mobile

ch145
cs623 cs623

CS 623 105, 106

Direct
Key values of the physical records are not necessarily in logical sequence There is a one-to-one correspondence between a record key and the physical address of the record May be used for storage and retrieval Access efficiency always 1 Storage efficiency depends on density of keys No duplicate keys permitted

Hashing
Key values of the physical records are not necessarily in logical sequence Many key values may share the same physical address (block) May be used for storage and retrieval Access efficiency depends on distribution of keys, algorithm for key transformation and space allocated Storage efficiency depends on distibution of keys and algorithm used for key transformation

Comparative Access Methods

Factor Storage space Sequential retrieval on primary key Random Retr. Multiple Key Retr. Deleting records Sequential No wasted space Very fast Indexed
No wasted space for data but extra space for index

Hashed
more space needed for addition and deletion of records after initial load

Moderately Fast Moderately Fast Very fast with multiple indexes OK if dynamic

Impractical Very fast Not possible very easy

Impractical Possible but needs a full scan can create wasted space Adding records requires rewriting file Updating records usually requires rewriting file

OK if dynamic
Easy but requires Maintenance of indexes

very easy
very easy

Secondary Storage Devices (1) :: Magnetic Disks
No ratings yet
Secondary Storage Devices (1) :: Magnetic Disks
56 pages
Chapter 17: Disk Storage, Basic File Structures, and Hashing
No ratings yet
Chapter 17: Disk Storage, Basic File Structures, and Hashing
54 pages
6 Data Storage and Querying
100% (1)
6 Data Storage and Querying
58 pages
Disk Storage & DBMS Basics
No ratings yet
Disk Storage & DBMS Basics
33 pages
FULL
No ratings yet
FULL
449 pages
Chapter 4: Spatial Storage and Indexing
No ratings yet
Chapter 4: Spatial Storage and Indexing
39 pages
Review Review: Views - "Named" Queries Subqueries in FROM Clause
No ratings yet
Review Review: Views - "Named" Queries Subqueries in FROM Clause
18 pages
Elmasri 6e Ch17 Week2 HW DiskStorage
No ratings yet
Elmasri 6e Ch17 Week2 HW DiskStorage
96 pages
Presentation14 Physical Database Design
No ratings yet
Presentation14 Physical Database Design
21 pages
Database System Ch-6
No ratings yet
Database System Ch-6
78 pages
Database Managment System
No ratings yet
Database Managment System
85 pages
ADBMS Answer Bank
No ratings yet
ADBMS Answer Bank
90 pages
Chapter 13:disk Storage and Basic File Structures
No ratings yet
Chapter 13:disk Storage and Basic File Structures
31 pages
02 Storage
No ratings yet
02 Storage
104 pages
Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
No ratings yet
Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
28 pages
DBMS - Chapter 2 - Storage and File Structures
No ratings yet
DBMS - Chapter 2 - Storage and File Structures
118 pages
Storage and Indexing Overview
No ratings yet
Storage and Indexing Overview
100 pages
Unit 5
No ratings yet
Unit 5
185 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
23 pages
File Organization
No ratings yet
File Organization
93 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
8 DataStorageIndexingStructures Updated
No ratings yet
8 DataStorageIndexingStructures Updated
57 pages
Topic 07
No ratings yet
Topic 07
87 pages
Physical Database Design Guide
No ratings yet
Physical Database Design Guide
38 pages
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
No ratings yet
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
68 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Chapter Five
No ratings yet
Chapter Five
14 pages
Data Storage, Indexing Structures For Files
No ratings yet
Data Storage, Indexing Structures For Files
83 pages
File Organization-Lec5
No ratings yet
File Organization-Lec5
21 pages
CH 13
No ratings yet
CH 13
6 pages
CH 1
No ratings yet
CH 1
39 pages
Topic2 4 Stid5014 PDD
No ratings yet
Topic2 4 Stid5014 PDD
70 pages
PhysicalDesign1 PDF
No ratings yet
PhysicalDesign1 PDF
11 pages
Notes 02 - Hardware
No ratings yet
Notes 02 - Hardware
62 pages
Physical Design
No ratings yet
Physical Design
32 pages
Chapter 4
No ratings yet
Chapter 4
47 pages
4 DBMS
No ratings yet
4 DBMS
78 pages
File Organization
No ratings yet
File Organization
47 pages
DBMS Storage & File Structures
No ratings yet
DBMS Storage & File Structures
45 pages
Lec 4 - Network Layer - III - Internet Protocol
No ratings yet
Lec 4 - Network Layer - III - Internet Protocol
36 pages
Electricity
No ratings yet
Electricity
25 pages
DB CH5
No ratings yet
DB CH5
42 pages
File Organization
No ratings yet
File Organization
37 pages
Lecture 1 Edited-1
No ratings yet
Lecture 1 Edited-1
48 pages
DBMS Chapter 5
No ratings yet
DBMS Chapter 5
13 pages
Physical Design PDF
No ratings yet
Physical Design PDF
11 pages
DBMS - Unit 3 - Page 1-6
No ratings yet
DBMS - Unit 3 - Page 1-6
19 pages
Ch4-Data Storage and Indexing
No ratings yet
Ch4-Data Storage and Indexing
116 pages
Computer Application in Construction Management
No ratings yet
Computer Application in Construction Management
3 pages
Chapter 6 - File - and - Storage
No ratings yet
Chapter 6 - File - and - Storage
63 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Chapter 5-Record Storage and Primary File Organization
100% (1)
Chapter 5-Record Storage and Primary File Organization
64 pages
S3 Connector Installation
No ratings yet
S3 Connector Installation
144 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
90 pages
Unit 5 Dbms Updated
No ratings yet
Unit 5 Dbms Updated
44 pages
Lecture 01 - File Storage - Part 1
No ratings yet
Lecture 01 - File Storage - Part 1
48 pages
Lecture 9 - Physical Design
No ratings yet
Lecture 9 - Physical Design
15 pages
JAVASCRIPT Path Finder
No ratings yet
JAVASCRIPT Path Finder
98 pages
Unit 4 File Organisation in DBMS: Structure Page Nos
No ratings yet
Unit 4 File Organisation in DBMS: Structure Page Nos
26 pages
EC2151 Lecture Notes
No ratings yet
EC2151 Lecture Notes
62 pages
Ieee 802
No ratings yet
Ieee 802
29 pages
DBMS Internals: How Does It All Work?
No ratings yet
DBMS Internals: How Does It All Work?
94 pages
File Organisation in DBMS
No ratings yet
File Organisation in DBMS
27 pages
Beginners Guide To Porting NETMF
No ratings yet
Beginners Guide To Porting NETMF
33 pages
EMS Requirements
No ratings yet
EMS Requirements
37 pages
SF Dump
No ratings yet
SF Dump
27 pages
Advanced Digital Systems Design: 02/10/2002 EE6471 (KR) 1
No ratings yet
Advanced Digital Systems Design: 02/10/2002 EE6471 (KR) 1
41 pages
LED Monitor Setup Guide
No ratings yet
LED Monitor Setup Guide
26 pages
Assign 1 Ans
No ratings yet
Assign 1 Ans
13 pages
AOZ1268QI 01 AlphaOmegaSemiconductors
No ratings yet
AOZ1268QI 01 AlphaOmegaSemiconductors
15 pages
Grade 8 Computer Networks
No ratings yet
Grade 8 Computer Networks
9 pages
OSRAM High-Speed Switching of IR-LEDs - Background and Data Sheet Definition
No ratings yet
OSRAM High-Speed Switching of IR-LEDs - Background and Data Sheet Definition
15 pages
Topic 11 - Logical Efforts
No ratings yet
Topic 11 - Logical Efforts
17 pages
Linux Driver Tool User Guide: Huawei Technologies Co., LTD
No ratings yet
Linux Driver Tool User Guide: Huawei Technologies Co., LTD
20 pages
Fix - (SPICE) Transient GMIN Stepping at Time 0.00156965 - Geeky Engineers
No ratings yet
Fix - (SPICE) Transient GMIN Stepping at Time 0.00156965 - Geeky Engineers
7 pages
4JBM Network Design Presentation
No ratings yet
4JBM Network Design Presentation
12 pages
JDViewer 5.9 Release Highlights
No ratings yet
JDViewer 5.9 Release Highlights
11 pages
102 v03000003 Physical Layer
No ratings yet
102 v03000003 Physical Layer
7 pages
Unit III Important Questions
No ratings yet
Unit III Important Questions
4 pages
VMs With Side Channel Mitigations Enabled May Exhibit Performance Degradation (79832)
No ratings yet
VMs With Side Channel Mitigations Enabled May Exhibit Performance Degradation (79832)
5 pages
SpringBoot9AM 18122020
No ratings yet
SpringBoot9AM 18122020
6 pages
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
No ratings yet
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
7 pages
Sapnote 0000801524
No ratings yet
Sapnote 0000801524
2 pages
AM Radio Band-Pass Filter Design
No ratings yet
AM Radio Band-Pass Filter Design
4 pages
Raspberry Pi - The Linux Kernel
No ratings yet
Raspberry Pi - The Linux Kernel
1 page
QUES - 6 - D4 - DS - Day - 4 Question - Contests - HackerRank
No ratings yet
QUES - 6 - D4 - DS - Day - 4 Question - Contests - HackerRank
2 pages
Atm Processing
50% (10)
Atm Processing
25 pages

Data Storage and Access Methods: Min Song IS698

Uploaded by

Data Storage and Access Methods: Min Song IS698

Uploaded by

Data Storage and Access Methods

Min Song IS698

Database Design Process

Physical Database Design

Physical Database Design

Physical Design Information

Physical Design Decisions

Objectives of data type selection

Access Data Types

Access Numeric types

Stores numbers from 0 to 255 (no fractions). 1 byte

Designing Physical Records

The Memory Hierarchy

Main issue is volatility

Used with a main memory buffer

The Mechanics of Disk

Important Disk Access Characteristics

Representing Data Elements

A tuple is represented as a record

Record Formats: Fixed Length

Base address (B)

Variable Length Records

Records With Referencing Fields

E.g. to represent one-many or many-many relationships

Storing Records in Blocks

Spanning Records Across Blocks

Is there space in a neighboring block ?

If anything else fails, create overflow block

Main Memory Address

Designing Physical/Internal Model Overview terminology Access methods

DBMS Model Internal

Interface 2 Operating System Access Methods

Physical File Design

Internal Model Access Methods

Data File Block 1

Actual Value Dumpling Harty Texaci ...

Address Block Number 1 2 3

Mobile Sunoci Texaci

Indexed Sequential: Two Levels

Devils Flyers Hawkeyes Hoosiers Minors Panthers Seminoles

Aces Boilers Cars

CS 623 105, 106

Comparative Access Methods

Impractical Very fast Not possible very easy

You might also like