Data Concepts 293
13.10 Data Independence:
Data Independence is an ability of a database to modify a schema definition at
one level without affecting a schema in the next higher level (the ability to change
the conceptual schema without affecting the external schemas or application
programs). Data independence occurs because when the schema is changed at
one level, the schema at next level remains unchanged and only the mapping
between the two levels is changed. Two types of data independence are:
1. Physical Data Independence. 2. Logical Data Independence
Fig 13.9 Data Independence
Physical Data independence
All schemas are logical and actual data is stored in bit format on the disk. Namely
storage medium: Hard disk (all the files will be stored), floppies, drum, tapes, SD
etc., System designs choose to organize, access and process records and files in
different ways depending on the type of application and the needs of users. The
three commonly used file organizations used in dbms/rdbms data processing
applications are sequential, direct and indexed sequential access method(ISAM).
The selection of a particular file organization depends upon the application used.
To access a record some key field or unique identifying value that is found in
every record in a file is used.
Serial File Organization : With serial file organization, records are arranged
one after another, in no particular order-other than, the chronological order in
which records are added to the file. Serial organization is commonly found in
the transaction data. Where records are created in a file in the order in which
transaction takes place. Serial file organization provides advantages like fast
294 Data Concepts
access to next records in sequence, stored in economical storage media and
easy to do the file backup facility, updating is slowly in this file organization.
Sequential File organization : Records are stored one after another in an
ascending or descending order determined by the key field of the records.
Example payroll file, records are stored in the form of employee id. Sequentially
organized files that are processed by computer systems are normally stored on
storage media such as magnetic tape, punched cards, or magnetic disks. To
access these records the computer must read the file in sequence from the
beginning. The first record is read and processed first, then the second record in
the file sequence, and so on. To locate a particular record, the computer program
must read in each record in sequence and compare its key field to the one that
is needed. The retrieval search ends only when the desired key matches with
the key field of the currently read record. On an average, about half of the file
has to be searched to retrieve the desired record from a sequential file.
Fig 13.10 File organisation
Data Concepts 295
Random/Direct Access File Organization: Direct access file organization allow
immediate direct access to individual records on the file. The record are stored
and retrieved using a relative record number, which gives the position of the
record in the file. This type of organization also allows the file to accessed
sequentially. The primary storage in a CPU truly provides for direct access.
There are some devices outside the CPU which can provide the direct feature;the
direct access storage devices have the capability of directly reaching any location.
Although there are several types of storage devices including discs and other
mass storage.
Self(direct) Addressing: Under self-direct addressing, a record key is used as
its relative address. Therefore, anyone can compute the record’s address from
the record key and the physical address of the first record in the file.
Advantage is self-addressing no need to store an index.
Disadvantages are, the records must be of fixed length, if some records are
deleted the space remains empty.
Random access method : Records are stored on disk by using a hasing algorithm.
The key field is fed through hashing algorithm and a relative address is created.
This address gives the position on the disk where the record is to be stored. The
desired records can be directly accessed using randomizing procedure or hashing
without accessing all other records in the file. Randomizing procedure is
characterized by the fact that records are stored in such a way that there is no
relationship between the keys of the adjacent records. The technique provide for
converting the records key number to a physical location represented by a disk
address through a computational procedure.
Advantages : The access to, and retrieval of a records is quick and direct.
Transactions need not be stored and placed in sequence prior to processing
Best used for online transaction.
Disadvantages: Address generation overhead is involved for accessing each record
due to hashing function.
May be less efficient in the use of storage space than sequentially organized
files.
Indexed Sequential Access Method(ISAM): ISAM is the hybrid between
sequential and direct access file organization. The records within the file are
stored sequentially but direct access to individual records is possible through an
index. Indexing permit access to selected records without searching the entire
file.
Advantages: ISAM permits efficient and economical use of sequential processing
techniques when the activity ratio is high.
Permits direct access processing of records in a relatively efficient way when the
activity ratio is low.
296 Data Concepts
Cylinder 1 track index Cylinder 2 track index
Cylinder Highest
Track Highest Track Highest
Record key
Record key Record key
in the cylinder
in the cylinder in the cylinder
1 84
1 84 1 95
2 250
2 250 2 110
3 398
3 398 3 175
4 479
4 479 4 250
Fig 13.11 Table describing ISAM
Disadvantages: Files must be stored in a direct-access storage device. Hence
relatively expensive hardware and software resources are required.
Access to records may be slower than direct file.
Less efficient in the use of storage space than some other alternatives.
Different types of Architecture
The design of a Database Management System highly depends on its architecture.
It can be centralized or decentralized or hierarchical. DBMS architecture can be
seen as single tier or multi-tier. N-tier architecture divides the whole system
into related but independent n modules, which can be independently modified,
altered, changed or replaced.
Database architecture is logically divided into three types.
1. Logical one-tier In 1-tier architecture,
2. Logical two-tier Client / Server architecture
3. Logical three-tier Client/Server architecture
Logical one-tier In 1-tier architecture
In 1-tier architecture, DBMS is the only entity where user directly sits on DBMS
and uses it. Any changes done
here will directly be done on
DBMS itself. It does not provide
handy tools for end users and
preferably database designer
and programmers use single
tier architecture.
Fig 13.12 Logical one tier
architecture
Data Concepts 297
Logical two-tier Client / Server architecture
Two-tier Client / Server Architecture
Two-tier Client / Server architecture is used for User Interface program and
Application Programs that runs on client side. An interface called ODBC (Open
Database Connectivity) provides an API that allows client side program to call
the DBMS. Most DBMS vendors provide ODBC drivers. A client program may
connect to several DBMS's. In this architecture some variation of client is also
possible for example in some DBMS's more functionality is transferred to the
client including data dictionary, optimization etc. Such clients are called Data
server.
Fig 13.13 2-Level Tier Architecture
298 Data Concepts
Three-tier Client / Server Architecture
Three-tier Client / Server database architecture is commonly used architecture
for web applications. Intermediate layer called Application server or Web Server
stores the web connectivity software and the business logic (constraints) part of
application used to access the right amount of data from the database server.
This layer acts like medium for sending partially processed data between the
database server and the client.
Fig 13.14 3-Level Tier Architecture
Database (Data) Tier: At this tier, only database resides. Database along with
its query processing languages sits in layer-3 of 3-tier architecture. It also
contains all relations and their constraints.
Application (Middle) Tier: At this tier the application server and program,
which access database, resides. For a user this application tier works as
abstracted view of database. Users are unaware of any existence of database
beyond application. For database-tier, application tier is the user of it.
Database tier is not aware of any other user beyond application tier. This tier
works as mediator between the two.
User (Presentation) Tier: An end user sits on this tier. From a users aspect
this tier is everything. He/she doesn't know about any existence or form of
database beyond this layer. At this layer multiple views of database can be
provided by the application. All views are generated by applications, which
reside in application tier. Multiple tier database architecture is highly
modifiable as almost all its components are independent and can be changed
independently.
13.11 Database Model
A database model or simply a data model is an abstract model that describes
how the data is represented and used. A data model consists of a set of data
Data Concepts 299
structures and conceptual tools that is used to describe the structure (data
types, relationships, and constraints) of a database.
A data model not only describes the structure of the data, it also defines a set
of operations that can be performed on the data. A data model generally
consists of data model theory, which is a formal description of how data may
be structured and used, and data model instance, which is a practical data
model designed for a particular application. The process of applying a data
model theory to create a data model instance is known as data modeling.
The main objective of database system is to highlight only the essential
features and to hide the storage and data organization details from the user.
This is known as data abstraction. A database model provides the necessary
means to achieve data abstraction.
A Database model defines the logical design of data. The model describes the
relationships between different parts of the data.
In history of database design, three models have been in use.
* Hierarchical Model
* Network Model
* Relational Model
13.11.1 Hierarchical Model
The hierarchical data model is the oldest type of data model, developed by IBM
in 1968. This data model organizes the data in a tree-like structure, in which
each child node (also known as dependents) can have only one parent node.
The database based on the hierarchical data model comprises a set of records
connected to one another through links. The link is an association between two
or more records. The top of the tree structure consists of a single node that does
not have any parent and is called the root node.
The root may have any number of dependents; each of these dependents may
have any number of lower level dependents. Each child node can have only one
parent node and a parent node can have any number of (many) child nodes. It,
therefore, represents only one-to-one and one-to-many relationships. The
collection of same type of records is known as a record type.
For simplicity, only few fields of each record type are shown. One complete
record of each record type represents a node.
300 Data Concepts
In this model each entity has only one parent but can have several children . At
the top of hierarchy there is only one entity which is called Root.
Hierarchical Model Example
Fig 13.15 Hierarchical Model
Advantage Dis-advantage
The hierarchical data model is that The main drawback of this model is
the data access is quite predictable that the links are ‘hard coded’ into
in the structure and, therefore, both the data structure, that is, the link is
the retrieval and updates can be permanently established and cannot
highly optimized by the DBMS. be modified. The hard coding makes
the hierarchical model rigid. In
addition, the physical links make it
difficult to expand or modify the
database and the changes require
substantial redesigning efforts.
13.11.2 Network Model
The first specification of network data model was presented by Conference on
Data Systems Languages (CODASYL) in 1969, followed by the second specification
in 1971. It is powerful but complicated. In a network model the data is also
represented by a collection of records, and relationships among data are
represented by links. However, the link in a network data model represents an
association between precisely two records. Like hierarchical data model, each
record of a particular record type represents a node. However, unlike hierarchical
data model, all the nodes are linked to each other without any hierarchy. The
main difference between hierarchical and network data model is that in
Data Concepts 301
hierarchical data model, the data is organized in the form of trees and in network
data model, the data is organized in the form of graphs.
In the network model, entities are organized in a graph, in which some
entities can be accessed through several path
Fig 13.16 Network Model of database
Network Model of database
Advantage Dis-advantage
The network data model is that a The network data model is that it
parent node can have many child can be quite complicated to maintain
nodes and a child can also have all the links and a single broken link
many parent nodes. Thus, the can lead to problems in the
network model permits the modeling database. In addition, since there are
of many-to-many relationships in no restrictions on the number of
data. relationships, the database design
can become complex.
13.11.3 Relational Model
The relational data model was developed by E. F. Codd in 1970. In the relational
data model, unlike the hierarchical and network models, there are no physical
links. All data is maintained in the form of tables (generally, known as relations)
consisting of rows and columns. Each row (record) represents an entity and a
column (field) represents an attribute of the entity. The relationship between
the two tables is implemented through a common attribute in the tables and not
by physical links or pointers. This makes the querying much easier in a relational
database system than in the hierarchical or network database systems. Thus,
302 Data Concepts
the relational model has become more programmer friendly and much more
dominant and popular in both industrial and academic scenarios. Oracle, Sybase,
DB2, Ingres, Informix, MS-SQL Server are few of the popular relational DBMSs.
In this model, data is organized in two-dimensional tables called relations. The
tables or relation are related to each other.
Relational Model of database
Fig 13.17 Relational Model of database
Basic Rules for the Realational Datamodel
13.12 Codd’s Rule
E.F Codd was a Computer Scientist who invented Relational model for Database
management. Based on relational model, Relation database was created. Codd
proposed 13 rules popularly known as Codd’s 12 rules to test DBMS’s concept
against his relational model. Codd’s rule actually define what quality a DBMS
requires in order to become a Relational Database Management System(RDBMS).
Till now, there is hardly any commercial product that follows all the 13 Codd’s
rules. Even Oracle follows only eight and half out(8.5) of 13. The Codd’s 12 rules
are as follows.
Rule zero
This rule states that for a system to qualify as an RDBMS, it must be
able to manage database entirely through the relational capabilities.
Rule 1 : Information rule
All information(including meta-deta) is to be represented as stored data
in cells of tables. The rows and columns have to be strictly unordered.
Rule 2 : Guaranteed Access
Each unique piece of data(atomic value) should be accessible by :
Table Name + primary key(Row) + Attribute(column).
NOTE : Ability to directly access via POINTER is a violation of this rule.
Rule 3 : Systemetic treatment of NULL