SIC 2112_ICS 2206 Introduction to Database Systems
Course outline
Introduction: Definition of data, information, DBMS and database systems. Types of database models:
filing, hierarchical, network, relational, object-based. Relational data models: entities, attributes, domain
and atomicity. Database design phases: conceptual, logical and physical database design. Normalization.
QL: Data definition language, Data manipulation language. Implementation of and manipulation of
database: data entry: append, edit, delete: field, names, types, size, index. Manipulating records: sorting:
finding/searching. Queries, deleting, updating. views, appending and deleting. Maintaining a database.
Reports: merging, labels, forms/screens, printing. Limitation of relational database management systems
such as MS-ACCESS, ORACLE, mySQL
Introduction
The database is now such an integral part of our day-to-day life that often we’re not aware we are using one.
To start our discussion of database systems, we briefly examine some of their applications. For the purposes
of this discussion, we consider a database to be a collection of related data and the Database Management
System (DBMS) to be the software that manages and controls access to the database. We also use the term
application program to be a computer program that interacts with the database in some way and we use the
more inclusive term database system to be the collection of application programs that interact with the
database along with the DBMS and the database itself. More accurate definitions are provided later.
The Database approach
All the above limitations of the file-based approach can be attributed to two factors:
(1) The definition of the data is embedded in the application programs, rather than being stored
separately and independently.
(2) There is no control over the access and manipulation of data beyond that imposed by the application
programs.
To become more effective, a new approach, “The database approach” emerged which involved the Database
and Database Management System (DBMS). In this approach, the definition of data is separated from the
application programs. There are internal and external definitions of an object. The users of the object see
only the external definition and are unaware of how the object is defined and how it functions. One
advantage of this approach known as “Data abstraction” is that the internal definition of an object can be
changed without affecting the users of the object provided the external definition remains.
Definition of terms
Data- Data is defined as facts which have a meaning to the user in his/her environment. These facts
include graphics, text, sound and even videos.
Database-A database is defined as a shared collection of logically related data (and a description
of this data), designed to meet the information needs of an organization.
The database is a single, possibly large repository of data, which can be used simultaneously by many
departments and users. All data that is required by these users is integrated with a minimum amount of
duplication. Importantly, the database is normally not owned by any one department or user but is a shared
corporate resource. As well as holding the organization’s operational data, the database also holds a
description of this data. For this reason, a database is also defined as a self-describing collection of
integrated records.
The description of the data (that is the meta-data – the ‘data about data’) is known as the system catalog or
data dictionary. It is the self-describing nature of a database that provides what’s known as data
independence. This means that if new data structures are added to the database or existing structures in the
database are modified then the application programs that use the database are unaffected, provided they
don’t directly depend upon what has been modified. For example, if we add a new column to a record or
create a new table, existing applications are unaffected. However, if we remove a column from a table that
an application program uses, then that application program is affected by this change and must be modified
accordingly.
The final term in the definition of a database that should be explained is ‘logically related’. When we
analyze the organization’s information needs, we attempt to identify the important objects (entities) that
need to be represented in the database their attributes and the logical relationships between these objects. An
entity is a distinct object (person, place, thing, concept or event) in the organization that is to be represented
in the database. An attribute is a property that describes some aspect of the object and a relationship is an
association between entities.
The Database Management System (DBMS)
A DBMS is defined as a “software system that enables users to define, create, and maintain the database
and also provides controlled access to the database”.
A DBMS interacts with the users, application programs and the database. It provides the following facilities:
Allows users to define the database usually through a Data Definition Language (DDL). The DDL
allows users to specify data types, structures and constraints on the data to be stored in the database.
Allows users to insert, update, delete and retrieve data from the database, usually through a Data
Manipulation Language (DML).
Provides a controlled access to the database. For instance, it may provide:
- A Security system, which prevents unauthorized users from accessing the database.
- An integrity system, which maintains the consistency of the stored data.
- A concurrency system, which allows shared access of the database.
- A recovery control system, which restores the database to a previous consistent state
following a hardware or software failure.
- A user-accessible catalog, which contains descriptions of data in the database.
Fig .2 Database Management System
(Database) application programs
Application program- Is defined as a computer program that interacts with the database by issuing an
appropriate request (typically an SQL statement) to the DBMS.
Users interact with the database through a number of application programs that are used to create and
maintain the database and to generate information. These programs can be conventional batch applications
or, more typically nowadays, they will be online applications. The application programs may be written in
some programming language or in some higher-level fourth-generation language.
Figure.2 above illustrates the database approach. It shows three departments using their application
programs to access the database through the DBMS. Each set of departmental application programs handles
data entry, data maintenance, and the generation of reports. The physical structure and storage of the data are
managed by the DBMS.
Components of a DBMS
There are five major components of a database management system (DBMS) namely; hardware, software,
data, procedures and people as explained below:
1) Hardware- These are the computer system(s) that the DBMS and the application programs run on.
This can range from a single PC, to a single mainframe, to a network of computers.
2) Software- This includes the DBMS software and the application programs, together with the
operating system, including network software if the DBMS is being used over a network.
3) Data -Data acts as a bridge between the hardware and software components and the human
components. As we’ve already said, the database contains both the operational data and the meta-
data (the ‘data about data’).
4) Procedures- These are the instructions and rules that govern the design and use of the database. This
may include instructions on how to log on to the DBMS, make backup copies of the database, and
how to handle hardware or software failures.
5) People- These includes the database designers, database administrators (DBAs), application
programmers, and the end-users
Data Models and Conceptual Modeling
1. Object-Based Data Models- They use concepts such as entities, attributes and relationships.
Examples of object-based models are; Entity-Relationship, Semantic, Functional and Object-
oriented.
2. Record-Based Data Models- In a record-based model, the database consists of a fixed- format
records possibly of differing types. Each record defines a fixed number of fields, each typically of a
fixed length. Record-based data models are used to specify the overall structure of the database and a
higher level description of the implementation. Modern database commercial systems are based on
the relational model whereas the early database systems were based on the network and hierarchical
models. The relational model adopts a declarative approach to database processing (i.e. Specifies
what data is to be retrieved) while network and hierarchical models adopts a navigational approach
(i.e. specify how the data is to be retrieved).There are three principal types of record-based logical
data models; the hierarchical, network and relational data models.
3. Network Data Model- In this model, data is represented as collections of records, and
relationships are represented by sets. The records are organized as graph structures with records
appearing as nodes (segments) and sets as edges in the graph. The most popular network DBMS is
computer associates IDMS/R. To illustrate the network model, consider the tables below showing the
branch and staff table details:
Branch
branchNo Street City postCode
B005 22 Deer Rd London Sw1 4EH
B007 16 Argyll St Aberdeen Ab2 3Su
B003 163 Main St Glasgow G11 9QX
B004 32 Manse Rd Bristol BS99 INZ
B002 56 Clover Dr London NW10 6EU
Staff
staffNo fName lName Position Sex DOB Salary branchNo
SL21 John White Manager M 1-oct-45 3000 B005
SG37 Ann Beech Assistant F 10-Nov-60 12000 B003
SG14 David Ford Supervisor M 24-Mar-58 18000 B003
SA9 Mary Howe Assistant F 19-Feb-70 9000 B007
SG5 Susan Brand Manager F 3-jun-40 24000 B003
SL41 Julie Lee Assistant F 13-jun-65 9000 B005
Figure 3 below shows an instance of a network schema for the same set of data.
Figure 3: Network model for the above data
B005 22 Deer Rd London SL41 Julie Lee ….. Assistant 9000
B007 16 Argyll St Aberdeen SL21 John White ….. Manager 30000
SG37 Ann Beech ….. Assistant 12000
SA9 Mary Howe ….. Assistant 9000
SG14 David Ford ….. Supervisor 18000
SG5 Susan Brand ….. Manager 24000
4. Hierarchical Data Model- It is a restricted type of network model. Data is represented as
collection of records and relationships are represented by sets. However, it allows a node to have
only one parent. It is represented as a tree graph with records appearing as nodes (segments) and sets
as edges. The main hierarchical DBMS is IBM’s IMS. Figure 4 below shows the hierarchical model
for the same data.
Figure 4: Hierarchical model
B002 ….. London
B007 ….. Aberdeen
SL41 Julie Lee ….. Assistant 9000
SL21 John White ….. Manager 30000
SG37 Ann Beech ….. Assistant 12000
SG14 David Ford ….. Supervisor 18000
SG5 Susan Brand ….. Manager 24000
SA9 Mary Howe ….. Assistant 9000
5. The Relational Model
The relational model is based on the mathematical concept of a relation, which is physically represented as a
table. In relational model, data is organized into a set of rows and columns.
Terminology
Relation/File- A relation is a table with columns and rows.
A relational DBMS (RDBMS) requires only that the database be perceived by the user as tables. In the
relational model, relations are used to hold information about the objects represented in the database. A
relation is represented as a table in which the rows of the table correspond to individual records and the
table columns correspond to attributes or fields. Attributes can appear in any order and the relation will still
be the same relation, and therefore convey the same meaning.
For example, the information on branch offices is represented by the Branch relation, with columns for
attributes branchNo (the branch number), street, city and postCode. Similarly, the information on staff is
represented by the Staff relation, with columns for attributes staffNo (the staff number), fName (first Name),
lName(last Name), position, sex, DOB(date of birth), salary and branchNo (the number of the branch the
staff member works at). Figure 5 below shows instances of the Branch and Staff relations. As you can see
from this figure, a column contains values for a single attribute; for example, the branchNo columns contain
only numbers of branches.
Branch
branchNo Street City postCode
B005 22 Deer Rd London Sw1 4EH
B007 16 Argyll St Aberdeen Ab2 3Su
B003 163 Main St Glasgow G11 9QX
B004 32 Manse Rd Bristol BS99 INZ
B002 56 Clover Dr London NW10 6EU
Staff
staffNo fName lName Position Sex DOB Salary branchNo
SL21 John White Manager M 1-oct-45 3000 B005
SG37 Ann Beech Assistant F 10-Nov-60 12000 B003
SG14 David Ford Supervisor M 24-Mar-58 18000 B003
SA9 Mary Howe Assistant F 19-Feb-70 9000 B007
SG5 Susan Brand Manager F 3-jun-40 24000 B003
SL41 Julie Lee Assistant F 13-jun-65 9000 B005
Figure 5: Branch and Staff Relations.
Attribute/field- Is a named column of a relation
Domain- A domain is the set of allowable values for one or more attributes.
Every attribute in a relational database is defined on a domain. Domains may be distinct for each attribute,
or two or more attributes may be associated with the same domain. Figure 6 below shows the domains for
some of the attributes of the Branch and Staff relations.
Figure 6: Domains for some attributes of the Branch and Staff relations.
Attribute Domain Name Meaning Domain Definition
Set of all possible branch Character: size 4, range
branchNo BranchNumbers numbers B001–B999
Set of all possible street
street StreetNames names Character: size 25
Set of all possible staff Character: size 25, range
staffNo StaffNumbers numbers S0001–S9999
One of Director,
Set of all possible staff Manager, Supervisor,
position StaffPositions positions Assistant, Buyer
Possible values of staff Monetary: 8 digits, range
salary StaffSalaries salaries $10,000.00–$100,000.00
Possible values of birth Date; range from 1-jan-
DOB DateOfBirth dates 20,format dd-mm-yy
Tuple- A tuple is defined as a record/row of a relation.
The structure of a relation together with a specification of the domains and any other restrictions on possible
values is sometimes called its intension, which is usually fixed unless the meaning of a relation is changed to
include additional attributes. The tuples are called the extension (state) of a relation which changes over
time.
Relational database -A collection of normalized relations with distinct relation names.
A relational database consists of tables that are appropriately structured. The appropriateness is obtained
through the process of normalization, discussed later.
Degree- the degree of a relation is the number of attributes it contains.
Cardinality- it is the number of tuples in a relation.
Properties of relational tables
A relational table has the following properties:
The table has a name that is distinct from all other tables in the database.
Each cell of the table contains exactly one value. (For example, it would be wrong to store several
telephone numbers for a single branch in a single cell). In other words, tables don’t contain repeating
groups of data. A relational table that satisfies this property is said to be normalized or in first
normal form.)
Each column has a distinct name.
The values of a column are all from the same domain.
The order of columns has no significance. In other words, provided a column name is moved along
with the column values, we can interchange columns.
Each record is distinct; there are no duplicate records.
The order of records has no significance, theoretically. (However, in practice, the order may affect
the efficiency of accessing records)
Relational keys
As stated above, each record in a table must be unique. This means that we need to be able to identify a
column or combination of columns (called relational keys) that provides uniqueness. In this section, we
explain the terminology used for relational keys.
1) Superkey- Is defined as a column or set of columns that uniquely identifies a record within a table.
Since a superkey may contain additional columns that are not necessary for unique identification, we’re
interested in identifying superkeys that contain only the minimum number of columns necessary for unique
identification.
2) Candidate key- Is a superkey that contains only the minimum number of columns necessary for
unique identification.
A candidate key for a table has two properties:
Uniqueness- In each record, the values of the candidate key uniquely identify that record.
Irreducibility-No proper subset of the candidate key has the uniqueness property.
Consider the Branch table shown in the Figure below:
Branch
branchNo Street City postCode
B005 22 Deer Rd London Sw1 4EH
B007 16 Argyll St Aberdeen Ab2 3Su
B003 163 Main St Glasgow G11 9QX
B004 32 Manse Rd Bristol BS99 INZ
B002 56 Clover Dr London NW10 6EU
For a given value of city, we would expect to be able to determine several branches (for example, London
has two branches). This column, therefore, cannot be selected as a candidate key. On the other hand, since
each branch a unique branch number, then for a given value of the branch number, branchNo, we can
determine at most one record, so that branchNo is a candidate key. Similarly, as no two branches can be
located in the same postcode, postCode is also a candidate key for the Branch table.
There may be several candidate keys for a table. Consider, for example, a table called Role, which
represents the characters played by actors in videos. The table comprises an actor number (actorNo), a
catalog number (catalogNo), and the name of the character played (character), as shown in Figure 7 below:
Figure 7: Role table
actorNo catalogNo character
A1002 207132 James Bond
A3006 330553 Frodo Baggins
A8401 902355 Harry Potter
A2019 634817 Captain Steve Hiller
A2019 445624 Agent J
A7525 445624 Agent K
A4343 781132 Shrek
For a given actor number, actorNo, there may be several different videos the actor has starred in. Similarly,
for a given catalog number, catalogNo, there may be several actors who have starred in this video.
Therefore, actorNo by itself or catalogNo by itself cannot be selected as a Candidate key. However, the
combination of actorNo and catalogNo identifies at most one record. When a key consists of more than one
column, it’s called a composite key. In the above table the composite key will be (actorNo, catalogNo).
N.B Be careful not to look at sample data and try to deduce the candidate key(s), unless you are certain the
sample is representative of the data that will be stored in the table. Generally, an instance of a table cannot
be used to prove that a column or combination of columns is a candidate key. The fact that there are no
duplicates for the values that appear at a particular moment in time does not guarantee that duplicates are not
possible. However, the presence of duplicates in an instance can be used to show that some column
combination is not a candidate key. Identifying a candidate key requires that we know the ‘real world’
meaning of the column(s) involved so that we can decide whether duplicates are possible. Only by using this
semantic information can we be certain that a column combination is a candidate key. For example, from the
data presented in Figure 5 (staff table) above, we may think that a suitable candidate key for the Staff table
would be fName or lName, the employee’s first name and last name respectively. However, although there
is only a single value of john white in this table just now, a new member of staff with the same name could
join the company, which would therefore prevent the choice of name as a candidate key.
3) Primary key -The candidate key that is selected to identify records uniquely within the table.
Since a table has no duplicate records, it’s always possible to uniquely identify each record. This means that
a table always has a primary key. In the worst case, the entire set of columns could serve as the primary key,
but usually some smaller subset is sufficient to distinguish the records. The candidate keys that are not
selected to be the primary key are called alternate keys. For the Branch table, if we choose branchNo as the
primary key, postCode would then be an alternate key. For the Role table, there is only one candidate key,
comprising actorNo and catalogNo, so these columns would automatically form the primary key.
4) Foreign key- A column, or set of columns, within one table that matches the candidate key of some
(possibly the same) table.
When a column appears in more than one table, its appearance usually represents a relationship between
records of the two tables. For example, in the staff table (Figure 5) the inclusion of branchNo on the Staff
table is quite deliberate. This links branches to the details of staff working there. In the Branch table,
branchNo is the primary key. However, in the Staff table the branchNo column exists to match staff to the
branch they work in. In the Staff table, branchNo is a foreign key. We say that the column branchNo in the
Staff table targets or references the primary key column branchNo in the home table, Branch. In this
situation, the Staff table is also known as the child table and the Branch table as the parent table.
Representing relational databases
A relational database consists of one or more tables. The common convention for representing a description
of a relational database is to give the name of each table, followed by the column names in parentheses.
Normally, the primary key is underlined. Figure 8 below shows the description of the relational database for
the DreamHome video rental company:
Branch (branchNo, street, city, state, zipCode, mgrStaffNo)
Staff (staffNo, name, position, salary, branchNo)
Video (catalogNo, title, category, dailyRental, price, directorNo)
Director (directorNo, directorName)
Actor (actorNo, actorName)
Role (actorNo, catalogNo, character)
Member (memberNo, fName, lName, address)
Registration (branchNo, memberNo, staffNo, dateJoined)
RentalAgreement (rentalNo, dateOut, dateReturn, memberNo, videoNo)
VideoForRent (videoNo, available, catalogNo, branchNo)
Figure 8: An instance of the DreamHome database.
Relational integrity
As mentioned earlier, a relational data model comprises of the structural part, manipulative part and a set of
integrity rules which ensure that the data is accurate. In this section, we discuss the relational integrity rules.
Since every column has an associated domain, there are constraints (called domain constraints) that form
restrictions on the set of values allowed for the columns of tables. In addition, there are two important
integrity rules, which are constraints or restrictions that apply to all instances of the database. The two
principal rules for the relational model are known as entity integrity and referential integrity. Other types of
integrity constraints are multiplicity and general constraints. Before we define these terms, it is necessary to
understand the concept of nulls.
Null –A null represents a value for a column that is currently unknown or is not applicable for this record.
A null can be taken to mean ‘unknown’. It can also mean that a value is not applicable to a particular record,
or it could just mean that no value has yet been supplied. Nulls are a way to deal with incomplete or
exceptional data. However, a null is not the same as a zero numeric value or a text string filled with spaces.
Zeros and spaces are values, but a null represents the absence of a value. Therefore, nulls should be treated
differently from other values. For example, suppose it was possible for a branch to be temporarily without a
manager, perhaps because the manager has recently left and a new manager has not yet been appointed. In
this case, the value for the corresponding mgrStaffNo column would be undefined. Without nulls, it
becomes necessary to introduce false data to represent this state or to add additional columns that may not be
meaningful to the user. In this example, we may try to represent the absence of a manager with the value
‘None at present’. Alternatively, we may add a new column ‘currentManager’ to the Branch table, which
contains a value Y (Yes), if there is a manager, and N (No), otherwise. Both these approaches can be
confusing to anyone using the database.
Relational Integrity Rules
1) Entity integrity- This rule applies to the primary keys of base tables/relations. A base table is a
named table whose records are physically stored in the database. The entity integrity rule states that,
“In a base table/relation, no column/attribute of a primary key can be null”.
By definition, a primary key is a minimal identifier that is used to identify records uniquely. This means that
no subset of the primary key is sufficient to provide unique identification of records. If we allow a null for
any part of a primary key, we’re implying that not all the columns are needed to distinguish between
records, which contradicts the definition of the primary key. For example, as branchNo is the primary key of
the Branch table, we should not be able to insert a record into the Branch table with a null for the branchNo
column.
2) Referential integrity- This rule applies to foreign keys. It states that, “If a foreign key exists in a
table, either the foreign key value must match a candidate key/primary key value of some record in
its home table or the foreign key value must be wholly null”.
The branchNo in the Staff table (Figure 5) is a foreign key targeting the branchNo column in the home
(parent) table, Branch. It should not be possible to create a staff record with branch number B300, for
example, unless there is already a record for branch number B300 in the Branch table. However, we should
be able to create a new staff record with a null in the branchNo column to allow for the situation where a
new member of staff has joined the company but has not yet been assigned to a particular branch.
3) General Constraints- These are additional rules specified by the users or database administrators of a
database that define or constrain some aspect of the enterprise.
It is possible for users to specify additional constraints that the data must satisfy. For example, if an upper
limit of 20 has been placed upon the number of staff that may work in a branch office, then the user must be
able to specify this general constraint and expect the DBMS to enforce it. In this case, it should not be
possible to add a new staff member to a branch if the number of staff members currently is 10.
4) Multiplicity- This defines the number of occurrences of one entity (such as a branch) that may relate
to a single occurrence of an associated entity (such as a member of staff). For example, one branch
may have one or more staff (one-to- many relationships) but one staff member cannot belong to more
than one branches.