0% found this document useful (0 votes)

83 views145 pages

BBM 6th Sem Database Notes Updated 2024

The document provides an overview of databases and database management systems (DBMS), defining a database as a collection of organized information that can be easily accessed and managed. It explains the critical role of databases in various fields, the differences between DBMS and file management systems, and the advantages of using a DBMS, such as reduced redundancy, improved data integrity, and enhanced security. Additionally, it outlines the three levels of DBMS architecture: external, conceptual, and internal schemas.

Uploaded by

iamronmishra1814

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views145 pages

BBM 6th Sem Database Notes Updated 2024

Uploaded by

iamronmishra1814

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 145

IT 202 Introductory Database

BBM 6th Semester (TU)

(Notes)

Database (DB)
Definition: A database is a collection of information that is organized so that it can easily be accessed, managed,
and updated.

Databases and database technology have had a major impact on the growing use of computers. It is fair to say that
databases play a critical role in almost all areas where computers are used, including business, electronic
commerce, social media, engineering, medicine, genetics, law, education, and library science. The word database is
so commonly used that we must begin by defining what a database is. Our initial definition is quite general. A
database is a collection of related data. By data, we mean known facts that can be recorded and that have implicit
meaning. For example, consider the names, telephone numbers, and addresses of the people you know. Nowadays,
this data is typically stored in mobile phones, which have their own simple database software. This data can also be
recorded in an indexed address book or stored on a hard drive, using a personal computer and software such as
Microsoft Access or Excel. This collection of related data with an implicit meaning is a database. The preceding
definition of database is quite general; for example, we may consider the collection of words that make up this
page of text to be related data and hence to constitute a database. However, the common use of the term
database is usually more restricted. A database has the following implicit properties:
■ A database represents some aspect of the real world, sometimes called the mini-world or the universe of
discourse (UoD). Changes to the mini-world are reflected in the database.
■ A database is a logically coherent collection of data with some inherent meaning. A random assortment of data
cannot correctly be referred to as a database.
■ A database is designed, built, and populated with data for a specific purpose. It has an intended group of users
and some preconceived applications in which these users are interested.

In other words, a database has some source from which data is derived, some degree of interaction with events in
the real world, and an audience that is actively interested in its contents. The end users of a database may perform
business transactions (for example, a customer buys a camera) or events may happen (for example, an employee
has a baby) that cause the information in the database to change. In order for a database to be accurate and
reliable at all times, it must be a true reflection of the mini-world that it represents; therefore, changes must be
reflected in the database as soon as possible.

1
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A database may be generated and maintained manually or it may be computerized. For example, a library card
catalog is a database that may be created and maintained manually. A computerized database may be created and
maintained either by a group of application programs written specifically for that task or by a database
management system.

Database Management System (DBMS)

Definition: A database management system (DBMS) is system software for creating and managing databases. The
DBMS provides users and programmers with a systematic way to create, retrieve, update and manage data. A
DBMS is a software package designed to define, manipulate, retrieve and manage data in a database. A DBMS
generally manipulates the data itself, the data format, field names, record structure and file structure. It also
defines rules to validate and manipulate this data.

A Database Management System (DBMS) is a computerized system that enables users to create and maintain a
database. The DBMS is a general-purpose software system that facilitates the processes of defining, constructing,
manipulating, and sharing databases among various users and applications. Defining a database involves specifying
the data types, structures, and constraints of the data to be stored in the database.
The database definition or descriptive information is also stored by the DBMS in the form of a database catalog or
dictionary; it is called meta-data. For example, Tables of all tables in a database, their names, sizes, and number of
rows in each table. Tables of columns in each database, what tables they are used in, and the type of data stored in
each column.
Constructing the database is the process of storing the data on some storage medium that is controlled by the
DBMS. Manipulating a database includes functions such as querying the database to retrieve specific data, updating
the database to reflect changes in the mini-world, and generating reports from the data. Sharing a database allows
multiple users and programs to access the database simultaneously.

An application program accesses the database by sending queries or requests for data to the DBMS. A query2
typically causes some data to be retrieved; a transaction may cause some data to be read and some data to be
written into the database. Other important functions provided by the DBMS include protecting the database
and maintaining it over a long period of time. Protection includes system protection against hardware or software
malfunction (or crashes) and security protection against unauthorized or malicious access. A typical large database
may have a life cycle of many years, so the DBMS must be able to maintain the database system by allowing the
system to evolve as requirements change over time.

2
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Management System (DBMS) Vs. File Management System (FMS)

A Database Management System (DMS) is a combination of computer software, hardware, and information
designed to electronically manipulate data via computer processing. Two types of database management systems
are DBMS’s and FMS’s. In simple terms, a File Management System (FMS) is a Database Management System that
allows access to single files or tables at a time. FMS’s accommodate flat files that have no relation to other files.
The FMS was the predecessor for the Database Management System (DBMS), which allows access to multiple files
or tables at a time.

3
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
File Management System (FMS):
This typical file management system is supported by a conventional operating system. The system stores
permanent records in various files, and it needs different application programs to extract records from, and add
records to, the appropriate files. Before database management systems (DBMSs) came along, organizations usually
stored information in such systems. Keeping organizational information in a file-processing system has a number of
Advantages:

Advantages
• Simpler to use
• Less expensive
• Fits the needs of many small businesses and home users
• Popular FMS’s are packaged along with the operating systems of personal computers (i.e. Microsoft
Cardfile and Microsoft Works)

Major Disadvantages:
File management system has a number of major disadvantages:

1) Data redundancy and inconsistency:

Since different programmers create the files and application programs over a long period, the various files are
likely to have different formats and the programs may be written in several programming languages. Moreover, the
same information may be duplicated in several places (files). For example, the address and telephone number of a
particular customer may appear in a file that consists of savings-account records and in a file that consists of
checking-account records. This redundancy leads to higher storage and access cost. In addition, it may lead to data
inconsistency; that is, the various copies of the same data may no longer agree. For example, a changed customer
address may be reflected in savings-account records but not elsewhere in the system.

2) Difficulty in accessing data:

Conventional file-processing environments do not allow Needed data to be retrieved in a convenient and efficient
manner. More Responsive data-retrieval systems are required for general use. This problem can be easily
accommodate by Database management system.

3) Data isolation:
Because data are scattered in various files, and files may be in different formats, writing new application programs
to retrieve the appropriate data is difficult.

4
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
4) Integrity problems:
The data values stored in the database must satisfy certain types of consistency constraints. For example, the
balance of a bank account may never fall below a prescribed amount (say, $25). Developers enforce these
constraints in the system by adding appropriate code in the various application programs. However, when new
constraints are added, it is difficult to change the programs to enforce them. The problem is harder when
constraints involve several data items from different files.

5) Atomicity problems:
A computer system, like any other mechanical or electrical device, is subject to failure.
In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed
prior to the failure. Consider a program to transfer $50 from account A to account B.
If a system failure occurs during the execution of the program, it is possible that the $50 was removed from
account A but was not credited to account B, resulting in an inconsistent database state. This inconsistent state
must be removed. The fund transfer must be atomic that is if failure occurs whatever updates performed must be
reversible. It is difficult to ensure atomicity in a conventional file-processing system.

6) Concurrent-access anomalies:
For the sake of overall performance of the system and faster response, many systems allow multiple users to
update the data simultaneously. In such an environment, interaction of concurrent updates may result in
inconsistent data. Consider bank account A, containing $500. If two customers withdraw funds (say $50 and $100
respectively) from account A at about the same time, the result of the concurrent executions may leave the
account in an incorrect (or inconsistent) state. To guard against this possibility, the system must maintain some
form of supervision. But supervision is difficult to provide because data may be accessed by many different
application programs that have not been coordinated. DBMS provides locking mechanism to guard such a
anomalies.

7) Security Problem:
Not every user of the database system should be able to access all the data. Based on user’s role or privilege users
are restricted to access all data.

5
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Management System (DBMS):
Database consists of logically related data stored in a single repository. It Provides advantages over file system
management approach:

• Reduction of Redundancies
• Shared Data
• Data Independence
• Improved Integrity
• Efficient Data Access
• Multiple User Interfaces
• Representing complex relationship among data
• Improved Security
• Improved Backup and Recovery
• Support for concurrent transactions

Reduction of Redundancies:
In database approach data can be stored at a single place or with controlled redundancy under DBMS,
which saves space and does not permit inconsistency.
Shared Data:
A DBMS allows the sharing of database under its control by any number of application programs or
users. A database belongs to the entire organization and is shared by all authorized users.
Data Independence:
Database Management systems separates data descriptions from data. Hence it is not affected by
changes. This is called Data Independence, where details of data are not exposed. DBMS provides an
abstract view and hides details. For example, logically we can say that the interface or window to data
provided by DBMS to a user may still be the same although the internal structure of the data may be
changed.

6
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Improved Integrity:
Data Integrity refers to validity and consistency of data. Data Integrity means that the data should be
accurate and consistent. This is done by providing some checks or constraints. These are consistency
rules that the database is not permitted to violate. Constraints may apply to data items within a record
or relationships between records. For example, the age of an employee can be between 18 and 70 years
only. While entering the data for the age of an employee, the database should check this.
Efficient Data Access:
DBMS utilizes techniques to store and retrieve the data efficiently at least for unforeseen queries. A
complex DBMS should be able to provide services to end users, where they can efficiently retrieve the
data almost immediately.
Multiple User Interfaces:
DBMS should be able to provide a variety of interfaces. This includes ─
a. query language for casual users,
b. programming language interfaces for application programmers,
c. forms and codes for parametric users,
d. menu driven interfaces, and
e. natural language interfaces for standalone users, these interfaces are still not available in standard form
with commercial database.
Representing complex relationship among data:
A database may include varieties of data interrelated to each other in many ways. A DBMS must have the capability
to represent a variety of relationships among the data as well as to retrieve and update related data easily and
efficiently.
Improved Security:
Data is vital to any organization and also confidential. In a shared system where multiple users share the
data, all information should not be shared by all users. For example, the salary of the employees should
not be visible to anyone other than the department dealing in this. Hence, database should be protected
from unauthorized users. This is done by Database Administrator (DBA) by providing the usernames and
passwords only to authorized users as well as granting privileges or the type of operation allowed. This is
done by using security and authorization subsystem. Only authorized users may use the database and their access
types can be restricted to only retrieval, insert, update or delete or any of these. For example, the Branch Manager
of any company may have access to all data whereas the Sales Assistant may not have access to salary details.
Improved Backup and Recovery:
DBMS provides facilities for recovering the hardware and software failures. A backup and recovery
subsystem is responsible for this. In case a program fails, it restores the database to a state in which it
was before the execution of the program.
Support for concurrent transactions:
A transaction is defined as the unit of work. For example, a bank may be involved in a transaction where
an amount of Rs.5000/- is transferred from account X to account Y. A DBMS also allows multiple
transactions to occur simultaneously.

7
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
DBMS ARCHITECTURE (Three Schema Architecture or Three Level of Abstraction)
Database Management Systems are very complex, sophisticated software applications that provide reliable
management of large amounts of data. There are two different ways to look at the architecture of a DBMS:
The logical DBMS architecture and the physical DBMS architecture.
The logical architecture deals with the way data is stored and presented to users, while the physical architecture is
concerned with the software components that make up a DBMS.
The logical architecture describes how data in the database is perceived by users. It is not concerned
with how the data is handled and processed by the DBMS, but only with how it looks. The method of
data storage on the underlying file system is not revealed, and the users can manipulate the data
without worrying about where it is located or how it is actually stored. The physical architecture describes the
software components used to enter and process data, and how these software components are related and
interconnected. This results in the database having different levels of abstraction. There are three levels of
abstraction:
1, The external Schema or view level
2, The conceptual Schema level
3, The internal or physical Schema level

8
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The external or view level:
The external or view level is the highest level of abstraction of database. It provides a window on the
conceptual view, which allows the user to see only the data of interest to them. The user can be either
an application program or an end user. There can be many external views as any number of external
schema can be defined and they can overlap each other. It consists of the definition of logical records
and relationships in the external view. It also contains the methods for deriving the objects such as
entities, attributes and relationships in the external view from the Conceptual view.

The conceptual level:

The middle level in the three-level architecture is a conceptual view level, which is also referred to as logical view
level. It describes the entire structure of database such as entities, attributes, data types, relationships, constraints,
and user operations. It hides the details of physical storage structures. The conceptual view level supports the
external view level to present the data to end-users as they need. This view is relatively constant and the Database
Administrator designs it after determining the present and future information needs of the organization. However,
to expand the conceptual view, the Database Administrator adds the new objects to fulfill the requirements of the
organization, without affecting the external view.
The conceptual view is defined by means of the conceptual schema, which includes definitions of each of the
various conceptual record types. The conceptual schema is a complete description of information of database
structure such as every record type with all its fields. It also includes security and integrity rules. The conceptual
schema is written in DDL, compiled by the DBMS and stored in its data dictionary. The DBMS uses the conceptual
schema to create the logical record interface, which is used by the external record of a particular user to present
data to that user. Actually, conceptual view level is a collection of the logical records.
The internal or physical level:
The collection of files permanently stored on secondary storage devices is known as the physical database. The
physical or internal level is the one closest to physical storage, and it provides a low‐level description of the physical
database, and an interface between the operating systems file system and the record structures used in higher
levels of abstraction. It is at this level that record types and methods of storage are defined, as well as how stored
fields are represented, what physical sequence the stored records are in, and what other physical structures exist.

9
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Introductions of Data models

Definition: A data model is a collection of concepts that can be used to describe the structure of a database.

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another
and to properties of the real world. For instance, a data model may specify that a data element representing a car
comprise a number of other elements which in turn represent the color, size and owner of the car. According to
Hoberman (2009), "A data model is a way finding tool for both business and IT professionals, which uses a set of
symbols and text to precisely explain a subset of real information to improve communication within the
organization and thereby lead to a more flexible and stable application environment."
The main aim of data models is to support the development of information systems by providing the definition and
format of data. According to West and Fowler (1999) "if this is done consistently across systems then compatibility
of data can be achieved. If the same data structures are used to store and access data then different applications
can share data. The results of this are indicated above. However, systems and interfaces often cost more than they
should, to build, operate, and maintain. They may also constrain the business rather than support it. A major cause
is that the quality of the data models implemented in systems and interfaces is poor".

Three perspectives:
A data model instance may be one of three kinds according to ANSI in 1975:
Conceptual data model : describes the semantics of a domain, being the scope of the model. For example, it may
be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of
things of significance in the domain, and relationship assertions about associations between pairs of entity classes.
A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that
sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the
model. The number of objects should be very small and focused on key concepts. Try to limit this model to one
page, although for extremely large organizations or complex projects, the model might span two or more pages. [7]
Logical data model : describes the semantics, as represented by a particular data manipulation technology. This
consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things.
Physical data model : describes the physical means by which data are stored. This is concerned with partitions,
CPUs, tablespaces, and the like.

Database model
Flat model
This may not strictly qualify as a data model. The flat (or table) model consists of a single, two-dimensional array of
data elements, where all members of a given column are assumed to be similar values, and all members of a row
are assumed to be related to one another.

10
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Hierarchical model
In this model data is organized into a tree-like structure, implying a single upward link in each record to describe
the nesting, and a sort field to keep the records in a particular order in each same-level list.

Network model
This model organizes data using two fundamental constructs, called records and sets. Records contain fields, and
sets define one-to-many relationships between records: one owner, many members.

11
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Entity Relationship Model:
The ER data model is based on a perception of real world that consist of collection of basic objects called entities
and relationship among these objects. In an ER model a database can be modeled as a collection of entities and
relationship among entities. It one of the conceptual data model and describes the information used by an
organization in a way that is independent of any implementation level issues and details. Entity Relationship model
can be overall logical structure of a database can be expressed graphically by E-R Diagram.

Relational model
Relational database Model is a collection of relations. A relation is nothing but a table of values. Every row in the
table represents a collection of related data values. These rows in the table denote a real-world entity or
relationship. The table name and column names are helpful to interpret the meaning of values in each row. The
data are represented as a set of relations. In the relational model, data are stored as tables. However, the physical
storage of the data is independent of the way the data are logically organized.

12
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Object-Oriented model
Similar to a relational database model, but objects, classes and inheritance are directly supported in database
schemas and in the query language. OODBMS main objectives are Encapsulation, Inheritance and polymorphism. In
Object Oriented Data Model, data and their relationships are contained in a single structure which is referred as
object in this data model. In this, real world problems are represented as objects with different attributes. All
objects have multiple relationships between them. Basically, it is combination of Object-Oriented programming and
Relational Database Model as it is clear from the following figure:

Object:
An object is an abstraction of a real-world entity or we can say it is an instance of class. Objects encapsulates
data and code into a single unit which provide data abstraction by hiding the implementation details from the
user. For example: Instances of Publication, Books, Reviewer.

Encapsulation:
Encapsulation is the ability to group data and mechanisms into a single object to provide access protection.
Through this process, pieces of information and details of how an object works are hidden, resulting in data and
function security. Classes interact with each other through methods without the need to know how particular
methods work.

13
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Inheritance:
Inheritance creates a hierarchical relationship between related classes while making parts of code reusable.
Defining new types inherits all the existing class fields and methods plus further extends them. The existing class
is the parent class, while the child class extends the parent.

Publication

Books

Reviewer

Polymorphism:
Polymorphism is originally a Greek word that means the ability to take multiple forms. In object-oriented
paradigm, polymorphism implies using operations in different ways, depending upon the instance they are
operating upon. Polymorphism allows objects with different internal structures to have a common external
interface. Polymorphism is particularly effective while implementing inheritance.

14
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Languages: DDL, DML
Database languages are used for read, update and store data in a database. There are several such languages that
can be used for this purpose; one of them is SQL (Structured Query Language). A DBMS must provide appropriate
languages and interfaces for each category of users to express database queries and updates. Database Languages
are used to create and maintain database on computer. There are large numbers of database languages like Oracle,
MySQL, MS Access, dBase, FoxPro etc. SQL statements commonly used in Oracle MS-SQL, My-SQL and MS Access
can be categorized as data definition language (DDL) and data manipulation language (DML).

Data Definition Language (DDL):

It is a language that allows the users to define data and their relationship to other types of data. It is mainly used to
create files, databases, data dictionary and tables within databases. It is also used to specify the structure of each
table, set of associated values with each attribute, integrity constraints, security and authorization information for
each table and physical storage structure of each table on disk.
Data Definition Language (DDL is used for specifying the database schema. Let’s take SQL for instance to categorize
the statements that comes under DDL.
• To create the database instance – CREATE (Create New Database)
Example:
CREATE DATABASE mydb;
use mydb;
CREATE TABLE Persons
(PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255) );
• To alter the structure of database – ALTER (ALTER TABLE statement is used to add, delete, or modify
columns in an existing table)
Example:
ALTER TABLE Persons
ADD DOB date

• To drop database instances – DROP (DROP command removes a table from the database)
Syntax:

15
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
DROP DATABASE database_name
DROP TABLE table_name
• To delete tables in a database instance – TRUNCATE (TRUNCATE removes all rows from a table)
Syntax:
TRUNCATE TABLE table_name

Data Manipulation Language (DML):

It is a language that provides a set of operations to support the basic data manipulation operations on the data held
in the databases. It allows users to insert, update, delete and retrieve data from the database. The part of DML that
involves data retrieval is called a query language.
Data Manipulation Language (DML) is used for accessing and manipulating data in a database.
• To read records from table(s) – SELECT
Syntax:
SELECT * FROM table_name;
• To insert record(s) into the table(s) – INSERT
Syntax:
INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);
• Update the data in table(s) – UPDATE
Example:
UPDATE Customers
SET ContactName=Biran’, City='Dharan'
WHERE CustomerName=Biran Limbu’;
• Delete all the records from the table – DELETE
Example:
DELETE FROM Customers
WHERE CustomerName=’Biran Limbu’ AND ContactName=’Biran’;
• With RENAME statement you can rename a table.
RENAME TABLE employees TO employees_new;

Data Control Language (DCL):

DCL statements control access to data and the database using statements such as GRANT and REVOKE. A privilege
can either be granted to a User with the help of GRANT statement. The privileges assigned can be SELECT, ALTER,
DELETE, EXECUTE, INSERT, INDEX etc. In addition to granting of privileges, you can also revoke (taken back) it by
using REVOKE command.
Data Control language (DCL) is used for granting and revoking user access on a database –
• To grant access to user – GRANT (SQL GRANT is a command used to provide access or privileges on the
database objects to the users.)
Syntax:
GRANT privilege_name
ON object_name
TO {user_name |PUBLIC |role_name}
[WITH GRANT OPTION];

16
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
privilege_name is the access right or privilege granted to the user. Some of the access rights are ALL,
EXECUTE, and SELECT.
object_name is the name of an database object like TABLE, VIEW, STORED PROC and SEQUENCE.
user_name is the name of the user to whom an access right is being granted.
PUBLIC is used to grant access rights to all users.
ROLES are a set of privileges grouped together.
WITH GRANT OPTION - allows a user to grant access rights to other users.
• To revoke access from user – REVOKE (REVOKE command removes user access rights or privileges to the
database objects.)
Syntax:
REVOKE privilege_name
ON object_name
FROM {user_name |PUBLIC |role_name}

Database users and administrators

Database users:
Database users are the one who really use and take the benefits of database. There will be different types of users
depending on their need and way of accessing the database.
Application Programmers - They are the developers who interact with the database by means of DML queries.
These DML queries are written in the application programs like C, C++, JAVA, Pascal etc. These queries are
converted into object code to communicate with the database. For example, writing a C program to generate the
report of employees who are working in particular department will involve a query to fetch the data from database.
It will include an embedded SQL query in the C Program.
Sophisticated Users - They are database developers, who write SQL queries to select/insert/delete/update data.
They do not use any application or programs to request the database. They directly interact with the database by
means of query language like SQL. These users will be scientists, engineers, analysts who thoroughly study SQL and
DBMS to apply the concepts in their requirement. In short, we can say this category includes designers and
developers of DBMS and SQL.
Specialized Users - These are also sophisticated users, but they write special database application programs. They
are the developers who develop the complex programs to the requirement.
Stand-alone Users - These users will have stand –alone database for their personal use. These kinds of database
will have readymade database packages which will have menus and graphical interfaces.
Native Users - these are the users who use the existing application to interact with the database. For example,
online library system, ticket booking systems, ATMs etc. which has existing application and users use them to
interact with the database to fulfill their requests.

17
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Administrators (DBA):
One of the main reasons for having the database management system is to have control of both data and
programs accessing that data. The person having such control over the system is called the database
administrator (DBA).
The life cycle of database starts from designing, implementing to administration of it. A database for any kind of
requirement needs to be designed perfectly so that it should work without any issues. Once all the design is
complete, it needs to be installed. Once this step is complete, users start using the database. The database grows
as the data grows in the database. When the database becomes huge, its performance comes down. Also accessing
the data from the database becomes challenge. There will be unused memory in database, making the memory
inevitably huge. These administration and maintenance of database is taken care by database Administrator – DBA.
A DBA has many responsibilities. A good performing database is in the hands of DBA.

• Installing and upgrading the DBMS Servers: - DBA is responsible for installing a new DBMS server for the
new projects. He is also responsible for upgrading these servers as there are new versions comes in the
market or requirement. If there is any failure in upgradation of the existing servers, he should be able
revert the new changes back to the older version, thus maintaining the DBMS working. He is also
responsible for updating the service packs/ hot fixes/ patches to the DBMS servers.
• Design and implementation: - Designing the database and implementing is also DBA’s responsibility. He
should be able to decide proper memory management, file organizations, error handling, log maintenance
etc. for the database.
• Performance tuning: - Since database is huge and it will have lots of tables, data, constraints and indices,
there will be variations in the performance from time to time. Also, because of some designing issues or
data growth, the database will not work as expected. It is responsibility of the DBA to tune the database
performance. He is responsible to make sure all the queries and programs works in fraction of seconds.
• Migrate database servers: - Sometimes, users using oracle would like to shift to SQL server or Netezza. It is
the responsibility of DBA to make sure that migration happens without any failure, and there is no data
loss.
• Backup and Recovery: - Proper backup and recovery programs needs to be developed by DBA and has to
be maintained him. This is one of the main responsibilities of DBA. Data/objects should be backed up
regularly so that if there is any crash, it should be recovered without much effort and data loss.
• Security: - DBA is responsible for creating various database users and roles, and giving them different levels
of access rights.
• Documentation: - DBA should be properly documenting all his activities so that if he quits or any new DBA
comes in, he should be able to understand the database without any effort. He should basically maintain all
his installation, backup, recovery, security methods. He should keep various reports about database
performance.

Types of DBA:

There are different kinds of DBA depending on the responsibility that he owns.
• Administrative DBA - This DBA is mainly concerned with installing, and maintaining DBMS servers. His
prime tasks are installing, backups, recovery, security, replications, memory management, configurations
and tuning. He is mainly responsible for all administrative tasks of a database.

18
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• Development DBA - He is responsible for creating queries and procedure for the requirement. Basically his
task is similar to any database developer.
• Database Architect - Database architect is responsible for creating and maintaining the users, roles, access
rights, tables, views, constraints and indexes. He is mainly responsible for designing the structure of the
database depending on the requirement. These structures will be used by developers and development
DBA to code.
• Data Warehouse DBA -DBA should be able to maintain the data and procedures from various sources in
the datawarehouse. These sources can be files, or any other programs. Here data and programs will be
from different sources. A good DBA should be able to keep the performance and function levels from these
sources at same pace to make the datawarehouse to work.
• Application DBA -He acts like a bridge between the application program and the database. He makes sure
all the application program is optimized to interact with the database. He ensures all the activities from
installing, upgrading, and patching, maintaining, backup, recovery to executing the records works without
any issues.

19
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Transaction Management

A transaction is one or more SQL statements that make up a unit of work performed against the database, and
either all the statements in a transaction are committed as a unit or all the statements are rolled back as a unit.
This unit of work typically satisfies a user request and ensures data integrity. For example, when use a computer to
transfer money from one bank account to another, the request involves a transaction: updating values stored in the
database for both accounts. For a transaction to be completed and database changes to be made permanent, a
transaction must be completed in its entirety.
Read A;
A = A – 100;
Write A;
Read B;
B = B + 100;
Write B;
A transaction is a logical unit of work that contains one or more SQL statements. A transaction is an atomic unit.
The effects of all the SQL statements in a transaction can be either all committed (applied to the database) or all
rolled back (undone from the database).
A transaction begins with the first executable SQL statement. A transaction ends when it is committed or rolled
back, either explicitly with a COMMIT or ROLLBACK statement or implicitly.
To illustrate the concept of a transaction, consider a banking database. When a bank customer transfers money
from a savings account to a checking account, the transaction can consist of three separate operations:
• Decrement the savings account
• Increment the checking account
• Record the transaction in the transaction journal

SQL must allow for two situations. If all three SQL statements can be performed to maintain the accounts in proper
balance, the effects of the transaction can be applied to the database. However, if a problem such as insufficient
funds, invalid account number, or a hardware failure prevents one or two of the statements in the transaction from
completing, the entire transaction must be rolled back so that the balance of all accounts is correct.

20
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A SQL statement that runs successfully is different from a committed transaction. Executing successfully means that
a single statement was:
• Parsed
• Found to be a valid SQL construction
• Run without error as an atomic unit. For example, all rows of a multi row update are changed.

However, until the transaction that contains the statement is committed, the transaction can be rolled back, and all
of the changes of the statement can be undone. A statement, rather than a transaction, runs successfully.
Committing means that a user has explicitly or implicitly requested that the changes in the transaction be made
permanent. An explicit request occurs when the user issues a COMMIT statement. An implicit request occurs after
normal termination of an application or completion of a data definition language (DDL) operation. The changes
made by the SQL statement(s) of a transaction become permanent and visible to other users only after that
transaction commits. Queries that are issued after the transaction commits will see the committed changes.
You can name a transaction using the SET TRANSACTION ... NAME statement before you start the transaction. This
makes it easier to monitor long-running transactions and to resolve in-doubt distributed transactions.
In a database, each transaction should maintain ACID (Atomicity, Consistency, Durability, Isolation) property to
meet the consistency and integrity of the database.

21
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Transaction Management: ACID (Atomicity, Consistency, Isolation, Durability)

The ACID model is one of the oldest and most important concepts of database theory. It sets forward four goals
that every database management system must strive to achieve: atomicity, consistency, isolation and durability. No
database that fails to meet any of these four goals can be considered reliable. ACID (Atomicity, Consistency,
Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the
context of databases, a single logical operation on the data is called a transaction.

Atomicity:
Atomicity states that database modifications must follow an all or nothing rule. Each transaction is said to be
atomic. If one part of the transaction fails, the entire transaction fails. It is critical that the database management
system maintain the atomic nature of transactions in spite of any DBMS, operating system or hardware failure.

Consistency:
Consistency states that only valid data will be written to the database. If, for some reason, a transaction is executed
that violates the databases consistency rules, the entire transaction will be rolled back and the database will be
restored to a state consistent with those rules. On the other hand, if a transaction successfully executes, it will take
the database from one state that is consistent with the rules to another state that is also consistent with the rules.

22
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Isolation:
It’s safe to say that at any given time on Amazon, there is far more than one transaction occurring on the platform…
In fact, an incredibly huge amount of database transactions are occurring simultaneously! For a database, isolation
refers to the ability to concurrently process multiple transactions in a way that one does not affect another. So,
imagine you and your neighbor are both trying to buy something from the same e-commerce platform at the same
time. There are 10 items for sale: your neighbor wants five and you want six. Isolation means that one of those
transactions would be completed ahead of the other one. In other words, if your neighbor clicked first, they will get
five items, and only five items will be remaining in stock. So you will only get to buy five items. If you clicked first,
you will get the six items you want, and they will only get four. Thus,. isolation ensures that eleven items aren’t sold
when only ten exist.

Durability:
Durability ensures that any transaction committed to the database will not be lost. Durability is ensured through
the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite
of any subsequent software or hardware failures.

States of Transaction
In a database, the transaction can be in one of the following states:

Active state
• The active state is the first state of every transaction. In this state, the transaction is being executed.
• For example: Insertion or deletion or updating a record is done here. But all the records are still not saved
to the database.
Partially committed
• In the partially committed state, a transaction executes its final operation, but the data is still not saved to
the database.
• In the total mark calculation example, a final display of the total marks step is executed in this state.
Committed
• A transaction is said to be in a committed state if it executes all its operations successfully. In this state, all
the effects are now permanently saved on the database system.
Failed state
• If any of the checks made by the database recovery system fails, then the transaction is said to be in the
failed state.
• In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then
the transaction will fail to execute.
23
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Aborted
• If any of the checks fail and the transaction has reached a failed state then the database recovery system
will make sure that the database is in its previous consistent state. If not then it will abort or roll back the
transaction to bring the database into a consistent state.
• If the transaction fails in the middle of the transaction then before executing the transaction, all the
executed transactions are rolled back to its consistent state.
• After aborting the transaction, the database recovery module will select one of the two operations:
✓ Re-start the transaction
✓ Kill the transaction
Serializability in DBMS
A schedule is serialized if it is equivalent to a serial schedule. A concurrent schedule must ensure it is the same as if
executed serially means one after another. It refers to the sequence of actions such as read, write, abort, commit
are performed in a serial manner.

Example
Let’s take two transactions T1 and T2,
If both transactions are performed without interfering each other then it is called as serial schedule, it can be
represented as follows – serial schedule

Non serial schedule − When a transaction is overlapped between the transaction T1 and T2.
Example
Consider the following example −

24
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Types of serializability
There are two types of serializability −
View serializability
A schedule is view-serializability if it is viewed equivalent to a serial schedule.
The rules it follows are as follows −
T1 is reading the initial value of A, then T2 also reads the initial value of A.
T1 is the reading value written by T2, then T2 also reads the value written by T1.
T1 is writing the final value, and then T2 also has the write operation as the final value.

Conflict serializability
It orders any conflicting operations in the same way as some serial execution. A pair of operations is said to conflict
if they operate on the same data item and one of them is a write operation.
That means
Readi(x) readj(x) - non conflict read-read operation
Readi(x) writej(x) - conflict read-write operation.
Writei(x) readj(x) - conflict write-read operation.
Writei(x) writej(x) - conflict write-write operation.

Recoverable schedule
Consider the following example −

25
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Here, transaction T2 is reading the value written by transaction T1 and the commit of T2 occurs after the commit of
T1. Hence, it is a recoverable schedule.
Again the recoverable schedule is divided into cascade less and strict schedule −
Cascadeless schedule
Given below is an example for the cascadeless schedule –
Here, the updated value of X is read by transaction T2 only after the commit of transaction T1. Hence, the schedule
is Cascadeless schedule.

Strict schedule
Given below is an example of strict schedule −

Here, transaction T2 reads and writes the updated or written value of transaction T1 only after the transaction T1
commits. Hence, the schedule is strict schedule.
Non-Recoverable Schedule
A schedule that is not recoverable is non-recoverable. If the commit operation of Ti doesn't occur before the
commit operation of Tj, it is non-recoverable.

T1 is recoverable because T1 is
not committed but T2 is Non-
Recoverable because T2 is
already committed.

26
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
T1 is not recoverable because T1
is committed.

DBMS Concurrency Control: Timestamp & Lock-Based Protocols

Concurrency Control
Concurrency Control is the management procedure that is required for controlling concurrent execution of the
operations that take place on a database.
Concurrent Execution
• In a multi-user system, multiple users can access and use the same database at one time, which is known as
the concurrent execution of the database. It means that the same database is executed simultaneously on
a multi-user system by different users.
• While working on the database transactions, there occurs the requirement of using the database by
multiple users for performing different operations, and in that case, concurrent execution of the database
is performed.
• The thing is that the simultaneous execution that is performed should be done in an interleaved manner,
and no operation should affect the other executing operations, thus maintaining the consistency of the
database. Thus, on making the concurrent execution of the transaction operations, there occur several
challenging problems that need to be solved.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So, there is a need to manage
these two operations in the concurrent execution of the transactions as if these operations are not performed in an
interleaved manner, and the data may become inconsistent. So, the following problems occur with the Concurrent
Execution of the operations:
Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the read/write operations on the same database items
in an interleaved manner (i.e., concurrent execution) that makes the values of the items incorrect hence making the database
inconsistent. For example:
Consider the below diagram where two transactions TX and TY, are performed on the same account A where the balance of
account A is $300.

27
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
1. At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
2. At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and not updated/write).
3. Alternately, at time t3, transaction TY reads the value of account A that will be $300 only because TX didn't update the
value yet.
4. At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not updated/write).
5. At time t6, transaction TX writes the value of account A that will be updated as $250 only, as TY didn't update the
value yet.
6. Similarly, at time t7, transaction TY writes the values of account A, so it will write as done at time t4 that will be $400.
It means the value written by TX is lost, i.e., $250 is lost.

Dirty Read Problems (W-R Conflict)

The dirty read problem occurs when one transaction updates an item of the database, and somehow the
transaction fails, and before the data gets rollback, the updated database item is accessed by another transaction.
There comes the Read-Write Conflict between both transactions. For example:
Consider two transactions TX and TY in the below diagram performing read/write operations on account A where
the available balance in account A is $300:

1. At time t1, transaction TX reads the value of account A, i.e., $300.

2. At time t2, transaction TX adds $50 to account A that becomes $350.
3. At time t3, transaction TX writes the updated value in account A, i.e., $350.
4. Then at time t4, transaction TY reads account A that will be read as $350.
5. Then at time t5, transaction TX rollbacks due to server problem, and the value changes back to $300 (as initially).
6. But the value for account A remains $350 for transaction TY as committed, which is the dirty read and therefore
known as the Dirty Read Problem.
Unrepeatable Read Problem (W-R Conflict)
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different values are read for the same
database item. For example:

28
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Consider two transactions, TX and TY, performing the read/write operations on account A, having an available balance = $300.
The diagram is shown below:

1. At time t1, transaction TX reads the value from account A, i.e., $300.
2. At time t2, transaction TY reads the value from account A, i.e., $300.
3. At time t3, transaction TY updates the value of account A by adding $100 to the available balance, and then it
becomes $400.
4. At time t4, transaction TY writes the updated value, i.e., $400.
5. After that, at time t5, transaction TX reads the available value of account A, and that will be read as $400.
6. It means that within the same transaction TX, it reads two different values of account A, i.e., $ 300 initially, and after
updation made by transaction TY, it reads $400. It is an unrepeatable read and is therefore known as the
Unrepeatable read problem.
Concurrency Control Protocols
Different concurrency control protocols offer different benefits between the amount of concurrency they allow and
the amount of overhead that they impose. Following are the Concurrency Control techniques in DBMS:
1. Lock-Based Protocols
2. Two Phase Locking Protocol
3. Timestamp-Based Protocols
4. Validation-Based Protocols

1. Lock-based Protocols
Lock Based Protocols in DBMS is a mechanism in which a transaction cannot Read or Write the data until it acquires an
appropriate lock. Lock based protocols help to eliminate the concurrency problem in DBMS for simultaneous transactions by
locking or isolating a particular transaction to a single user.
Lock-based protocols are:
1. Binary Locks:
A Binary lock on a data item can either locked or unlocked states.
2. Shared Lock (S):
A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared between
transactions. This is because you will never have permission to update data on the data item.
For example,
consider a case where two transactions are reading the account balance of a person. The database will let them
read by placing a shared lock. However, if another transaction wants to update that account’s balance, shared lock
prevent it until the reading process is over.
3. Exclusive Lock (X):
With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can’t be held concurrently
on the same data item. X-lock is requested using lock-x instruction. Transactions may unlock the data item after
finishing the ‘write’ operation.
For example,

29
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
when a transaction needs to update the account balance of a person. You can allow this transaction by placing X
lock on it. Therefore, when the second transaction wants to read or write, exclusive lock prevents this operation.
4. Simplistic Lock Protocol
This type of lock-based protocols allows transactions to obtain a lock on every object
before beginning ‘write ‘operation. Transactions may unlock the data item after
finishing the ‘write’ operation.

5. Pre-claiming Locking
Pre-claiming lock protocol helps to evaluate operations and create a list of
required data items which are needed to initiate an execution process. In the
situation when all locks are granted, the transaction executes. After that, all locks
release when all of its operations are over.

2. Two Phase Locking Protocol

In this method, all locking operations precede the first lock-release or unlock operation. The transaction comprise
of two phases. In the first phase, a transaction only acquires all the locks it needs and do not release any lock. This
is called the expanding or the growing phase. In the second phase, the transaction releases the locks and cannot
request any new locks. This is called the shrinking phase.
Growing Phase − All the locks are issued in this phase. No locks are released, after all changes to data-items are
committed and then the second phase (shrinking phase) starts.
Shrinking phase − No locks are issued in this phase, all the changes to data-items are noted (stored) and then
locks are released.
This locking protocol divides the execution phase of a transaction into three different parts.
1. In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.
2. The second part is where the transaction obtains all the locks. When a transaction releases its first lock,
the third phase starts.
3. In this third phase, the transaction cannot demand any new locks. Instead, it only releases the acquired
locks.

Strict Two-phase locking (Strict-2PL)

30
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the transaction
continues to execute normally.
The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after using it.
Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at a time.
Strict-2PL protocol does not have shrinking phase of lock release.
3. Timestamp-Based Protocols
Timestamp based Protocol in DBMS is an algorithm which uses the System Time or Logical Counter as a timestamp
to serialize the execution of concurrent transactions. The Timestamp-based protocol ensures that every conflicting
read and write operations are executed in a timestamp order.
The older transaction is always given priority in this method. It uses system time to determine the time stamp of
the transaction. This is the most commonly used concurrency protocol.
Lock-based protocols help you to manage the order between the conflicting transactions when they will execute.
Timestamp-based protocols manage conflicts as soon as an operation is created.
Example:
Suppose there are three transactions T1, T2, and T3.
T1 has entered the system at time 0010
T2 has entered the system at 0020
T3 has entered the system at 0030
Priority will be given to transaction T1, then transaction T2 and lastly Transaction T3.
4. Validation Based Protocol
Validation based Protocol in DBMS also known as Optimistic Concurrency Control Technique is a method to avoid
concurrency in transactions. In this protocol, the local copies of the transaction data are updated rather than the
data itself, which results in less interference while execution of the transaction.
The Validation based Protocol is performed in the following three phases:
Read Phase
Validation Phase
Write Phase
Read Phase
In the Read Phase, the data values from the database can be read by a transaction but the write operation or
updates are only applied to the local data copies, not the actual database.
Validation Phase
In Validation Phase, the data is checked to ensure that there is no violation of serializability while applying the
transaction updates to the database.
Write Phase
In the Write Phase, the updates are applied to the database if the validation is successful, else; the updates are not
applied, and the transaction is rolled back.

31
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Recovery Techniques
There can be any case in database system like any computer system when database failure happens. So, data
stored in database should be available all the time whenever it is needed. So, Database recovery means recovering
the data when it gets deleted, hacked or damaged accidentally. Atomicity is must whether is transaction is over or
not it should reflect in the database permanently or it should not affect the database at all. So, database recovery
and database recovery techniques are must in DBMS.

Types of failure-
• A computer failure (system crash). A hardware, software, or network error occurs in the computer system.
• A transaction or system error. Some operation in the transaction may cause it to fail, such as integer
overflow or division by zero.
• Disk failure. Some disk blocks may lose their data because of a read or write Malfunction.
• Physical problems and catastrophes. This refers to an endless list of problems that includes power failure,
fire, flood earthquake.

Recovery:
• The database is restored to the most recent consistent state just before the time of failure.
• It takes care of atomicity and durability properties of a transaction.
System log:
The system must keep information about the changes that were applied to data items by the various transactions.
This information is typically kept in the system log.
Log buffer: stored at volatile memory (main memory).
Log file: stored at non-volatile memory (disk).

Types of entries in system log:

1. [start_transaction, T]. Indicates that transaction T has started execution.
2. [write_item, T, X, old_value, new_value]. Indicates that transaction T has changed the value of database item X
from old_value to new_value.
3. [read_item, T, X]. Indicates that transaction T has read the value of database item X.
4. [commit, T]. Indicates that transaction T has completed successfully, and affirms that its effect can be committed
(recorded permanently) to the database.
5. [abort, T]. Indicates that transaction T has been aborted.

32
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Recovery Based on Deferred Update: Also called No-UNDO/REDO approach.

• The idea is to postpone any actual updates to the database on disk until the transaction reaches its commit
point.
• During transaction execution, the updates are recorded only in the log and forcewritten to disk only after
transaction reaches its commit point.
• If a transaction fails before reaching its commit point, there is no need to undo any operations because the
transaction has not affected the database on disk in any way

Recovery Techniques Based on Immediate Update:

When a transaction issues an update command, the database on disk can be updated immediately, without waiting
for the transaction to commit.
It is not mandatory that every update applied immediately to disk. It is possible that some updates are applied to
disk before the transaction commits.
Two variations:
UNDO/NO-REDO: If the recovery technique ensures that all updates of transaction are recorded in the
database on disk before the transaction commits, REDOing is not required.
UNDO/REDO: If the transaction is allowed to commit before all its changes are written to the database.

Shadow Paging:
• It maintains two copies of database: Current directory and Shadow directory.
• When a transaction begins executing, the current directory—whose entries point to the most recent or
current database state—is copied into a shadow directory.
• current directory is used by the transaction.
• During transaction execution, the shadow directory is never modified.
• When transaction commits, the shadow directory is overwritten by current
• directory.
• To recover from a failure, the current directory is mounted with shadow directory.
• NO-UNDO/ NO-REDO recovery technique.

Recovery from Catastrophic Failures:

• Database backup, in which the whole database and the log are periodically copied to physically separate
safe locations.
• In case of a catastrophic failure, the latest backup copy can be reloaded.

33
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data Governance
Data governance is a broad category that includes internal policies and procedures controlling the management of
data. Data governance is the process of organizing, securing, managing, and presenting data using methods and
technologies that ensure it remains correct, consistent, and accessible to verified users.
Data governance is the process of:
Organizing — identifying all your data sources and getting all your data in one place.
Securing — making sure all your data is compliant with data privacy regulations and internal company policies.
Managing and presenting data — after you’ve nailed down your organization’s data, you need to decide how you
present this data to your team.
Using methods and technologies — like modern data governance platforms. That ensures it remains correct,
consistent, and accessible to — like modern data governance platforms. the people in your organization that have
the permission to access it, in short — verified users.
Imporatnce of Data Governance
Data is arguably the most important asset that organizations have. Data governance helps to ensure that data is
usable, accessible and protected. Effective data governance leads to better data analytics, which in turn leads to
better decision making and improved operations support. Further, it helps to avoid data inconsistencies or errors in
data, which lead to integrity issues, poor decision making, and a variety of organizational problems.
Data Governance can be described in three core objectives of access, literacy, and quality.
• Access includes all of the company’s data so that data is easily discoverable and protected for compliance.
• Since everyone has access to the data, they need to understand it, so data literacy is a high priority.
• Data quality can be monitored and users report data quality issues which can then be fixed, increasing the
data’s trustworthiness in the data’s lifecycle.
Data governance drivers
For effective data governance, it is vital to understand what factors are driving data governance to point of
urgency.
Master Data Management
When we have multiple applications doing different business functions, they always require common data, like
customers, employees, chart of accounts, and materials. To overcome challenges in the enterprise, the IT
department generally ensures that one application is the master of a specific data element, while others are only
used to deal with customer prospects. It is essential to integrate these applications and have a single sign-on.
Integrations
With multiple applications doing many business functions, it is of great importance to integrate them. For example,
the CRM system may need to combine with the financial system to complete the purchase and invoicing process.
As data flows during the integration, it requires management. These integrations are mostly custom written and
need support from IT staff. This issue can also be referred to as a data governance problem regarding business
rules. Generally, a support team is incorporated. Its job is to manage these integrations.
BI & Analytics
As BI and Analytics are becoming more prominent, every business unit is hiring their own data scientists. With the
growing abundance of data, it is becoming challenging for them to provide access to data and knowledge about
that data. Hence, most of the business units are demanding automated data governance.
Data Privacy and Financial Regulations
As companies store various kinds of data in different databases, they need to manage the data and all the problems
associated with it because of compliance. These regulations have strict rules that deal with how organizations can
capture and store customer data.

34
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data governance initiatives
Data governance initiatives improve quality of data by assigning a team responsible for data's accuracy,
completeness, consistency, timeliness, validity, and uniqueness.This team usually consists of executive leadership,
project management, line-of-business managers, and data stewards. The team usually employs some form of
methodology for tracking and improving enterprise data, tools for data mapping, profiling, cleansing, and
monitoring data.
Data governance initiatives may be aimed at achieving a number of objectives including offering better visibility to
internal and external customers (such as supply chain management). Many data governance initiatives are also
inspired by past attempts to fix information quality at the departmental level, leading to incongruent and
redundant data quality processes. Most large companies have many applications and databases that can't easily
share information. Therefore, knowledge workers within large organizations often don't have access to the data
they need to best do their jobs. When they do have access to the data, the data quality may be poor. By setting up
a data governance practice or corporate data authority (individual or area responsible for determining how to
proceed, in the best interest of the business, when a data issue arises), these problems can be mitigated.

Database Management
Database Management allows a person to organize, store, and retrieve data from a computer. Database
Management can also describe the data storage, operations, and security practices of a database administrator
(DBA) throughout the life cycle of the data. Managing a database involves designing, implementing, and supporting
stored data to maximize its value.

Data maintenance
Database Maintenance is a term we use to describe a set of tasks that are all run with the intention to improve
your database. There are routines meant to help performance, free up disk space, check for data errors, check for
hardware faults, update internal statistics, and many other obscure (but important) things.

Data quality management

Data quality management provides a context-specific process for improving the fitness of data that’s used for
analysis and decision making. The goal is to create insights into the health of that data using various processes and
technologies on increasingly bigger and more complex data sets.
Examples of data quality issues include duplicated data, incomplete data, inconsistent data, incorrect data, poorly
defined data, poorly organized data, and poor data security.
Data quality assessments are executed by data quality analysts, who assess and interpret each individual data
quality metric, aggregate a score for the overall quality of the data, and provide organizations with a percentage to
represent the accuracy of their data. A low data quality scorecard indicates poor data quality, which is of low value,
is misleading, and can lead to poor decision making that may harm the organization.

Data quality rules are an integral component of data governance, which is the process of developing and
establishing a defined, agreed-upon set of rules and standards by which all data across an organization is governed.
Effective data governance should harmonize data from various data sources, create and monitor data usage
policies, and eliminate inconsistencies and inaccuracies that would otherwise negatively impact data analytics
accuracy and regulatory compliance.

35
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data quality Management: Data cleansing, data integrity, Data enrichment, Data quality
There are six main dimensions of data quality: accuracy, completeness, consistency, validity, uniqueness, and
timeliness.
Accuracy: The data should reflect actual, real-world scenarios; the measure of accuracy can be confirmed with a
verifiable source.
Completeness: Completeness is a measure of the data’s ability to effectively deliver all the required values that
are available.
Consistency: Data consistency refers to the uniformity of data as it moves across networks and applications. The
same data values stored in difference locations should not conflict with one another.
Validity: Data should be collected according to defined business rules and parameters, and should conform to
the right format and fall within the right range.
Uniqueness: Uniqueness ensures there are no duplications or overlapping of values across all data sets. Data
cleansing and deduplication can help remedy a low uniqueness score.
Timeliness: Timely data is data that is available when it is required. Data may be updated in real time to ensure
that it is readily available and accessible.

Data cleansing
Data cleaning is a process by which inaccurate, poorly formatted, or otherwise messy data is organized and
corrected. Data cleansing or data cleaning is the process of identifying and correcting corrupt, incomplete,
duplicated, incorrect, and irrelevant data from a reference set, table, or database.
Data issues typically arise through user entry errors, incomplete data capture, non-standard formats, and data
integration issues.
Data cleansing is an essential process for preparing data for further use whether in operational processes or
downstream analysis. It can be performed best with data quality tools. These tools function in a variety of ways,
from correcting simple typographical errors to validating values against a known true reference set.
Data enrichment
Data enrichment is the process of adding external data from third-party data sources and adding it to your existing
database. The goal is to get further insights from your data to improve your marketing or sales approach.
Sometimes you can get the information from databases—the more data you have, the more you can understand
patterns and extract information like first names and genders.
Other times you can get this data from scraping company websites and platforms like LinkedIn.
The data enrichment process improves on the data you already have. This process, sometimes referred to as
‘appending’, enables you to fill in gaps in your database, such as gender, company, age, and so on.

Data security
Data security is critical to public and private sector organizations for a variety of reasons. First, there’s the legal and
moral obligation that companies have to protect their user and customer data from falling into the wrong hands.
Financial firms, for example, may be subject to the Payment Card Industry Data Security Standard (PCI DSS) that
forces companies to take all reasonable measures to protect user data.
Then there’s the reputational risk of a data breach or hack. If you don’t take data security seriously, your reputation
can be permanently damaged in the event of a publicized, high-profile breach or hack. Not to mention the financial
and logistical consequences if a data breach occurs. You’ll need to spend time and money to assess and repair the
damage, as well as determine which business processes failed and what needs to be improved.

36
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Types of Data Security
Access Controls
This type of data security measures includes limiting both physical and digital access to critical systems and data.
This includes making sure all computers and devices are protected with mandatory login entry, and that physical
spaces can only be entered by authorized personnel.

Authentication
Similar to access controls, authentication refers specifically to accurately identifying users before they have access
to data. This usually includes things like passwords, PIN numbers, security tokens, swipe cards, or biometrics.

Backups & Recovery

Good data security means you have a plan to securely access data in the event of system failure, disaster, data
corruption, or breach. You’ll need a backup data copy, stored on a separate format such as a physical disk, local
network, or cloud to recover if needed.

Data Erasure
You’ll want to dispose of data properly and on a regular basis. Data erasure employs software to completely
overwrite data on any storage device and is more secure than standard data wiping. Data erasure verifies that the
data is unrecoverable and therefore won’t fall into the wrong hands.

Data Masking
By using data masking software, information is hidden by obscuring letters and numbers with proxy characters. This
effectively masks key information even if an unauthorized party gains access to it. The data changes back to its
original form only when an authorized user receives it.

Encryption
A computer algorithm transforms text characters into an unreadable format via encryption keys. Only authorized
users with the proper corresponding keys can unlock and access the information. Everything from files and a
database to email communications can — and should — be encrypted to some extent.

Main Elements of Data Security

There are three core elements to data security that all organizations should adhere to: Confidentiality, Integrity,
and Availability.
✓ Confidentiality. Ensures that data is accessed only by authorized users with the proper credentials.
✓ Integrity. Ensure that all data stored is reliable, accurate, and not subject to unwarranted changes.
✓ Availability. Ensures that data is readily — and safely — accessible and available for ongoing business
needs.

37
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database System Structure

DBMS (Database Management System) acts as an interface between the user and the database. The user requests
the DBMS to perform various operations (insert, delete, update and retrieval) on the database. The components of
DBMS perform these requested operations on the database and provide necessary data to the users. The various
components of DBMS are:
Data Definition Language Compiler:
The DDL Compiler converts the data definition statements into a set of tables.
These tables contain the metadata concerning the database and are in a form that can be used by other
components of DBMS.
• To create the database instance – CREATE (Create New Database)
• To alter the structure of database – ALTER (ALTER TABLE statement is used to add, delete, or modify
columns in an existing table)
• To drop database instances – DROP (DROP command removes a table from the database)
• To delete tables in a database instance – TRUNCATE (TRUNCATE removes all rows from a table)
Data Manager:
The data manager is the central software component of the DBMS. It is sometimes referred to as the database
control system. One of the functions of the data manager is to convert operations in the user’s queries coming
directly via the query processor or indirectly via an application program from user’s logical view to a physical file
system. The data manager is responsible for interfacing with the file system. In addition, the tasks of enforcing
constraints to maintain the consistency and integrity of the data, as well as its security, are also performed by the
data manager. Synchronizing the simultaneous operations performed by concurrent users is under the control of
the data manager. It is also entrusted with the backup and recovery operations.

38
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
File Manager:
Responsibility for the structure of the files and managing the file space rests with the file manager. It is also
responsible for locating the block containing the required record, requesting this block from the disk manager, and
transmitting the required record to the data manager. The file manager can be implemented using an interface to
the existing file subsystem provided by the operating system of the host computer or it can include a file subsystem
written especially for DBMS.
Disk Manager:
The disk manager is part of the operating system of the host computer and all physical input and output operations
are performed by it. The disk manager transfers the block or page requested by the file manager so that the latter
need not be concerned with the physical characteristics of the underlying storage media.
Query Processor:

The database user retrieves data by formulating a query in the data manipulation language provided with the
database. The query processor is used to interpret the online user’s query and convert it into an efficient series of
operations in a form capable of being sent to the data manager for execution. The query processor uses the data
dictionary to find the structure of the relevant portion of the database and uses the information in modifying the
query and preparing an optimal plan to access the database.
Database Schemas:
Movies(title, year, length, genre, studioName, producerC#)
StarsIn(movieTitle, movieYear, starName)
MovieStar(name, address, gender, birthdate)

In a data model, it is important to distinguish between the description of the database and the database itself. The
description of a database is called the database schema, which is specified during database design and is not
expected to change frequently. Most data models have certain conventions for displaying schemas as diagrams. A
displayed schema is called a schema diagram. Diagram displays the structure of each record type but not the actual
instances of records.
Examples of Schema:

39
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data Files:
Data files contain the data portion of the database.

Data Dictionary:
A Data Dictionary stores information about the structure of the database. A comprehensive data
dictionary would provide the definition of data items, how they fit into the data structure and how they
relate to other entities in the database. In DBMS, the data dictionary stores the information concerning
the external, conceptual and internal levels of the databases. In the case of a table, data dictionary provides
information about

• Its name
• Security information like who is the owner of the table, when was it created, and when it was last accessed.
• Physical information like where is the data stored for this table
• Structural information like its attribute names and its data types, constraints and indexes.
• The definitions of all database objects like tables, views, constraints, indexes, clusters, synonyms,
sequences, procedures, functions, packages, triggers etc.
• It stores the information about how much space is allocated for each object and how much space has been
used by them
• Any default values that a column can have are stored
• Database user names - schemas
• Access rights for schemas on each of the objects
• Last updated and last accessed information about the object
• Any other database information
All these information are stored in the form of tables in the data dictionary.

40
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Application architectures

Today's database professionals face several options when considering architectures in which to employ to address
the various needs of their employers and/or clients. The following text will provide an overview of three main
categories of database architectures and their sub-categories, as well as offer some insight into the benefits of
each.
Application Logic
Database architectures can be distinguished by examining the way application logic is distributed throughout the
system. Application logic consists of three components: Presentation Logic, Processing Logic, and Storage Logic.
The presentation logic component is responsible for formatting and presenting data on the user's screen The
processing logic component handles data processing logic, business rules logic, and data management
logic. Finally, the storage logic component is responsible for the storage and retrieval from actual devices such as a
hard drive or RAM.

By determining which tier(s) these components are processed on we can get a good idea of what type of
architecture and subtype we are dealing with.

One Tier Architectures

Imagine a person on a desktop computer who uses Microsoft Access to load up a list of personal addresses and
phone numbers that he or she has saved in MS Windows' “My Documents” folder. This is an example of a one-tier
database architecture. The program (Microsoft Access) runs on the user's local machine, and references a file that
is stored on that machine's hard drive, thus using a single physical resource to access and process information.
Another example of a one-tier architecture is a file server architecture. In this scenario, a workgroup database is
stored in a shared location on a single machine. Workgroup members use a software package such as Microsoft
Access to load the data and then process it on their local machine. In this case, the data may be shared among
different users, but all of the processing occurs on the local machine. Essentially, the file-server is just an extra
hard drive from which to retrieve files.

41
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Yet another way one-tier architectures have appeared is in that of mainframe computing. In this outdated system,
large machines provide directly connected unintelligent terminals with the means necessary to access, view and
manipulate data. Even though this is considered a client-server system, since all of the processing power (for both
data and applications) occurs on a single machine, we have a one-tier architecture.
One-tier architectures can be beneficial when we are dealing with data that is relevant to a single user (or small
number of users) and we have a relatively small amount of data. They are somewhat inexpensive to deploy and
maintain.

Two Tier Client/Server Architectures

A two-tier architecture is one that is familiar to many of today's computer users. A common implementation of this
type of system is that of a Microsoft Windows based client program that accesses a server database such as Oracle
or SQL Server. Users interact through a GUI (Graphical User Interface) to communicate with the database server
across a network via SQL (Structured Query Language).
In two-tier architectures it is important to note that two configurations exist. A thin-client (fat-server)
configuration exists when most of the processing occurs on the server tier. Conversely, a fat-client (thin-server)
configuration exists when most of the processing occurs on the client machine.
Another example of a two-tier architecture can be seen in web-based database applications. In this case, users
interact with the database through applications that are hosted on a web-server and displayed through a web-
browser such as Internet Explorer. The web server processes the web application, which can be written in a
language such as PHP or ASP. The web app connects to a database server to pass along SQL statements which in
turn are used to access, view, and modify data. The DB server then passes back the requested data which is then
formatted by the web server for the user.
Although this appears to be a three-tier system because of the number of machines required to complete the
process, it is not. The web-server does not normally house any of the business rules and therefore should be
considered part of the client tier in partnership with the web-browser.
Two-tier architectures can prove to be beneficial when we have a relatively small number of users on the system
(100-150) and we desire an increased level of scalability.

Two-Tier Client-Server Architecture

Web-Based, Two-Tier Client-Server Architecture

42
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
N-Tier Client/Server Architectures
Most n-tier database architectures exist in a three-tier configuration. In this architecture the client/server model
expands to include a middle tier (business tier), which is an application server that houses the business logic. This
middle tier relieves the client application(s) and database server of some of their processing duties by translating
client calls into database queries and translating data from the database into client data in return. Consequently,
the client and server never talk directly to one-another.
A variation of the n-tier architecture is the web-based n-tier application. These systems combine the scalability
benefits of n-tier client/server systems with the rich user interface of web-based systems.

Because the middle tier in a three-tier architecture contains the business logic, there is greatly increased scalability
and isolation of the business logic, as well as added flexibility in the choice of database vendors.

Three-Tier Client-Server Architecture

Web-Based, Three-Tier Client Server Architecture

43
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Relational Data Models
A Data model is a conceptual representation of data structures (tables) required for a database and is very
powerful in expressing and communicating the business requirements. A data model visually represents the nature
of data, business rules governing the data, and how it will be organized in the database. A data model is comprised
of two parts logical design and physical design. Data Models are created in either Top Down Approach or Bottom-
Up Approach. In Top-Down Approach, data models are created by understanding and analyzing the business
requirements. In Bottom Up Approach, data models are created from existing databases, which has no data
models. IDEF1X is the common notation used in creating data models since it is more descriptive.
Data model helps functional and technical team in designing the database. Functional team normally refers to one
or more Business Analysts, Business Managers, Smart Management Experts, End Users etc., and Technical teams
refers to one or more programmers, DBAs etc. Data modelers are responsible for designing the data model and
they communicate with functional team to get the business requirements and technical teams to implement the
database.

Data Modeling Concept:

The concept of data modeling can be better understood if we compare the development cycle of a data model to
the construction of a house. For example Company ABC is planning to build a guest house (database) and it calls the
building architect (data modeler) and projects its building requirements (business requirements). Building architect
(data modeler) develops the plan (data model) and gives it to company ABC. Finally company ABC calls civil
engineers (DBA) to construct the guest house (database).
Data Engineers, Data Modeler and Data Architect are the common titles for those who are involved in data
modeling. To become an efficient data modeler, you should have an overview about the database objects,
constraints, normalization and understanding the requirements correctly.

Fig: A simple logical data model

Fig: A simple physical data model.

44
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Logical and Physical Data Modeling Objects:

In a data model, there is one main subject area which comprises all objects present in all subject areas and other
subject areas based on their processes or business domains. Each subject area contains objects, which are relevant
to that subject area and the subject area is very useful in understanding the data model and to generate reports
and PRINT OUTS based on main subject areas or other subject areas. In a telecommunication data model, there
may be several subject areas like Service Request, Service Order, Ticketing and Main Subject Area. In a Mortgage
data model, there may be several subject areas like borrower, loan, under writing and main subject area. Usually
subject areas are created on main business processes. In Telecommunication (telephone service subscription by
customer), service request is a process to get the request from the customer through phone, email, fax etc. Service
Order is the next process to approve the service request and provide telephone line subscription to customers.
Ticketing is a process by which complaints are gathered from the customer and problems are resolved.
For Example:
Logical Physical
Data Model Type: Logical Data Model Type: Physical
Data Model Objects: Entity Data Model Objects: Table
It is the business presentation of a table present in a It is comprised of rows and columns, which stores data
database. Example: COUNTRY in a database. Example: CNTRY
Data Model Type: Logical Data Model Type: Physical
Data Model Objects: Attribute Data Model Objects: Column
It is the business presentation of a column present in a It is a data item, which stores data for that particular
database. Example: Country Code, Country Name. item. Example: CNTRYCODE, CNTRYNAME.

45
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Constraints:
Database Constraints in DBMS:
Database constraints are restrictions on the contents of the database or on database operations. It is a condition
specified on a database schema that restricts the data to be inserted in an instance of the database.
Constraints in the database provide a way to guarantee that :
✓ the values of individual columns are valid.
✓ in a table, rows have a valid primary key or unique key values.
✓ in a dependent table, rows have valid foreign key values that reference rows in a parent table.

Different Types of constraints in DBMS:

➢ Domain Constraints
➢ Tuple Uniqueness Constraints
➢ Key Constraints
➢ Single Value Constraints
➢ Integrity Rule 1 (Entity Integrity Rule or Constraint)
➢ Integrity Rule 2 (Referential Integrity Rule or Constraint)
➢ General Constraints

➢ Domain Constraints:
A domain is a set of permissible values that can be given to an attribute. So every attribute in a table has a
specific domain. Values to these attributes cannot be assigned outside their domains. Domain Constraints specifies
that what set of values an attribute can take. Value of each attribute X must be an atomic value from the domain of
X. The data type associated with domains include integer, character, string, date, time, currency etc. An attribute
value must be available in the corresponding domain.

➢ Tuple Uniqueness Constraints:

A relation is defined as a set of tuples. All tuples or all rows in a relation must be unique or distinct. Suppose if in a
relation, tuple uniqueness constraint is applied, then all the rows of that table must be unique i.e. it does not
contain the duplicate values.

46
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
➢ Key Constraints:
Keys are attributes or sets of attributes that uniquely identify an entity within its entity set. An Entity set E can have
multiple keys out of which one key will be designated as the primary key. Primary Key must have unique and not
null values in the relational table. In an subclass hierarchy, only the root entity set has a key or primary key and
that primary key must serve as the key for all entities in the hierarchy.

➢ Single Value Constraints:

Single value constraints refers that each attribute of an entity set has a single value. If the value of an attribute is
missing in a tuple, then we call fill it with a “null” value. The null value for a attribute will specify that either the
value is not known or the value is not applicable.

➢ Integrity Rule 1 (Entity Integrity Rule or Constraint):

The Integrity Rule 1 is also called Entity Integrity Rule or Constraint. This rule states that no attribute of primary key
will contain a null value. If a relation have a null value in the primary key attribute, then uniqueness property of the
primary key cannot be maintained.

47
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
➢ Integrity Rule 2 (Referential Integrity Rule or Constraint):
The integrity Rule 2 is also called the Referential Integrity Constraints. This rule states that if a foreign key in Table 1
refers to the Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be available in
Table 2.

Let the table in which the foreign key is defined is Foreign Table or details table i.e. Table 1 in above example and
the table that defines the primary key and is referenced by the foreign key is master table or primary table i.e.
Table 2 in above example. Then the following properties must be hold :
• Records cannot be inserted into a Foreign table if corresponding records in the master table do not exist.
• Records of the master table or Primary Table cannot be deleted or updated if corresponding records in the
detail table actually exist.

➢ General Constraints:
General constraints are the arbitrary constraints that should hold in the database. Domain Constraints, Key
Constraints, Tuple Uniqueness Constraints, Single Value Constraints, Integrity Rule 1 (Entity Integrity) and 2
(Referential Integrity Constraints) are considered to be a fundamental part of the relational data model. However,
sometimes it is necessary to specify more general constraints like the CHECK Constraints or the Range Constraints
etc.
Check constraints can ensure that only specific values are allowed in certain column. For example , if there is a
need to allow only three values for the color like ‘Bakers Chocolate’, ‘Glistening Grey’ and ‘Superior White’, then we
can apply the check constraint. All other values like ‘GREEN’ etc. would yield an error.
Range Constraints is implemented by BETWEEN and NOT BETWEEN. For example, if it is a requirement that student
ages be within 16 to 35, then we can apply the range constraints for it.

48
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Keys:
A key is an attribute or a set of attributes in a relation that identifies a tuple in a relation. The keys are defined in a
table to access or sequence the stored data quickly and smoothly. They are also used to create relationship
between different tables. Keys are very important part of Relational database. They are used to establish and
identify relation between tables. They also ensure that each record within a table can be uniquely identified by
combination of one or more fields within a table.
Types of Keys:
Following are the different types of keys.
• Primary Key
• Composite key
• Candidate key
• Super key
• Alternate Key
• Foreign key
✓ Primary Key
A primary key is a candidate key that is selected by the database designer to identify tuples uniquely in a relation. A
relation may contain many candidate keys. When the designer selects one of them to identify a tuple in the

49
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
relation, it becomes a primary key. It means that if there is only one candidate key, it will be automatically selected
as primary key.

✓ Composite key
Key that consist of two or more attributes that uniquely identify an entity occurrence is called Composite key. But
any attribute that makes up the Composite key is not a simple key in its own. If we apply primary key in both
attribute then it is call composite key.

✓ Candidate key
A candidate key is a super key that contains no extra attribute. It consists of minimum possible attributes. A super
key like {RegistrationNo, Name} contains an extra field Name. It can be used to identify a tuple uniquely in the
relation, But it does not consist of minimum possible attribute as only RegistrationNo can be used to identify a
tuple in a relation. It means that {RegistrationNo, Name} is a super key but it is not a candidate key because it
contains an extra field. On the other hand, RegistrationNo is a super key as well as candidate key.

50
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
✓ Super key:
A super key is an attribute or combination of attributes in a relation that identifies a tuple uniquely within the
relation. A super key is the most general type of key. For example, in a relation STUDENT consists of different
attributes like RegistrationNo, Name, FatherName, Class and Address The only attribute that can uniquely identify
a tuple in a relation is RegistrationNo. The Name attribute cannot identify a tuple because two or more students
may have the same Name. Similarly FaththeName, Class and Address cannot be used to identify a tuple. It means
that RegistrationNo is the super key for the relation. Any combination of attributes with the super key is also a
super key. it means any attribute or set of attributes combined with the super key Registrationno will also become
a super key. A combination of two attributes {RegistrationNo, Name} is also a super key. This combination can also
be used to identify a tuple in a relation. Similarly {RegistrationNo, Class} or {RegistrationNo, Name, Class} are super
keys.

51
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
✓ Alternate Key
The candidate keys that are not selected as primary key are known as alternate keys. Suppose STUDENT relation
contains different attributes such as RegNo, RollNo, Name and Class. The attributes RegNo and RollNo can be used
to identify each student in the table. If Regno is selected as primary key then RollNo attribute is known as alternate
key.

✓ Foreign key
A foreign key is an attribute or set of attributes in a relation whose values match a primary key in another relation.
The relation in which foreign key is created is known as Dependent Table or Child Table. The relation to which the
foreign key refers is known as Parent Table. The key connects to another relation when a relationship is established
between two relations. A relation may contain more than one foreign keys.
A foreign key is generally a primary key from one table that appears as a field in another where the first table has a
relationship to the second. In other words, if we had a table A with a primary key X that linked to a table B where X
was a field in B, then X would be a foreign key in B.
An example might be a student table that contains the course_id the student is attending. Another table lists the
courses on offer with course_id being the primary key. The 2 tables are linked through course_id and as such
course_id would be a foreign key in the student table.

52
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Prime and non-prime attributes
Attributes which are parts of any candidate key of relation are called as prime attribute, others are non-prime
attributes. For Example, STUD_NO in STUDENT relation is prime attribute, others are non-prime attribute.
Closure Of Functional Dependency
The Closure Of Functional Dependency means the complete set of all possible attributes that can be functionally
derived from given functional dependency using the inference rules known as Armstrong’s Rules.
If “F” is a functional dependency then closure of functional dependency can be denoted using “{F}+”.
There are three steps to calculate closure of functional dependency. These are:
Step-1 : Add the attributes which are present on Left Hand Side in the original functional dependency.
Step-2 : Now, add the attributes present on the Right Hand Side of the functional dependency.
Step-3 : With the help of attributes present on Right Hand Side, check the other attributes that can be derived from
the other given functional dependencies. Repeat this process until all the possible attributes which can be derived
are added in the closure.
The Algorithm
✓ The procedure shown in the previous example can be generalized to an algorithm. Assume we are given
the set of functional dependencies FD and a set of attributes X. The algorithm is as follows:
✓ Add the attributes contained in the attribute set X to the result set X+.
✓ Add the attributes to the result set X+ which can be functionally determined from the attributes already
contained in the result set.
✓ Repeat step 2 until no more attributes can be added to the result set X+.
Example 1
We are given the relation R(A, B, C, D, E). This means that the table R has five columns: A, B, C, D, and E. We
are also given the set of functional dependencies: {A->B, B->C, C->D, D->E}. What is {A}+?
• First, we add A to {A}+.
• What columns can be determined given A? We have A -> B, so we can determine B. Therefore, {A}+ is now
{A, B}.
• What columns can be determined given A and B? We have B -> C in the functional dependencies, so we can
determine C. Therefore, {A}+ is now {A, B, C}.
• Now, we have A, B, and C. What other columns can we determine? Well, we have C -> D, so we can add D
to {A}+.
• Now, we have A, B, C, and D. Can we add anything else to it? Yes, since D -> E, we can add E to {A}+.
• We have used all of the columns in R and we have all used all functional dependencies. {A}+ = {A, B, C, D, E}.
Example 2
Let’s look at another example. We are given R(A, B, C, D, E, F). The functional dependencies are {AB->C, BC->AD, D-
>E, CF->B}. What is {A, B}+?
• We start with {A, B}.
• What columns can we determine, given A and B? We have AB -> C, so we can add C to {A, B}+.
• We now have A, B, and C. What other columns can we determine? We have BC -> AD. We already have A in
{A, B}+, so we can add D.
• So, we now have A, B, C, and D. What else can we add? We have D -> E, so we can add E to {A, B}+.
• Now {A, B}+ is {A, B, C, D, E}. Can we add anything else? No. We have one more functional dependency in
our set that we did not use: CF -> B. We can’t use this dependency because F is not in {A, B}+.
• Thus, {A, B}+ is {A, B, C, D, E}.

53
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Find Super and Candidate Key from Attribute Closure

Find Super Key using Attribute Closure:

We can find the super key from attribute closure. If we will get total relational (all attributes) by closer of one
attribute then this attribute is called the super key.

Example:

So, attribute A is the super key of this relation (A → Super Key)

Find candidate key from attribute closure:

A candidate key is a single or group of multiple keys that uniquely identify rows in a table. A candidate key
is also called a super key without redundancy. A candidate key is not reducible further. The minimum set
of attributes is used to uniquely differentiate the record of the table that you called a candidate key.
• A candidate key can never be NULL or EMPTY and its value should be unique.
• There can be more than one candidate key to a table
• A candidate key can be a combination of more than one attribute.
Example
Let Q = (A, B, C, D, E) be a relation scheme with the following dependencies-
AB → C
C→D
B→E
Find the candidate keys.
We will check if AB can determine all remaining attributes. To check, we find the closure of AB.
So, we have
{ AB }+
={A,B}
= { A , B , C } (Since AB → C )
= { A , B , C , D } (Since C → D )
= { A , B , C , D , E } (Since B → E )
We conclude that AB can determine all of the attributes of the provided relation.
As a result, AB is the candidate key for the relation.

54
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Design issues
Database design is an area that is frequently overlooked when performance tuning and optimization are
considered. In fact, when a database is small, poor design might not cause problems. However, as the database
grows, so does the number of problems instigated by poor logical and physical design.
Designing a database requires an understanding of both the business functions you want to model and the
database concepts and features used to represent those business functions. As a developer, you do not have to do
this on your own. There are people and resources within UCS who are willing and able to assist you with designing
the database and its core elements. It is important to accurately design a database to model the business because
it can be time consuming to change the design of a database significantly once implemented. A well-designed
database also performs better. When designing a database, consider:
o The purpose of the database and how it affects the design. Create a database plan to fit your purpose.
o Database normalization rules that prevent mistakes in the database design
o Protection of your data integrity
o Security requirements of the database and user permissions
o Performance needs of the application

❖ Creating a database plan

The first step in creating a database is creating a plan that serves both as a guide to be used when implementing
the database and as a functional specification for the database after it has been implemented. The complexity and
detail of a database design is dictated by the complexity and size of the database application as well as the user
population.
The nature and complexity of a database application, as well as the process of planning it, can vary greatly. A
database can be relatively simple and designed for use by a single person, or it can be large and complex and
designed, for example, to handle upwards of half a billion rows similar to those designed for the Web server logs
used by the Site Tracker application.
Before creating a database, you must have a good understanding of the function the database is expected to
perform. If the database is to replace a paper-based or manually performed information system, the existing
system will give you most of the information you need. It is important to gather all available information to find out
how users interact with the data and what they need from the database. It is also important to identify what they
want the new system to do, as well as to identify the problems, limitations, and bottlenecks of any existing system.
One of the strengths of a relational database is the ability to relate or associate information about various items in
the database. Isolated types of information can be stored separately, but the database engine can combine data
when necessary. Identifying the relationships between objects in the design process requires looking at the tables,
determining how they are logically related, and adding relational columns that establish a link from one table to
another.
❖ Database creation/modification steps
On any new system the DBA should be brought into the project around the time that the requirements are
finalized. The database design steps should follow the pattern below:
o The project leader will schedule meetings with the DBA to discuss requirements, database layout, database
schema and any other necessary information.
o Once the DB schema has been verbally agreed upon, the DBA will create the physical DB and give the
developer access to it.
o The developer will then take the plan and develop the database per the previous meetings, requirements,
and discussions.
o Meet with the DBA to review, and finalize, the design to ensure that it meets all requirements.
55
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
o Once the DBA has given approval that the schema is accurate and complete, the database is locked and
no new schema changes will be made by anyone other than the DBA.

❖ Database changes
Any changes to a database or permissions need to be requested by sending a database change request to the DBA.
The DBA will be the primary person who can move new database changes to the production servers. In the event
that the DBA is unavailable the group leaders will have access to production as well.

❖ Normalization
The logical design of the database, including the tables and the relationships between them, is the core of an
optimized relational database. A good logical database design can lay the foundation for optimal database and
application performance. A poor logical database design can impair the performance of the entire system.
Normalizing a logical database design involves using formal methods to separate the data into multiple, related
tables. A greater number of narrow tables (with fewer columns) is characteristic of a normalized database. A few
wide tables (with more columns) are characteristics of an non-normalized database.
In relational-database design theory, normalization rules identify certain attributes that must be present or absent
in a well-designed database. However, there are a few rules that can help you achieve a sound database design:
A table should have a numeric or unique identifier (GUID) primary key
A table should store only data for a single type of entity (all fields should relate directly to the key)
For example: Avoid storing information about a student and his/her test scores in the same table
A table should avoid null able columns
A table should use default values where appropriate
A table should not have repeating values or columns
For example: TEST_SCORE_1, TEST_SCORE_2 and so on
As normalization increases, so do the number and complexity of joins required to retrieve data. Too many complex
relational joins between too many tables can hinder performance. Reasonable normalization often includes few
regularly executed queries that use joins involving more than four tables.

❖ Data integrity
Enforcing data integrity ensures the quality of the data in the database. One of the more common forms of data
integrity is referential integrity.
Referential integrity preserves the defined relationships between tables when records are entered or deleted. In
SQL, referential integrity is based on relationships between foreign keys and primary keys or between foreign keys
and unique keys. Referential integrity ensures that key values are consistent across tables. Such consistency
requires that there be no references to nonexistent values and that if a key value changes, all references to it
change consistently throughout the database.

❖ Data security
One of the functions of a database is to protect the data by preventing certain users from seeing or changing highly
sensitive data and preventing all users from making costly mistakes. For this reason, each application will use a
separate user with specific permissions to only the data needed to provide a successful implementation.

56
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The Entity Relationship Database Model

In DBMS, an entity–relationship model (ER model) is a data model for describing the data or information aspects of
a business domain or its process requirements, in an abstract way that lends itself to ultimately being implemented
in a database such as a relational database. The main components of ER models are entities (things) and the
relationships that can exist among them.
An entity–relationship model is the result of using a systematic process to describe and define a subject area of
business data. It does not define business process; only visualize business data. The data is represented as
components (entities) that are linked with each other by relationships that express the dependencies and
requirements between them, such as: one building may be divided into zero or more apartments, but one
apartment can only be located in one building. Entities may have various properties (attributes) that characterize
them. Diagrams created to represent these entities, attributes, and relationships graphically are called entity–
relationship diagrams.

57
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The three schema approach to DBMS Designer uses three levels of ER models that may be developed.
✓ Conceptual data model
This is the highest level ER model in that it contains the least granular detail but establishes the overall scope of
what is to be included within the model set. The conceptual ER model normally defines master reference data
entities that are commonly used by the organization. Developing an enterprise-wide conceptual ER model is useful
to support documenting the data architecture for an organization.
✓ Logical data model
A logical ER model does not require a conceptual ER model, especially if the scope of the logical ER model
includes only the development of a distinct information system. The logical ER model contains more detail than the
conceptual ER model. In addition to master data entities, operational and transactional data entities are now
defined. The details of each data entity are developed and the relationships between these data entities are
established. The logical ER model is however developed independent of technology into which it can be
implemented.
✓ Physical data model
One or more physical ER models may be developed from each logical ER model. The physical ER model is
normally developed to be instantiated as a database. Therefore, each physical ER model must contain enough
detail to produce a database and each physical ER model is technology dependent since each database
management system is somewhat different.
The physical model is normally instantiated in the structural metadata of a database management system as
relational database objects such as database tables, database indexes such as unique key indexes, and database
constraints such as a foreign key constraint or a commonality constraint. The ER model is also normally used to
design modifications to the relational database objects and to maintain the structural metadata of the database.

Entity Relationship diagram symbols, and the meanings of those symbols:

Symbol Shape Name Symbol Description

Entities

An entity is represented by a
rectangle which contains the entity’s
name. For example, Employee,
Manager, Department etc.

An entity that cannot be uniquely

identified by its attributes alone.
The existence of a weak entity is
dependent upon another entity
called the owner entity. The weak
entity’s identifier is a combination of
the identifier of the owner entity
and the partial key of the weak
entity.

58
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Attributes

An Attribute describes a property or

characteristic of an entity. For
example, Name, Age, Address etc.
can be attributes of a Student. An
attribute is represented using
eclipse.

Key attribute represents the main

characteristic of an Entity. It is used
to represent Primary key. Example,
ID is key attribute.

An attribute that can have many.

Multivalued attribute or Composite
attribute is depicted by a dual oval.
“Name”, ”address” etc. are
multivalued attribute.

An attribute whose value is

calculated (derived) from other
attributes. The derived attribute
may or may not be physically stored
in the database. “Age is derived
from DOB or current date”.

Relationships

A Strong relationship describes

relations between entities.
Relationship is represented using
diamonds.

A Weak relationship where Child

entity is existence-dependent on
parent, and Weak entities have
always a total participating
constraint because they cannot be
identified without an owner entity.

59
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Entities:
Let us first be aware of the question:
• An entity is an object of concern used to represent the things in the real world, e.g., car, table, book, etc.
• An entity need not be a physical entity, it can also represent a concept in real world, e.g., project, loan, etc.
• It represents a class of things, not any one instance, e.g., ‘STUDENT’ entity has instances of ‘Ramesh’ and
‘Mohan’.
Entity Set or Entity Type:
A collection of a similar kind of entities is called an Entity Set or entity type.
Example:
For the COLLEGE database described earlier objects of concern are Students, Faculty, Course and
departments. The collections of all the students’ entities form an entity set STUDENT. Similarly
collection of all the courses form an entity set COURSE. Entity sets need not be disjoint. For example – an entity
may be part of the entity set STUDENT, the entity set FACULTY, and the entity set PERSON.

Weak Entities:
• These tables are existence dependent.
• They cannot exist without entity with which it has a relationship.
• Primary key is derived from the primary key of the parent entity
Example:
The spouse table is a weak entity because its PK is dependent on the employee table. Without a corresponding
employee record, the spouse record could not exist

Entity-Set and Keys

Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.
For example, the roll_number of a student makes him/her identifiable among students.
• Super Key − A set of attributes (one or more) that collectively identifies an entity in an entity set.
• Candidate Key − A minimal super key is called a candidate key. An entity set may have more than one
candidate key.
• Primary Key − A primary key is one of the candidate keys chosen by the database designer to uniquely
identify the entity set.

Attributes:
Let us first answer the question:
What is an attribute?
An attribute is a property used to describe the specific feature of the entity. So to describe an entity
entirely, a set of attributes is used. For example, a student entity may be described by the student’s name, age,
address, course, etc. An entity will have a value for each of its attributes. For example for a particular student the
following values can be assigned:
RollNo: 124
Name: Numa Limbu
Age: 23
Address: Dharan-9, Koshi, Nepal.
Course: B.Sc. (Computer)

Types of attributes
Attributes attached to an entity can be of various types.
Simple
The attribute that cannot be further divided into smaller parts and represents the basic meaning is called a

60
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
simple attribute. For example: The ‘First name’, ‘Last name’, age attributes of a person entity represent a
simple attribute.
Composite
Attributes that can be further divided into smaller units and each individual unit contains a specific
meaning. For example:-The NAME attribute of an employee entity can be sub-divided into First name, Last name
and Middle name.
Single valued
Attributes having a single value for a particular entity. For Example, Age is a single valued attribute of a
student entity.
Multi-valued
Attributes that have more than one values for a particular entity is called a multi-valued attribute.
Different entities may have different number of values for these kinds of attributes. For multi-valued
attributes we must also specify the minimum and maximum number of vales that can be attached. For
Example phone number for a person entity is a multi-valued attribute.
Derived
Attributes that are not stored directly but can be derived from stored attributes are called derived
attributes. For Example, The years of services of a ‘person’ entity can be determined from the current date
and the date of joining of the person. Similarly, total salary of a ‘person’ can be calculated from ‘basic
salary’ attribute of a ‘person’.

Relationships:
Let us first define the term relationships. A relationship can be defined as:
• A connection or set of associations, or
• A rule for communication among entities:
Example: In college the database, the association between student and course entity, i.e., “Student opts
course” is an example of a relationship.
Relationship sets
A relationship set is a set of relationships of the same type. For example, consider the relationship between two entity
sets student and course. Collection of all the instances of relationship opts forms a relationship set called relationship
type.
Degree
The degree of a relationship type is the number of participating entity types.
The relationship between two entities is called binary relationship. A relationship among three entities is
called ternary relationship. Similarly relationship among n entities is called n-ry relationship.

Relationship Cardinality
Cardinality (the number of elements of set) specifies the number of instances of an entity associated with another
entity participating in a relationship. Based on the cardinality binary relationship can be further classified into the
following categories:

One-to-one relationship:
An entity in A is associated with at most one entity in B, and an entity in B is associated
with at most one entity in A.
Example: Relationship between college and principal

61
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
One college can have at the most one principal and one principal can be assigned to only one college.
Similarly we can define the relationship between university and Vice Chancellor.

One-to-many relationship:
An entity in A is associated with any number of entities in B. An entity in B is associated
with at the most one entity in A.
Example: Relationship between department and faculty.

One department can appoint any number of faculty members but a faculty member is assigned to only one
department.

Many-to-one relationship:
An entity in A is associated with at most one entity in B. An entity in B is associated with any number in A.
Example: Relationship between course and instructor. An instructor can teach various courses but a course can be
taught only by one instructor. Please note this is an assumption.

Many-to-many relationship:
Entities in A and B are associated with any number of entities from each other.
Example:
Taught by Relationship between course and faculty. One faculty member can be assigned to teach many courses and
one course may be taught by many faculty members.

Relationship between book and author.

One author can write many books and one book can be written by more than one authors.

62
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Example: University Entity Relationship Cardinality Diagram

63
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Extended E-R Features:
Specialization and Generalization Entity relationship diagram

The ER Model has the power of expressing database entities in a conceptual hierarchical manner. As the hierarchy
goes up, it generalizes the view of entities, and as we go deep in the hierarchy, it gives us the detail of every entity
included.
Going up in this structure is called generalization, where entities are clubbed together to represent a more generalized
view. For example, a particular student named Mira can be generalized along with all the students. The entity shall
be a student, and further, the student is a person. The reverse is called specialization where a person is a student, and
that student is Mira.

Generalization:
As mentioned above, the process of generalizing entities, where the generalized entities contain the properties of all
the generalized entities, is called generalization. In generalization, a number of entities are brought together into one
generalized entity based on their similar characteristics. For example, pigeon, house sparrow, crow and dove can all
be generalized as Birds.

Generalization is a bottom-up approach in which two lower level entities combine to form a higher level entity. In
generalization, the higher level entity can also combine with other lower level entity to make further higher level
entity.

64
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Specialization:
Specialization is the opposite of generalization. In specialization, a group of entities is divided into sub-groups based
on their characteristics. Take a group ‘Person’ for example. A person has name, date of birth, gender, etc. These
properties are common in all persons, human beings. But in a company, persons can be identified as employee,
employer, customer, or vendor, based on what role they play in the company.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on what role they play
in school as entities.

Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity can be broken
down into two lower level entity. In specialization, some higher level entities may not have lower-level entity sets at
all.

65
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Inheritance:

We use all the above features of ER-Model in order to create classes of objects in object-oriented programming. The
details of entities are generally hidden from the user; this process known as abstraction.
Inheritance is an important feature of Generalization and Specialization. It allows lower-level entities to inherit the
attributes of higher-level entities.

For example, the attributes of a Person class such as name, age, and gender can be inherited by lower-level entities
such as Student or Teacher.

Aggregation:
Aggregation is a process when relation between two entity is treated as a single entity. Here the relation between
Center and Course, is acting as an Entity in relation with Visitor.

66
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Specialization and Generalization Entity relationship diagram

67
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Codd's Twelve Rules:
Many references to the twelve rules include a thirteenth rule - or rule zero: A relational database management
system (DBMS) must manage its stored data using only its relational capabilities.
1. Information Rule
The data stored in a database, may it be user data or metadata, must be a value of some table cell. Everything in a
database must be stored in a table format.
2. Guaranteed Access Rule
Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table
name, primary key value, and column name.
3. Systematic Treatment of Null Values
Null values (distinct from empty character string or a string of blank characters and distinct from zero or any other
number) are supported in the fully relational DBMS for representing missing information in a systematic way,
independent of data type.
4. Dynamic Online Catalog Based on the Relational Model
The structure description of the entire database must be stored in an online catalog, known as data dictionary,
which can be accessed by authorized users. Users can use the same query language to access the catalog which
they use to access the database itself.
5. Comprehensive Data Sublanguage Rule
A relational system may support several languages and various modes of terminal use. However, there must be at
least one language whose statements are expressible, per some well-defined syntax, as character strings and
whose ability to support all of the following is comprehensible:
a. data definition
b. view definition
c. data manipulation (interactive and by program)
d. integrity constraints
e. authorization
f. transaction boundaries (begin, commit, and rollback).

6.View Updating Rule

All views that are theoretically updateable are also updateable by the system.
7.High-Level Insert, Update, and Delete
The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval
of data, but also to the insertion, update, and deletion of data.
8.Physical Data Independence
The data stored in a database must be independent of the applications that access the database. Any change in the
physical structure of a database must not have any impact on how the data is being accessed by external
applications.
9.Logical Data Independence
The logical data in a database must be independent of its user’s view (application). Any change in logical data must
not affect the applications using it. For example, if two tables are merged or one is split into two different tables,
there should be no impact or change on the user application. This is one of the most difficult rule to apply.
10.Integrity Independence

68
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A database must be independent of the application that uses it. All its integrity constraints can be independently
modified without the need of any change in the application. This rule makes a database independent of the front-
end application and its interface.
11.Distribution Independence
The data manipulation sublanguage of a relational DBMS must enable application programs and terminal activities
to remain logically unimpaired whether and whenever data are physically centralized or distributed.
12.Nonsubversion Rule
If a relational system has or supports a low-level (single-record-at-a-time) language, that low-level language cannot
be used to subvert or bypass the integrity rules or constraints expressed in the higher-level (multiple-records-at-a-
time) relational language.

Schema Diagram:
In a data model, it is important to distinguish between the description of the database and the database itself. The
description of a database is called the database schema, which is specified during database design and is not
expected to change frequently. Most data models have certain conventions for displaying schemas as diagrams. A
displayed schema is called a schema diagram. Figure shows a schema diagram for the database; the diagram
displays the structure of each record type but not the actual instances of records.

Schema diagram into table:

69
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
70
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input and yields instances of
relations as output. It uses operators to perform queries. An operator can be either unary or binary. They accept
relations as their input and yield relations as their output. Relational algebra is performed recursively on a relation
and intermediate results are also considered relations.
The fundamental operations of relational algebra are as follows −
• Select
• Project
• Union
• Set Difference
• Cartesian product
• Rename
Select Operation (σ) sigma:
The SELECT operator is σ (sigma) symbol
Used as an expression to choose tuples that meet the selection condition…
σ<selection condition>(R)
-> Select operation selects tuples that satisfy a given predicate.
Ex:- find all employees born after 1st Jan 1950:
σ '01/JAN/1950'(employee)
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books published after
2010.

Project Operation PROJECTION(∏ )Pi:

• ∏ (pi) symbol used to choose attributes from a relation.
• This operator shows the list of those attributes that we wish to appear in the result and rest attributes are
eliminated from the table.
∏ <attribute list>(relation)

It projects column(s) that satisfy a given predicate.

Notation: ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)
Selects and projects columns named as subject and author from the relation Books.

71
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Union Operation (∪):
It performs binary union between two given relations and is defined as −
r ∪ s = { t | t ∈ r or t ∈ s}
Notion: r U s
examples R = {1,2,3,4} and S= {3,4,5}
R ∪ S = {1,2,3,4,5}
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −
• r, and s must have the same number of attributes.
• Attribute domains must be compatible.
• Duplicate tuples are automatically eliminated.

More Example…
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an article or both.

Intersection (∩):
72
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• The INTERSECTION operation on a relation A INTERSECTION relation B, is symbolized by R ∩ S, includes
tuples that are only in R and S.
examples, in both R and S.
R = {1,2,3,4} and S= {3,4,5}.
R ∩ S = {3,4}
• RESULT: R ∩ S

More Example…
∏ author (Books) ∩ ∏ author (Articles)

Set Difference (−):

The result of set difference operation is tuples, which are present in one relation but are not in the second relation.
Notation: r – s

More example:
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.
Cartesian Product (Χ):
Combines information of two different relations into one.
• Creates a relation that has all the attributes of R and S, allowing all the attainable combinations of tuples
from R and S in the result. The notation used is X.

73
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
E.g. set of ordered pairs from R and S.
R={1,2}, S={3,4}
R x S== {(1,3), (1,4), (2,3), (2,4)}
Notation: r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'MMC'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by MMC.

Rename Operation (ρ)

The results of relational algebra are also relations but without any name. The rename operation allows us to
rename the output relation. 'rename' operation is denoted with small Greek letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.

Relational Algebra JOIN operation (⋈):

• The JOIN operation is denoted by the R⋈S symbol and is used to compound similar tuples from two
Relations into single longer tuples.
• Join operation is generally the cross product of two relation.

74
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• The notation used is
R JOIN join condition S

Natural join (⋈):

Natural join ( ) is a binary operator that is written as (R S) where R and S are relations. The result of the
natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For
an example consider the tables Employee and Dept and their natural join:

Employee Dept Employee Dept

Name EmpId DeptName DeptName Manager Name EmpId DeptName Manager

Harry 3415 Finance Finance George Harry 3415 Finance George

Sally 2241 Sales Sales Harriet Sally 2241 Sales Harriet

George 3401 Finance George
George 3401 Finance Production Charles
Harriet 2202 Sales Harriet
Harriet 2202 Sales

This can also be used to define composition of relations. For example, the composition of Employee and Dept is
their join as shown above, projected on all but the common attribute DeptName.

where Fun is a predicate that is true for a relation r iff it is also true for relation s. It is usually required that R and S
must have at least one common attribute, but if this constraint is omitted, and R and S have no common attributes,
then the natural join becomes exactly the Cartesian product.

θ-join and equijoin:

Consider tables Car and Boat which list models of cars and boats and their respective prices. Suppose a customer
wants to buy a car and a boat, but she does not want to spend more money for the boat than for the car. The θ-join
(⋈θ) on the relation CarPrice ≥ BoatPrice produces a table with all the possible options. When using a condition
75
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
where the attributes are equal, for example Price, then the condition may be specified as Price=Price or
alternatively (Price) itself.

Car Boat
CarModel CarPrice BoatModel BoatPrice
CarA 20,000 Boat1 10,000 CarModel CarPrice BoatModel BoatPrice

CarB 30,000 Boat2 40,000 CarA 20,000 Boat1 10,000

CarC 50,000 Boat3 60,000 CarB 30,000 Boat1 10,000

CarC 50,000 Boat1 10,000
CarC 50,000 Boat2 40,000

Semijoin (⋉)(⋊):

The left semijoin is joining similar to the natural join and written as R S where R and S are relations. The result of
this semijoin is the set of all tuples in R for which there is a tuple in S that is equal on their common attribute
names. For an example consider the tables Employee and Dept and their semi join:
Employee Dept Employee ⋉ Dept
Name EmpId DeptName DeptName Manager Name EmpId DeptName
Harry 3415 Finance Sales Bob Sally 2241 Sales
Sally 2241 Sales Sales Thomas Harriet 2202 Production
George 3401 Finance Production Katie
Harriet 2202 Production Production Mark

The semijoin can be simulated using the natural join as follows. If a1, ..., an are the attribute names of R, then
R S = a1,..,an(R S).
Since we can simulate the natural join with the basic operators it follows that this also holds for the semijoin.

Antijoin (▷):
The antijoin, written as R S where R and S are relations, is similar to the semijoin, but the result of an antijoin is
only those tuples in R for which there is notuple in S that is equal on their common attribute names.
For an example consider the tables Employee and Dept and their antijoin:

76
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Employee Dept Employee Dept
Name EmpId DeptName DeptName Manager Name EmpId DeptName
Harry 3415 Finance Sales Sally Harry 3415 Finance
Sally 2241 Sales Production Harriet George 3401 Finance
George 3401 Finance
Harriet 2202 Production

Division (÷)
The division is a binary operation that is written as R ÷ S. The result consists of the restrictions of tuples in R to the
attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their
combinations with tuples in S are present in R. For an example see the tables Completed, DBProjectand their
division:
Completed DBProject Completed÷DBProject
Student Task Task Student
Fred Database1 Database1 Fred
Fred Database2 Database2 Sarah
Fred Compiler1
Eugene Database1
Eugene Compiler1
Sarah Database1
Sarah Database2
If DBProject contains all the tasks of the Database project, then the result of the division above contains exactly the
students who have completed both of the tasks in the Database project.

Outer joins
Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two
operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched
tuple in one of the operands by "fill" values for each of the attributes of the other operand.
Three outer join operators are defined: left outer join, right outer join, and full outer join.

Left outer join (⟕):

The left outer join is written as R ⟕ S where R and S are relations. The result of the left outer join is the set of all
combinations of tuples in R and S that are equal on their common attribute names, in addition (loosely speaking) to
tuples in R that have no matching tuples in S.
For an example consider the tables Employee and Dept and their left outer join:

77
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Employee Dept Employee ⟕ Dept
Name EmpId DeptName DeptName Manager Name EmpId DeptName Manager
Harry 3415 Finance Sales Harriet Harry 3415 Finance ω
Sally 2241 Sales Production Charles Sally 2241 Sales Harriet
George 3401 Finance George 3401 Finance ω
Harriet 2202 Sales Harriet 2202 Sales Harriet
Tim 1123 Executive Tim 1123 Executive ω
In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take
a null value, ω. Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting
relation where tuples in Employee have a DeptName of Finance orExecutive.

Right outer join (⟖):

The right outer join behaves almost identically to the left outer join, but the roles of the tables are switched.
The right outer join of relations R and S is written as R ⟖ S. The result of the right outer join is the set of all
combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that
have no matching tuples in R.
For example, consider the tables Employee and Dept and their right outer join:
Employee Dept Employee ⟖ Dept
Name EmpId DeptName DeptName Manager Name EmpId DeptName Manager
Harry 3415 Finance Sales Harriet Sally 2241 Sales Harriet
Sally 2241 Sales Production Charles Harriet 2202 Sales Harriet
George 3401 Finance ω ω Production Charles
Harriet 2202 Sales
Tim 1123 Executive
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take
a null value, ω.
Since there are no tuples in Employee with a DeptName of Production, ωs occur in the Name attribute of the
resulting relation where tuples in DeptName had tuples ofProduction.

Full outer join (⟗):

The outer join or full outer join in effect combines the results of the left and right outer joins.
The full outer join is written as R ⟗ S where R and S are relations. The result of the full outer join is the set of all
combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that
have no matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.

78
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
For an example consider the tables Employee and Dept and their full outer join:

Employee Dept Employee ⟗ Dept

Name EmpId DeptName DeptName Manager Name EmpId DeptName Manager
Harry 3415 Finance Sales Harriet Harry 3415 Finance ω
Sally 2241 Sales Production Charles Sally 2241 Sales Harriet
George 3401 Finance George 3401 Finance ω
Harriet 2202 Sales Harriet 2202 Sales Harriet
Tim 1123 Executive Tim 1123 Executive ω
ω ω Production Charles
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take
a null value, ω. Tuples in S which have no common values in common attribute names with tuples in R also take
a null value, ω.
The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as
follows:
R ⟗ S = (R ⟕ S) (R ⟖ S)
examples:
R = {1,2,3,4} and S= {3,4,5}
R ∪ S = {1,2,3,4,5}
Database Data Integrity
Data integrity contains guidelines for data retention, specifying or guaranteeing the length of time data can be
retained in a particular database. It specifies what can be done with data values when their validity or usefulness
expires. In order to achieve data integrity, these rules are consistently and routinely applied to all data entering the
system, and any relaxation of enforcement could cause errors in the data. Implementing checks on the data as
close as possible to the source of input (such as human data entry), causes less erroneous data to enter the system.
Strict enforcement of data integrity rules causes the error rates to be lower, resulting in time saved troubleshooting
and tracing erroneous data and the errors it causes algorithms.
Data integrity also includes rules defining the relations a piece of data can have, to other pieces of data, such as a
Customer record being allowed to link to purchased Products, but not to unrelated data such as Corporate Assets.
Data integrity often includes checks and correction for invalid data, based on a fixed schema or a predefined set of
rules. An example being textual data entered where a date-time value is required. Rules for data derivation are also
applicable, specifying how a data value is derived based on algorithm, contributors and conditions. It also specifies
the conditions on how the data value could be re-derived.

Types of integrity constraints:

Data integrity is normally enforced in a database system by a series of integrity constraints or rules. Three types of
integrity constraints are an inherent part of the relational data model: entity integrity, referential integrity and
domain integrity:

79
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Entity integrity: Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which
states that every table must have a primary key and that the column or columns chosen to be the primary key
should be unique and not null.

Referential integrity: Referential integrity concerns the concept of a foreign key. The referential integrity rule
states that any foreign-key value can only be in one of two states. The usual state of affairs is that the foreign-key
value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of
the data owner, a foreign-key value can be null. In this case we are explicitly saying that either there is no
relationship between the objects represented in the database or that this relationship is unknown.

For example, consider the 2 tables shown. If you delete row with ID = 1 from tblGender table, then row with ID = 3
from tblPerson table becomes an orphan record. You will not be able to tell the Gender for this row. So, Cascading
referential integrity constraint can be used to define actions SQL Server should take when this happens. By default,
we get an error and the DELETE or UPDATE statement is rolled back.

Domain integrity: Domain integrity specifies that all columns in a relational database must be declared upon a
defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to
be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of
values from which actual values appearing in the columns of a table are drawn.

80
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Triggers:
A database trigger is procedural code that is automatically executed in response to certain events on a particular
table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the
database. For example, when a new record (representing a new worker) is added to the employees table, new
records should also be created in the tables of the taxes, vacations and salaries.

DML Triggers
DML triggers is a special type of stored procedure that automatically takes effect when a data manipulation
language (DML) event takes place that affects the table or view defined in the trigger. DML events include INSERT,
UPDATE, or DELETE statements. DML triggers can be used to enforce business rules and data integrity, query other
tables, and include complex Transact-SQL statements. The trigger and the statement that fires it are treated as a
single transaction, which can be rolled back from within the trigger. If a severe error is detected (for example,
insufficient disk space), the entire transaction automatically rolls back.

81
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Example, DML trigger with a reminder message
The following DML trigger prints a message to the client when anyone tries to add or change data in the Customer
table in the database.
CREATE TRIGGER reminder1
ON Sales.Customer
AFTER INSERT, UPDATE
AS RAISERROR ('Notify Customer Relations', 16, 10);
GO

DDL Triggers
DDL triggers fire in response to a variety of Data Definition Language (DDL) events. These events primarily
correspond to Transact-SQL statements that start with the keywords CREATE, ALTER, DROP, GRANT, DENY, REVOKE
or UPDATE STATISTICS. Certain system stored procedures that perform DDL-like operations can also fire DDL
triggers.
Use DDL triggers when you want to do the following:
✓ Prevent certain changes to your database schema.
✓ Have something occur in the database in response to a change in your database schema.
✓ Record changes or events in the database schema.

In the following example, DDL trigger safety will fire whenever a DROP_TABLE or ALTER_TABLE event occurs in the
database.
CREATE TRIGGER safety
ON DATABASE
FOR DROP_TABLE, ALTER_TABLE
AS
PRINT 'You must disable Trigger "safety" to drop or alter tables!'
ROLLBACK;

Assertion:
An assertion is a database object that uses a check constraint to limit data values you can enter into the database
as a whole. Both assertions and constraints are specified as check conditions that the DBMS can evaluate to either
TRUE or FALSE. However, while a constraint uses a check condition that acts on a single table to limit the values
assigned to columns in that table; the check condition in an assertion involves multiple tables and the data
relationships among them. Because an assertion applies to the database as a whole, you use the CREATE
ASSERTION statement to create an assertion as part of the database definition. (Conversely, since a constraint
applies to only a single table, you apply [define] the constraint when you create the table.)
For example, if you want to prevent investors from withdrawing more than a certain amount of money from your
hedge fund, you could create an assertion using the following SQL statement:

CREATE ASSERTION maximum_withdrawal

CHECK (investor.withdrawal_limit>
SELECT SUM(withdrawals.amount)
FROM withdrawals
WHERE withdrawals.investor_id = investor.ID)

82
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Thus, the syntax used to create an assertion is:
CREATE ASSERTION
Once you add the MAXIMUM_WITHDRAWAL ASSERTION to the database definition, the DBMS will check to make
sure that the assertion remains TRUE each time you execute an SQL statement that modifies either the INVESTOR
or WITHDRAWALS tables. As such, each time the user or application program attempts to execute an INSERT,
UPDATE, or DELETE statement on one of the tables in the assertion's CHECK clause, the DBMS checks the check
condition against the database, including the proposed modification. If the check condition remains TRUE, the
DBMS carries out the modification. If the modification makes the check condition FALSE, the DBMS does not
perform the modification and returns an error code indicating that the statement was unsuccessful due to an
assertion violation.

Relationships within the Relational Database:

A relationship describes association among entities. For example, a relationship exists between customers and
An agent, in that an agent can serve many customers and each customer may be served by only one agent. Data
Models use three types of relationships:
one‐to‐many, many‐to‐many, and one‐to‐one.
Database designers usually use these notations:
1:M or 1..*, M:N or “..” or 1:1 or 1..1 respectively.
• One‐to‐many (1:M or 1..*) relationship:
A painter creates many different paintings, but each painting is only made by one painter. Thus the painter is the
“one” is related to the paintings (the “many”). Therefore “PAINTER paints PAINTINGS” is represented as 1:M.
• Many‐to‐many (M:N or “..”) relationship:
An employee may learn many job skills and each job skill may be learned by many employees. Thus, “EMPLOYEE
Learns SKILL” as M:N.
• One‐to‐one (1:1 or 1..1) relationship:
A retail company’s management structure may require that each store be managed by a single employee. In
turn, each manager can only manage a single store. Therefore, the relationship “EMPLOYEE manages STORE”
is labeled 1:1
• The 1:M relationship is the relational modeling ideal. Therefore, this relationship type should be the norm in
Any relationship database design.
• The 1:1 relationship should be rare in any relational database design
• M:N relationships cannot be implemented as such in the relational model, they have to be changed into two
1:M relationships.
The 1:M Relationship:
The 1:M relationship is the norm for relational databases. For example, one painter usually has many paintings.
Each painting was painted by only one painter, but a painter could have many paintings.
The 1:1 Relationship:
One entity in a 1:1 relationship can only be related to one other entity and vice versa. For example, every
department has only one Chair and each chair can only head one department. 1:1 relationships should be rare.
The M:N Relationship:
A many to many (M:N) relationship is not supported directly in the relational environment. They are usually
implemented by creating a new entity in 1:M relationships with the original entities.

83
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Example: Each CLASS is taken by many students, and each STUDENT can take many CLASSes.
There may be many rows in the CLASS table for any given row in the STUDENT table. Additionally, there can be
many rows in the STUDENT table for any given row in the CLASS table. M:N relations create a lot of redundancy, in
that the same tuple occurs many times in a given table, so tuples and their attributes are repeated many times,
occupying space and leading to errors and efficiency problems.
Bridge Entity:
The problem inherent to the many‐to‐many relationship can be avoided by creating composite entity, also called
bridge entity or associative entity. Such tables are used to link the tables that were originally related in an M:N
relationship.
The composite entity structure includes –as foreign keys ‐ at least the primary keys of the tables that are to be
linked. The designer has two options when defining a composite table primary key: use a combination of those
foreign keys or create a new primary key.

84
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Distributed Database Management Systems
When an organization is geographically dispersed, it may choose to store its databases on a central database server
or to distribute them to local servers (or a combination of both). A distributed database is a single logical database
that is spread physically across computers in multiple locations that are connected by a data communications
network. We emphasize that a distributed database is truly a database, not a loose collection of files.

The distributed database is still centrally administered as a corporate resource while providing local flexibility and
customization. The network must allow the users to share the data; thus, a user (or program) at location A must be
able to access (and perhaps update) data at location B. The sites of a distributed system may be spread over a large
area (e.g., country or the world) or over a small area (e.g., a building or campus). The computers may range from
PCs to large-scale servers or even supercomputers. A distributed database requires multiple instances of a database

85
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
management system (or several DBMSs), running at each remote site. The degree to which these different DBMS
instances cooperate, or work in partnership, and whether there is a master site that coordinates requests involving
data from multiple sites distinguish different types of distributed database environments. It is important to
distinguish between distributed and decentralized databases. A decentralized database is also stored on computers
at multiple locations; however, the computers are not interconnected by network and database software that
make the data appear to be in one logical database. Thus, users at the various sites cannot share data. A
decentralized database is best regarded as a collection of independent databases, rather than having the
geographical distribution of a single database.

Various business conditions encourage the use of distributed databases:

• Distribution and autonomy of business units Divisions, departments, and facilities in modern organizations are
often geographically distributed, often across national boundaries. Often each unit has the authority to create its
own information systems, and often these units want local data over which they can have control. Business
mergers and acquisitions often create this environment.
• Data sharing Even moderately complex business decisions require sharing data across business units, so it must
be convenient to consolidate data across local databases on demand.
• Data communications costs and reliability The cost to ship large quantities of data across a communications
network or to handle a large volume of transactions from remote sources can still be high, even if data
communication costs have decreased substantially recently. It is in many cases more economical to locate data and
applications close to where they are needed. Also, dependence on data communications always involves an
element of risk, so keeping local copies or fragments of data can be a reliable way to support the need for rapid
access to data across the organization.
• Multiple application vendor environment Today, many organizations purchase packaged application software
from several different vendors. Each “best in breed” package is designed to work with its own database, and
possibly
with different database management systems. A distributed database can possibly be defined to provide
functionality that cuts across the separate applications.
• Database recovery Replicating data on separate computers is one strategy for ensuring that a damaged database
can be quickly recovered and users can have access to data while the primary site is being restored. Replicating
data across multiple computer sites is one natural form of a distributed database.

Distributed database transparency features

DDBMS transparency features have the common property of allowing the end user to feel like the database's only
user . In other words, the user believes that (s)he is working with centralized DBMS; all complexities of a distributed
database are hidden ,or transparent to the user.
The DDBMS transparency features are:
Distribution transparency ,which allows a distributed database to be treated as a single logical database. If a
DDBMS exhibits distribution transparency , the user does not need to know:
- That the data are partitioned _ meaning the table's rows and columns are split vertically or horizontally and
stored among multiple sites.
- That the data can be replicated at several sites.
- The data location.
• Transaction transparency, which allows a transaction to update data at more than one network site
Transaction transparency ensures that the transaction will be either entirely completed or aborted. Thus,
maintaining database integrity.
86
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• Failure transparency. which ensures that the system will continue to operate in the event of a node failure.
Functions that were lost because of the failure will be picked up by another network node.
• Performance transparency. which allows the system to perform as if it were a centralized DBMS. The system
will not suffer any performance degradation due to its use on a network or due to the network-: platform
differences. Performance transparency also ensures that the system will find the most cost-effective path to access
remote data.
• Heterogeneity transparency. Which allows the integration of several different local DBMSs (relational. network,
and hierarchical) under a common, or global, schema. The DDBMS is responsible for translating the data requests
from the global schema to the local DBMS schema.

TYPES OF DISTRIBUTED DATABASES:

I. Homogeneous The same DBMS is used at each node.
A. Autonomous Each DBMS works independently, passing messages back and forth to share data updates.
B. Nonautonomous A central, or master, DBMS coordinates database access and updates across the nodes.

II. Heterogeneous Potentially different DBMSs are used at each node.

A. Systems Supports some or all of the functionality of one logical database.
1. Full DBMS functionality Supports all of the functionality of a distributed database.
2. Partial-multidatabase Supports some features of a distributed database.
a. Federated Supports local databases for unique data requests.
i. Loose integration Many schemas exist, for each local database, and each local DBMS must communicate with all local
schemas.
ii. Tight integration One global schema exists that defines all the data across all local databases.
b. Unfederated Requires all access to go through a central coordinating module.
B. Gateways Simple paths are created to other databases, without the benefits of one logical database.

87
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A homogeneous distributed database environment.

This environment is typically defined by the following :

• Data are distributed across all the nodes.

88
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• The same DBMS is used at each location.
• All data are managed by the distributed DBMS.

Heterogeneous distributed database environment:

• All users access the database through one global schema or database definition.
• The global schema is simply the union of all the local database schemas.
It is difficult in most organizations to force a homogeneous environment, yet heterogeneous environments are
much more difficult to manage. A heterogeneous environment will be defined by the following characteristics
• Data are distributed across all the nodes.
• Different DBMSs may be used at each node.
• Some users require only local access to databases, which can be accomplished by using only the local DBMS and
schema.
• A global schema exists, which allows local users to access remote data.

DISTRIBUTED DATABASE ARCHITECTURE

Client-Server:
• Client connects directly to specific server(s) and access only their data
• Direct queries only

Collaborative Servers:
• Servers can serve queries or be clients and query other servers
• Support indirect queries

Peer-to-Peer Architecture:
• Scalability and flexibility in growing and shrinking
• All nodes have the same role and functionality
• Harder to manage because all machines are autonomous and loosely coupled

89
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Objectives and Trade-offs:
A major objective of distributed databases is to provide ease of access to data for users at many different locations.
To meet this objective, the distributed database system must provide location transparency, which means that a
user (or user program) using data for querying or updating need not know the location of the data. Any request to
retrieve or update data from any site is automatically forwarded by the system to the site or sites related to the
processing request. Ideally, the user is unaware of the distribution of data, and all data in the network appear as a
single logical database stored at one site. In this ideal case, a single query can join data from tables in multiple sites
as if the data were all in one site.
A second objective of distributed databases is local autonomy, which is the capability to administer a local database
and to operate independently when connections to other nodes have failed. With local autonomy, each site has the
capability to control local data, administer security, and log transactions and recover when local failures occur and
to provide full access to local data to local users when any central or coordinating site cannot operate. In this case,
data are locally owned and managed, even though they are accessible from remote sites. This implies that there is
no reliance on a central site.

A significant trade-off in designing a distributed database environment is whether to use synchronous or

asynchronous distributed technology.
With synchronous distributed database technology, all data across the network are continuously kept up- to-date
so that a user at any site can access data anywhere on the network at any time and get the same answer. With
synchronous technology, if any copy of a data item is updated anywhere on the network, the same update is
immediately applied to all other copies or it is aborted. Synchronous technology ensures data integrity and
minimizes the complexity of knowing where the most recent copy of data is located. Synchronous technology can
result in unsatisfactorily slow response time because the distributed DBMS is spending considerable time checking
that an update is accurately and completely propagated across the network.

90
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
asynchronous distributed database technology keeps copies of replicated data at different nodes so that local
servers can access data without reaching out across the network. With asynchronous technology, there is usually
some delay in propagating data updates across the remote databases, so some degree of at least temporary
inconsistency is tolerated. Asynchronous technology tends to have acceptable response time because updates
happen locally and data replicas are synchronized in batches and pre-determined intervals, but it may be more
complex to plan and design to ensure exactly the right level of data integrity and consistency across the nodes.

Compared with centralized databases, either form of a distributed database has numerous advantages. The
following are the most important of them:
• Increased reliability and availability When a centralized system fails, the database is unavailable to all users. A
distributed system will continue to function at some reduced level, however, even when a component fails. The
reliability and availability will depend (among other things) on the way the data are distributed (discussed in the
following sections).
• Local control Distributing the data encourages local groups to exercise greater control over “their” data, which
promotes improved data integrity and administration. At the same time, users can access nonlocal data when
necessary. Hardware can be chosen for the local site to match the local, not global, data processing work.
• Modular growth Suppose that an organization expands to a new location or adds a new workgroup. It is often
easier and more economical to add a local computer and its associated data to the distributed network than to
expand a large central computer. Also, there is less chance of disruption to existing users than is the case when a
central computer system is modified or expanded.
• Lower communication costs With a distributed system, data can be located closer to their point of use. This can
reduce communication costs, compared with a central system.
• Faster response Depending on the way data are distributed, most requests for data by users at a particular site
can be satisfied by data stored at that site. This speeds up query processing since communication and central
computer delays are minimized. It may also be possible to split complex queries into subqueries that can be
processed in parallel at several sites, providing even faster response.

A distributed database system also faces certain costs and disadvantages:

• Software cost and complexity More complex software (especially the DBMS) is required for a distributed
database environment.

91
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• Processing overhead The various sites must exchange messages and perform additional calculations to ensure
proper coordination among data at the different sites.
• Data integrity A by-product of the increased complexity and need for coordination is the additional exposure to
improper updating and other problems of data integrity.
• Slow response If the data are not distributed properly according to their usage, or if queries are not formulated
correctly, response to requests for data can be extremely slow.

Data replication
A popular option for data distribution as well as for fault tolerance of a database is to store a separate copy of the
database at each of two or more sites. Replication may allow an IS organization to move a database off a
centralized mainframe onto less expensive departmental or location-specific servers, close to end users. Replication
may use either synchronous or asynchronous distributed database technologies, although asynchronous
technologies are more typical in a replicated environment. If a copy is stored at every site, we have the case of full
replication, which may be impractical except for only relatively small databases. However, as disk storage and
network technology costs have decreased, full data replication, or mirror images, have become more common,
especially for “always on” services, such as electronic commerce and search engines.

There are five advantages to data replication:

1. Reliability If one of the sites containing the relation (or database) fails, a copy can always be found at another site
without network traffic delays. Also, available copies can all be updated as soon as transactions occur, and
unavailable nodes will be updated once they return to service.
2. Fast response Each site that has a full copy can process queries locally, so queries can be processed rapidly.
3. Possible avoidance of complicated distributed transaction integrity routines Replicated databases are usually
refreshed at scheduled intervals, so most forms of replication are used when some relaxing of synchronization
across database copies is acceptable.
4. Node decoupling Each transaction may proceed without coordination across the network. Thus, if nodes are
down, busy, or disconnected (e.g., in the case of mobile personal computers), a transaction is handled when the
user desires. In the place of real-time synchronization of updates, a behind-the-scenes process coordinates all data
copies.
5. Reduced network traffic at prime time Often updating data happens during prime business hours, when network
traffic is highest and the demands for rapid response greatest. Replication, with delayed updating of copies of data,
moves network traffic for sending updates to other nodes to non-prime-time hours.

92
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Snapshot replication:
snapshot replication Different schemes exist for updating data copies. Some applications, such as those for decision
support and data warehousing or mining, which often do not require current data, are supported by simple table
copying or periodic snapshots. This might work as follows, assuming that multiple sites are updating the same data.
First, updates from all replicated sites are periodically collected at a master, or primary, site, where all the updates
are made to form a consolidated record of all changes. With some distributed DBMSs, this list of changes is
collected in a snapshot log, which is a table of row identifiers for the records to go into the snapshot. Then a read-
only snapshot of the replicated portion of the database is taken at the master site. Finally, the snapshot is sent to
each site where there is a copy. (It is often said that these other sites “subscribe” to the data owned at the primary
site.) This is called a full refresh of the database. Alternatively, only those pages that have changed since the last

snapshot can be sent, which is called a differential, or incremental, refresh. In this case, a snapshot log for each
replicated table is joined with the associated base to form the set of changed rows to be sent to the replicated
sites.

Components of Distributed Database system

The DDBMS must include the following components:
• Computer Workstations that form the network system. The distributed database system must be independent of
the computer system hardware
• Network hardware and software components that reside in each workstation. The network
components allow all sites to interact and exchange data. Network system independence is
a desirable distributed database system attribute.
• Communication media that carry the data from one workstation to another. The DDBMS must
be communications media independent; that is, it must be able to support several types of
communications media.
• The Transaction Processor(TP) which is the software component found each computer that requests data. The
transaction processor receives and processes the application’s data requests (remote and local). The TP is also
known as the application processor(AP) or the transaction manager(TM).

• The data processor(DP), which is the software component residing on each computer that stores and retrieves
data located at the site. The DP is also known as the data manager (DM). A data processor may even be a
centralized DBMS.

93
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Distributed Database Design
There are in general several design alternatives.
Top-down approach:
first the general concepts, the global framework are defined, after then the details.
Down-top approach:
first the detail modules are defined, after then the global framework. If the system is built up from a scratch, the
top-down method is more accepted. If the system should match to existing systems or some modules are yet
ready, the down-top method is usually used.
General design steps
according to the structure:
- analysis of the external, application requirements
- design of the global schema
- design of the fragmentation
- design of the distribution schema
- design of the local schemes
- design of the local physical layers
DDBMS -specific design steps:
- design of the fragmentation
- design of the distribution schema
During the requirement analysis phase, also the fragmentation and distribution requirements are considered.

Top-down Database Design:

94
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Down-Top Design:
Usually existing and heterogeneous databases are integrated into a common distributed system.
Steps of integration:-
Common data model selection As the different component databases may have different data models and the
DDBMS should base on a single, common data model, the first step is to convert the different models into a
common model. The common model is usually an intermediate model, different from the data model of the
components
- Translation of each local schema into the common model The different schema element descriptions should be
converted into this common model.
- Integration of the local schema into a common global schema Beside the collection of the component
descriptions, the integration should deal with the matching of the different semantic elements and with the
resolving of the different types of inconsistency.
- Design the translation between the global and local schemes To access all of the components on a homogeneous
way, a conversion procedure should be applied.

Goals of the Fragmentation and Distribution Design:

Local processing
It is desirable to perform as much tasks as possible at the local level, i.e. without any access to the other sites. The
local processing provides an easier management and a more efficient execution. Although a complete locality is an
aim from the performance point of view, it cannot be realized due to the distribution requirements of the system.

Availability and reliability of the DDB

It is desirable to distribute the data over different sites to provide higher availability. Thus, in the case of site
failures a site can replace the other, no service or functionality will be lost. The replication, distribution is a useful
method to perform a recovery if the data on some site would be destroyed.

Distribution of processing load

It is desirable to distribute the processing power over the different sites to provide a higher throughput. The
different processing steps of a complex task will be distributed among several sites enabling a parallel processing
too.
Storage cost reduction
It may be cost effective if not every site is equipped with the same high performance and costly elements. It is
enough to install only some of such specialized sites, the others can use it in a shared way. We should find the
trade-off of these requirements.

Fragmentation (horizontal and vertical fragmentation)

The purpose of this phase is to determine the non-overlapping pieces, fragments of the global database which can
be stored as a unit on different sites. The data elements having the same properties, behavior are assigned to the
same fragment.

95
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Main aspects of the fragmentation:
Granularity:
The granularity determines at which level of database storage can be the fragmentation performed. If it is too low
(field) then it needs a lot of management cost. If it is too rough (user level) then the unnecessary elements should
be replicated causing a higher cost.
Fragmentation strategy
- horizontal fragmentation:. In this case the granularity is at the tuple level, and the attribute values of the
tuple determine the corresponding fragment. The relation is partitioned horizontally into fragments.
- vertical fragmentation: the assignment of the data elements in a table is based on the schema position of the
data. In this case the different projections are the content of the fragments.
- mixed fragmentation: the data are fragmented by both the vertical as the horizontal method.

Horizontal Fragmentation
The relation is partitioned horizontally into fragments.
Primary fragmentation:
the assignment of the tuple depends on the attribute values of the tuple Derived fragmentation:
the assignment of the tuple depends not on the attributes of this tuple, but on the attributes of another
tuple(s). The fragmentation is described by an expression which value for every tuple determines the
corresponding fragment.

Vertical Fragmentation
The assignment of the data elements in a table is based on the attribute identifier of the data. In this case the
different projections are the content of the fragments.
96
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Fragmentation rule:
Every attribute must belong to at least one fragment, and every fragment must have a tuple identifier.
Vertical partitioning:
every attribute is contained in only one fragment.
Vertical clustering:
an attribute may be contained in more than one fragments. The vertical clustering causes replication of some data
elements. The replication is more advantageous for the read-only applications than for the read-write application.
In the latter case, the same update operation is performed on several sites.
Identification of the fragmentation:
The fragmentation of an R relation schema into R1 and R2 is only then advisable when there are different
applications that use either R1 or R2 but not both.

97
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Security
Database management systems are increasingly being used to store information about all aspects of an enterprise.
The data stored in a DBMS is often vital to the business interests of the organization and is regarded as a corporate
asset. In addition to protecting the intrinsic value of the data, corporations must consider ways to ensure privacy
and to control access to data that must not be revealed to certain groups of users for various reasons.

There are three main objectives to consider while designing a secure database application:
1. Secrecy: Information should not be disclosed to unauthorized users. For example,
a student should not be allowed to examine other students' grades.
2. Integrity: Only authorized users should be allowed to modify data. For example, students may be allowed to see
their grades, yet not allowed to modify them.
3. Availability: Authorized users should not be denied access. For example, an instructor who wishes to change a
grade should be allowed to do so.

Types of Security
Database security is a broad area that addresses many issues, including the following:
■ Various legal and ethical issues regarding the right to access certain information—for example, some

information may be deemed to be private and cannot be accessed legally by unauthorized organizations or
persons. In the
United States, there are numerous laws governing privacy of information.
■ Policy issues at the governmental, institutional, or corporate level regarding what kinds of information

should not be made publicly available—for example, credit ratings and personal medical records.
■ System-related issues such as the system levels at which various security functions should be enforced—for

example, whether a security function should be handled at the physical hardware level, the operating system
level, or the DBMS level.
■ The need in some organizations to identify multiple security levels and to categorize the data and users based
on these classifications—for example, top secret, secret, confidential, and unclassified. The security policy of
the
organization with respect to permitting access to various classifications of data must be enforced.

Threats to Databases.
Threats to databases can result in the loss or degradation of some or all of the following commonly accepted
security goals: integrity, availability, and confidentiality.
■ Loss of integrity. Database integrity refers to the requirement that information be protected from improper
modification. Modification of data includes creating, inserting, and updating data; changing the status of data;
and deleting data. Integrity is lost if unauthorized changes are made to the data by either intentional or
accidental acts. If the loss of system or data integrity is not corrected, continued use of the contaminated
system or corrupted data
could result in inaccuracy, fraud, or erroneous decisions.
■ Loss of availability. Database availability refers to making objects available to a human user or a program
who/which has a legitimate right to those data objects. Loss of availability occurs when the user or program
cannot access these objects.
■ Loss of confidentiality. Database confidentiality refers to the protection of data from unauthorized
disclosure. The impact of unauthorized disclosure of confidential information can range from violation of the

98
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data Privacy Act. Unauthorized, unanticipated, or unintentional disclosure could result in loss of public
confidence, embarrassment, or legal action against the organization.

Database Security: Not an Isolated Concern.

When considering the threats facing databases, it is important to remember that the database management
system alone cannot be responsible for maintaining the confidentiality, integrity, and availability of the data.
Rather, the database works as part of a network of services, including applications, Web servers, firewalls, SSL
terminators, and security monitoring systems. Because security of an overall system is only as strong as its
weakest link, a database may be compromised even if it would have been perfectly secure on its own merits.
To protect databases against the threats discussed above, it is common to implement four kinds of control
measures:
✓ Access control
✓ Inference control
✓ Flow control
✓ Encryption
In a multiuser database system, the DBMS must provide techniques to enable certain users or user groups to
access selected portions of a database without gaining access to the rest of the database. This is particularly
important when a large integrated database is to be used by many different users within the same
organization. For example, sensitive information such as employee salaries or performance reviews should be
kept confidential from most of the database system’s users. A DBMS typically includes a database security
and authorization subsystem that is responsible for ensuring the security of portions of a database against
unauthorized access. It is now customary to refer to two types of database security mechanisms:
■ Discretionary security mechanisms. These are used to grant privileges to users, including the capability to

access specific data files, records, or fields in a specified mode (such as read, insert, delete, or update).
■ Mandatory security mechanisms. These are used to enforce multilevel security by classifying the data and
users into various security classes (or levels) and then implementing the appropriate security policy of the
organization. For example, a typical security policy is to permit users at a certain classification (or clearance)
level to see only the data items classified at the user’s own (or lower) classification level. An extension of this
is role-based
security, which enforces policies and privileges based on the concept of organizational roles.

Control Measures
Four main control measures are used to provide security of data in databases:
■ Access control

■ Inference control
■ Flow control
■ Data encryption

✓ Access control
A security problem common to computer systems is that of preventing unauthorized persons from accessing
the system itself, either to obtain information or to make malicious changes in a portion of the database. The
security mechanism of a DBMS must include provisions for restricting access to the database system as a
whole. This function, called access control, is handled by creating user accounts and passwords to control the
login process by the DBMS.
✓ Inference control
Statistical databases are used to provide statistical information or summaries of values based on various
criteria. For example, a database for population statistics may provide statistics based on age groups, income
levels, household size, education levels, and other criteria. Statistical database users such as government
99
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
statisticians or market research firms are allowed to access the database to retrieve statistical information
about a population but not to access the detailed confidential information about specific individuals. Security
for statistical databases must ensure that information about individuals cannot be accessed. It is sometimes
possible to deduce or infer certain facts concerning individuals from queries that involve only summary
statistics on groups; consequently, this must not be permitted either. This problem, called statistical
database security. The corresponding control measures are called inference
control measures.

✓ Flow control
Another security issue is that of flow control, which prevents information from flowing in such a way that it
reaches unauthorized users. Flow control are pathways on which information flows implicitly in ways that
violate the security policy of an organization.

✓ Data encryption
A final control measure is data encryption, which is used to protect sensitive data (such as credit card
numbers) that is transmitted via some type of communications network. Encryption can be used to provide
additional protection for sensitive portions of a database as well. The data is encoded using some coding
algorithm. An unauthorized user who accesses encoded data will have difficulty deciphering it, but authorized
users are given decoding or decrypting algorithms (or keys) to decipher the data. Encrypting techniques that
are very difficult to decode without a key
have been developed for military applications. However, encrypted database records are used today in both
private organizations and governmental and military applications. In fact, state and federal laws prescribe
encryption for any system that deals with legally protected personal information.

Database Security and the DBA

The database administrator (DBA) is the central authority for managing a database system. The DBA’s
responsibilities include granting privileges to users who need to use the system and classifying users and data
in accordance with the policy of the organization. The DBA has a DBA account in the DBMS, sometimes called
a system or superuser account, which provides powerful capabilities that are not made available to regular
database accounts and users. DBA-privileged commands include commands for granting and revoking
privileges to individual accounts, users, or user groups and for performing the following types of actions:
1. Account creation. This action creates a new account and password for a user or a group of users to enable
access to the DBMS.
2. Privilege granting. This action permits the DBA to grant certain privileges to certain accounts.
3. Privilege revocation. This action permits the DBA to revoke (cancel) certain privileges that were
previously given to certain accounts.
4. Security level assignment. This action consists of assigning user accounts to the appropriate security
clearance level.
The DBA is responsible for the overall security of the database system. Action 1 in the preceding list is used to
control access to the DBMS as a whole, whereas actions 2 and 3 are used to control discretionary database
authorization, and action 4 is used to control mandatory authorization.

SQL Authorization
Authorization:
• A file system identifies certain privileges on the objects (files) it manages.

100
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
o Typically read, write, execute.
• A file system identifies certain participants to whom privileges may be granted.
o Typically the owner, a group, all users.

How to authorize objects in database?

➢ Privileges
➢ Grant and Revoke
➢ Grant Diagrams
Privileges – (1)
✓ SQL identifies a more detailed set of privileges on objects (relations) than the typical file system.
✓ Nine privileges in all, some of which can be restricted to one column of one relation.
Privileges – (2)
✓ Some important privileges on a relation:
1. SELECT = right to query the relation.
2. INSERT = right to insert tuples.
✓ May apply to only one attribute.
3. DELETE = right to delete tuples.
4. UPDATE = right to update tuples.
➢ May apply to only one attribute.
Example:
➢ For the statement below:
INSERT INTO prod(name)
SELECT sprite FROM Sells
WHERE NOT EXISTS ( SELECT * FROM sells WHERE name = sprite);
We add them to Beers with a NULL manufacturer.

➢ We require privileges SELECT on Sells and sprites, and INSERT on sprite or

sprites.name.

Database Objects
➢ The objects on which privileges exist include stored tables and views.
➢ Other privileges are the right to create objects of a type, e.g., triggers.
➢ Views form an important tool for access control.
Example: Views as Access Control:
✓ We might not want to give the SELECT privilege on Emps(name, addr, salary).
✓ But it is safer to give SELECT on:
CREATE VIEW SafeEmps AS
SELECT name, addr FROM Emps;
✓ Queries on SafeEmps do not require SELECT on Emps, just on SafeEmps.

Granting Privileges
✓ You have all possible privileges on the objects, such as relations, that you create.
✓ You may grant privileges to other users (authorization ID’s), including PUBLIC.
✓ You may also grant privileges WITH GRANT OPTION, which lets the grantee also grant this
privilege.
To grant privileges, say:
GRANT <list of privileges>
101
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
ON <relation or other object>
TO <list of authorization ID’s>;
If you want the recipient(s) to be able to pass the privilege(s) to others add:
WITH GRANT OPTION
Example: GRANT:
Suppose you are the owner of Sells. You may say:
GRANT SELECT, UPDATE(price)
ON Sells
TO sally;
Now Sally has the right to issue any query on Sells and can update the price component only.
Suppose we also grant:
GRANT UPDATE ON Sells TO sally
WITH GRANT OPTION;
Now, Sally not only can update any attribute of Sells, but can grant to others the privilege UPDATE ON Sells.
Also, she can grant more specific privileges like UPDATE(price)ON Sells.

Revoking Privileges
REVOKE <list of privileges>
ON <relation or other object>
FROM <list of authorization ID’s>;
✓ Your grant of these privileges can no longer be used by these users to justify their use of the
privilege.
But they may still have the privilege because they obtained it independently from elsewhere.

SQL GRANT REVOKE Commands

Description
You can GRANT and REVOKE privileges on various database objects in SQL Server. We'll look at how to grant
and revoke privileges on tables in SQL Server.
Grant Privileges on Table
You can grant users various privileges to tables. These permissions can be any combination of SELECT,
INSERT, UPDATE, DELETE, REFERENCES, ALTER, or ALL.
Syntax
The syntax for granting privileges on a table in SQL Server is:
GRANT privileges ON object TO user;
privileges
The privileges to assign. It can be any of the following values:
Privilege Description
SELECT Ability to perform SELECT statements on the table.
INSERT Ability to perform INSERT statements on the table.
UPDATE Ability to perform UPDATE statements on the table.
DELETE Ability to perform DELETE statements on the table.

102
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
REFERENCES Ability to create a constraint that refers to the table.
ALTER Ability to perform ALTER TABLE statements to change the table definition.
ALL does not grant all permissions for the table. Rather, it grants the ANSI-92
ALL
permissions which are SELECT, INSERT, UPDATE, DELETE, and REFERENCES.
object
The name of the database object that you are granting permissions for. In the case of granting privileges on a
table, this would be the table name.
user
The name of the user that will be granted these privileges.
Example
Let's look at some examples of how to grant privileges on tables in SQL Server.
For example, if you wanted to grant SELECT, INSERT, UPDATE, and DELETE privileges on a table called
employees to a user name smithj, you would run the following GRANT statement:
GRANT SELECT, INSERT, UPDATE, DELETE ON employees TO smithj;
You can also use the ALL keyword to indicate that you wish to grant the ANSI-92 permissions (ie: SELECT,
INSERT, UPDATE, DELETE, and REFERENCES) to a user named smithj. For example:
GRANT ALL ON employees TO smithj;
If you wanted to grant only SELECT access on the employees table to all users, you could grant the privileges to
the public role. For example:
GRANT SELECT ON employees TO public;

Revoke Privileges on Table

Once you have granted privileges, you may need to revoke some or all of these privileges. To do this, you can
run a revoke command. You can revoke any combination of SELECT, INSERT, UPDATE, DELETE, REFERENCES,
ALTER, or ALL.
Syntax
The syntax for revoking privileges on a table in SQL Server is:
REVOKE privileges ON object FROM user;
privileges
It is the privileges to assign. It can be any of the following values:
Privilege Description
SELECT Ability to perform SELECT statements on the table.
INSERT Ability to perform INSERT statements on the table.
UPDATE Ability to perform UPDATE statements on the table.
DELETE Ability to perform DELETE statements on the table.
REFERENCES Ability to create a constraint that refers to the table.
ALTER Ability to perform ALTER TABLE statements to change the table definition.
ALL does not revoke all permissions for the table. Rather, it revokes the ANSI-92
ALL
permissions which are SELECT, INSERT, UPDATE, DELETE, and REFERENCES.
object
The name of the database object that you are revoking privileges for. In the case of revoking privileges on a
table, this would be the table name.

103
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
user
The name of the user that will have these privileges revoked.
Example
Let's look at some examples of how to revoke privileges on tables in SQL Server.
For example, if you wanted to revoke DELETE privileges on a table called employees from a user named
anderson, you would run the following REVOKE statement:
REVOKE DELETE ON employees FROM anderson;
If you wanted to revoke ALL ANSI-92 permissions (ie: SELECT, INSERT, UPDATE, DELETE, and REFERENCES)
on a table for a user named anderson, you could use the ALL keyword as follows:
REVOKE ALL ON employees FROM anderson;
If you had granted SELECT privileges to the public role (ie: all users) on the employees table and you wanted to
revoke these privileges, you could run the following REVOKE statement:
REVOKE SELECT ON employees FROM public;

104
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Designing Good Designed Database
A properly designed database provides with access to up-to-date, accurate information. Because a correct
design is essential to achieving goals in working with a database, investing the time required to learn the
principles of good design makes sense. Certain principles guide the database design process. The first
principle is that duplicate information (also called redundant data) is bad, because it wastes space and
increases the likelihood of errors and inconsistencies. The second principle is that the correctness and
completeness of information is important. If your database contains incorrect information, any reports that
pull information from the database will also contain incorrect information. As a result, any decisions you make
that are based on those reports will then be misinformed.

A good database design is, therefore, one that:

✓ Divides your information into subject-based tables to reduce redundant data.
✓ Provides Access with the information it requires to join the information in the tables together as
needed.
✓ Helps support and ensure the accuracy and integrity of your information.
✓ Accommodates your data processing and reporting needs.

The design process

The design process consists of the following steps:
• Determine the purpose of your database
• Find and organize the information required
• Divide the information into tables
• Turn information items into columns
• Specify primary keys
• Set up the table relationships
• Refine your design
• Apply the normalization rules

❖ Determining the purpose of your database:

It is a good idea to write down the purpose of the database on paper — its purpose, how you expect to use it,
and who will use it. For a small database for a home based business, for example, you might write something

105
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
simple like "The customer database keeps a list of customer information for the purpose of producing
mailings and reports." If the database is more complex or is used by many people, as often occurs in a
corporate setting, the purpose could easily be a paragraph or more and should include when and how each
person will use the database. The idea is to have a well-developed mission statement that can be referred to
throughout the design process. Having such a statement helps you focus on your goals when you make
decisions.

❖ Finding and organizing the required information

To find and organize the information required, start with your existing information. For example, you might
record purchase orders in a ledger or keep customer information on paper forms in a file cabinet. Gather
those documents and list each type of information shown (for example, each box that you fill in on a form). If
you don't have any existing forms, imagine instead that you have to design a form to record the customer
information. What information would you put on the form? What fill-in boxes would you create? Identify and
list each of these items. For example, suppose you currently keep the customer list on index cards. Examining
these cards might show that each card holds a customers name, address, city, state, postal code and telephone
number. Each of these items represents a potential column in a table.

A key point to remember is that you should break each piece of information into its smallest useful parts. In
the case of a name, to make the last name readily available, you will break the name into two parts — First
Name and Last Name. To sort a report by last name, for example, it helps to have the customer's last name
stored separately. In general, if you want to sort, search, calculate, or report based on an item of information,
you should put that item in its own field.
Think about the questions you might want the database to answer. For instance, how many sales of your
featured product did you close last month? Where do your best customers live? Who is the supplier for your
best-selling product? Anticipating these questions helps you zero in on additional items to record.

❖ Dividing the information into tables

To divide the information into tables, choose the major entities, or subjects. For example, after finding and
organizing information for a product sales database, the preliminary list might look like this:

106
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The major entities shown here are the products, the suppliers, the customers, and the orders. Therefore, it
makes sense to start out with these four tables: one for facts about products, one for facts about suppliers, one
for facts about customers, and one for facts about orders. Although this doesn’t complete the list, it is a good
starting point. You can continue to refine this list until you have a design that works well.
When you first review the preliminary list of items, you might be tempted to place them all in a single table,
instead of the four shown in the preceding illustration. You will learn here why that is a bad idea. Consider for
a moment, the table shown here:

In this case, each row contains information about both the product and its supplier. Because you can have
many products from the same supplier, the supplier name and address information has to be repeated many
times. This wastes disk space. Recording the supplier information only once in a separate Suppliers table, and
then linking that table to the Products table, is a much better solution.
A second problem with this design comes about when you need to modify information about the supplier. For
example, suppose you need to change a supplier's address. Because it appears in many places, you might
accidentally change the address in one place but forget to change it in the others. Recording the supplier’s
address in only one place solves the problem. When you design your database, always try to record each fact
just once. If you find yourself repeating the same information in more than one place, such as the address for a
particular supplier, place that information in a separate table.
Finally, suppose there is only one product supplied by XYZ, and you want to delete the product, but retain the
supplier name and address information. How would you delete the product record without also losing the
supplier information? You can't. Because each record contains facts about a product, as well as facts about a
supplier, you cannot delete one without deleting the other. To keep these facts separate, you must split the
one table into two: one table for product information, and another table for supplier information. Deleting a
product record should delete only the facts about the product, not the facts about the supplier.
Once you have chosen the subject that is represented by a table, columns in that table should store facts only
about the subject. For instance, the product table should store facts only about products. Because the supplier
address is a fact about the supplier, and not a fact about the product, it belongs in the supplier table.
107
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
❖ Turning information items into columns
To determine the columns in a table, decide what information you need to track about the subject recorded in
the table. For example, for the Customers table, Name, Address, City-State-Zip, Send e-mail, Salutation and E-
mail address comprise a good starting list of columns. Each record in the table contains the same set of
columns, so you can store Name, Address, City-State-Zip, Send e-mail, Salutation and E-mail address
information for each record. For example, the address column contains customers’ addresses. Each record
contains data about one customer, and the address field contains the address for that customer.
Once you have determined the initial set of columns for each table, you can further refine the columns. For
example, it makes sense to store the customer name as two separate columns: first name and last name, so
that you can sort, search, and index on just those columns. Similarly, the address actually consists of five
separate components, address, city, state, postal code, and country/region, and it also makes sense to store
them in separate columns. If you want to perform a search, filter or sort operation by state, for example, you
need the state information stored in a separate column.
You should also consider whether the database will hold information that is of domestic origin only, or
international, as well. For instance, if you plan to store international addresses, it is better to have a Region
column instead of State, because such a column can accommodate both domestic states and the regions of
other countries/regions. Similarly, Postal Code makes more sense than Zip Code if you are going to store
international addresses.
The following list shows a few tips for determining your columns.
• Don’t include calculated data
In most cases, you should not store the result of calculations in tables. Instead, you can have Access perform
the calculations when you want to see the result. For example, suppose there is a Products On Order report
that displays the subtotal of units on order for each category of product in the database. However, there is no
Units On Order subtotal column in any table. Instead, the Products table includes a Units On Order column that
stores the units on order for each product. Using that data, Access calculates the subtotal each time you print
the report. The subtotal itself should not be stored in a table.
• Store information in its smallest logical parts
You may be tempted to have a single field for full names, or for product names along with product
descriptions. If you combine more than one kind of information in a field, it is difficult to retrieve individual
facts later. Try to break down information into logical parts; for example, create separate fields for first and
last name, or for product name, category, and description.

108
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
❖ Specifying primary keys
Each table should include a column or set of columns that uniquely identifies each row stored in the table. This
is often a unique identification number, such as an employee ID number or a serial number. In database
terminology, this information is called the primary key of the table. Access uses primary key fields to quickly
associate data from multiple tables and bring the data together for you.
If you already have a unique identifier for a table, such as a product number that uniquely identifies each
product in your catalog, you can use that identifier as the table’s primary key — but only if the values in this
column will always be different for each record. You cannot have duplicate values in a primary key. For
example, don’t use people’s names as a primary key, because names are not unique. You could easily have two
people with the same name in the same table.
A primary key must always have a value. If a column's value can become unassigned or unknown (a missing
value) at some point, it can't be used as a component in a primary key.
You should always choose a primary key whose value will not change. In a database that uses more than one
table, a table’s primary key can be used as a reference in other tables. If the primary key changes, the change
must also be applied everywhere the key is referenced. Using a primary key that will not change reduces the
chance that the primary key might become out of sync with other tables that reference it.
Often, an arbitrary unique number is used as the primary key. For example, you might assign each order a
unique order number. The order number's only purpose is to identify an order. Once assigned, it never
changes.
If you don’t have in mind a column or set of columns that might make a good primary key, consider using a
column that has the AutoNumber data type. When you use the AutoNumber data type, Access automatically
assigns a value for you. Such an identifier is factless; it contains no factual information describing the row that
it represents. Factless identifiers are ideal for use as a primary key because they do not change. A primary key
that contains facts about a row — a telephone number or a customer name, for example — is more likely to
change, because the factual information itself might change.

109
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
1. A column set to the AutoNumber data type often makes a good primary key. No two product IDs are the
same.
In some cases, you may want to use two or more fields that, together, provide the primary key of a table. For
example, an Order Details table that stores line items for orders would use two columns in its primary key:
Order ID and Product ID. When a primary key employs more than one column, it is also called a composite
key.
For the product sales database, you can create an AutoNumber column for each of the tables to serve as
primary key: ProductID for the Products table, OrderID for the Orders table, CustomerID for the Customers
table, and SupplierID for the Suppliers table.

❖ Creating the table relationships

Now that you have divided your information into tables, you need a way to bring the information together
again in meaningful ways. For example, the following form includes information from several tables.

110
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
1. Information in this form comes from the Customers table...
2. ...the Employees table...
3. ...the Orders table...
4. ...the Products table...
5. ...and the Order Details table.
Access is a relational database management system. In a relational database, you divide your information into
separate, subject-based tables. You then use table relationships to bring the information together as needed.

❖ Creating a one-to-many relationship

Consider this example: the Suppliers and Products tables in the product orders database. A supplier can
supply any number of products. It follows that for any supplier represented in the Suppliers table, there can
be many products represented in the Products table. The relationship between the Suppliers table and the
Products table is, therefore, a one-to-many relationship.

To represent a one-to-many relationship in your database design, take the primary key on the "one" side of
the relationship and add it as an additional column or columns to the table on the "many" side of the
relationship. In this case, for example, you add the Supplier ID column from the Suppliers table to the Products

111
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
table. Access can then use the supplier ID number in the Products table to locate the correct supplier for each
product.
The Supplier ID column in the Products table is called a foreign key. A foreign key is another table’s primary
key. The Supplier ID column in the Products table is a foreign key because it is also the primary key in the
Suppliers table.

You provide the basis for joining related tables by establishing pairings of primary keys and foreign keys. If
you are not sure which tables should share a common column, identifying a one-to-many relationship ensures
that the two tables involved will, indeed, require a shared column.

❖ Creating a many-to-many relationship

Consider the relationship between the Products table and Orders table.
A single order can include more than one product. On the other hand, a single product can appear on many
orders. Therefore, for each record in the Orders table, there can be many records in the Products table. And
for each record in the Products table, there can be many records in the Orders table. This type of relationship
is called a many-to-many relationship because for any product, there can be many orders; and for any order,
there can be many products. Note that to detect many-to-many relationships between your tables, it is
important that you consider both sides of the relationship.
The subjects of the two tables — orders and products — have a many-to-many relationship. This presents a
problem. To understand the problem, imagine what would happen if you tried to create the relationship
between the two tables by adding the Product ID field to the Orders table. To have more than one product per
order, you need more than one record in the Orders table per order. You would be repeating order
information for each row that relates to a single order — resulting in an inefficient design that could lead to
inaccurate data. You run into the same problem if you put the Order ID field in the Products table — you
would have more than one record in the Products table for each product. How do you solve this problem?
The answer is to create a third table, often called a junction table, that breaks down the many-to-many
relationship into two one-to-many relationships. You insert the primary key from each of the two tables into
the third table. As a result, the third table records each occurrence or instance of the relationship.

112
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Each record in the Order Details table represents one line item on an order. The Order Details table’s primary
key consists of two fields — the foreign keys from the Orders and the Products tables. Using the Order ID field
alone doesn’t work as the primary key for this table, because one order can have many line items. The Order
ID is repeated for each line item on an order, so the field doesn’t contain unique values. Using the Product ID
field alone doesn’t work either, because one product can appear on many different orders. But together, the
two fields always produce a unique value for each record.
In the product sales database, the Orders table and the Products table are not related to each other directly.
Instead, they are related indirectly through the Order Details table. The many-to-many relationship between
orders and products is represented in the database by using two one-to-many relationships:
• The Orders table and Order Details table have a one-to-many relationship. Each order can have more
than one line item, but each line item is connected to only one order.
• The Products table and Order Details table have a one-to-many relationship. Each product can have
many line items associated with it, but each line item refers to only one product.
From the Order Details table, you can determine all of the products on a particular order. You can also
determine all of the orders for a particular product.
After incorporating the Order Details table, the list of tables and fields might look something like this:

113
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
❖ Creating a one-to-one relationship
Another type of relationship is the one-to-one relationship. For instance, suppose you need to record some
special supplementary product information that you will need rarely or that only applies to a few products.
Because you don't need the information often, and because storing the information in the Products table
would result in empty space for every product to which it doesn’t apply, you place it in a separate table. Like
the Products table, you use the ProductID as the primary key. The relationship between this supplemental
table and the Product table is a one-to-one relationship. For each record in the Product table, there exists a
single matching record in the supplemental table. When you do identify such a relationship, both tables must
share a common field.
When you detect the need for a one-to-one relationship in your database, consider whether you can put the
information from the two tables together in one table. If you don’t want to do that for some reason, perhaps
because it would result in a lot of empty space, the following list shows how you would represent the
relationship in your design:
• If the two tables have the same subject, you can probably set up the relationship by using the same
primary key in both tables.
• If the two tables have different subjects with different primary keys, choose one of the tables (either
one) and insert its primary key in the other table as a foreign key.
Determining the relationships between tables helps you ensure that you have the right tables and columns.
When a one-to-one or one-to-many relationship exists, the tables involved need to share a common column or
columns. When a many-to-many relationship exists, a third table is needed to represent the relationship.

❖ Refining the design

Once you have the tables, fields, and relationships you need, you should create and populate your tables with
sample data and try working with the information: creating queries, adding new records, and so on. Doing this

114
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
helps highlight potential problems — for example, you might need to add a column that you forgot to insert
during your design phase, or you may have a table that you should split into two tables to remove duplication.
See if you can use the database to get the answers you want. Create rough drafts of your forms and reports
and see if they show the data you expect. Look for unnecessary duplication of data and, when you find any,
alter your design to eliminate it.
As you try out your initial database, you will probably discover room for improvement. Here are a few things
to check for:
• Did you forget any columns? If so, does the information belong in the existing tables? If it is
information about something else, you may need to create another table. Create a column for every
information item you need to track. If the information can’t be calculated from other columns, it is
likely that you will need a new column for it.
• Are any columns unnecessary because they can be calculated from existing fields? If an information
item can be calculated from other existing columns — a discounted price calculated from the retail
price, for example — it is usually better to do just that, and avoid creating new column.
• Are you repeatedly entering duplicate information in one of your tables? If so, you probably need to
divide the table into two tables that have a one-to-many relationship.
• Do you have tables with many fields, a limited number of records, and many empty fields in individual
records? If so, think about redesigning the table so it has fewer fields and more records.
• Has each information item been broken into its smallest useful parts? If you need to report, sort,
search, or calculate on an item of information, put that item in its own column.
• Does each column contain a fact about the table's subject? If a column does not contain information
about the table's subject, it belongs in a different table.
• Are all relationships between tables represented, either by common fields or by a third table? One-to-
one and one-to- many relationships require common columns. Many-to-many relationships require a
third table.

❖ Refining the Products table

Suppose that each product in the product sales database falls under a general category, such as beverages,
condiments, or seafood. The Products table could include a field that shows the category of each product.
Suppose that after examining and refining the design of the database, you decide to store a description of the
category along with its name. If you add a Category Description field to the Products table, you have to repeat
each category description for each product that falls under the category — this is not a good solution.
A better solution is to make Categories a new subject for the database to track, with its own table and its own
primary key. You can then add the primary key from the Categories table to the Products table as a foreign
key.
The Categories and Products tables have a one-to-many relationship: a category can include more than one
product, but a product can belong to only one category.
When you review your table structures, be on the lookout for repeating groups. For example, consider a table
containing the following columns:
• Product ID
• Name
• Product ID1
• Name1
• Product ID2
• Name2
• Product ID3
• Name3
115
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Here, each product is a repeating group of columns that differs from the others only by adding a number to the
end of the column name. When you see columns numbered this way, you should revisit your design.
Such a design has several flaws. For starters, it forces you to place an upper limit on the number of products.
As soon as you exceed that limit, you must add a new group of columns to the table structure, which is a major
administrative task.
Another problem is that those suppliers that have fewer than the maximum number of products will waste
some space, since the additional columns will be blank. The most serious flaw with such a design is that it
makes many tasks difficult to perform, such as sorting or indexing the table by product ID or name.
Whenever you see repeating groups review the design closely with an eye on splitting the table in two. In the
above example it is better to use two tables, one for suppliers and one for products, linked by supplier ID.

❖ Applying the normalization rules

You can apply the data normalization rules (sometimes just called normalization rules) as the next step in
your design. You use these rules to see if your tables are structured correctly. The process of applying the
rules to your database design is called normalizing the database, or just normalization.
Normalization is most useful after you have represented all of the information items and have arrived at a
preliminary design. The idea is to help you ensure that you have divided your information items into the
appropriate tables. What normalization cannot do is ensure that you have all the correct data items to begin
with.
You apply the rules in succession, at each step ensuring that your design arrives at one of what is known as
the "normal forms." Five normal forms are widely accepted — the first normal form through the fifth normal
form. This article expands on the first three, because they are all that is required for the majority of database
designs.
First normal form
First normal form states that at every row and column intersection in the table there, exists a single value, and
never a list of values. For example, you cannot have a field named Price in which you place more than one
Price. If you think of each intersection of rows and columns as a cell, each cell can hold only one value.
Second normal form
Second normal form requires that each non-key column be fully dependent on the entire primary key, not on
just part of the key. This rule applies when you have a primary key that consists of more than one column. For
example, suppose you have a table containing the following columns, where Order ID and Product ID form the
primary key:
• Order ID (primary key)
• Product ID (primary key)
• Product Name
This design violates second normal form, because Product Name is dependent on Product ID, but not on Order
ID, so it is not dependent on the entire primary key. You must remove Product Name from the table. It belongs
in a different table (Products).
Third normal form
Third normal form requires that not only every non-key column be dependent on the entire primary key, but
that non-key columns be independent of each other.
Another way of saying this is that each non-key column must be dependent on the primary key and nothing
but the primary key. For example, suppose you have a table containing the following columns:
• ProductID (primary key)
• Name
• SRP
• Discount
116
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Assume that Discount depends on the suggested retail price (SRP). This table violates third normal form
because a non-key column, Discount, depends on another non-key column, SRP. Column independence means
that you should be able to change any non-key column without affecting any other column. If you change a
value in the SRP field, the Discount would change accordingly, thus violating that rule. In this case Discount
should be moved to another table that is keyed on SRP.

More About Normalization:

Database normalization (or normalization) is the process of organizing the columns (attributes) and tables
(relations) of a relational database to minimize data redundancy. Normalization involves decomposing a table
into less redundant (and smaller) tables without losing information; defining foreign keys in the old table
referencing the primary keys of the new ones. The objective is to isolate data so that additions, deletions, and
modifications of an attribute can be made in just one table and then propagated through the rest of the
database using the defined foreign keys.
Edgar F. Codd, the inventor of the relational model (RM), introduced the concept of normalization and what
we now know as the First normal form (1NF) in 1970. Codd went on to define the Second normal form (2NF)
and Third normal form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form
(BCNF) in 1974. Informally, a relational database table is often described as "normalized" if it meets Third
Normal Form. Most 3NF tables are free of insertion, update, and deletion anomalies.

Problem Without Normalization:

Database Normalisation is a technique of organizing the data in the database. Normalization is a systematic
approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion,
Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form by removing
duplicated data from the relation tables. Normalization is used for mainly two purpose,
✓ Eliminating reduntant(useless) data.
✓ Ensuring data dependencies make sense i.e data is logically stored.

Without Normalization, it becomes difficult to handle and update the database, without facing data loss.
Insertion, Updation and Deletion Anomalies are very frequent if Database is not Normalized.

Anomalies in DBMS:
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database
administrator. Managing a database with anomalies is next to impossible.

There are three types of anomalies that occur when the database is not normalized. These are –
Insertion anomaly
update anomaly
deletion anomaly.

117
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that has
four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for
storing employee’s address and emp_dept for storing the department details in which the employee works. At
some point of time the table looks like this:

emp_id emp_name emp_address emp_dept

101 Rick Dharan D001

101 Rick Dharan D002

123 Maggie Damak D890

166 Glenn Pokhara D900

166 Glenn Pokhara D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly:
In the above table we have two rows for employee Rick as he belongs to two departments of the company. If
we want to update the address of Rick then we have to update the same in two rows or the data will become
inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the
database, Rick would be having two different addresses, which is not correct and would lead to inconsistent
data.
Insert anomaly:
Suppose a new employee joins the company, who is under training and currently not assigned to any
department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly:
Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having
emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this
department.

To overcome these anomalies we need to normalize the database.

Normalization Rule:
Normalization rule are divided into following normal form.
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF (Boyce-Codd Normal Form)

1. First Normal Form (1NF)

118
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
As per First Normal Form, no two Rows of data must contain repeating group of information i.e each set of
column must have a unique value, such that multiple columns cannot be used to fetch the same row. Each
table should be organized into rows, and each row should have a primary key that distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one column can be combined to create a
single primary key. For example consider a table which is not in First normal form

Student Table :
Student Age Subject
Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths
In First Normal Form, any row must not have a column in which more than one value is saved (Adam,15,
Biology, Maths), like separated with commas. Rather than that, we must separate such data into multiple rows.

Student Table following 1NF will be (1NF Table: Student) :

Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
Using the First Normal Form, data redundancy increases, as there will be many columns with same data in
multiple rows but each row as a whole will be unique.

2. Second Normal Form

Before we learn about the second normal form, we need to understand the following −
Prime attribute − An attribute, which is a part of the prime-key, is known as a prime attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.

If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime
key attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds
true. E.g, R has some elements of S.
R = {1,2,3,4} and S= {3,4,5}.
R ⊂ S = {3,5} ⊂ S

119
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule,
non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key
attribute individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.

We broke the relation in two as depicted in the above picture. So there exists no partial dependency.

Example:
As per the Second Normal Form there must not be any partial dependency of any column on primary key. It means
that for a table that has concatenated primary key, each column in the table that is not part of the primary key
must depend upon the entire concatenated key for its existence. If any column depends only on one part of the
concatenated key, then the table fails Second normal form.
In example of First Normal Form there are two rows for Adam, to include multiple subjects that he has opted for.
While this is searchable, and follows First normal form, it is an inefficient use of space. Also in the above Table in
First Normal Form, while the candidate key is {Student, Subject}, Age of Student only depends on Student column,
which is incorrect as per Second Normal Form. To achieve second normal form, it would be helpful to split out the
subjects into an independent table, and match them up using the student names as foreign keys.

New Student Table following 2NF will be : (Table 1: Student)

Student Age

Adam 15

Alex 14

Stuart 17
In Student Table the candidate key will be Student column, because all other column i.e Age is dependent on it.
New Subject Table introduced for 2NF will be : (table 2: Subject)
Student Subject

Adam Biology

120
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Adam Maths

Alex Maths

Stuart Maths
In Subject Table the candidate key will be {Student, Subject} column. Now, both the above tables qualifies for
Second Normal Form and will never suffer from Update Anomalies. Although there are a few complex cases in
which table in Second Normal Form suffers Update Anomalies, and to handle those scenarios Third Normal Form is
there.

3. Third Normal Form

For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy −
• No non-prime attribute is transitively dependent on prime key attribute.
• For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,
o A is prime attribute.
Note:
A super key is an attribute or combination of attributes in a relation that identifies a tuple uniquely within the
relation.
Prime attribute − An attribute, which is a part of the prime-key, is known as a prime attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City
can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally,
Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows −

Example 1:
121
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Third Normal form applies that every non-prime attribute of table must be dependent on primary key, or we can
say that, there should not be the case that a non-prime attribute is determined by another non-prime attribute. So
this transitive functional dependency should be removed from the table and also the table must be in Second

Normal form. For example, consider a table with following fields.

Student_Detail Table:

Student_id Student_name DOB Street city State Zip

In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip
and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to
new table, with Zip as primary key.

New Student_Detail Table:

Student_id Student_name DOB Zip

Address Table:
Zip Street city state

The advantage of removing transitive dependency is,

• Amount of data duplication is reduced.
• Data integrity achieved.
Example 2:
A table design is said to be in 3NF if both the following conditions hold:
• Table must be in 2NF
• Transitive functional dependency of non-prime attribute on any super key should be removed.
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency X->
Y at least one of the following conditions hold:
• X is a super key of table
• Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create a table named
employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam

122
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on

Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that
makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id).
This violates the rule of 3NF. To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:
Employee table:
emp_id emp_name emp_zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999

Employee_zip table:
emp_zip emp_state emp_city emp_district
282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan

4. Boyce-Codd Normal Form

Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −
• For any non-trivial functional dependency, X → A, X must be a super-key.

123
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
In the figure, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation
ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.
Example:
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies
with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one department. They store the data
like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp

1001 Austrian Production and planning D001 200

1001 Austrian stores D001 250

1002 American design and technical support D134 100

1002 American Purchasing department D134 600

Functional dependencies in the table above:

emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys. To make the table comply with BCNF we
can break the table in three tables like this:

emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American

emp_dept table:
emp_dept dept_type dept_no_of_emp
Production and planning D001 200
stores D001 250

124
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
design and technical support D134 100
Purchasing department D134 600

emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.

125
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Functional Dependencies:
A functional dependency is a relationship between two attributes. Typically between the PK and other non-key
attributes with in the table. For any relation R, attribute Y is functionally dependent on attribute X (usually the PK),
if for every valid instance of X, that value of X uniquely determines the value of Y.
X ———–> Y
The left-hand side of the FD is called the determinant, and the right-hand side is the dependent.
Examples:
SID ———-> Name, Address, Birthdate
SID determines names and address and birthdays. Given SID, we can determine any of the other attributes within
the table.
SID, Course ———> DateCompleted
SID and Course determine date completed. This must also work for a composite PK.
ISBN ———–> Title
ISBN determines title.

Rules of Functional Dependencies:

Consider the following instance r(R) of the relation schema R(ABCDE):

What kind of dependencies can we observe among the attributes in Table R?

• Since the values of A are unique, it follows from the FD definition that:
A →B, A →C, A →D, A →E
• It also follows that A →BC (or any other subset of ABCDE).
• This can be summarized as A →BCDE

126
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• From our understanding of primary keys, A is a Primary Key.
Since the values of E are always the same, it follows that: A →E, B →E, C →E, D →E
However, we cannot generally summarized above by ABCD →E
In general, A →E, B →E AB →E
Other observations:
• combinations of BC are unique, therefore BC →ADE
• combinations of BD are unique, therefore BD →ACE
• if C values match, so do D values, therefore C →D however, D values don’t determine C values, so C does
not determine D, and D does not determine C.
When looking at the data, it makes a lot more sense in terms of which attributes are dependent and which are
determinants.
Inference Rules
Armstrong’s axioms are a set of axioms (or, more precisely, inference rules) used to infer all the functional
dependencies on a relational database. They were developed by William W. Armstrong.
Let R(U) be a relation scheme over the set of attributes U. We will use the letters X, Y, Z to represent any subset of
and, for short, the union of two sets of attributes and by instead of the usual X U Y.
examples, R has some (or all) elements of S.
R = {1,2,3,4} and S= {3,4,5}
R ∪ S = {1,2,3,4,5}
Axiom of reflexivity: (Partial dependency)

If Y is a subset of X, then X determines Y

E.g examples, R has some (or all) elements of S.
Y = {1,2,3,4} and X= {3,4,5}.
Y ⊆ X = {3,4,5} ⊆ X

e g. PartNo —> NT123 – composed of category (NT) and partID (123)

X X Y
Axiom of augmentation
• Also known as a Partial dependency

Studentno, course —> studentName, address, city, prov, pc, grade, dateCompleted
This situation is not desirable, because every non key attribute has to be fully dependent on the PK. In this
situation Student information is only ‘partially’ dependent on the PK; StudentNo.
To fix this problem, we need to break down the table into two as follows:
StudentNo, course, grade, dateCompleted, StudentNo, studentName, address, city, prov, pc

Axiom of transitivity

127
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
If X determines Y and Y determines Z, then X must also determine Z.
StudentNo, studentName, address, city, prov, pc, ProgramID, ProgramName
X Y Z

This situation is not desirable, because a non-key attributes depends on another non key attribute.
To fix this problem, we need to break this table into two; one to hold information about the student and the other
to hold information about the program. However we still need to leave a FK in the student table, so that we can
determine which program the Student is enrolled in.
StudentNo —> studentName, address, city, prov, pc, ProgramID
ProgramID —> ProgramName

Additional rules:
Union

If X determines Y and X determines Z then X must also determines Y and Z.

This is suggesting that if two tables are separate, and the PK is the same, you may want to consider putting them
together.
SID —> EmpName
SID —> SpouseName
You may want to join these two tables into one.
SID –> EmpName, SpouseName
Some DBAs would leave them separated for a couple of reasons. They describes two entities, so should be
separated. If in one table, the spouse name may be left NULL most of the time, so there is no need to have it in the
same table.

Decomposition

If X determines Y and Z, then X determines Y and X determines Z separately. This is the reverse of Union. If you
have a table that appears to contain two entities that are determined by the same PK, consider breaking them up
into two tables.

Dependency Diagram

A dependency diagram illustrates the various dependencies that may exist in a non-normalized table. The following
dependencies are identified:
ProjectNo, and EmpNo combined is the PK.
Partial Dependencies:
ProjectNo —> ProjName
128
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
EmpNo —> EmpName, DeptNo, HrsWork
Transitive Dependency:
DeptNo —> DeptName

Remember:

PD – Partial Dependency
TD – Transitive Dependency
FD – Full Dependency

FUNCTIONAL DEPENDENCY DIAGRAMS

A set of Functional Dependencies for a data model can be documented in a Functional Dependency Diagram (also
known as a Determinacy Diagram).
In a Functional Dependency Diagram each attribute is shown in a rectangle with an arrow indicating the direction
of the dependency. The figure below illustrates the functional dependency
Prod# > Product.

A Functional Dependency with Multiple Attributes is shown below, for the functional dependency
Order#, Prod# > Quantity.

129
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A derived Functional Dependency involving Partial Key Dependency is shown in the figure below.
The arrow connected to the outer rectangle, which represents Order#, Prod# > Product can be deleted without loss
of information.

A derived Functional Dependency involving Transitive Dependency is shown in the figure below.
The arrow which represents Order# > Supplier can be deleted without loss of information.

Rules for Functional Dependency Diagrams

The following rules apply to Functional Dependency Diagrams:
• each attribute appears only once on the Functional Dependency Diagram
• all the attributes of interest appear on the Functional Dependency Diagram
• no partial key dependencies appear on the Functional Dependency Diagram
• no transitive dependencies appear on the Functional Dependency Diagram
The complete Functional Dependency Diagram for the Purchase Order data model is shown below:

130
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Closure Of Functional Dependency
The Closure Of Functional Dependency means the complete set of all possible attributes that can be functionally
derived from given functional dependency using the inference rules known as Armstrong’s Rules.
If “F” is a functional dependency then closure of functional dependency can be denoted using “{F}+”.
There are three steps to calculate closure of functional dependency. These are:
Step-1 : Add the attributes which are present on Left Hand Side in the original functional dependency.
Step-2 : Now, add the attributes present on the Right Hand Side of the functional dependency.
Step-3 : With the help of attributes present on Right Hand Side, check the other attributes that can be derived from
the other given functional dependencies. Repeat this process until all the possible attributes which can be derived
are added in the closure.
The Algorithm
✓ The procedure shown in the previous example can be generalized to an algorithm. Assume we are given
the set of functional dependencies FD and a set of attributes X. The algorithm is as follows:
✓ Add the attributes contained in the attribute set X to the result set X+.
✓ Add the attributes to the result set X+ which can be functionally determined from the attributes already
contained in the result set.
✓ Repeat step 2 until no more attributes can be added to the result set X+.
Example 1
We are given the relation R(A, B, C, D, E). This means that the table R has five columns: A, B, C, D, and E. We
are also given the set of functional dependencies: {A->B, B->C, C->D, D->E}.
What is {A}+?
• First, we add A to {A}+.
• What columns can be determined given A? We have A -> B, so we can determine B. Therefore, {A}+ is now
{A, B}.
• What columns can be determined given A and B? We have B -> C in the functional dependencies, so we can
determine C. Therefore, {A}+ is now {A, B, C}.

131
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• Now, we have A, B, and C. What other columns can we determine? Well, we have C -> D, so we can add D
to {A}+.
• Now, we have A, B, C, and D. Can we add anything else to it? Yes, since D -> E, we can add E to {A}+.
• We have used all of the columns in R and we have all used all functional dependencies. {A}+ = {A, B, C, D, E}.
Example 2
Let’s look at another example. We are given R(A, B, C, D, E, F). The functional dependencies are {AB->C, BC->AD, D-
>E, CF->B}. What is {A, B}+?
• We start with {A, B}.
• What columns can we determine, given A and B? We have AB -> C, so we can add C to {A, B}+.
• We now have A, B, and C. What other columns can we determine? We have BC -> AD. We already have A in
{A, B}+, so we can add D.
• So, we now have A, B, C, and D. What else can we add? We have D -> E, so we can add E to {A, B}+.
• Now {A, B}+ is {A, B, C, D, E}. Can we add anything else? No. We have one more functional dependency in
our set that we did not use: CF -> B. We can’t use this dependency because F is not in {A, B}+.
• Thus, {A, B}+ is {A, B, C, D, E}.
Prime and non-prime attributes
Attributes which are parts of any candidate key of relation are called as prime attribute, others are non-prime
attributes. For Example, STUD_NO in STUDENT relation is prime attribute, others are non-prime attribute.
Multivalued Dependency:
Multivalued Dependencies is a generalization of Functional Dependencies Concept that significantly helps to design
and optimize a Relation Database Structure. Let R(X1...,Xm, Y1...,Yn, Z1...,Zr) is a relation with m + n + r column
names. For notational convenience, we can define X for {X1...,Xm}; Y for {Y1...,Yn} and Z for {Z1...,Zn} and the
relation is R(X, Y, Z).
Multivalued Dependency is a statement of the form X -> Y, where X and Y are sets of attributes. Let Z be the set of
all the attributes in U (union) that are neither in X nor in Y. The multivalued dependency X -> Y holds in R if for all r1
and r2 in R, if r1[X] = r2[X], then there are r3 and r4 in R such that r3[X] = r1[X], r3[Y] = r1[Y], and r3[Z] = r2[Z]; and
r4[X] = r1[X], r4[y] = r2[Y], and r4[Z] = r1[Z]. examples: R = {1,2,3,4} and S= {3,4,5}
R ∪ S = {1,2,3,4,5}
Let X -> Y holds in R if and only if X -> Y - X holds in R. When the sets X, Y, and Z form a partition of U (union), then it
is convenient to write a tuple r of R like (x, y, z), where x, y, and z denote the projections of r onto X, Y, and Z. The
alphabet (A, B, C, D,...) is denote single attributes and (...,X, Y, Z) to denote sets of attributes. XY is a union of X and
Y attributes. A string of attributes A1, A2, ... An is denotes the set {A1, A2, ..., An}.

The multivalued dependency X -> Y is said to hold for R(X, Y, Z) if Yxz depends only on x; that is, if Yxz = Yxz' for each
x, z, z' such that Yxz and Yxz' are nonempty. Define Yxz to be {y : (x, y, z) ->R}, Yxz is nonempty IFF x and z appear
together in a tuple of R. Multivalued Dependencies provide a necessary and sufficient condition for a relation to be
decomposable into two of its projections without loss of information.
An instance, we can decompose R(X, Y, Z) into R1(X, Y) and R2(X, Z) is the set of tuples (x, y, z) where (x, y) is a tuple
of R1 and where (x, z) is a tuple of R2. In database, a Project_Employee_Task Entity is defined with {Project_Name,
Employee_Name, Task_Name) that can decompose into Project_Employee Entity (Project_Name,
Employee_Name} and Project_Task Entity {Project_Name, Task_Name}.
If X and Y are disjoint and if the functional dependency X -> Y holds for a relation R then the multivalued
dependency X -> Y also holds for R. X -> Y holds for the relation R(X, Y, Z) IFF R is the join of its projections R1(X, Y)

132
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
and R2(X, Z) based on multivalued dependency theorem.
R(X, Y, Z) is the join of its projections R1(X, Y) and R2(X, Z) IFF the following condition holds, whenever (x, y, z) and
(x, y', z') are tuples of R, then (x, y', z) and (x, y, z'). Since the right hand side of the "IFF" in theorem is symmetric in
the role of Y and Z, the next proposition is continue immediately that X -> Y holds for the relation R(X, Y, Z) IFF X -> Z
holds.

Join Dependency
Multivalued Dependencies are helps to lossless decomposition and form relation R based on trivial multivalued
dependencies. An instance, a relation R(A B C) is decomposed into relation R1(A B) and R2(A C) based on trivial A ?
B multivalued functional dependencies.

Join Dependency is helps to lossless decomposition and form relation R based on nontrivial multivalued
dependencies. An instance, a relation R(A B C) is decomposed into relation R1(A B), R2(B C) and R3(A C) based on
nontrivial multivalued functional dependencies.

Let R = {R1, R2, ..., Rn} be a set of relation schemes over union, the relation r(R) satisfies the join dependency * [R1,
R2, ..., Rn] if r decomposes lossless onto R1, R2, ... Rn.

That is, r = pR1(r) ⋈ pR2(r) ⋈ ..., ⋈ pRn(r).

A join dependency * [R1, R2, ..., Rn] over R is trivial if it is satisfied by every relation r(R).

133
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Fourth Normal Form
A relation R is in Fourth Normal Form (4NF) if and only if the following conditions are satisfied simultaneously:
R is already in 3NF or BCNF.
If it contains no multi-valued dependencies.
What is Multi-Valued Dependency (MVD)?
MVD is the dependency where one attribute value is potentially a 'multi-valued fact' about another. Consider the
table

134
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
In this example, 'Address' is a multi-valued fact 'Customer Name' and the converse is also true.
For example, the attribute 'Address' takes on the two values 'New Delhi' and 'Amritsar'for the single
'Customer_Name' value 'Raj'. The attribute 'Customer_Name' takes on the values 'Raj' and 'Suneet' for the single
'address; value 'Amritsar'.
MVD can be defined informally as follows:
MVDs occur when two or more independent multi valued facts about the same attribute occur within the same
table. It means that if in a relation R having A, B and C as attributes, B and Care multi-value facts about A, which is
represented as A- -B and A J C, then multi value dependency exist only if B and C are independent of each other.
There are two things to note about this definition.
Firstly, in order for a table to contain MVD, it must have three or more attributes.
Secondly, it is possible to have a table containing two or more attributes which are interdependent multi valued
facts about another attribute.
This does not give rise to an MVD. The attributes giving rise to the multi-valued facts must be independent of each
other consider the following table:

The table lists students, the textbooks; they have borrowed, the librarians issuing them and the date of borrowing.
It contains three multi-valued facts about students, the books they have borrowed, the librarians who have issued
these books to the and the dates upon which the books were borrowed. However, these multi-valued facts are not
independent of each other. There is clearly an association between librarians, the textbooks they have issued and
the dates upon which they issued the books. Therefore, there are no MVDs in the table. Note that there is no
redundant information in this table. The fact that student 'Ankit', for example, has borrowed the book 'Mechanics'
is recorded twice, but these are different borrowing, one in April and the other in June and therefore constitute
different items of information.
Now consider another table example involving Course, Student_name and text_book.
135
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
This table lists students, the courses they attend and the textbooks they use for these courses. The text books are
prescribed by the authorities for each course, that is, the students have no say in the matter. Clearly the attributes
'Student_name' and 'Text_book' are multivalued facts about the attribute 'Course'. However, since a student has
no influence over the text books to be used for a course, these multi-valued facts about courses are independent of
each other. Thus the table contains an MVD. Multi-value facts are represented by.
Here, in above database following MVDs exists:
Course --> --> Student_name
Course --> --> Text book
Here, Student_name and Text_book are independent of each other.

Anomalies of database with MVDs

This form of the table is obviously full of anomalies. If a new student join the physics. we have to make two
insertions for that student in the database, which is equal to the number of physics textbooks. Consider the
problem if there are hundred textbooks for a subject. Similarly, if a new textbook is introduced for a course, then
again we have to make multiple insertions in the database, which is equal to number of students for that course.
So, there is a high degree of redundancy in the database, which will lead to update problems.
The above database is in First, Second and Third normal form because for each row column intersection we have
at-most single entry and primary key is the combination of three columns (Course, Student_name, Text_book). So,
it does not have any non-key attribute. It satisfies second and third normal form because it only refers to non-key
attributes. The relation is also in BCNF, since all three attributes concatenated together constitute its key, yet it is
clearly contained anomalies and requires decomposition with the help of fourth normal form.
Solution of above anomalies with Fourth Normal Form
This problem of MVD is handled in Fourth Normal Form. Here, is the rule for transforming a relation to 4NF given
by Fagin.

Rule to transform a relation into Fourth Normal Form

A relation R having A, B, and C, as attributes can be non-loss-decomposed into two projections R1(A,B) and R2(A,C)
if and only if the MVD A--> --> B|C hold in R.
Looking again at the un-decomposed COURSE_STUDENT_BOOK table, it contains a multi-valued dependency as
shown below:
Course ---> --> Student_name
Course ---> --> Text_book
136
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
To put it into 4NF, two separate tables are formed as shown below:
COURSE_STUDENT (Course, Student_name)
COURSE_BOOK (Course, text_book)

Now, we can easily check that all the above anomalies of STUDENT_ COURSE_BOOK database are removed. For
example, if now a new student joins a course then we have to make only one insertion in COURSE_STUDENT table
and if a new book introduced for a course then again we have to make a single entry in COURSE_BOOK table, so
this modified database eliminate the problem of redundancy which also solves the update problems.

Fifth Normal Form(5NF)

A relation R is in Fifth Normal Form (5NF) if and only if the following conditions are satisfied simultaneously:
1. R is already in 4NF.
2. It cannot be further non-loss decomposed.
5NF is of little practical use to the database designer, but it is of interest from a theoretical point of view and a
discussion of it is included here to complete the picture of the further normal forms.

137
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
In all of the further normal forms discussed so far, no loss decomposition was achieved by the decomposing of a
single table into two separate tables. No loss decomposition is possible because of the availability of the join
operator as part of the relational model. In considering 5NF, consideration must be given to tables where this non-
loss decomposition can only be achieved by decomposition into three or more separate tables. Such decomposition
is not always possible as is shown by the following example.
Consider the table
AGENT_COMPANY_PRODUCT (Agent, Company, Product _Name)

This table lists agents, the companies they work for and the products they sell for those companies. 'The agents do
not necessarily sell all the products supplied by the companies they do business with. An example of this table
might be:

The table is necessary in order to show all the information required. Suneet, for example, sells ABC's Nuts and
Screws, but not ABC's Bolts. Raj is not an age it for CDE and does not sell ABC's Nuts or Screws. The table is in 4NF
because it contains no multi-valued dependency. It does, however, contain an element of redundancy in that it
records the fact that Suneet is an agent for ABC twice. But there is no way of eliminating this redundancy without
losing information. Suppose that the table is decomposed into its two projections, PI and P2.

The redundancy has been eliminated, but the information about which companies make which products and which
of these products they supply to which agents has been lost. The natural join of these projections over the 'agent'
columns is:

The table resulting from this join is spurious, since the asterisked row of the table contains incorrect information.
Now suppose that the original table were to be decomposed into three tables, the two projections, P I and P2
which have already shown, and the final, possible projection, P3.
If a join is taken of all three projections, first of PI and P2 with the (spurious) result shown above, and then of this
result with P3 over the 'Company' and 'Product name' column, the following table is obtained:

138
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
This still contains a spurious row. The order in which the joins are performed makes no difference to the final
result. It is not simply possible of decompose the 'AGENT_COMPANY_PRODUCT' table, populated as shown,
without losing information. Thus, it has to be accepted that it is not possible· to eliminate all redundancies using
normalization techniques, because it cannot be assumed that all decompositions will be non-loss.
But now consider the different case where, if an agent is an agent for a company and that company makes a
product, then he always sells that product for the company. Under these circumstances, the 'agent company
product' table as shown below:

The assumption being that ABC makes both Nuts and Bolts and that CDE makes Bolts only. This table can be
decomposed into its three projections without loss of information as demonstrated below:

All redundancy has been removed, if the natural join of PI and P2 IS taken, the result is:

139
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The spurious row as asterisked. Now, if this result is joined with P3 over the column 'company 'product_name' the
following table is obtained:

This is a correct recomposition of the original table and no loss decomposition into the three projections was
achieved. Again, the order in which the joins are performed does not affect the final result. The original table,
therefore, violated 5NF simply because it was non-loss decomposable into its three projections.
In the first case exemplified above, non-loss decomposition of the 'agent_company -product' table was not
possible. In the second it was. If a table is nonloss decomposable as in the second case, it is said to be in violation of
5NF. The difference, of course, lay in certain semantic properties of the information being represented. These
properties were not understandable simply by looking at the table, but had to be supplemented by further
information about the relationship between products, agents and companies.
Detecting that a table violates 5 NF is very difficult in practice and for this reason this normal form has little if any
practical application. The theoretical concept of fifth normal form is discussed in the following paragraphs.
Suppose that the statement, 'The agent50mpany -product' table is equal to the join of its three projections is to
hold true, this is another way of saying that it Can be non-loss decomposed into its three projections and is
equivalent to saying.
IF the tuple 'agent X, company Y' appears in PI
AND the tuple 'agent X, product Z' appears in P2
AND the tuple 'company Y, product Z' appears in P3
Then the row 'agent X, company Y, product Z' must have appeared in 'agent_company _product'.
If the reader cares to re-examine the projections PI, P2, and P3 from the two versions of ' the table which were
illustrated earlier, then, it will be seen that the earlier version which was in 5NF does not confirm to the above rule,
whereas the later version, which violated
5NF does.
The rule is referred to as a Join. Dependency, because it holds good only if a table can be reconstituted without loss
of information from the join of certain specified projections it.

140
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The notation used for a join dependency on table Tips:
*(X, Y, Z)
Where X, Y ... Z are projections of T.
Table T is said to satisfy the above join dependency, if it is equal to the join of the projections X, Y, Z.
Thus, the second example given of the table 'agent_company product' can be said to satisfy the join dependency:
*(PI, P2, P3)
In the discussion of the other further normal forms use was made of the concepts of functional and multi-valued
dependencies. In dealing with 5NF the concept of join dependency has been introduced (in a very informal way).

5NF is defined by the statement

A table T is in fifth normal form if every join dependency in T is a consequence only of the candidate keys of T.
The second version of the table 'agent_company product' illustrated earlier’ violated
5NF, because the join dependency *(agent, company, product_name) was not a consequence only of the primary
key for the table, but also a consequence of the tuple formation rule which was given earlier,
In the first, example of 'agent_companYjJroduct' there was no application of this rule, hence no join dependency
other than that on the primary key. Thus, the table was in 5NF.
It can be shown that if a table is in 5NF, then, because join dependencies are the 'ultimate' form of dependency, it
must also be in 4NF and thus confirm to all the further normal forms. The problem with this is that detecting join
dependencies is, in practice, very difficult. For this reason, 5NF is largely of academic interest.

***Thank You***
Tutor: Biran Limbu

141
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Course detail and pedagogy of
BBM (Bachelor of Business Management) 6th Semester
COM 312: Database Management
Credits: 3
Lecture Hours: 48
Course Objectives
The main objective of this module is to provide strong theoretical and practical knowledge of the
database management system.
Course Description
Database system, Data Abstraction, Data Models, Database users, Entity-Relation Model, Constraints,
E-R Diagrams, Design of E-R Database Schema, Relational Data Model, Structure of Relational
Database, Relational Algebra, Fundamental Operations, Additional Operations, Modifying the
database, Structured Query Language Data Definition Language, Data manipulation Language,
Transaction Control Language, Join operations, Integrity Constraints, Assertion, Triggers, Relational
database design issues, Normalization, Database Governance, Database Management, Transaction
Management.
Course Details
Unit 1: Introduction LH 6
Database Management Systems
Purpose of Database Systems
Data Abstraction
Data Models
• The E-R Model
• The Object-Oriented Model
• The Relational Model
• The Network Model
• The Hierarchical Model
• Physical Data Models
Instances and Schemes
Data Independence
Database Administrator
Database Users
Application Architecture (One tier, two tier and n-tire)

142
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Overall Database System Structure and Components
2
Unit 2: Entity-Relationship Model LH 6
2.1 Entities and Entity Sets
2.2 Relationships and Relationship Sets
2.3 Attributes
2.4 Mapping Constraints
2.5 Keys (Super key, Candidate key and Primary key)
2.5.1 Primary Keys for Entity Sets and Relationship Sets
2.6 The Entity Relationship Diagram
2.7 Reducing E-R Diagrams to Tables
2.7.1 Representation of Strong Entity Sets
2.7.2 Representation of Weak Entity Sets
2.7.3 Representation of Relationship Sets
2.8 Generalization and Specialization
2.9 Aggregation
2.10 Mapping Cardinalities
2.10.1 Representation of Mapping Cardinalities in E-R Diagram
2.11 Use of Entity or Relationship Sets
2.12 Use of Extended E-R Features
2.13 Design of an E-R Database Scheme ( Case study)
Unit 3: Relational Model LH 7
3.1 Structure of Relational Database
3.2 Basic Structure
3.3 Database Scheme
3.4 Keys
3.5 Query Languages
3.6 The Relational Algebra
3.6.1 Fundamental Operations
3.6.2 Formal Definition of Relational Algebra
3.6.3 Additional Operations
3.7 Modifying the Database
3.7.1 Deletion
3.7.2 Insertions
3.7.3 Updating
3.8 Views and View Definition in Relational Algebra
Unit 4: Structured Query Language (SQL) LH 6
4.1 Background
3
4.2 Data Definition Language
4.2.1 Domain Types in SQL
4.2.2 Schema Definition in SQL
4
4.3 Data Manipulation Language
4.3.1 The select Clause
4.3.2 The where Clause

143
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
4.3.3 The from Clause
4.3.4 The Rename Operation
4.3.5 Tuple Variables
4.3.6 String Operations
4.3.7 Ordering the Display of Tuples
4.3.8 Duplicate Tuples
4.4 Set Operations
4.5 Aggregate Functions
4.6 Null Values
4.7 Nested Subqueries
4.7.1 Set Membership
4.7.2 Set Comparison
4.7.3 Test for Empty Relations
4.7.4 Test for the Absence of Duplicate Tuples
4.8 Derived Relations
4.8.1 Views
4.9 Modification of the Database
4.9.1 Deletion
4.9.2 Insertion
4.9.3 Updates
4.9.4 Updates
4.9.5 Update of a View
4.10 Joined Relations
4.10.1 Join types and Conditions
4.11 Embedded SQL
4.12 Dynamic SQL
4.13 Transaction Control Language (Commit, Rollback)
Unit 5: Integrity Constraints LH 3
5.1 Domain Constraints
5.2 Referential Integrity
5.2.1 Basic Concepts
5.2.2 Referential Integrity in the E-R Model
5.2.3 Database Modification
5.2.4 Referential Integrity in SQL
5
5.3 Assertions
5.4 Triggers
Unit 6: Relational Database Design LH 5
6.1 Pitfalls in Relational DB Design
6.1.1 Representation of Information
6.1.2 Anomalies
6.2 Functional Dependencies
6.2.1 Basic Concepts
6.2.2 Closure of a Set of Functional Dependencies
6.2.3 Closure of Attribute Sets
6.3 Decomposition
6.3.1 Lossless-Join Decomposition
144
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
6.3.2 Dependency Preservation
6.4 Normalization
6.4.1 First Normal Form
6.4.2 Second Normal Form
6.4.3 Third Normal Form
6.4.4 Boyce-Codd Normal Form
6.4.5 Comparison of BCNF and 3NF
Unit 7: Data Governance LH 4
7.1 Introduction
7.2 Data governance drivers
7.3 Data governance initiatives
Unit 8: Database Management LH 6
8.1 Data maintenance
8.2 Data quality Management: Data cleansing, data integrity, Data enrichment, Data quality
8.3 Data Security Management: Data access, Data erasure, Data Privacy, Data Security
Unit 9: Transaction Management LH 5
9.1 ACID Properties
9.2. Transaction States
9.2.1 Implementation of Atomicity and Durability
9.2.1 Serializability
9.2.3 Basic Concept of Concurrency Control and Recovery
9.2.4 Locking Protocols
6
Note:
➢ The students are required to undertake a project work. The project work can be done
individually or in group (at most 4 - 5 students). The format of the project report is as
follows:
o Project Description
o Description of entities or object considered in the project
o Algorithm or Diagram showing description of project
o Conclusion of the project
The project report should be original, and the reproduction of others’ work is strictly
prohibited. Number of pages of the report should be at least 4.
Total lecture Hour: 45

145
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.

NEB Class 12 Computer Notes
0% (1)
NEB Class 12 Computer Notes
216 pages
DBMS MCQs: Comprehensive Question Sets
83% (12)
DBMS MCQs: Comprehensive Question Sets
71 pages
DBMS Unit-1 Notes
No ratings yet
DBMS Unit-1 Notes
22 pages
Computer Integrated Manufacturing
No ratings yet
Computer Integrated Manufacturing
24 pages
Chapter 1 DBMS
100% (1)
Chapter 1 DBMS
32 pages
Introduction To Databases PDF Notes
No ratings yet
Introduction To Databases PDF Notes
6 pages
Database Notes New
No ratings yet
Database Notes New
69 pages
IM 101 - Fundamentals of Database Systems - Unit 1
No ratings yet
IM 101 - Fundamentals of Database Systems - Unit 1
13 pages
DBS Unit 1
No ratings yet
DBS Unit 1
35 pages
1 Introduction To Database
No ratings yet
1 Introduction To Database
35 pages
DBMS Ctevt Students
100% (1)
DBMS Ctevt Students
230 pages
Chapter 1 Introduction To Database Systems
No ratings yet
Chapter 1 Introduction To Database Systems
36 pages
Database Administration
No ratings yet
Database Administration
65 pages
Unit 1
No ratings yet
Unit 1
35 pages
Database - Wikipediaa
No ratings yet
Database - Wikipediaa
25 pages
Database - Hospital
50% (2)
Database - Hospital
60 pages
Lect # 1 - DBMS Fundamentals
No ratings yet
Lect # 1 - DBMS Fundamentals
18 pages
Access & Use Database Application
100% (1)
Access & Use Database Application
57 pages
Dbms - Introduction-1 - 1
No ratings yet
Dbms - Introduction-1 - 1
87 pages
DBMS Notes
No ratings yet
DBMS Notes
19 pages
UNIT1DBMS
No ratings yet
UNIT1DBMS
55 pages
Absolute Beginner S Guide To Databases 1st Edition John Petersen Newest Edition 2025
100% (3)
Absolute Beginner S Guide To Databases 1st Edition John Petersen Newest Edition 2025
128 pages
Database Management Systems Guide
No ratings yet
Database Management Systems Guide
38 pages
DBMS Unit-1
No ratings yet
DBMS Unit-1
37 pages
Imperia+++++++L Leadership School Course Outlines
100% (1)
Imperia+++++++L Leadership School Course Outlines
25 pages
DBMS
No ratings yet
DBMS
3 pages
DBMS Unit 1
No ratings yet
DBMS Unit 1
23 pages
DBMS - Unit 1
No ratings yet
DBMS - Unit 1
34 pages
DBMS
No ratings yet
DBMS
6 pages
Database Management System: Dr. Neha Gulati University Business School Panjab University
100% (1)
Database Management System: Dr. Neha Gulati University Business School Panjab University
30 pages
DBMS Essentials for Beginners
No ratings yet
DBMS Essentials for Beginners
24 pages
Designing Databases
100% (1)
Designing Databases
27 pages
C.S. Project
No ratings yet
C.S. Project
19 pages
Chapter 1 Databases and Database Users Data
No ratings yet
Chapter 1 Databases and Database Users Data
8 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
DBMS 01
No ratings yet
DBMS 01
11 pages
Database
No ratings yet
Database
129 pages
Dbms MBA Notes
50% (2)
Dbms MBA Notes
125 pages
DBMS Tutorial
No ratings yet
DBMS Tutorial
173 pages
DBMS Unit1 Notes
No ratings yet
DBMS Unit1 Notes
25 pages
Introduction To Data Base
No ratings yet
Introduction To Data Base
35 pages
It CS-4002
No ratings yet
It CS-4002
84 pages
Unit - 1
No ratings yet
Unit - 1
35 pages
DBMS (R20) Unit - 1
No ratings yet
DBMS (R20) Unit - 1
14 pages
Database Management System
No ratings yet
Database Management System
88 pages
Dbms-Module 1
No ratings yet
Dbms-Module 1
50 pages
Dbms
No ratings yet
Dbms
13 pages
2.introduction of DBMS
No ratings yet
2.introduction of DBMS
72 pages
Unit 4
No ratings yet
Unit 4
12 pages
Simon Kamau Dbms JT
No ratings yet
Simon Kamau Dbms JT
203 pages
Management Information Systems 2
No ratings yet
Management Information Systems 2
208 pages
Basic Concepts of DBMS Guide
No ratings yet
Basic Concepts of DBMS Guide
17 pages
Unit 4
No ratings yet
Unit 4
16 pages
DBMS
No ratings yet
DBMS
58 pages
Concepts of Database
No ratings yet
Concepts of Database
3 pages
Database Management System by Simon (BUBT)
No ratings yet
Database Management System by Simon (BUBT)
56 pages
Database Design Essentials
No ratings yet
Database Design Essentials
31 pages
Database Management Systems
No ratings yet
Database Management Systems
7 pages
Unit 1
No ratings yet
Unit 1
66 pages
Data Base Management System
No ratings yet
Data Base Management System
3 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
Lecture 1.2
No ratings yet
Lecture 1.2
32 pages
DBMS Unit-1 Notes
No ratings yet
DBMS Unit-1 Notes
34 pages
What Is A Database?
No ratings yet
What Is A Database?
4 pages
Introductiontodatabases 151106233350 Lva1 App6892
No ratings yet
Introductiontodatabases 151106233350 Lva1 App6892
33 pages
Database Management System Basics
No ratings yet
Database Management System Basics
11 pages
Database Systems Overview
No ratings yet
Database Systems Overview
75 pages
Understanding Data Models in Databases
No ratings yet
Understanding Data Models in Databases
60 pages
Database: "Database Software" Redirects Here. For The Computer Program, See
No ratings yet
Database: "Database Software" Redirects Here. For The Computer Program, See
18 pages
Database Ass
No ratings yet
Database Ass
4 pages
DBMS Unit I
No ratings yet
DBMS Unit I
30 pages
Database Models for IT Students
No ratings yet
Database Models for IT Students
41 pages
Mis Chapter 3
No ratings yet
Mis Chapter 3
17 pages
Hierarchichal Database Model
No ratings yet
Hierarchichal Database Model
3 pages
Dbms Report6
No ratings yet
Dbms Report6
31 pages
EIS MCQ by Swapnil Patni
No ratings yet
EIS MCQ by Swapnil Patni
73 pages
Wa0009.
No ratings yet
Wa0009.
6 pages
Student Registration System
No ratings yet
Student Registration System
6 pages
Assignment (Data Models of DBMS)
No ratings yet
Assignment (Data Models of DBMS)
5 pages
Faunadb: A Guide For Relational Users: Technical Whitepaper
No ratings yet
Faunadb: A Guide For Relational Users: Technical Whitepaper
26 pages
Notes Unit 1
No ratings yet
Notes Unit 1
31 pages
DB Module Final
No ratings yet
DB Module Final
43 pages
NDOUtils DB Model PDF
No ratings yet
NDOUtils DB Model PDF
57 pages
CIS Quiz 2 Finals
No ratings yet
CIS Quiz 2 Finals
7 pages
Data Base Chapter 1 - Iot
No ratings yet
Data Base Chapter 1 - Iot
10 pages
Oracle - Content Writing - Chapter 1
No ratings yet
Oracle - Content Writing - Chapter 1
5 pages
Theory of Database Review Questions
No ratings yet
Theory of Database Review Questions
6 pages