BBM 6th Sem Database Notes Updated 2024
BBM 6th Sem Database Notes Updated 2024
Database (DB)
Definition: A database is a collection of information that is organized so that it can easily be accessed, managed,
and updated.
Databases and database technology have had a major impact on the growing use of computers. It is fair to say that
databases play a critical role in almost all areas where computers are used, including business, electronic
commerce, social media, engineering, medicine, genetics, law, education, and library science. The word database is
so commonly used that we must begin by defining what a database is. Our initial definition is quite general. A
database is a collection of related data. By data, we mean known facts that can be recorded and that have implicit
meaning. For example, consider the names, telephone numbers, and addresses of the people you know. Nowadays,
this data is typically stored in mobile phones, which have their own simple database software. This data can also be
recorded in an indexed address book or stored on a hard drive, using a personal computer and software such as
Microsoft Access or Excel. This collection of related data with an implicit meaning is a database. The preceding
definition of database is quite general; for example, we may consider the collection of words that make up this
page of text to be related data and hence to constitute a database. However, the common use of the term
database is usually more restricted. A database has the following implicit properties:
■ A database represents some aspect of the real world, sometimes called the mini-world or the universe of
discourse (UoD). Changes to the mini-world are reflected in the database.
■ A database is a logically coherent collection of data with some inherent meaning. A random assortment of data
cannot correctly be referred to as a database.
■ A database is designed, built, and populated with data for a specific purpose. It has an intended group of users
and some preconceived applications in which these users are interested.
In other words, a database has some source from which data is derived, some degree of interaction with events in
the real world, and an audience that is actively interested in its contents. The end users of a database may perform
business transactions (for example, a customer buys a camera) or events may happen (for example, an employee
has a baby) that cause the information in the database to change. In order for a database to be accurate and
reliable at all times, it must be a true reflection of the mini-world that it represents; therefore, changes must be
reflected in the database as soon as possible.
1
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A database may be generated and maintained manually or it may be computerized. For example, a library card
catalog is a database that may be created and maintained manually. A computerized database may be created and
maintained either by a group of application programs written specifically for that task or by a database
management system.
A Database Management System (DBMS) is a computerized system that enables users to create and maintain a
database. The DBMS is a general-purpose software system that facilitates the processes of defining, constructing,
manipulating, and sharing databases among various users and applications. Defining a database involves specifying
the data types, structures, and constraints of the data to be stored in the database.
The database definition or descriptive information is also stored by the DBMS in the form of a database catalog or
dictionary; it is called meta-data. For example, Tables of all tables in a database, their names, sizes, and number of
rows in each table. Tables of columns in each database, what tables they are used in, and the type of data stored in
each column.
Constructing the database is the process of storing the data on some storage medium that is controlled by the
DBMS. Manipulating a database includes functions such as querying the database to retrieve specific data, updating
the database to reflect changes in the mini-world, and generating reports from the data. Sharing a database allows
multiple users and programs to access the database simultaneously.
An application program accesses the database by sending queries or requests for data to the DBMS. A query2
typically causes some data to be retrieved; a transaction may cause some data to be read and some data to be
written into the database. Other important functions provided by the DBMS include protecting the database
and maintaining it over a long period of time. Protection includes system protection against hardware or software
malfunction (or crashes) and security protection against unauthorized or malicious access. A typical large database
may have a life cycle of many years, so the DBMS must be able to maintain the database system by allowing the
system to evolve as requirements change over time.
2
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Management System (DBMS) Vs. File Management System (FMS)
A Database Management System (DMS) is a combination of computer software, hardware, and information
designed to electronically manipulate data via computer processing. Two types of database management systems
are DBMS’s and FMS’s. In simple terms, a File Management System (FMS) is a Database Management System that
allows access to single files or tables at a time. FMS’s accommodate flat files that have no relation to other files.
The FMS was the predecessor for the Database Management System (DBMS), which allows access to multiple files
or tables at a time.
3
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
File Management System (FMS):
This typical file management system is supported by a conventional operating system. The system stores
permanent records in various files, and it needs different application programs to extract records from, and add
records to, the appropriate files. Before database management systems (DBMSs) came along, organizations usually
stored information in such systems. Keeping organizational information in a file-processing system has a number of
Advantages:
Advantages
• Simpler to use
• Less expensive
• Fits the needs of many small businesses and home users
• Popular FMS’s are packaged along with the operating systems of personal computers (i.e. Microsoft
Cardfile and Microsoft Works)
Major Disadvantages:
File management system has a number of major disadvantages:
3) Data isolation:
Because data are scattered in various files, and files may be in different formats, writing new application programs
to retrieve the appropriate data is difficult.
4
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
4) Integrity problems:
The data values stored in the database must satisfy certain types of consistency constraints. For example, the
balance of a bank account may never fall below a prescribed amount (say, $25). Developers enforce these
constraints in the system by adding appropriate code in the various application programs. However, when new
constraints are added, it is difficult to change the programs to enforce them. The problem is harder when
constraints involve several data items from different files.
5) Atomicity problems:
A computer system, like any other mechanical or electrical device, is subject to failure.
In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed
prior to the failure. Consider a program to transfer $50 from account A to account B.
If a system failure occurs during the execution of the program, it is possible that the $50 was removed from
account A but was not credited to account B, resulting in an inconsistent database state. This inconsistent state
must be removed. The fund transfer must be atomic that is if failure occurs whatever updates performed must be
reversible. It is difficult to ensure atomicity in a conventional file-processing system.
6) Concurrent-access anomalies:
For the sake of overall performance of the system and faster response, many systems allow multiple users to
update the data simultaneously. In such an environment, interaction of concurrent updates may result in
inconsistent data. Consider bank account A, containing $500. If two customers withdraw funds (say $50 and $100
respectively) from account A at about the same time, the result of the concurrent executions may leave the
account in an incorrect (or inconsistent) state. To guard against this possibility, the system must maintain some
form of supervision. But supervision is difficult to provide because data may be accessed by many different
application programs that have not been coordinated. DBMS provides locking mechanism to guard such a
anomalies.
7) Security Problem:
Not every user of the database system should be able to access all the data. Based on user’s role or privilege users
are restricted to access all data.
5
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Management System (DBMS):
Database consists of logically related data stored in a single repository. It Provides advantages over file system
management approach:
• Reduction of Redundancies
• Shared Data
• Data Independence
• Improved Integrity
• Efficient Data Access
• Multiple User Interfaces
• Representing complex relationship among data
• Improved Security
• Improved Backup and Recovery
• Support for concurrent transactions
Reduction of Redundancies:
In database approach data can be stored at a single place or with controlled redundancy under DBMS,
which saves space and does not permit inconsistency.
Shared Data:
A DBMS allows the sharing of database under its control by any number of application programs or
users. A database belongs to the entire organization and is shared by all authorized users.
Data Independence:
Database Management systems separates data descriptions from data. Hence it is not affected by
changes. This is called Data Independence, where details of data are not exposed. DBMS provides an
abstract view and hides details. For example, logically we can say that the interface or window to data
provided by DBMS to a user may still be the same although the internal structure of the data may be
changed.
6
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Improved Integrity:
Data Integrity refers to validity and consistency of data. Data Integrity means that the data should be
accurate and consistent. This is done by providing some checks or constraints. These are consistency
rules that the database is not permitted to violate. Constraints may apply to data items within a record
or relationships between records. For example, the age of an employee can be between 18 and 70 years
only. While entering the data for the age of an employee, the database should check this.
Efficient Data Access:
DBMS utilizes techniques to store and retrieve the data efficiently at least for unforeseen queries. A
complex DBMS should be able to provide services to end users, where they can efficiently retrieve the
data almost immediately.
Multiple User Interfaces:
DBMS should be able to provide a variety of interfaces. This includes ─
a. query language for casual users,
b. programming language interfaces for application programmers,
c. forms and codes for parametric users,
d. menu driven interfaces, and
e. natural language interfaces for standalone users, these interfaces are still not available in standard form
with commercial database.
Representing complex relationship among data:
A database may include varieties of data interrelated to each other in many ways. A DBMS must have the capability
to represent a variety of relationships among the data as well as to retrieve and update related data easily and
efficiently.
Improved Security:
Data is vital to any organization and also confidential. In a shared system where multiple users share the
data, all information should not be shared by all users. For example, the salary of the employees should
not be visible to anyone other than the department dealing in this. Hence, database should be protected
from unauthorized users. This is done by Database Administrator (DBA) by providing the usernames and
passwords only to authorized users as well as granting privileges or the type of operation allowed. This is
done by using security and authorization subsystem. Only authorized users may use the database and their access
types can be restricted to only retrieval, insert, update or delete or any of these. For example, the Branch Manager
of any company may have access to all data whereas the Sales Assistant may not have access to salary details.
Improved Backup and Recovery:
DBMS provides facilities for recovering the hardware and software failures. A backup and recovery
subsystem is responsible for this. In case a program fails, it restores the database to a state in which it
was before the execution of the program.
Support for concurrent transactions:
A transaction is defined as the unit of work. For example, a bank may be involved in a transaction where
an amount of Rs.5000/- is transferred from account X to account Y. A DBMS also allows multiple
transactions to occur simultaneously.
7
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
DBMS ARCHITECTURE (Three Schema Architecture or Three Level of Abstraction)
Database Management Systems are very complex, sophisticated software applications that provide reliable
management of large amounts of data. There are two different ways to look at the architecture of a DBMS:
The logical DBMS architecture and the physical DBMS architecture.
The logical architecture deals with the way data is stored and presented to users, while the physical architecture is
concerned with the software components that make up a DBMS.
The logical architecture describes how data in the database is perceived by users. It is not concerned
with how the data is handled and processed by the DBMS, but only with how it looks. The method of
data storage on the underlying file system is not revealed, and the users can manipulate the data
without worrying about where it is located or how it is actually stored. The physical architecture describes the
software components used to enter and process data, and how these software components are related and
interconnected. This results in the database having different levels of abstraction. There are three levels of
abstraction:
1, The external Schema or view level
2, The conceptual Schema level
3, The internal or physical Schema level
8
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The external or view level:
The external or view level is the highest level of abstraction of database. It provides a window on the
conceptual view, which allows the user to see only the data of interest to them. The user can be either
an application program or an end user. There can be many external views as any number of external
schema can be defined and they can overlap each other. It consists of the definition of logical records
and relationships in the external view. It also contains the methods for deriving the objects such as
entities, attributes and relationships in the external view from the Conceptual view.
9
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Introductions of Data models
Definition: A data model is a collection of concepts that can be used to describe the structure of a database.
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another
and to properties of the real world. For instance, a data model may specify that a data element representing a car
comprise a number of other elements which in turn represent the color, size and owner of the car. According to
Hoberman (2009), "A data model is a way finding tool for both business and IT professionals, which uses a set of
symbols and text to precisely explain a subset of real information to improve communication within the
organization and thereby lead to a more flexible and stable application environment."
The main aim of data models is to support the development of information systems by providing the definition and
format of data. According to West and Fowler (1999) "if this is done consistently across systems then compatibility
of data can be achieved. If the same data structures are used to store and access data then different applications
can share data. The results of this are indicated above. However, systems and interfaces often cost more than they
should, to build, operate, and maintain. They may also constrain the business rather than support it. A major cause
is that the quality of the data models implemented in systems and interfaces is poor".
Three perspectives:
A data model instance may be one of three kinds according to ANSI in 1975:
Conceptual data model : describes the semantics of a domain, being the scope of the model. For example, it may
be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of
things of significance in the domain, and relationship assertions about associations between pairs of entity classes.
A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that
sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the
model. The number of objects should be very small and focused on key concepts. Try to limit this model to one
page, although for extremely large organizations or complex projects, the model might span two or more pages. [7]
Logical data model : describes the semantics, as represented by a particular data manipulation technology. This
consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things.
Physical data model : describes the physical means by which data are stored. This is concerned with partitions,
CPUs, tablespaces, and the like.
Database model
Flat model
This may not strictly qualify as a data model. The flat (or table) model consists of a single, two-dimensional array of
data elements, where all members of a given column are assumed to be similar values, and all members of a row
are assumed to be related to one another.
10
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Hierarchical model
In this model data is organized into a tree-like structure, implying a single upward link in each record to describe
the nesting, and a sort field to keep the records in a particular order in each same-level list.
Network model
This model organizes data using two fundamental constructs, called records and sets. Records contain fields, and
sets define one-to-many relationships between records: one owner, many members.
11
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Entity Relationship Model:
The ER data model is based on a perception of real world that consist of collection of basic objects called entities
and relationship among these objects. In an ER model a database can be modeled as a collection of entities and
relationship among entities. It one of the conceptual data model and describes the information used by an
organization in a way that is independent of any implementation level issues and details. Entity Relationship model
can be overall logical structure of a database can be expressed graphically by E-R Diagram.
Relational model
Relational database Model is a collection of relations. A relation is nothing but a table of values. Every row in the
table represents a collection of related data values. These rows in the table denote a real-world entity or
relationship. The table name and column names are helpful to interpret the meaning of values in each row. The
data are represented as a set of relations. In the relational model, data are stored as tables. However, the physical
storage of the data is independent of the way the data are logically organized.
12
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Object-Oriented model
Similar to a relational database model, but objects, classes and inheritance are directly supported in database
schemas and in the query language. OODBMS main objectives are Encapsulation, Inheritance and polymorphism. In
Object Oriented Data Model, data and their relationships are contained in a single structure which is referred as
object in this data model. In this, real world problems are represented as objects with different attributes. All
objects have multiple relationships between them. Basically, it is combination of Object-Oriented programming and
Relational Database Model as it is clear from the following figure:
Object:
An object is an abstraction of a real-world entity or we can say it is an instance of class. Objects encapsulates
data and code into a single unit which provide data abstraction by hiding the implementation details from the
user. For example: Instances of Publication, Books, Reviewer.
Encapsulation:
Encapsulation is the ability to group data and mechanisms into a single object to provide access protection.
Through this process, pieces of information and details of how an object works are hidden, resulting in data and
function security. Classes interact with each other through methods without the need to know how particular
methods work.
13
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Inheritance:
Inheritance creates a hierarchical relationship between related classes while making parts of code reusable.
Defining new types inherits all the existing class fields and methods plus further extends them. The existing class
is the parent class, while the child class extends the parent.
Publication
Books
Reviewer
Polymorphism:
Polymorphism is originally a Greek word that means the ability to take multiple forms. In object-oriented
paradigm, polymorphism implies using operations in different ways, depending upon the instance they are
operating upon. Polymorphism allows objects with different internal structures to have a common external
interface. Polymorphism is particularly effective while implementing inheritance.
14
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Languages: DDL, DML
Database languages are used for read, update and store data in a database. There are several such languages that
can be used for this purpose; one of them is SQL (Structured Query Language). A DBMS must provide appropriate
languages and interfaces for each category of users to express database queries and updates. Database Languages
are used to create and maintain database on computer. There are large numbers of database languages like Oracle,
MySQL, MS Access, dBase, FoxPro etc. SQL statements commonly used in Oracle MS-SQL, My-SQL and MS Access
can be categorized as data definition language (DDL) and data manipulation language (DML).
• To drop database instances – DROP (DROP command removes a table from the database)
Syntax:
15
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
DROP DATABASE database_name
DROP TABLE table_name
• To delete tables in a database instance – TRUNCATE (TRUNCATE removes all rows from a table)
Syntax:
TRUNCATE TABLE table_name
16
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
privilege_name is the access right or privilege granted to the user. Some of the access rights are ALL,
EXECUTE, and SELECT.
object_name is the name of an database object like TABLE, VIEW, STORED PROC and SEQUENCE.
user_name is the name of the user to whom an access right is being granted.
PUBLIC is used to grant access rights to all users.
ROLES are a set of privileges grouped together.
WITH GRANT OPTION - allows a user to grant access rights to other users.
• To revoke access from user – REVOKE (REVOKE command removes user access rights or privileges to the
database objects.)
Syntax:
REVOKE privilege_name
ON object_name
FROM {user_name |PUBLIC |role_name}
Database users:
Database users are the one who really use and take the benefits of database. There will be different types of users
depending on their need and way of accessing the database.
Application Programmers - They are the developers who interact with the database by means of DML queries.
These DML queries are written in the application programs like C, C++, JAVA, Pascal etc. These queries are
converted into object code to communicate with the database. For example, writing a C program to generate the
report of employees who are working in particular department will involve a query to fetch the data from database.
It will include an embedded SQL query in the C Program.
Sophisticated Users - They are database developers, who write SQL queries to select/insert/delete/update data.
They do not use any application or programs to request the database. They directly interact with the database by
means of query language like SQL. These users will be scientists, engineers, analysts who thoroughly study SQL and
DBMS to apply the concepts in their requirement. In short, we can say this category includes designers and
developers of DBMS and SQL.
Specialized Users - These are also sophisticated users, but they write special database application programs. They
are the developers who develop the complex programs to the requirement.
Stand-alone Users - These users will have stand –alone database for their personal use. These kinds of database
will have readymade database packages which will have menus and graphical interfaces.
Native Users - these are the users who use the existing application to interact with the database. For example,
online library system, ticket booking systems, ATMs etc. which has existing application and users use them to
interact with the database to fulfill their requests.
17
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Administrators (DBA):
One of the main reasons for having the database management system is to have control of both data and
programs accessing that data. The person having such control over the system is called the database
administrator (DBA).
The life cycle of database starts from designing, implementing to administration of it. A database for any kind of
requirement needs to be designed perfectly so that it should work without any issues. Once all the design is
complete, it needs to be installed. Once this step is complete, users start using the database. The database grows
as the data grows in the database. When the database becomes huge, its performance comes down. Also accessing
the data from the database becomes challenge. There will be unused memory in database, making the memory
inevitably huge. These administration and maintenance of database is taken care by database Administrator – DBA.
A DBA has many responsibilities. A good performing database is in the hands of DBA.
• Installing and upgrading the DBMS Servers: - DBA is responsible for installing a new DBMS server for the
new projects. He is also responsible for upgrading these servers as there are new versions comes in the
market or requirement. If there is any failure in upgradation of the existing servers, he should be able
revert the new changes back to the older version, thus maintaining the DBMS working. He is also
responsible for updating the service packs/ hot fixes/ patches to the DBMS servers.
• Design and implementation: - Designing the database and implementing is also DBA’s responsibility. He
should be able to decide proper memory management, file organizations, error handling, log maintenance
etc. for the database.
• Performance tuning: - Since database is huge and it will have lots of tables, data, constraints and indices,
there will be variations in the performance from time to time. Also, because of some designing issues or
data growth, the database will not work as expected. It is responsibility of the DBA to tune the database
performance. He is responsible to make sure all the queries and programs works in fraction of seconds.
• Migrate database servers: - Sometimes, users using oracle would like to shift to SQL server or Netezza. It is
the responsibility of DBA to make sure that migration happens without any failure, and there is no data
loss.
• Backup and Recovery: - Proper backup and recovery programs needs to be developed by DBA and has to
be maintained him. This is one of the main responsibilities of DBA. Data/objects should be backed up
regularly so that if there is any crash, it should be recovered without much effort and data loss.
• Security: - DBA is responsible for creating various database users and roles, and giving them different levels
of access rights.
• Documentation: - DBA should be properly documenting all his activities so that if he quits or any new DBA
comes in, he should be able to understand the database without any effort. He should basically maintain all
his installation, backup, recovery, security methods. He should keep various reports about database
performance.
Types of DBA:
There are different kinds of DBA depending on the responsibility that he owns.
• Administrative DBA - This DBA is mainly concerned with installing, and maintaining DBMS servers. His
prime tasks are installing, backups, recovery, security, replications, memory management, configurations
and tuning. He is mainly responsible for all administrative tasks of a database.
18
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• Development DBA - He is responsible for creating queries and procedure for the requirement. Basically his
task is similar to any database developer.
• Database Architect - Database architect is responsible for creating and maintaining the users, roles, access
rights, tables, views, constraints and indexes. He is mainly responsible for designing the structure of the
database depending on the requirement. These structures will be used by developers and development
DBA to code.
• Data Warehouse DBA -DBA should be able to maintain the data and procedures from various sources in
the datawarehouse. These sources can be files, or any other programs. Here data and programs will be
from different sources. A good DBA should be able to keep the performance and function levels from these
sources at same pace to make the datawarehouse to work.
• Application DBA -He acts like a bridge between the application program and the database. He makes sure
all the application program is optimized to interact with the database. He ensures all the activities from
installing, upgrading, and patching, maintaining, backup, recovery to executing the records works without
any issues.
19
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Transaction Management
A transaction is one or more SQL statements that make up a unit of work performed against the database, and
either all the statements in a transaction are committed as a unit or all the statements are rolled back as a unit.
This unit of work typically satisfies a user request and ensures data integrity. For example, when use a computer to
transfer money from one bank account to another, the request involves a transaction: updating values stored in the
database for both accounts. For a transaction to be completed and database changes to be made permanent, a
transaction must be completed in its entirety.
Read A;
A = A – 100;
Write A;
Read B;
B = B + 100;
Write B;
A transaction is a logical unit of work that contains one or more SQL statements. A transaction is an atomic unit.
The effects of all the SQL statements in a transaction can be either all committed (applied to the database) or all
rolled back (undone from the database).
A transaction begins with the first executable SQL statement. A transaction ends when it is committed or rolled
back, either explicitly with a COMMIT or ROLLBACK statement or implicitly.
To illustrate the concept of a transaction, consider a banking database. When a bank customer transfers money
from a savings account to a checking account, the transaction can consist of three separate operations:
• Decrement the savings account
• Increment the checking account
• Record the transaction in the transaction journal
SQL must allow for two situations. If all three SQL statements can be performed to maintain the accounts in proper
balance, the effects of the transaction can be applied to the database. However, if a problem such as insufficient
funds, invalid account number, or a hardware failure prevents one or two of the statements in the transaction from
completing, the entire transaction must be rolled back so that the balance of all accounts is correct.
20
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A SQL statement that runs successfully is different from a committed transaction. Executing successfully means that
a single statement was:
• Parsed
• Found to be a valid SQL construction
• Run without error as an atomic unit. For example, all rows of a multi row update are changed.
However, until the transaction that contains the statement is committed, the transaction can be rolled back, and all
of the changes of the statement can be undone. A statement, rather than a transaction, runs successfully.
Committing means that a user has explicitly or implicitly requested that the changes in the transaction be made
permanent. An explicit request occurs when the user issues a COMMIT statement. An implicit request occurs after
normal termination of an application or completion of a data definition language (DDL) operation. The changes
made by the SQL statement(s) of a transaction become permanent and visible to other users only after that
transaction commits. Queries that are issued after the transaction commits will see the committed changes.
You can name a transaction using the SET TRANSACTION ... NAME statement before you start the transaction. This
makes it easier to monitor long-running transactions and to resolve in-doubt distributed transactions.
In a database, each transaction should maintain ACID (Atomicity, Consistency, Durability, Isolation) property to
meet the consistency and integrity of the database.
21
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Transaction Management: ACID (Atomicity, Consistency, Isolation, Durability)
The ACID model is one of the oldest and most important concepts of database theory. It sets forward four goals
that every database management system must strive to achieve: atomicity, consistency, isolation and durability. No
database that fails to meet any of these four goals can be considered reliable. ACID (Atomicity, Consistency,
Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the
context of databases, a single logical operation on the data is called a transaction.
Atomicity:
Atomicity states that database modifications must follow an all or nothing rule. Each transaction is said to be
atomic. If one part of the transaction fails, the entire transaction fails. It is critical that the database management
system maintain the atomic nature of transactions in spite of any DBMS, operating system or hardware failure.
Consistency:
Consistency states that only valid data will be written to the database. If, for some reason, a transaction is executed
that violates the databases consistency rules, the entire transaction will be rolled back and the database will be
restored to a state consistent with those rules. On the other hand, if a transaction successfully executes, it will take
the database from one state that is consistent with the rules to another state that is also consistent with the rules.
22
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Isolation:
It’s safe to say that at any given time on Amazon, there is far more than one transaction occurring on the platform…
In fact, an incredibly huge amount of database transactions are occurring simultaneously! For a database, isolation
refers to the ability to concurrently process multiple transactions in a way that one does not affect another. So,
imagine you and your neighbor are both trying to buy something from the same e-commerce platform at the same
time. There are 10 items for sale: your neighbor wants five and you want six. Isolation means that one of those
transactions would be completed ahead of the other one. In other words, if your neighbor clicked first, they will get
five items, and only five items will be remaining in stock. So you will only get to buy five items. If you clicked first,
you will get the six items you want, and they will only get four. Thus,. isolation ensures that eleven items aren’t sold
when only ten exist.
Durability:
Durability ensures that any transaction committed to the database will not be lost. Durability is ensured through
the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite
of any subsequent software or hardware failures.
States of Transaction
In a database, the transaction can be in one of the following states:
Active state
• The active state is the first state of every transaction. In this state, the transaction is being executed.
• For example: Insertion or deletion or updating a record is done here. But all the records are still not saved
to the database.
Partially committed
• In the partially committed state, a transaction executes its final operation, but the data is still not saved to
the database.
• In the total mark calculation example, a final display of the total marks step is executed in this state.
Committed
• A transaction is said to be in a committed state if it executes all its operations successfully. In this state, all
the effects are now permanently saved on the database system.
Failed state
• If any of the checks made by the database recovery system fails, then the transaction is said to be in the
failed state.
• In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then
the transaction will fail to execute.
23
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Aborted
• If any of the checks fail and the transaction has reached a failed state then the database recovery system
will make sure that the database is in its previous consistent state. If not then it will abort or roll back the
transaction to bring the database into a consistent state.
• If the transaction fails in the middle of the transaction then before executing the transaction, all the
executed transactions are rolled back to its consistent state.
• After aborting the transaction, the database recovery module will select one of the two operations:
✓ Re-start the transaction
✓ Kill the transaction
Serializability in DBMS
A schedule is serialized if it is equivalent to a serial schedule. A concurrent schedule must ensure it is the same as if
executed serially means one after another. It refers to the sequence of actions such as read, write, abort, commit
are performed in a serial manner.
Example
Let’s take two transactions T1 and T2,
If both transactions are performed without interfering each other then it is called as serial schedule, it can be
represented as follows – serial schedule
Non serial schedule − When a transaction is overlapped between the transaction T1 and T2.
Example
Consider the following example −
24
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Types of serializability
There are two types of serializability −
View serializability
A schedule is view-serializability if it is viewed equivalent to a serial schedule.
The rules it follows are as follows −
T1 is reading the initial value of A, then T2 also reads the initial value of A.
T1 is the reading value written by T2, then T2 also reads the value written by T1.
T1 is writing the final value, and then T2 also has the write operation as the final value.
Conflict serializability
It orders any conflicting operations in the same way as some serial execution. A pair of operations is said to conflict
if they operate on the same data item and one of them is a write operation.
That means
Readi(x) readj(x) - non conflict read-read operation
Readi(x) writej(x) - conflict read-write operation.
Writei(x) readj(x) - conflict write-read operation.
Writei(x) writej(x) - conflict write-write operation.
Recoverable schedule
Consider the following example −
25
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Here, transaction T2 is reading the value written by transaction T1 and the commit of T2 occurs after the commit of
T1. Hence, it is a recoverable schedule.
Again the recoverable schedule is divided into cascade less and strict schedule −
Cascadeless schedule
Given below is an example for the cascadeless schedule –
Here, the updated value of X is read by transaction T2 only after the commit of transaction T1. Hence, the schedule
is Cascadeless schedule.
Strict schedule
Given below is an example of strict schedule −
Here, transaction T2 reads and writes the updated or written value of transaction T1 only after the transaction T1
commits. Hence, the schedule is strict schedule.
Non-Recoverable Schedule
A schedule that is not recoverable is non-recoverable. If the commit operation of Ti doesn't occur before the
commit operation of Tj, it is non-recoverable.
T1 is recoverable because T1 is
not committed but T2 is Non-
Recoverable because T2 is
already committed.
26
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
T1 is not recoverable because T1
is committed.
Concurrency Control
Concurrency Control is the management procedure that is required for controlling concurrent execution of the
operations that take place on a database.
Concurrent Execution
• In a multi-user system, multiple users can access and use the same database at one time, which is known as
the concurrent execution of the database. It means that the same database is executed simultaneously on
a multi-user system by different users.
• While working on the database transactions, there occurs the requirement of using the database by
multiple users for performing different operations, and in that case, concurrent execution of the database
is performed.
• The thing is that the simultaneous execution that is performed should be done in an interleaved manner,
and no operation should affect the other executing operations, thus maintaining the consistency of the
database. Thus, on making the concurrent execution of the transaction operations, there occur several
challenging problems that need to be solved.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So, there is a need to manage
these two operations in the concurrent execution of the transactions as if these operations are not performed in an
interleaved manner, and the data may become inconsistent. So, the following problems occur with the Concurrent
Execution of the operations:
Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the read/write operations on the same database items
in an interleaved manner (i.e., concurrent execution) that makes the values of the items incorrect hence making the database
inconsistent. For example:
Consider the below diagram where two transactions TX and TY, are performed on the same account A where the balance of
account A is $300.
27
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
1. At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
2. At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and not updated/write).
3. Alternately, at time t3, transaction TY reads the value of account A that will be $300 only because TX didn't update the
value yet.
4. At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not updated/write).
5. At time t6, transaction TX writes the value of account A that will be updated as $250 only, as TY didn't update the
value yet.
6. Similarly, at time t7, transaction TY writes the values of account A, so it will write as done at time t4 that will be $400.
It means the value written by TX is lost, i.e., $250 is lost.
28
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Consider two transactions, TX and TY, performing the read/write operations on account A, having an available balance = $300.
The diagram is shown below:
1. At time t1, transaction TX reads the value from account A, i.e., $300.
2. At time t2, transaction TY reads the value from account A, i.e., $300.
3. At time t3, transaction TY updates the value of account A by adding $100 to the available balance, and then it
becomes $400.
4. At time t4, transaction TY writes the updated value, i.e., $400.
5. After that, at time t5, transaction TX reads the available value of account A, and that will be read as $400.
6. It means that within the same transaction TX, it reads two different values of account A, i.e., $ 300 initially, and after
updation made by transaction TY, it reads $400. It is an unrepeatable read and is therefore known as the
Unrepeatable read problem.
Concurrency Control Protocols
Different concurrency control protocols offer different benefits between the amount of concurrency they allow and
the amount of overhead that they impose. Following are the Concurrency Control techniques in DBMS:
1. Lock-Based Protocols
2. Two Phase Locking Protocol
3. Timestamp-Based Protocols
4. Validation-Based Protocols
1. Lock-based Protocols
Lock Based Protocols in DBMS is a mechanism in which a transaction cannot Read or Write the data until it acquires an
appropriate lock. Lock based protocols help to eliminate the concurrency problem in DBMS for simultaneous transactions by
locking or isolating a particular transaction to a single user.
Lock-based protocols are:
1. Binary Locks:
A Binary lock on a data item can either locked or unlocked states.
2. Shared Lock (S):
A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared between
transactions. This is because you will never have permission to update data on the data item.
For example,
consider a case where two transactions are reading the account balance of a person. The database will let them
read by placing a shared lock. However, if another transaction wants to update that account’s balance, shared lock
prevent it until the reading process is over.
3. Exclusive Lock (X):
With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can’t be held concurrently
on the same data item. X-lock is requested using lock-x instruction. Transactions may unlock the data item after
finishing the ‘write’ operation.
For example,
29
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
when a transaction needs to update the account balance of a person. You can allow this transaction by placing X
lock on it. Therefore, when the second transaction wants to read or write, exclusive lock prevents this operation.
4. Simplistic Lock Protocol
This type of lock-based protocols allows transactions to obtain a lock on every object
before beginning ‘write ‘operation. Transactions may unlock the data item after
finishing the ‘write’ operation.
5. Pre-claiming Locking
Pre-claiming lock protocol helps to evaluate operations and create a list of
required data items which are needed to initiate an execution process. In the
situation when all locks are granted, the transaction executes. After that, all locks
release when all of its operations are over.
30
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the transaction
continues to execute normally.
The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after using it.
Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at a time.
Strict-2PL protocol does not have shrinking phase of lock release.
3. Timestamp-Based Protocols
Timestamp based Protocol in DBMS is an algorithm which uses the System Time or Logical Counter as a timestamp
to serialize the execution of concurrent transactions. The Timestamp-based protocol ensures that every conflicting
read and write operations are executed in a timestamp order.
The older transaction is always given priority in this method. It uses system time to determine the time stamp of
the transaction. This is the most commonly used concurrency protocol.
Lock-based protocols help you to manage the order between the conflicting transactions when they will execute.
Timestamp-based protocols manage conflicts as soon as an operation is created.
Example:
Suppose there are three transactions T1, T2, and T3.
T1 has entered the system at time 0010
T2 has entered the system at 0020
T3 has entered the system at 0030
Priority will be given to transaction T1, then transaction T2 and lastly Transaction T3.
4. Validation Based Protocol
Validation based Protocol in DBMS also known as Optimistic Concurrency Control Technique is a method to avoid
concurrency in transactions. In this protocol, the local copies of the transaction data are updated rather than the
data itself, which results in less interference while execution of the transaction.
The Validation based Protocol is performed in the following three phases:
Read Phase
Validation Phase
Write Phase
Read Phase
In the Read Phase, the data values from the database can be read by a transaction but the write operation or
updates are only applied to the local data copies, not the actual database.
Validation Phase
In Validation Phase, the data is checked to ensure that there is no violation of serializability while applying the
transaction updates to the database.
Write Phase
In the Write Phase, the updates are applied to the database if the validation is successful, else; the updates are not
applied, and the transaction is rolled back.
31
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Recovery Techniques
There can be any case in database system like any computer system when database failure happens. So, data
stored in database should be available all the time whenever it is needed. So, Database recovery means recovering
the data when it gets deleted, hacked or damaged accidentally. Atomicity is must whether is transaction is over or
not it should reflect in the database permanently or it should not affect the database at all. So, database recovery
and database recovery techniques are must in DBMS.
Types of failure-
• A computer failure (system crash). A hardware, software, or network error occurs in the computer system.
• A transaction or system error. Some operation in the transaction may cause it to fail, such as integer
overflow or division by zero.
• Disk failure. Some disk blocks may lose their data because of a read or write Malfunction.
• Physical problems and catastrophes. This refers to an endless list of problems that includes power failure,
fire, flood earthquake.
Recovery:
• The database is restored to the most recent consistent state just before the time of failure.
• It takes care of atomicity and durability properties of a transaction.
System log:
The system must keep information about the changes that were applied to data items by the various transactions.
This information is typically kept in the system log.
Log buffer: stored at volatile memory (main memory).
Log file: stored at non-volatile memory (disk).
32
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Recovery Based on Deferred Update: Also called No-UNDO/REDO approach.
• The idea is to postpone any actual updates to the database on disk until the transaction reaches its commit
point.
• During transaction execution, the updates are recorded only in the log and forcewritten to disk only after
transaction reaches its commit point.
• If a transaction fails before reaching its commit point, there is no need to undo any operations because the
transaction has not affected the database on disk in any way
Shadow Paging:
• It maintains two copies of database: Current directory and Shadow directory.
• When a transaction begins executing, the current directory—whose entries point to the most recent or
current database state—is copied into a shadow directory.
• current directory is used by the transaction.
• During transaction execution, the shadow directory is never modified.
• When transaction commits, the shadow directory is overwritten by current
• directory.
• To recover from a failure, the current directory is mounted with shadow directory.
• NO-UNDO/ NO-REDO recovery technique.
33
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data Governance
Data governance is a broad category that includes internal policies and procedures controlling the management of
data. Data governance is the process of organizing, securing, managing, and presenting data using methods and
technologies that ensure it remains correct, consistent, and accessible to verified users.
Data governance is the process of:
Organizing — identifying all your data sources and getting all your data in one place.
Securing — making sure all your data is compliant with data privacy regulations and internal company policies.
Managing and presenting data — after you’ve nailed down your organization’s data, you need to decide how you
present this data to your team.
Using methods and technologies — like modern data governance platforms. That ensures it remains correct,
consistent, and accessible to — like modern data governance platforms. the people in your organization that have
the permission to access it, in short — verified users.
Imporatnce of Data Governance
Data is arguably the most important asset that organizations have. Data governance helps to ensure that data is
usable, accessible and protected. Effective data governance leads to better data analytics, which in turn leads to
better decision making and improved operations support. Further, it helps to avoid data inconsistencies or errors in
data, which lead to integrity issues, poor decision making, and a variety of organizational problems.
Data Governance can be described in three core objectives of access, literacy, and quality.
• Access includes all of the company’s data so that data is easily discoverable and protected for compliance.
• Since everyone has access to the data, they need to understand it, so data literacy is a high priority.
• Data quality can be monitored and users report data quality issues which can then be fixed, increasing the
data’s trustworthiness in the data’s lifecycle.
Data governance drivers
For effective data governance, it is vital to understand what factors are driving data governance to point of
urgency.
Master Data Management
When we have multiple applications doing different business functions, they always require common data, like
customers, employees, chart of accounts, and materials. To overcome challenges in the enterprise, the IT
department generally ensures that one application is the master of a specific data element, while others are only
used to deal with customer prospects. It is essential to integrate these applications and have a single sign-on.
Integrations
With multiple applications doing many business functions, it is of great importance to integrate them. For example,
the CRM system may need to combine with the financial system to complete the purchase and invoicing process.
As data flows during the integration, it requires management. These integrations are mostly custom written and
need support from IT staff. This issue can also be referred to as a data governance problem regarding business
rules. Generally, a support team is incorporated. Its job is to manage these integrations.
BI & Analytics
As BI and Analytics are becoming more prominent, every business unit is hiring their own data scientists. With the
growing abundance of data, it is becoming challenging for them to provide access to data and knowledge about
that data. Hence, most of the business units are demanding automated data governance.
Data Privacy and Financial Regulations
As companies store various kinds of data in different databases, they need to manage the data and all the problems
associated with it because of compliance. These regulations have strict rules that deal with how organizations can
capture and store customer data.
34
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data governance initiatives
Data governance initiatives improve quality of data by assigning a team responsible for data's accuracy,
completeness, consistency, timeliness, validity, and uniqueness.This team usually consists of executive leadership,
project management, line-of-business managers, and data stewards. The team usually employs some form of
methodology for tracking and improving enterprise data, tools for data mapping, profiling, cleansing, and
monitoring data.
Data governance initiatives may be aimed at achieving a number of objectives including offering better visibility to
internal and external customers (such as supply chain management). Many data governance initiatives are also
inspired by past attempts to fix information quality at the departmental level, leading to incongruent and
redundant data quality processes. Most large companies have many applications and databases that can't easily
share information. Therefore, knowledge workers within large organizations often don't have access to the data
they need to best do their jobs. When they do have access to the data, the data quality may be poor. By setting up
a data governance practice or corporate data authority (individual or area responsible for determining how to
proceed, in the best interest of the business, when a data issue arises), these problems can be mitigated.
Database Management
Database Management allows a person to organize, store, and retrieve data from a computer. Database
Management can also describe the data storage, operations, and security practices of a database administrator
(DBA) throughout the life cycle of the data. Managing a database involves designing, implementing, and supporting
stored data to maximize its value.
Data maintenance
Database Maintenance is a term we use to describe a set of tasks that are all run with the intention to improve
your database. There are routines meant to help performance, free up disk space, check for data errors, check for
hardware faults, update internal statistics, and many other obscure (but important) things.
Data quality rules are an integral component of data governance, which is the process of developing and
establishing a defined, agreed-upon set of rules and standards by which all data across an organization is governed.
Effective data governance should harmonize data from various data sources, create and monitor data usage
policies, and eliminate inconsistencies and inaccuracies that would otherwise negatively impact data analytics
accuracy and regulatory compliance.
35
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data quality Management: Data cleansing, data integrity, Data enrichment, Data quality
There are six main dimensions of data quality: accuracy, completeness, consistency, validity, uniqueness, and
timeliness.
Accuracy: The data should reflect actual, real-world scenarios; the measure of accuracy can be confirmed with a
verifiable source.
Completeness: Completeness is a measure of the data’s ability to effectively deliver all the required values that
are available.
Consistency: Data consistency refers to the uniformity of data as it moves across networks and applications. The
same data values stored in difference locations should not conflict with one another.
Validity: Data should be collected according to defined business rules and parameters, and should conform to
the right format and fall within the right range.
Uniqueness: Uniqueness ensures there are no duplications or overlapping of values across all data sets. Data
cleansing and deduplication can help remedy a low uniqueness score.
Timeliness: Timely data is data that is available when it is required. Data may be updated in real time to ensure
that it is readily available and accessible.
Data cleansing
Data cleaning is a process by which inaccurate, poorly formatted, or otherwise messy data is organized and
corrected. Data cleansing or data cleaning is the process of identifying and correcting corrupt, incomplete,
duplicated, incorrect, and irrelevant data from a reference set, table, or database.
Data issues typically arise through user entry errors, incomplete data capture, non-standard formats, and data
integration issues.
Data cleansing is an essential process for preparing data for further use whether in operational processes or
downstream analysis. It can be performed best with data quality tools. These tools function in a variety of ways,
from correcting simple typographical errors to validating values against a known true reference set.
Data enrichment
Data enrichment is the process of adding external data from third-party data sources and adding it to your existing
database. The goal is to get further insights from your data to improve your marketing or sales approach.
Sometimes you can get the information from databases—the more data you have, the more you can understand
patterns and extract information like first names and genders.
Other times you can get this data from scraping company websites and platforms like LinkedIn.
The data enrichment process improves on the data you already have. This process, sometimes referred to as
‘appending’, enables you to fill in gaps in your database, such as gender, company, age, and so on.
Data security
Data security is critical to public and private sector organizations for a variety of reasons. First, there’s the legal and
moral obligation that companies have to protect their user and customer data from falling into the wrong hands.
Financial firms, for example, may be subject to the Payment Card Industry Data Security Standard (PCI DSS) that
forces companies to take all reasonable measures to protect user data.
Then there’s the reputational risk of a data breach or hack. If you don’t take data security seriously, your reputation
can be permanently damaged in the event of a publicized, high-profile breach or hack. Not to mention the financial
and logistical consequences if a data breach occurs. You’ll need to spend time and money to assess and repair the
damage, as well as determine which business processes failed and what needs to be improved.
36
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Types of Data Security
Access Controls
This type of data security measures includes limiting both physical and digital access to critical systems and data.
This includes making sure all computers and devices are protected with mandatory login entry, and that physical
spaces can only be entered by authorized personnel.
Authentication
Similar to access controls, authentication refers specifically to accurately identifying users before they have access
to data. This usually includes things like passwords, PIN numbers, security tokens, swipe cards, or biometrics.
Data Erasure
You’ll want to dispose of data properly and on a regular basis. Data erasure employs software to completely
overwrite data on any storage device and is more secure than standard data wiping. Data erasure verifies that the
data is unrecoverable and therefore won’t fall into the wrong hands.
Data Masking
By using data masking software, information is hidden by obscuring letters and numbers with proxy characters. This
effectively masks key information even if an unauthorized party gains access to it. The data changes back to its
original form only when an authorized user receives it.
Encryption
A computer algorithm transforms text characters into an unreadable format via encryption keys. Only authorized
users with the proper corresponding keys can unlock and access the information. Everything from files and a
database to email communications can — and should — be encrypted to some extent.
37
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database System Structure
DBMS (Database Management System) acts as an interface between the user and the database. The user requests
the DBMS to perform various operations (insert, delete, update and retrieval) on the database. The components of
DBMS perform these requested operations on the database and provide necessary data to the users. The various
components of DBMS are:
Data Definition Language Compiler:
The DDL Compiler converts the data definition statements into a set of tables.
These tables contain the metadata concerning the database and are in a form that can be used by other
components of DBMS.
• To create the database instance – CREATE (Create New Database)
• To alter the structure of database – ALTER (ALTER TABLE statement is used to add, delete, or modify
columns in an existing table)
• To drop database instances – DROP (DROP command removes a table from the database)
• To delete tables in a database instance – TRUNCATE (TRUNCATE removes all rows from a table)
Data Manager:
The data manager is the central software component of the DBMS. It is sometimes referred to as the database
control system. One of the functions of the data manager is to convert operations in the user’s queries coming
directly via the query processor or indirectly via an application program from user’s logical view to a physical file
system. The data manager is responsible for interfacing with the file system. In addition, the tasks of enforcing
constraints to maintain the consistency and integrity of the data, as well as its security, are also performed by the
data manager. Synchronizing the simultaneous operations performed by concurrent users is under the control of
the data manager. It is also entrusted with the backup and recovery operations.
38
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
File Manager:
Responsibility for the structure of the files and managing the file space rests with the file manager. It is also
responsible for locating the block containing the required record, requesting this block from the disk manager, and
transmitting the required record to the data manager. The file manager can be implemented using an interface to
the existing file subsystem provided by the operating system of the host computer or it can include a file subsystem
written especially for DBMS.
Disk Manager:
The disk manager is part of the operating system of the host computer and all physical input and output operations
are performed by it. The disk manager transfers the block or page requested by the file manager so that the latter
need not be concerned with the physical characteristics of the underlying storage media.
Query Processor:
The database user retrieves data by formulating a query in the data manipulation language provided with the
database. The query processor is used to interpret the online user’s query and convert it into an efficient series of
operations in a form capable of being sent to the data manager for execution. The query processor uses the data
dictionary to find the structure of the relevant portion of the database and uses the information in modifying the
query and preparing an optimal plan to access the database.
Database Schemas:
Movies(title, year, length, genre, studioName, producerC#)
StarsIn(movieTitle, movieYear, starName)
MovieStar(name, address, gender, birthdate)
In a data model, it is important to distinguish between the description of the database and the database itself. The
description of a database is called the database schema, which is specified during database design and is not
expected to change frequently. Most data models have certain conventions for displaying schemas as diagrams. A
displayed schema is called a schema diagram. Diagram displays the structure of each record type but not the actual
instances of records.
Examples of Schema:
39
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data Files:
Data files contain the data portion of the database.
Data Dictionary:
A Data Dictionary stores information about the structure of the database. A comprehensive data
dictionary would provide the definition of data items, how they fit into the data structure and how they
relate to other entities in the database. In DBMS, the data dictionary stores the information concerning
the external, conceptual and internal levels of the databases. In the case of a table, data dictionary provides
information about
• Its name
• Security information like who is the owner of the table, when was it created, and when it was last accessed.
• Physical information like where is the data stored for this table
• Structural information like its attribute names and its data types, constraints and indexes.
• The definitions of all database objects like tables, views, constraints, indexes, clusters, synonyms,
sequences, procedures, functions, packages, triggers etc.
• It stores the information about how much space is allocated for each object and how much space has been
used by them
• Any default values that a column can have are stored
• Database user names - schemas
• Access rights for schemas on each of the objects
• Last updated and last accessed information about the object
• Any other database information
All these information are stored in the form of tables in the data dictionary.
40
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Application architectures
Today's database professionals face several options when considering architectures in which to employ to address
the various needs of their employers and/or clients. The following text will provide an overview of three main
categories of database architectures and their sub-categories, as well as offer some insight into the benefits of
each.
Application Logic
Database architectures can be distinguished by examining the way application logic is distributed throughout the
system. Application logic consists of three components: Presentation Logic, Processing Logic, and Storage Logic.
The presentation logic component is responsible for formatting and presenting data on the user's screen The
processing logic component handles data processing logic, business rules logic, and data management
logic. Finally, the storage logic component is responsible for the storage and retrieval from actual devices such as a
hard drive or RAM.
By determining which tier(s) these components are processed on we can get a good idea of what type of
architecture and subtype we are dealing with.
41
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Yet another way one-tier architectures have appeared is in that of mainframe computing. In this outdated system,
large machines provide directly connected unintelligent terminals with the means necessary to access, view and
manipulate data. Even though this is considered a client-server system, since all of the processing power (for both
data and applications) occurs on a single machine, we have a one-tier architecture.
One-tier architectures can be beneficial when we are dealing with data that is relevant to a single user (or small
number of users) and we have a relatively small amount of data. They are somewhat inexpensive to deploy and
maintain.
42
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
N-Tier Client/Server Architectures
Most n-tier database architectures exist in a three-tier configuration. In this architecture the client/server model
expands to include a middle tier (business tier), which is an application server that houses the business logic. This
middle tier relieves the client application(s) and database server of some of their processing duties by translating
client calls into database queries and translating data from the database into client data in return. Consequently,
the client and server never talk directly to one-another.
A variation of the n-tier architecture is the web-based n-tier application. These systems combine the scalability
benefits of n-tier client/server systems with the rich user interface of web-based systems.
Because the middle tier in a three-tier architecture contains the business logic, there is greatly increased scalability
and isolation of the business logic, as well as added flexibility in the choice of database vendors.
43
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Relational Data Models
A Data model is a conceptual representation of data structures (tables) required for a database and is very
powerful in expressing and communicating the business requirements. A data model visually represents the nature
of data, business rules governing the data, and how it will be organized in the database. A data model is comprised
of two parts logical design and physical design. Data Models are created in either Top Down Approach or Bottom-
Up Approach. In Top-Down Approach, data models are created by understanding and analyzing the business
requirements. In Bottom Up Approach, data models are created from existing databases, which has no data
models. IDEF1X is the common notation used in creating data models since it is more descriptive.
Data model helps functional and technical team in designing the database. Functional team normally refers to one
or more Business Analysts, Business Managers, Smart Management Experts, End Users etc., and Technical teams
refers to one or more programmers, DBAs etc. Data modelers are responsible for designing the data model and
they communicate with functional team to get the business requirements and technical teams to implement the
database.
44
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Logical and Physical Data Modeling Objects:
In a data model, there is one main subject area which comprises all objects present in all subject areas and other
subject areas based on their processes or business domains. Each subject area contains objects, which are relevant
to that subject area and the subject area is very useful in understanding the data model and to generate reports
and PRINT OUTS based on main subject areas or other subject areas. In a telecommunication data model, there
may be several subject areas like Service Request, Service Order, Ticketing and Main Subject Area. In a Mortgage
data model, there may be several subject areas like borrower, loan, under writing and main subject area. Usually
subject areas are created on main business processes. In Telecommunication (telephone service subscription by
customer), service request is a process to get the request from the customer through phone, email, fax etc. Service
Order is the next process to approve the service request and provide telephone line subscription to customers.
Ticketing is a process by which complaints are gathered from the customer and problems are resolved.
For Example:
Logical Physical
Data Model Type: Logical Data Model Type: Physical
Data Model Objects: Entity Data Model Objects: Table
It is the business presentation of a table present in a It is comprised of rows and columns, which stores data
database. Example: COUNTRY in a database. Example: CNTRY
Data Model Type: Logical Data Model Type: Physical
Data Model Objects: Attribute Data Model Objects: Column
It is the business presentation of a column present in a It is a data item, which stores data for that particular
database. Example: Country Code, Country Name. item. Example: CNTRYCODE, CNTRYNAME.
45
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Constraints:
Database Constraints in DBMS:
Database constraints are restrictions on the contents of the database or on database operations. It is a condition
specified on a database schema that restricts the data to be inserted in an instance of the database.
Constraints in the database provide a way to guarantee that :
✓ the values of individual columns are valid.
✓ in a table, rows have a valid primary key or unique key values.
✓ in a dependent table, rows have valid foreign key values that reference rows in a parent table.
➢ Domain Constraints:
A domain is a set of permissible values that can be given to an attribute. So every attribute in a table has a
specific domain. Values to these attributes cannot be assigned outside their domains. Domain Constraints specifies
that what set of values an attribute can take. Value of each attribute X must be an atomic value from the domain of
X. The data type associated with domains include integer, character, string, date, time, currency etc. An attribute
value must be available in the corresponding domain.
46
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
➢ Key Constraints:
Keys are attributes or sets of attributes that uniquely identify an entity within its entity set. An Entity set E can have
multiple keys out of which one key will be designated as the primary key. Primary Key must have unique and not
null values in the relational table. In an subclass hierarchy, only the root entity set has a key or primary key and
that primary key must serve as the key for all entities in the hierarchy.
47
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
➢ Integrity Rule 2 (Referential Integrity Rule or Constraint):
The integrity Rule 2 is also called the Referential Integrity Constraints. This rule states that if a foreign key in Table 1
refers to the Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be available in
Table 2.
Let the table in which the foreign key is defined is Foreign Table or details table i.e. Table 1 in above example and
the table that defines the primary key and is referenced by the foreign key is master table or primary table i.e.
Table 2 in above example. Then the following properties must be hold :
• Records cannot be inserted into a Foreign table if corresponding records in the master table do not exist.
• Records of the master table or Primary Table cannot be deleted or updated if corresponding records in the
detail table actually exist.
➢ General Constraints:
General constraints are the arbitrary constraints that should hold in the database. Domain Constraints, Key
Constraints, Tuple Uniqueness Constraints, Single Value Constraints, Integrity Rule 1 (Entity Integrity) and 2
(Referential Integrity Constraints) are considered to be a fundamental part of the relational data model. However,
sometimes it is necessary to specify more general constraints like the CHECK Constraints or the Range Constraints
etc.
Check constraints can ensure that only specific values are allowed in certain column. For example , if there is a
need to allow only three values for the color like ‘Bakers Chocolate’, ‘Glistening Grey’ and ‘Superior White’, then we
can apply the check constraint. All other values like ‘GREEN’ etc. would yield an error.
Range Constraints is implemented by BETWEEN and NOT BETWEEN. For example, if it is a requirement that student
ages be within 16 to 35, then we can apply the range constraints for it.
48
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Keys:
A key is an attribute or a set of attributes in a relation that identifies a tuple in a relation. The keys are defined in a
table to access or sequence the stored data quickly and smoothly. They are also used to create relationship
between different tables. Keys are very important part of Relational database. They are used to establish and
identify relation between tables. They also ensure that each record within a table can be uniquely identified by
combination of one or more fields within a table.
Types of Keys:
Following are the different types of keys.
• Primary Key
• Composite key
• Candidate key
• Super key
• Alternate Key
• Foreign key
✓ Primary Key
A primary key is a candidate key that is selected by the database designer to identify tuples uniquely in a relation. A
relation may contain many candidate keys. When the designer selects one of them to identify a tuple in the
49
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
relation, it becomes a primary key. It means that if there is only one candidate key, it will be automatically selected
as primary key.
✓ Composite key
Key that consist of two or more attributes that uniquely identify an entity occurrence is called Composite key. But
any attribute that makes up the Composite key is not a simple key in its own. If we apply primary key in both
attribute then it is call composite key.
✓ Candidate key
A candidate key is a super key that contains no extra attribute. It consists of minimum possible attributes. A super
key like {RegistrationNo, Name} contains an extra field Name. It can be used to identify a tuple uniquely in the
relation, But it does not consist of minimum possible attribute as only RegistrationNo can be used to identify a
tuple in a relation. It means that {RegistrationNo, Name} is a super key but it is not a candidate key because it
contains an extra field. On the other hand, RegistrationNo is a super key as well as candidate key.
50
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
✓ Super key:
A super key is an attribute or combination of attributes in a relation that identifies a tuple uniquely within the
relation. A super key is the most general type of key. For example, in a relation STUDENT consists of different
attributes like RegistrationNo, Name, FatherName, Class and Address The only attribute that can uniquely identify
a tuple in a relation is RegistrationNo. The Name attribute cannot identify a tuple because two or more students
may have the same Name. Similarly FaththeName, Class and Address cannot be used to identify a tuple. It means
that RegistrationNo is the super key for the relation. Any combination of attributes with the super key is also a
super key. it means any attribute or set of attributes combined with the super key Registrationno will also become
a super key. A combination of two attributes {RegistrationNo, Name} is also a super key. This combination can also
be used to identify a tuple in a relation. Similarly {RegistrationNo, Class} or {RegistrationNo, Name, Class} are super
keys.
51
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
✓ Alternate Key
The candidate keys that are not selected as primary key are known as alternate keys. Suppose STUDENT relation
contains different attributes such as RegNo, RollNo, Name and Class. The attributes RegNo and RollNo can be used
to identify each student in the table. If Regno is selected as primary key then RollNo attribute is known as alternate
key.
✓ Foreign key
A foreign key is an attribute or set of attributes in a relation whose values match a primary key in another relation.
The relation in which foreign key is created is known as Dependent Table or Child Table. The relation to which the
foreign key refers is known as Parent Table. The key connects to another relation when a relationship is established
between two relations. A relation may contain more than one foreign keys.
A foreign key is generally a primary key from one table that appears as a field in another where the first table has a
relationship to the second. In other words, if we had a table A with a primary key X that linked to a table B where X
was a field in B, then X would be a foreign key in B.
An example might be a student table that contains the course_id the student is attending. Another table lists the
courses on offer with course_id being the primary key. The 2 tables are linked through course_id and as such
course_id would be a foreign key in the student table.
52
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Prime and non-prime attributes
Attributes which are parts of any candidate key of relation are called as prime attribute, others are non-prime
attributes. For Example, STUD_NO in STUDENT relation is prime attribute, others are non-prime attribute.
Closure Of Functional Dependency
The Closure Of Functional Dependency means the complete set of all possible attributes that can be functionally
derived from given functional dependency using the inference rules known as Armstrong’s Rules.
If “F” is a functional dependency then closure of functional dependency can be denoted using “{F}+”.
There are three steps to calculate closure of functional dependency. These are:
Step-1 : Add the attributes which are present on Left Hand Side in the original functional dependency.
Step-2 : Now, add the attributes present on the Right Hand Side of the functional dependency.
Step-3 : With the help of attributes present on Right Hand Side, check the other attributes that can be derived from
the other given functional dependencies. Repeat this process until all the possible attributes which can be derived
are added in the closure.
The Algorithm
✓ The procedure shown in the previous example can be generalized to an algorithm. Assume we are given
the set of functional dependencies FD and a set of attributes X. The algorithm is as follows:
✓ Add the attributes contained in the attribute set X to the result set X+.
✓ Add the attributes to the result set X+ which can be functionally determined from the attributes already
contained in the result set.
✓ Repeat step 2 until no more attributes can be added to the result set X+.
Example 1
We are given the relation R(A, B, C, D, E). This means that the table R has five columns: A, B, C, D, and E. We
are also given the set of functional dependencies: {A->B, B->C, C->D, D->E}. What is {A}+?
• First, we add A to {A}+.
• What columns can be determined given A? We have A -> B, so we can determine B. Therefore, {A}+ is now
{A, B}.
• What columns can be determined given A and B? We have B -> C in the functional dependencies, so we can
determine C. Therefore, {A}+ is now {A, B, C}.
• Now, we have A, B, and C. What other columns can we determine? Well, we have C -> D, so we can add D
to {A}+.
• Now, we have A, B, C, and D. Can we add anything else to it? Yes, since D -> E, we can add E to {A}+.
• We have used all of the columns in R and we have all used all functional dependencies. {A}+ = {A, B, C, D, E}.
Example 2
Let’s look at another example. We are given R(A, B, C, D, E, F). The functional dependencies are {AB->C, BC->AD, D-
>E, CF->B}. What is {A, B}+?
• We start with {A, B}.
• What columns can we determine, given A and B? We have AB -> C, so we can add C to {A, B}+.
• We now have A, B, and C. What other columns can we determine? We have BC -> AD. We already have A in
{A, B}+, so we can add D.
• So, we now have A, B, C, and D. What else can we add? We have D -> E, so we can add E to {A, B}+.
• Now {A, B}+ is {A, B, C, D, E}. Can we add anything else? No. We have one more functional dependency in
our set that we did not use: CF -> B. We can’t use this dependency because F is not in {A, B}+.
• Thus, {A, B}+ is {A, B, C, D, E}.
53
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Find Super and Candidate Key from Attribute Closure
Example:
54
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Design issues
Database design is an area that is frequently overlooked when performance tuning and optimization are
considered. In fact, when a database is small, poor design might not cause problems. However, as the database
grows, so does the number of problems instigated by poor logical and physical design.
Designing a database requires an understanding of both the business functions you want to model and the
database concepts and features used to represent those business functions. As a developer, you do not have to do
this on your own. There are people and resources within UCS who are willing and able to assist you with designing
the database and its core elements. It is important to accurately design a database to model the business because
it can be time consuming to change the design of a database significantly once implemented. A well-designed
database also performs better. When designing a database, consider:
o The purpose of the database and how it affects the design. Create a database plan to fit your purpose.
o Database normalization rules that prevent mistakes in the database design
o Protection of your data integrity
o Security requirements of the database and user permissions
o Performance needs of the application
❖ Database changes
Any changes to a database or permissions need to be requested by sending a database change request to the DBA.
The DBA will be the primary person who can move new database changes to the production servers. In the event
that the DBA is unavailable the group leaders will have access to production as well.
❖ Normalization
The logical design of the database, including the tables and the relationships between them, is the core of an
optimized relational database. A good logical database design can lay the foundation for optimal database and
application performance. A poor logical database design can impair the performance of the entire system.
Normalizing a logical database design involves using formal methods to separate the data into multiple, related
tables. A greater number of narrow tables (with fewer columns) is characteristic of a normalized database. A few
wide tables (with more columns) are characteristics of an non-normalized database.
In relational-database design theory, normalization rules identify certain attributes that must be present or absent
in a well-designed database. However, there are a few rules that can help you achieve a sound database design:
A table should have a numeric or unique identifier (GUID) primary key
A table should store only data for a single type of entity (all fields should relate directly to the key)
For example: Avoid storing information about a student and his/her test scores in the same table
A table should avoid null able columns
A table should use default values where appropriate
A table should not have repeating values or columns
For example: TEST_SCORE_1, TEST_SCORE_2 and so on
As normalization increases, so do the number and complexity of joins required to retrieve data. Too many complex
relational joins between too many tables can hinder performance. Reasonable normalization often includes few
regularly executed queries that use joins involving more than four tables.
❖ Data integrity
Enforcing data integrity ensures the quality of the data in the database. One of the more common forms of data
integrity is referential integrity.
Referential integrity preserves the defined relationships between tables when records are entered or deleted. In
SQL, referential integrity is based on relationships between foreign keys and primary keys or between foreign keys
and unique keys. Referential integrity ensures that key values are consistent across tables. Such consistency
requires that there be no references to nonexistent values and that if a key value changes, all references to it
change consistently throughout the database.
❖ Data security
One of the functions of a database is to protect the data by preventing certain users from seeing or changing highly
sensitive data and preventing all users from making costly mistakes. For this reason, each application will use a
separate user with specific permissions to only the data needed to provide a successful implementation.
56
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The Entity Relationship Database Model
In DBMS, an entity–relationship model (ER model) is a data model for describing the data or information aspects of
a business domain or its process requirements, in an abstract way that lends itself to ultimately being implemented
in a database such as a relational database. The main components of ER models are entities (things) and the
relationships that can exist among them.
An entity–relationship model is the result of using a systematic process to describe and define a subject area of
business data. It does not define business process; only visualize business data. The data is represented as
components (entities) that are linked with each other by relationships that express the dependencies and
requirements between them, such as: one building may be divided into zero or more apartments, but one
apartment can only be located in one building. Entities may have various properties (attributes) that characterize
them. Diagrams created to represent these entities, attributes, and relationships graphically are called entity–
relationship diagrams.
57
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The three schema approach to DBMS Designer uses three levels of ER models that may be developed.
✓ Conceptual data model
This is the highest level ER model in that it contains the least granular detail but establishes the overall scope of
what is to be included within the model set. The conceptual ER model normally defines master reference data
entities that are commonly used by the organization. Developing an enterprise-wide conceptual ER model is useful
to support documenting the data architecture for an organization.
✓ Logical data model
A logical ER model does not require a conceptual ER model, especially if the scope of the logical ER model
includes only the development of a distinct information system. The logical ER model contains more detail than the
conceptual ER model. In addition to master data entities, operational and transactional data entities are now
defined. The details of each data entity are developed and the relationships between these data entities are
established. The logical ER model is however developed independent of technology into which it can be
implemented.
✓ Physical data model
One or more physical ER models may be developed from each logical ER model. The physical ER model is
normally developed to be instantiated as a database. Therefore, each physical ER model must contain enough
detail to produce a database and each physical ER model is technology dependent since each database
management system is somewhat different.
The physical model is normally instantiated in the structural metadata of a database management system as
relational database objects such as database tables, database indexes such as unique key indexes, and database
constraints such as a foreign key constraint or a commonality constraint. The ER model is also normally used to
design modifications to the relational database objects and to maintain the structural metadata of the database.
Entities
An entity is represented by a
rectangle which contains the entity’s
name. For example, Employee,
Manager, Department etc.
58
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Attributes
Relationships
59
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Entities:
Let us first be aware of the question:
• An entity is an object of concern used to represent the things in the real world, e.g., car, table, book, etc.
• An entity need not be a physical entity, it can also represent a concept in real world, e.g., project, loan, etc.
• It represents a class of things, not any one instance, e.g., ‘STUDENT’ entity has instances of ‘Ramesh’ and
‘Mohan’.
Entity Set or Entity Type:
A collection of a similar kind of entities is called an Entity Set or entity type.
Example:
For the COLLEGE database described earlier objects of concern are Students, Faculty, Course and
departments. The collections of all the students’ entities form an entity set STUDENT. Similarly
collection of all the courses form an entity set COURSE. Entity sets need not be disjoint. For example – an entity
may be part of the entity set STUDENT, the entity set FACULTY, and the entity set PERSON.
Weak Entities:
• These tables are existence dependent.
• They cannot exist without entity with which it has a relationship.
• Primary key is derived from the primary key of the parent entity
Example:
The spouse table is a weak entity because its PK is dependent on the employee table. Without a corresponding
employee record, the spouse record could not exist
Attributes:
Let us first answer the question:
What is an attribute?
An attribute is a property used to describe the specific feature of the entity. So to describe an entity
entirely, a set of attributes is used. For example, a student entity may be described by the student’s name, age,
address, course, etc. An entity will have a value for each of its attributes. For example for a particular student the
following values can be assigned:
RollNo: 124
Name: Numa Limbu
Age: 23
Address: Dharan-9, Koshi, Nepal.
Course: B.Sc. (Computer)
Types of attributes
Attributes attached to an entity can be of various types.
Simple
The attribute that cannot be further divided into smaller parts and represents the basic meaning is called a
60
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
simple attribute. For example: The ‘First name’, ‘Last name’, age attributes of a person entity represent a
simple attribute.
Composite
Attributes that can be further divided into smaller units and each individual unit contains a specific
meaning. For example:-The NAME attribute of an employee entity can be sub-divided into First name, Last name
and Middle name.
Single valued
Attributes having a single value for a particular entity. For Example, Age is a single valued attribute of a
student entity.
Multi-valued
Attributes that have more than one values for a particular entity is called a multi-valued attribute.
Different entities may have different number of values for these kinds of attributes. For multi-valued
attributes we must also specify the minimum and maximum number of vales that can be attached. For
Example phone number for a person entity is a multi-valued attribute.
Derived
Attributes that are not stored directly but can be derived from stored attributes are called derived
attributes. For Example, The years of services of a ‘person’ entity can be determined from the current date
and the date of joining of the person. Similarly, total salary of a ‘person’ can be calculated from ‘basic
salary’ attribute of a ‘person’.
Relationships:
Let us first define the term relationships. A relationship can be defined as:
• A connection or set of associations, or
• A rule for communication among entities:
Example: In college the database, the association between student and course entity, i.e., “Student opts
course” is an example of a relationship.
Relationship sets
A relationship set is a set of relationships of the same type. For example, consider the relationship between two entity
sets student and course. Collection of all the instances of relationship opts forms a relationship set called relationship
type.
Degree
The degree of a relationship type is the number of participating entity types.
The relationship between two entities is called binary relationship. A relationship among three entities is
called ternary relationship. Similarly relationship among n entities is called n-ry relationship.
Relationship Cardinality
Cardinality (the number of elements of set) specifies the number of instances of an entity associated with another
entity participating in a relationship. Based on the cardinality binary relationship can be further classified into the
following categories:
One-to-one relationship:
An entity in A is associated with at most one entity in B, and an entity in B is associated
with at most one entity in A.
Example: Relationship between college and principal
61
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
One college can have at the most one principal and one principal can be assigned to only one college.
Similarly we can define the relationship between university and Vice Chancellor.
One-to-many relationship:
An entity in A is associated with any number of entities in B. An entity in B is associated
with at the most one entity in A.
Example: Relationship between department and faculty.
One department can appoint any number of faculty members but a faculty member is assigned to only one
department.
Many-to-one relationship:
An entity in A is associated with at most one entity in B. An entity in B is associated with any number in A.
Example: Relationship between course and instructor. An instructor can teach various courses but a course can be
taught only by one instructor. Please note this is an assumption.
Many-to-many relationship:
Entities in A and B are associated with any number of entities from each other.
Example:
Taught by Relationship between course and faculty. One faculty member can be assigned to teach many courses and
one course may be taught by many faculty members.
62
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Example: University Entity Relationship Cardinality Diagram
63
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Extended E-R Features:
Specialization and Generalization Entity relationship diagram
The ER Model has the power of expressing database entities in a conceptual hierarchical manner. As the hierarchy
goes up, it generalizes the view of entities, and as we go deep in the hierarchy, it gives us the detail of every entity
included.
Going up in this structure is called generalization, where entities are clubbed together to represent a more generalized
view. For example, a particular student named Mira can be generalized along with all the students. The entity shall
be a student, and further, the student is a person. The reverse is called specialization where a person is a student, and
that student is Mira.
Generalization:
As mentioned above, the process of generalizing entities, where the generalized entities contain the properties of all
the generalized entities, is called generalization. In generalization, a number of entities are brought together into one
generalized entity based on their similar characteristics. For example, pigeon, house sparrow, crow and dove can all
be generalized as Birds.
Generalization is a bottom-up approach in which two lower level entities combine to form a higher level entity. In
generalization, the higher level entity can also combine with other lower level entity to make further higher level
entity.
64
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Specialization:
Specialization is the opposite of generalization. In specialization, a group of entities is divided into sub-groups based
on their characteristics. Take a group ‘Person’ for example. A person has name, date of birth, gender, etc. These
properties are common in all persons, human beings. But in a company, persons can be identified as employee,
employer, customer, or vendor, based on what role they play in the company.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on what role they play
in school as entities.
Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity can be broken
down into two lower level entity. In specialization, some higher level entities may not have lower-level entity sets at
all.
65
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Inheritance:
We use all the above features of ER-Model in order to create classes of objects in object-oriented programming. The
details of entities are generally hidden from the user; this process known as abstraction.
Inheritance is an important feature of Generalization and Specialization. It allows lower-level entities to inherit the
attributes of higher-level entities.
For example, the attributes of a Person class such as name, age, and gender can be inherited by lower-level entities
such as Student or Teacher.
Aggregation:
Aggregation is a process when relation between two entity is treated as a single entity. Here the relation between
Center and Course, is acting as an Entity in relation with Visitor.
66
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Specialization and Generalization Entity relationship diagram
67
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Codd's Twelve Rules:
Many references to the twelve rules include a thirteenth rule - or rule zero: A relational database management
system (DBMS) must manage its stored data using only its relational capabilities.
1. Information Rule
The data stored in a database, may it be user data or metadata, must be a value of some table cell. Everything in a
database must be stored in a table format.
2. Guaranteed Access Rule
Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table
name, primary key value, and column name.
3. Systematic Treatment of Null Values
Null values (distinct from empty character string or a string of blank characters and distinct from zero or any other
number) are supported in the fully relational DBMS for representing missing information in a systematic way,
independent of data type.
4. Dynamic Online Catalog Based on the Relational Model
The structure description of the entire database must be stored in an online catalog, known as data dictionary,
which can be accessed by authorized users. Users can use the same query language to access the catalog which
they use to access the database itself.
5. Comprehensive Data Sublanguage Rule
A relational system may support several languages and various modes of terminal use. However, there must be at
least one language whose statements are expressible, per some well-defined syntax, as character strings and
whose ability to support all of the following is comprehensible:
a. data definition
b. view definition
c. data manipulation (interactive and by program)
d. integrity constraints
e. authorization
f. transaction boundaries (begin, commit, and rollback).
68
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A database must be independent of the application that uses it. All its integrity constraints can be independently
modified without the need of any change in the application. This rule makes a database independent of the front-
end application and its interface.
11.Distribution Independence
The data manipulation sublanguage of a relational DBMS must enable application programs and terminal activities
to remain logically unimpaired whether and whenever data are physically centralized or distributed.
12.Nonsubversion Rule
If a relational system has or supports a low-level (single-record-at-a-time) language, that low-level language cannot
be used to subvert or bypass the integrity rules or constraints expressed in the higher-level (multiple-records-at-a-
time) relational language.
Schema Diagram:
In a data model, it is important to distinguish between the description of the database and the database itself. The
description of a database is called the database schema, which is specified during database design and is not
expected to change frequently. Most data models have certain conventions for displaying schemas as diagrams. A
displayed schema is called a schema diagram. Figure shows a schema diagram for the database; the diagram
displays the structure of each record type but not the actual instances of records.
69
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
70
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input and yields instances of
relations as output. It uses operators to perform queries. An operator can be either unary or binary. They accept
relations as their input and yield relations as their output. Relational algebra is performed recursively on a relation
and intermediate results are also considered relations.
The fundamental operations of relational algebra are as follows −
• Select
• Project
• Union
• Set Difference
• Cartesian product
• Rename
Select Operation (σ) sigma:
The SELECT operator is σ (sigma) symbol
Used as an expression to choose tuples that meet the selection condition…
σ<selection condition>(R)
-> Select operation selects tuples that satisfy a given predicate.
Ex:- find all employees born after 1st Jan 1950:
σ '01/JAN/1950'(employee)
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books published after
2010.
71
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Union Operation (∪):
It performs binary union between two given relations and is defined as −
r ∪ s = { t | t ∈ r or t ∈ s}
Notion: r U s
examples R = {1,2,3,4} and S= {3,4,5}
R ∪ S = {1,2,3,4,5}
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −
• r, and s must have the same number of attributes.
• Attribute domains must be compatible.
• Duplicate tuples are automatically eliminated.
More Example…
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an article or both.
Intersection (∩):
72
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• The INTERSECTION operation on a relation A INTERSECTION relation B, is symbolized by R ∩ S, includes
tuples that are only in R and S.
examples, in both R and S.
R = {1,2,3,4} and S= {3,4,5}.
R ∩ S = {3,4}
• RESULT: R ∩ S
More Example…
∏ author (Books) ∩ ∏ author (Articles)
More example:
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.
Cartesian Product (Χ):
Combines information of two different relations into one.
• Creates a relation that has all the attributes of R and S, allowing all the attainable combinations of tuples
from R and S in the result. The notation used is X.
73
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
E.g. set of ordered pairs from R and S.
R={1,2}, S={3,4}
R x S== {(1,3), (1,4), (2,3), (2,4)}
Notation: r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'MMC'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by MMC.
74
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• The notation used is
R JOIN join condition S
This can also be used to define composition of relations. For example, the composition of Employee and Dept is
their join as shown above, projected on all but the common attribute DeptName.
where Fun is a predicate that is true for a relation r iff it is also true for relation s. It is usually required that R and S
must have at least one common attribute, but if this constraint is omitted, and R and S have no common attributes,
then the natural join becomes exactly the Cartesian product.
Car Boat
CarModel CarPrice BoatModel BoatPrice
CarA 20,000 Boat1 10,000 CarModel CarPrice BoatModel BoatPrice
Semijoin (⋉)(⋊):
The left semijoin is joining similar to the natural join and written as R S where R and S are relations. The result of
this semijoin is the set of all tuples in R for which there is a tuple in S that is equal on their common attribute
names. For an example consider the tables Employee and Dept and their semi join:
Employee Dept Employee ⋉ Dept
Name EmpId DeptName DeptName Manager Name EmpId DeptName
Harry 3415 Finance Sales Bob Sally 2241 Sales
Sally 2241 Sales Sales Thomas Harriet 2202 Production
George 3401 Finance Production Katie
Harriet 2202 Production Production Mark
The semijoin can be simulated using the natural join as follows. If a1, ..., an are the attribute names of R, then
R S = a1,..,an(R S).
Since we can simulate the natural join with the basic operators it follows that this also holds for the semijoin.
Antijoin (▷):
The antijoin, written as R S where R and S are relations, is similar to the semijoin, but the result of an antijoin is
only those tuples in R for which there is notuple in S that is equal on their common attribute names.
For an example consider the tables Employee and Dept and their antijoin:
76
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Employee Dept Employee Dept
Name EmpId DeptName DeptName Manager Name EmpId DeptName
Harry 3415 Finance Sales Sally Harry 3415 Finance
Sally 2241 Sales Production Harriet George 3401 Finance
George 3401 Finance
Harriet 2202 Production
Division (÷)
The division is a binary operation that is written as R ÷ S. The result consists of the restrictions of tuples in R to the
attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their
combinations with tuples in S are present in R. For an example see the tables Completed, DBProjectand their
division:
Completed DBProject Completed÷DBProject
Student Task Task Student
Fred Database1 Database1 Fred
Fred Database2 Database2 Sarah
Fred Compiler1
Eugene Database1
Eugene Compiler1
Sarah Database1
Sarah Database2
If DBProject contains all the tasks of the Database project, then the result of the division above contains exactly the
students who have completed both of the tasks in the Database project.
Outer joins
Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two
operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched
tuple in one of the operands by "fill" values for each of the attributes of the other operand.
Three outer join operators are defined: left outer join, right outer join, and full outer join.
77
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Employee Dept Employee ⟕ Dept
Name EmpId DeptName DeptName Manager Name EmpId DeptName Manager
Harry 3415 Finance Sales Harriet Harry 3415 Finance ω
Sally 2241 Sales Production Charles Sally 2241 Sales Harriet
George 3401 Finance George 3401 Finance ω
Harriet 2202 Sales Harriet 2202 Sales Harriet
Tim 1123 Executive Tim 1123 Executive ω
In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take
a null value, ω. Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting
relation where tuples in Employee have a DeptName of Finance orExecutive.
78
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
For an example consider the tables Employee and Dept and their full outer join:
79
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Entity integrity: Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which
states that every table must have a primary key and that the column or columns chosen to be the primary key
should be unique and not null.
Referential integrity: Referential integrity concerns the concept of a foreign key. The referential integrity rule
states that any foreign-key value can only be in one of two states. The usual state of affairs is that the foreign-key
value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of
the data owner, a foreign-key value can be null. In this case we are explicitly saying that either there is no
relationship between the objects represented in the database or that this relationship is unknown.
For example, consider the 2 tables shown. If you delete row with ID = 1 from tblGender table, then row with ID = 3
from tblPerson table becomes an orphan record. You will not be able to tell the Gender for this row. So, Cascading
referential integrity constraint can be used to define actions SQL Server should take when this happens. By default,
we get an error and the DELETE or UPDATE statement is rolled back.
Domain integrity: Domain integrity specifies that all columns in a relational database must be declared upon a
defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to
be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of
values from which actual values appearing in the columns of a table are drawn.
80
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Triggers:
A database trigger is procedural code that is automatically executed in response to certain events on a particular
table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the
database. For example, when a new record (representing a new worker) is added to the employees table, new
records should also be created in the tables of the taxes, vacations and salaries.
DML Triggers
DML triggers is a special type of stored procedure that automatically takes effect when a data manipulation
language (DML) event takes place that affects the table or view defined in the trigger. DML events include INSERT,
UPDATE, or DELETE statements. DML triggers can be used to enforce business rules and data integrity, query other
tables, and include complex Transact-SQL statements. The trigger and the statement that fires it are treated as a
single transaction, which can be rolled back from within the trigger. If a severe error is detected (for example,
insufficient disk space), the entire transaction automatically rolls back.
81
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Example, DML trigger with a reminder message
The following DML trigger prints a message to the client when anyone tries to add or change data in the Customer
table in the database.
CREATE TRIGGER reminder1
ON Sales.Customer
AFTER INSERT, UPDATE
AS RAISERROR ('Notify Customer Relations', 16, 10);
GO
DDL Triggers
DDL triggers fire in response to a variety of Data Definition Language (DDL) events. These events primarily
correspond to Transact-SQL statements that start with the keywords CREATE, ALTER, DROP, GRANT, DENY, REVOKE
or UPDATE STATISTICS. Certain system stored procedures that perform DDL-like operations can also fire DDL
triggers.
Use DDL triggers when you want to do the following:
✓ Prevent certain changes to your database schema.
✓ Have something occur in the database in response to a change in your database schema.
✓ Record changes or events in the database schema.
In the following example, DDL trigger safety will fire whenever a DROP_TABLE or ALTER_TABLE event occurs in the
database.
CREATE TRIGGER safety
ON DATABASE
FOR DROP_TABLE, ALTER_TABLE
AS
PRINT 'You must disable Trigger "safety" to drop or alter tables!'
ROLLBACK;
Assertion:
An assertion is a database object that uses a check constraint to limit data values you can enter into the database
as a whole. Both assertions and constraints are specified as check conditions that the DBMS can evaluate to either
TRUE or FALSE. However, while a constraint uses a check condition that acts on a single table to limit the values
assigned to columns in that table; the check condition in an assertion involves multiple tables and the data
relationships among them. Because an assertion applies to the database as a whole, you use the CREATE
ASSERTION statement to create an assertion as part of the database definition. (Conversely, since a constraint
applies to only a single table, you apply [define] the constraint when you create the table.)
For example, if you want to prevent investors from withdrawing more than a certain amount of money from your
hedge fund, you could create an assertion using the following SQL statement:
82
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Thus, the syntax used to create an assertion is:
CREATE ASSERTION
Once you add the MAXIMUM_WITHDRAWAL ASSERTION to the database definition, the DBMS will check to make
sure that the assertion remains TRUE each time you execute an SQL statement that modifies either the INVESTOR
or WITHDRAWALS tables. As such, each time the user or application program attempts to execute an INSERT,
UPDATE, or DELETE statement on one of the tables in the assertion's CHECK clause, the DBMS checks the check
condition against the database, including the proposed modification. If the check condition remains TRUE, the
DBMS carries out the modification. If the modification makes the check condition FALSE, the DBMS does not
perform the modification and returns an error code indicating that the statement was unsuccessful due to an
assertion violation.
83
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Example: Each CLASS is taken by many students, and each STUDENT can take many CLASSes.
There may be many rows in the CLASS table for any given row in the STUDENT table. Additionally, there can be
many rows in the STUDENT table for any given row in the CLASS table. M:N relations create a lot of redundancy, in
that the same tuple occurs many times in a given table, so tuples and their attributes are repeated many times,
occupying space and leading to errors and efficiency problems.
Bridge Entity:
The problem inherent to the many‐to‐many relationship can be avoided by creating composite entity, also called
bridge entity or associative entity. Such tables are used to link the tables that were originally related in an M:N
relationship.
The composite entity structure includes –as foreign keys ‐ at least the primary keys of the tables that are to be
linked. The designer has two options when defining a composite table primary key: use a combination of those
foreign keys or create a new primary key.
84
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Distributed Database Management Systems
When an organization is geographically dispersed, it may choose to store its databases on a central database server
or to distribute them to local servers (or a combination of both). A distributed database is a single logical database
that is spread physically across computers in multiple locations that are connected by a data communications
network. We emphasize that a distributed database is truly a database, not a loose collection of files.
The distributed database is still centrally administered as a corporate resource while providing local flexibility and
customization. The network must allow the users to share the data; thus, a user (or program) at location A must be
able to access (and perhaps update) data at location B. The sites of a distributed system may be spread over a large
area (e.g., country or the world) or over a small area (e.g., a building or campus). The computers may range from
PCs to large-scale servers or even supercomputers. A distributed database requires multiple instances of a database
85
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
management system (or several DBMSs), running at each remote site. The degree to which these different DBMS
instances cooperate, or work in partnership, and whether there is a master site that coordinates requests involving
data from multiple sites distinguish different types of distributed database environments. It is important to
distinguish between distributed and decentralized databases. A decentralized database is also stored on computers
at multiple locations; however, the computers are not interconnected by network and database software that
make the data appear to be in one logical database. Thus, users at the various sites cannot share data. A
decentralized database is best regarded as a collection of independent databases, rather than having the
geographical distribution of a single database.
87
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A homogeneous distributed database environment.
88
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• The same DBMS is used at each location.
• All data are managed by the distributed DBMS.
• All users access the database through one global schema or database definition.
• The global schema is simply the union of all the local database schemas.
It is difficult in most organizations to force a homogeneous environment, yet heterogeneous environments are
much more difficult to manage. A heterogeneous environment will be defined by the following characteristics
• Data are distributed across all the nodes.
• Different DBMSs may be used at each node.
• Some users require only local access to databases, which can be accomplished by using only the local DBMS and
schema.
• A global schema exists, which allows local users to access remote data.
Collaborative Servers:
• Servers can serve queries or be clients and query other servers
• Support indirect queries
Peer-to-Peer Architecture:
• Scalability and flexibility in growing and shrinking
• All nodes have the same role and functionality
• Harder to manage because all machines are autonomous and loosely coupled
89
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Objectives and Trade-offs:
A major objective of distributed databases is to provide ease of access to data for users at many different locations.
To meet this objective, the distributed database system must provide location transparency, which means that a
user (or user program) using data for querying or updating need not know the location of the data. Any request to
retrieve or update data from any site is automatically forwarded by the system to the site or sites related to the
processing request. Ideally, the user is unaware of the distribution of data, and all data in the network appear as a
single logical database stored at one site. In this ideal case, a single query can join data from tables in multiple sites
as if the data were all in one site.
A second objective of distributed databases is local autonomy, which is the capability to administer a local database
and to operate independently when connections to other nodes have failed. With local autonomy, each site has the
capability to control local data, administer security, and log transactions and recover when local failures occur and
to provide full access to local data to local users when any central or coordinating site cannot operate. In this case,
data are locally owned and managed, even though they are accessible from remote sites. This implies that there is
no reliance on a central site.
90
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
asynchronous distributed database technology keeps copies of replicated data at different nodes so that local
servers can access data without reaching out across the network. With asynchronous technology, there is usually
some delay in propagating data updates across the remote databases, so some degree of at least temporary
inconsistency is tolerated. Asynchronous technology tends to have acceptable response time because updates
happen locally and data replicas are synchronized in batches and pre-determined intervals, but it may be more
complex to plan and design to ensure exactly the right level of data integrity and consistency across the nodes.
Compared with centralized databases, either form of a distributed database has numerous advantages. The
following are the most important of them:
• Increased reliability and availability When a centralized system fails, the database is unavailable to all users. A
distributed system will continue to function at some reduced level, however, even when a component fails. The
reliability and availability will depend (among other things) on the way the data are distributed (discussed in the
following sections).
• Local control Distributing the data encourages local groups to exercise greater control over “their” data, which
promotes improved data integrity and administration. At the same time, users can access nonlocal data when
necessary. Hardware can be chosen for the local site to match the local, not global, data processing work.
• Modular growth Suppose that an organization expands to a new location or adds a new workgroup. It is often
easier and more economical to add a local computer and its associated data to the distributed network than to
expand a large central computer. Also, there is less chance of disruption to existing users than is the case when a
central computer system is modified or expanded.
• Lower communication costs With a distributed system, data can be located closer to their point of use. This can
reduce communication costs, compared with a central system.
• Faster response Depending on the way data are distributed, most requests for data by users at a particular site
can be satisfied by data stored at that site. This speeds up query processing since communication and central
computer delays are minimized. It may also be possible to split complex queries into subqueries that can be
processed in parallel at several sites, providing even faster response.
91
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• Processing overhead The various sites must exchange messages and perform additional calculations to ensure
proper coordination among data at the different sites.
• Data integrity A by-product of the increased complexity and need for coordination is the additional exposure to
improper updating and other problems of data integrity.
• Slow response If the data are not distributed properly according to their usage, or if queries are not formulated
correctly, response to requests for data can be extremely slow.
Data replication
A popular option for data distribution as well as for fault tolerance of a database is to store a separate copy of the
database at each of two or more sites. Replication may allow an IS organization to move a database off a
centralized mainframe onto less expensive departmental or location-specific servers, close to end users. Replication
may use either synchronous or asynchronous distributed database technologies, although asynchronous
technologies are more typical in a replicated environment. If a copy is stored at every site, we have the case of full
replication, which may be impractical except for only relatively small databases. However, as disk storage and
network technology costs have decreased, full data replication, or mirror images, have become more common,
especially for “always on” services, such as electronic commerce and search engines.
92
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Snapshot replication:
snapshot replication Different schemes exist for updating data copies. Some applications, such as those for decision
support and data warehousing or mining, which often do not require current data, are supported by simple table
copying or periodic snapshots. This might work as follows, assuming that multiple sites are updating the same data.
First, updates from all replicated sites are periodically collected at a master, or primary, site, where all the updates
are made to form a consolidated record of all changes. With some distributed DBMSs, this list of changes is
collected in a snapshot log, which is a table of row identifiers for the records to go into the snapshot. Then a read-
only snapshot of the replicated portion of the database is taken at the master site. Finally, the snapshot is sent to
each site where there is a copy. (It is often said that these other sites “subscribe” to the data owned at the primary
site.) This is called a full refresh of the database. Alternatively, only those pages that have changed since the last
snapshot can be sent, which is called a differential, or incremental, refresh. In this case, a snapshot log for each
replicated table is joined with the associated base to form the set of changed rows to be sent to the replicated
sites.
• The data processor(DP), which is the software component residing on each computer that stores and retrieves
data located at the site. The DP is also known as the data manager (DM). A data processor may even be a
centralized DBMS.
93
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Distributed Database Design
There are in general several design alternatives.
Top-down approach:
first the general concepts, the global framework are defined, after then the details.
Down-top approach:
first the detail modules are defined, after then the global framework. If the system is built up from a scratch, the
top-down method is more accepted. If the system should match to existing systems or some modules are yet
ready, the down-top method is usually used.
General design steps
according to the structure:
- analysis of the external, application requirements
- design of the global schema
- design of the fragmentation
- design of the distribution schema
- design of the local schemes
- design of the local physical layers
DDBMS -specific design steps:
- design of the fragmentation
- design of the distribution schema
During the requirement analysis phase, also the fragmentation and distribution requirements are considered.
94
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Down-Top Design:
Usually existing and heterogeneous databases are integrated into a common distributed system.
Steps of integration:-
Common data model selection As the different component databases may have different data models and the
DDBMS should base on a single, common data model, the first step is to convert the different models into a
common model. The common model is usually an intermediate model, different from the data model of the
components
- Translation of each local schema into the common model The different schema element descriptions should be
converted into this common model.
- Integration of the local schema into a common global schema Beside the collection of the component
descriptions, the integration should deal with the matching of the different semantic elements and with the
resolving of the different types of inconsistency.
- Design the translation between the global and local schemes To access all of the components on a homogeneous
way, a conversion procedure should be applied.
95
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Main aspects of the fragmentation:
Granularity:
The granularity determines at which level of database storage can be the fragmentation performed. If it is too low
(field) then it needs a lot of management cost. If it is too rough (user level) then the unnecessary elements should
be replicated causing a higher cost.
Fragmentation strategy
- horizontal fragmentation:. In this case the granularity is at the tuple level, and the attribute values of the
tuple determine the corresponding fragment. The relation is partitioned horizontally into fragments.
- vertical fragmentation: the assignment of the data elements in a table is based on the schema position of the
data. In this case the different projections are the content of the fragments.
- mixed fragmentation: the data are fragmented by both the vertical as the horizontal method.
Horizontal Fragmentation
The relation is partitioned horizontally into fragments.
Primary fragmentation:
the assignment of the tuple depends on the attribute values of the tuple Derived fragmentation:
the assignment of the tuple depends not on the attributes of this tuple, but on the attributes of another
tuple(s). The fragmentation is described by an expression which value for every tuple determines the
corresponding fragment.
Vertical Fragmentation
The assignment of the data elements in a table is based on the attribute identifier of the data. In this case the
different projections are the content of the fragments.
96
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Fragmentation rule:
Every attribute must belong to at least one fragment, and every fragment must have a tuple identifier.
Vertical partitioning:
every attribute is contained in only one fragment.
Vertical clustering:
an attribute may be contained in more than one fragments. The vertical clustering causes replication of some data
elements. The replication is more advantageous for the read-only applications than for the read-write application.
In the latter case, the same update operation is performed on several sites.
Identification of the fragmentation:
The fragmentation of an R relation schema into R1 and R2 is only then advisable when there are different
applications that use either R1 or R2 but not both.
97
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Database Security
Database management systems are increasingly being used to store information about all aspects of an enterprise.
The data stored in a DBMS is often vital to the business interests of the organization and is regarded as a corporate
asset. In addition to protecting the intrinsic value of the data, corporations must consider ways to ensure privacy
and to control access to data that must not be revealed to certain groups of users for various reasons.
There are three main objectives to consider while designing a secure database application:
1. Secrecy: Information should not be disclosed to unauthorized users. For example,
a student should not be allowed to examine other students' grades.
2. Integrity: Only authorized users should be allowed to modify data. For example, students may be allowed to see
their grades, yet not allowed to modify them.
3. Availability: Authorized users should not be denied access. For example, an instructor who wishes to change a
grade should be allowed to do so.
Types of Security
Database security is a broad area that addresses many issues, including the following:
■ Various legal and ethical issues regarding the right to access certain information—for example, some
information may be deemed to be private and cannot be accessed legally by unauthorized organizations or
persons. In the
United States, there are numerous laws governing privacy of information.
■ Policy issues at the governmental, institutional, or corporate level regarding what kinds of information
should not be made publicly available—for example, credit ratings and personal medical records.
■ System-related issues such as the system levels at which various security functions should be enforced—for
example, whether a security function should be handled at the physical hardware level, the operating system
level, or the DBMS level.
■ The need in some organizations to identify multiple security levels and to categorize the data and users based
on these classifications—for example, top secret, secret, confidential, and unclassified. The security policy of
the
organization with respect to permitting access to various classifications of data must be enforced.
Threats to Databases.
Threats to databases can result in the loss or degradation of some or all of the following commonly accepted
security goals: integrity, availability, and confidentiality.
■ Loss of integrity. Database integrity refers to the requirement that information be protected from improper
modification. Modification of data includes creating, inserting, and updating data; changing the status of data;
and deleting data. Integrity is lost if unauthorized changes are made to the data by either intentional or
accidental acts. If the loss of system or data integrity is not corrected, continued use of the contaminated
system or corrupted data
could result in inaccuracy, fraud, or erroneous decisions.
■ Loss of availability. Database availability refers to making objects available to a human user or a program
who/which has a legitimate right to those data objects. Loss of availability occurs when the user or program
cannot access these objects.
■ Loss of confidentiality. Database confidentiality refers to the protection of data from unauthorized
disclosure. The impact of unauthorized disclosure of confidential information can range from violation of the
98
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Data Privacy Act. Unauthorized, unanticipated, or unintentional disclosure could result in loss of public
confidence, embarrassment, or legal action against the organization.
access specific data files, records, or fields in a specified mode (such as read, insert, delete, or update).
■ Mandatory security mechanisms. These are used to enforce multilevel security by classifying the data and
users into various security classes (or levels) and then implementing the appropriate security policy of the
organization. For example, a typical security policy is to permit users at a certain classification (or clearance)
level to see only the data items classified at the user’s own (or lower) classification level. An extension of this
is role-based
security, which enforces policies and privileges based on the concept of organizational roles.
Control Measures
Four main control measures are used to provide security of data in databases:
■ Access control
■ Inference control
■ Flow control
■ Data encryption
✓ Access control
A security problem common to computer systems is that of preventing unauthorized persons from accessing
the system itself, either to obtain information or to make malicious changes in a portion of the database. The
security mechanism of a DBMS must include provisions for restricting access to the database system as a
whole. This function, called access control, is handled by creating user accounts and passwords to control the
login process by the DBMS.
✓ Inference control
Statistical databases are used to provide statistical information or summaries of values based on various
criteria. For example, a database for population statistics may provide statistics based on age groups, income
levels, household size, education levels, and other criteria. Statistical database users such as government
99
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
statisticians or market research firms are allowed to access the database to retrieve statistical information
about a population but not to access the detailed confidential information about specific individuals. Security
for statistical databases must ensure that information about individuals cannot be accessed. It is sometimes
possible to deduce or infer certain facts concerning individuals from queries that involve only summary
statistics on groups; consequently, this must not be permitted either. This problem, called statistical
database security. The corresponding control measures are called inference
control measures.
✓ Flow control
Another security issue is that of flow control, which prevents information from flowing in such a way that it
reaches unauthorized users. Flow control are pathways on which information flows implicitly in ways that
violate the security policy of an organization.
✓ Data encryption
A final control measure is data encryption, which is used to protect sensitive data (such as credit card
numbers) that is transmitted via some type of communications network. Encryption can be used to provide
additional protection for sensitive portions of a database as well. The data is encoded using some coding
algorithm. An unauthorized user who accesses encoded data will have difficulty deciphering it, but authorized
users are given decoding or decrypting algorithms (or keys) to decipher the data. Encrypting techniques that
are very difficult to decode without a key
have been developed for military applications. However, encrypted database records are used today in both
private organizations and governmental and military applications. In fact, state and federal laws prescribe
encryption for any system that deals with legally protected personal information.
SQL Authorization
Authorization:
• A file system identifies certain privileges on the objects (files) it manages.
100
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
o Typically read, write, execute.
• A file system identifies certain participants to whom privileges may be granted.
o Typically the owner, a group, all users.
Database Objects
➢ The objects on which privileges exist include stored tables and views.
➢ Other privileges are the right to create objects of a type, e.g., triggers.
➢ Views form an important tool for access control.
Example: Views as Access Control:
✓ We might not want to give the SELECT privilege on Emps(name, addr, salary).
✓ But it is safer to give SELECT on:
CREATE VIEW SafeEmps AS
SELECT name, addr FROM Emps;
✓ Queries on SafeEmps do not require SELECT on Emps, just on SafeEmps.
Granting Privileges
✓ You have all possible privileges on the objects, such as relations, that you create.
✓ You may grant privileges to other users (authorization ID’s), including PUBLIC.
✓ You may also grant privileges WITH GRANT OPTION, which lets the grantee also grant this
privilege.
To grant privileges, say:
GRANT <list of privileges>
101
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
ON <relation or other object>
TO <list of authorization ID’s>;
If you want the recipient(s) to be able to pass the privilege(s) to others add:
WITH GRANT OPTION
Example: GRANT:
Suppose you are the owner of Sells. You may say:
GRANT SELECT, UPDATE(price)
ON Sells
TO sally;
Now Sally has the right to issue any query on Sells and can update the price component only.
Suppose we also grant:
GRANT UPDATE ON Sells TO sally
WITH GRANT OPTION;
Now, Sally not only can update any attribute of Sells, but can grant to others the privilege UPDATE ON Sells.
Also, she can grant more specific privileges like UPDATE(price)ON Sells.
Revoking Privileges
REVOKE <list of privileges>
ON <relation or other object>
FROM <list of authorization ID’s>;
✓ Your grant of these privileges can no longer be used by these users to justify their use of the
privilege.
But they may still have the privilege because they obtained it independently from elsewhere.
102
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
REFERENCES Ability to create a constraint that refers to the table.
ALTER Ability to perform ALTER TABLE statements to change the table definition.
ALL does not grant all permissions for the table. Rather, it grants the ANSI-92
ALL
permissions which are SELECT, INSERT, UPDATE, DELETE, and REFERENCES.
object
The name of the database object that you are granting permissions for. In the case of granting privileges on a
table, this would be the table name.
user
The name of the user that will be granted these privileges.
Example
Let's look at some examples of how to grant privileges on tables in SQL Server.
For example, if you wanted to grant SELECT, INSERT, UPDATE, and DELETE privileges on a table called
employees to a user name smithj, you would run the following GRANT statement:
GRANT SELECT, INSERT, UPDATE, DELETE ON employees TO smithj;
You can also use the ALL keyword to indicate that you wish to grant the ANSI-92 permissions (ie: SELECT,
INSERT, UPDATE, DELETE, and REFERENCES) to a user named smithj. For example:
GRANT ALL ON employees TO smithj;
If you wanted to grant only SELECT access on the employees table to all users, you could grant the privileges to
the public role. For example:
GRANT SELECT ON employees TO public;
103
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
user
The name of the user that will have these privileges revoked.
Example
Let's look at some examples of how to revoke privileges on tables in SQL Server.
For example, if you wanted to revoke DELETE privileges on a table called employees from a user named
anderson, you would run the following REVOKE statement:
REVOKE DELETE ON employees FROM anderson;
If you wanted to revoke ALL ANSI-92 permissions (ie: SELECT, INSERT, UPDATE, DELETE, and REFERENCES)
on a table for a user named anderson, you could use the ALL keyword as follows:
REVOKE ALL ON employees FROM anderson;
If you had granted SELECT privileges to the public role (ie: all users) on the employees table and you wanted to
revoke these privileges, you could run the following REVOKE statement:
REVOKE SELECT ON employees FROM public;
104
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Designing Good Designed Database
A properly designed database provides with access to up-to-date, accurate information. Because a correct
design is essential to achieving goals in working with a database, investing the time required to learn the
principles of good design makes sense. Certain principles guide the database design process. The first
principle is that duplicate information (also called redundant data) is bad, because it wastes space and
increases the likelihood of errors and inconsistencies. The second principle is that the correctness and
completeness of information is important. If your database contains incorrect information, any reports that
pull information from the database will also contain incorrect information. As a result, any decisions you make
that are based on those reports will then be misinformed.
105
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
simple like "The customer database keeps a list of customer information for the purpose of producing
mailings and reports." If the database is more complex or is used by many people, as often occurs in a
corporate setting, the purpose could easily be a paragraph or more and should include when and how each
person will use the database. The idea is to have a well-developed mission statement that can be referred to
throughout the design process. Having such a statement helps you focus on your goals when you make
decisions.
A key point to remember is that you should break each piece of information into its smallest useful parts. In
the case of a name, to make the last name readily available, you will break the name into two parts — First
Name and Last Name. To sort a report by last name, for example, it helps to have the customer's last name
stored separately. In general, if you want to sort, search, calculate, or report based on an item of information,
you should put that item in its own field.
Think about the questions you might want the database to answer. For instance, how many sales of your
featured product did you close last month? Where do your best customers live? Who is the supplier for your
best-selling product? Anticipating these questions helps you zero in on additional items to record.
106
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The major entities shown here are the products, the suppliers, the customers, and the orders. Therefore, it
makes sense to start out with these four tables: one for facts about products, one for facts about suppliers, one
for facts about customers, and one for facts about orders. Although this doesn’t complete the list, it is a good
starting point. You can continue to refine this list until you have a design that works well.
When you first review the preliminary list of items, you might be tempted to place them all in a single table,
instead of the four shown in the preceding illustration. You will learn here why that is a bad idea. Consider for
a moment, the table shown here:
In this case, each row contains information about both the product and its supplier. Because you can have
many products from the same supplier, the supplier name and address information has to be repeated many
times. This wastes disk space. Recording the supplier information only once in a separate Suppliers table, and
then linking that table to the Products table, is a much better solution.
A second problem with this design comes about when you need to modify information about the supplier. For
example, suppose you need to change a supplier's address. Because it appears in many places, you might
accidentally change the address in one place but forget to change it in the others. Recording the supplier’s
address in only one place solves the problem. When you design your database, always try to record each fact
just once. If you find yourself repeating the same information in more than one place, such as the address for a
particular supplier, place that information in a separate table.
Finally, suppose there is only one product supplied by XYZ, and you want to delete the product, but retain the
supplier name and address information. How would you delete the product record without also losing the
supplier information? You can't. Because each record contains facts about a product, as well as facts about a
supplier, you cannot delete one without deleting the other. To keep these facts separate, you must split the
one table into two: one table for product information, and another table for supplier information. Deleting a
product record should delete only the facts about the product, not the facts about the supplier.
Once you have chosen the subject that is represented by a table, columns in that table should store facts only
about the subject. For instance, the product table should store facts only about products. Because the supplier
address is a fact about the supplier, and not a fact about the product, it belongs in the supplier table.
107
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
❖ Turning information items into columns
To determine the columns in a table, decide what information you need to track about the subject recorded in
the table. For example, for the Customers table, Name, Address, City-State-Zip, Send e-mail, Salutation and E-
mail address comprise a good starting list of columns. Each record in the table contains the same set of
columns, so you can store Name, Address, City-State-Zip, Send e-mail, Salutation and E-mail address
information for each record. For example, the address column contains customers’ addresses. Each record
contains data about one customer, and the address field contains the address for that customer.
Once you have determined the initial set of columns for each table, you can further refine the columns. For
example, it makes sense to store the customer name as two separate columns: first name and last name, so
that you can sort, search, and index on just those columns. Similarly, the address actually consists of five
separate components, address, city, state, postal code, and country/region, and it also makes sense to store
them in separate columns. If you want to perform a search, filter or sort operation by state, for example, you
need the state information stored in a separate column.
You should also consider whether the database will hold information that is of domestic origin only, or
international, as well. For instance, if you plan to store international addresses, it is better to have a Region
column instead of State, because such a column can accommodate both domestic states and the regions of
other countries/regions. Similarly, Postal Code makes more sense than Zip Code if you are going to store
international addresses.
The following list shows a few tips for determining your columns.
• Don’t include calculated data
In most cases, you should not store the result of calculations in tables. Instead, you can have Access perform
the calculations when you want to see the result. For example, suppose there is a Products On Order report
that displays the subtotal of units on order for each category of product in the database. However, there is no
Units On Order subtotal column in any table. Instead, the Products table includes a Units On Order column that
stores the units on order for each product. Using that data, Access calculates the subtotal each time you print
the report. The subtotal itself should not be stored in a table.
• Store information in its smallest logical parts
You may be tempted to have a single field for full names, or for product names along with product
descriptions. If you combine more than one kind of information in a field, it is difficult to retrieve individual
facts later. Try to break down information into logical parts; for example, create separate fields for first and
last name, or for product name, category, and description.
108
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
❖ Specifying primary keys
Each table should include a column or set of columns that uniquely identifies each row stored in the table. This
is often a unique identification number, such as an employee ID number or a serial number. In database
terminology, this information is called the primary key of the table. Access uses primary key fields to quickly
associate data from multiple tables and bring the data together for you.
If you already have a unique identifier for a table, such as a product number that uniquely identifies each
product in your catalog, you can use that identifier as the table’s primary key — but only if the values in this
column will always be different for each record. You cannot have duplicate values in a primary key. For
example, don’t use people’s names as a primary key, because names are not unique. You could easily have two
people with the same name in the same table.
A primary key must always have a value. If a column's value can become unassigned or unknown (a missing
value) at some point, it can't be used as a component in a primary key.
You should always choose a primary key whose value will not change. In a database that uses more than one
table, a table’s primary key can be used as a reference in other tables. If the primary key changes, the change
must also be applied everywhere the key is referenced. Using a primary key that will not change reduces the
chance that the primary key might become out of sync with other tables that reference it.
Often, an arbitrary unique number is used as the primary key. For example, you might assign each order a
unique order number. The order number's only purpose is to identify an order. Once assigned, it never
changes.
If you don’t have in mind a column or set of columns that might make a good primary key, consider using a
column that has the AutoNumber data type. When you use the AutoNumber data type, Access automatically
assigns a value for you. Such an identifier is factless; it contains no factual information describing the row that
it represents. Factless identifiers are ideal for use as a primary key because they do not change. A primary key
that contains facts about a row — a telephone number or a customer name, for example — is more likely to
change, because the factual information itself might change.
109
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
1. A column set to the AutoNumber data type often makes a good primary key. No two product IDs are the
same.
In some cases, you may want to use two or more fields that, together, provide the primary key of a table. For
example, an Order Details table that stores line items for orders would use two columns in its primary key:
Order ID and Product ID. When a primary key employs more than one column, it is also called a composite
key.
For the product sales database, you can create an AutoNumber column for each of the tables to serve as
primary key: ProductID for the Products table, OrderID for the Orders table, CustomerID for the Customers
table, and SupplierID for the Suppliers table.
110
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
1. Information in this form comes from the Customers table...
2. ...the Employees table...
3. ...the Orders table...
4. ...the Products table...
5. ...and the Order Details table.
Access is a relational database management system. In a relational database, you divide your information into
separate, subject-based tables. You then use table relationships to bring the information together as needed.
To represent a one-to-many relationship in your database design, take the primary key on the "one" side of
the relationship and add it as an additional column or columns to the table on the "many" side of the
relationship. In this case, for example, you add the Supplier ID column from the Suppliers table to the Products
111
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
table. Access can then use the supplier ID number in the Products table to locate the correct supplier for each
product.
The Supplier ID column in the Products table is called a foreign key. A foreign key is another table’s primary
key. The Supplier ID column in the Products table is a foreign key because it is also the primary key in the
Suppliers table.
You provide the basis for joining related tables by establishing pairings of primary keys and foreign keys. If
you are not sure which tables should share a common column, identifying a one-to-many relationship ensures
that the two tables involved will, indeed, require a shared column.
112
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Each record in the Order Details table represents one line item on an order. The Order Details table’s primary
key consists of two fields — the foreign keys from the Orders and the Products tables. Using the Order ID field
alone doesn’t work as the primary key for this table, because one order can have many line items. The Order
ID is repeated for each line item on an order, so the field doesn’t contain unique values. Using the Product ID
field alone doesn’t work either, because one product can appear on many different orders. But together, the
two fields always produce a unique value for each record.
In the product sales database, the Orders table and the Products table are not related to each other directly.
Instead, they are related indirectly through the Order Details table. The many-to-many relationship between
orders and products is represented in the database by using two one-to-many relationships:
• The Orders table and Order Details table have a one-to-many relationship. Each order can have more
than one line item, but each line item is connected to only one order.
• The Products table and Order Details table have a one-to-many relationship. Each product can have
many line items associated with it, but each line item refers to only one product.
From the Order Details table, you can determine all of the products on a particular order. You can also
determine all of the orders for a particular product.
After incorporating the Order Details table, the list of tables and fields might look something like this:
113
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
❖ Creating a one-to-one relationship
Another type of relationship is the one-to-one relationship. For instance, suppose you need to record some
special supplementary product information that you will need rarely or that only applies to a few products.
Because you don't need the information often, and because storing the information in the Products table
would result in empty space for every product to which it doesn’t apply, you place it in a separate table. Like
the Products table, you use the ProductID as the primary key. The relationship between this supplemental
table and the Product table is a one-to-one relationship. For each record in the Product table, there exists a
single matching record in the supplemental table. When you do identify such a relationship, both tables must
share a common field.
When you detect the need for a one-to-one relationship in your database, consider whether you can put the
information from the two tables together in one table. If you don’t want to do that for some reason, perhaps
because it would result in a lot of empty space, the following list shows how you would represent the
relationship in your design:
• If the two tables have the same subject, you can probably set up the relationship by using the same
primary key in both tables.
• If the two tables have different subjects with different primary keys, choose one of the tables (either
one) and insert its primary key in the other table as a foreign key.
Determining the relationships between tables helps you ensure that you have the right tables and columns.
When a one-to-one or one-to-many relationship exists, the tables involved need to share a common column or
columns. When a many-to-many relationship exists, a third table is needed to represent the relationship.
114
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
helps highlight potential problems — for example, you might need to add a column that you forgot to insert
during your design phase, or you may have a table that you should split into two tables to remove duplication.
See if you can use the database to get the answers you want. Create rough drafts of your forms and reports
and see if they show the data you expect. Look for unnecessary duplication of data and, when you find any,
alter your design to eliminate it.
As you try out your initial database, you will probably discover room for improvement. Here are a few things
to check for:
• Did you forget any columns? If so, does the information belong in the existing tables? If it is
information about something else, you may need to create another table. Create a column for every
information item you need to track. If the information can’t be calculated from other columns, it is
likely that you will need a new column for it.
• Are any columns unnecessary because they can be calculated from existing fields? If an information
item can be calculated from other existing columns — a discounted price calculated from the retail
price, for example — it is usually better to do just that, and avoid creating new column.
• Are you repeatedly entering duplicate information in one of your tables? If so, you probably need to
divide the table into two tables that have a one-to-many relationship.
• Do you have tables with many fields, a limited number of records, and many empty fields in individual
records? If so, think about redesigning the table so it has fewer fields and more records.
• Has each information item been broken into its smallest useful parts? If you need to report, sort,
search, or calculate on an item of information, put that item in its own column.
• Does each column contain a fact about the table's subject? If a column does not contain information
about the table's subject, it belongs in a different table.
• Are all relationships between tables represented, either by common fields or by a third table? One-to-
one and one-to- many relationships require common columns. Many-to-many relationships require a
third table.
Without Normalization, it becomes difficult to handle and update the database, without facing data loss.
Insertion, Updation and Deletion Anomalies are very frequent if Database is not Normalized.
Anomalies in DBMS:
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database
administrator. Managing a database with anomalies is next to impossible.
There are three types of anomalies that occur when the database is not normalized. These are –
Insertion anomaly
update anomaly
deletion anomaly.
117
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that has
four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for
storing employee’s address and emp_dept for storing the department details in which the employee works. At
some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly:
In the above table we have two rows for employee Rick as he belongs to two departments of the company. If
we want to update the address of Rick then we have to update the same in two rows or the data will become
inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the
database, Rick would be having two different addresses, which is not correct and would lead to inconsistent
data.
Insert anomaly:
Suppose a new employee joins the company, who is under training and currently not assigned to any
department then we would not be able to insert the data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly:
Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having
emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this
department.
Normalization Rule:
Normalization rule are divided into following normal form.
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF (Boyce-Codd Normal Form)
118
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
As per First Normal Form, no two Rows of data must contain repeating group of information i.e each set of
column must have a unique value, such that multiple columns cannot be used to fetch the same row. Each
table should be organized into rows, and each row should have a primary key that distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one column can be combined to create a
single primary key. For example consider a table which is not in First normal form
Student Table :
Student Age Subject
Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths
In First Normal Form, any row must not have a column in which more than one value is saved (Adam,15,
Biology, Maths), like separated with commas. Rather than that, we must separate such data into multiple rows.
If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime
key attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds
true. E.g, R has some elements of S.
R = {1,2,3,4} and S= {3,4,5}.
R ⊂ S = {3,5} ⊂ S
119
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule,
non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key
attribute individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial dependency.
Example:
As per the Second Normal Form there must not be any partial dependency of any column on primary key. It means
that for a table that has concatenated primary key, each column in the table that is not part of the primary key
must depend upon the entire concatenated key for its existence. If any column depends only on one part of the
concatenated key, then the table fails Second normal form.
In example of First Normal Form there are two rows for Adam, to include multiple subjects that he has opted for.
While this is searchable, and follows First normal form, it is an inefficient use of space. Also in the above Table in
First Normal Form, while the candidate key is {Student, Subject}, Age of Student only depends on Student column,
which is incorrect as per Second Normal Form. To achieve second normal form, it would be helpful to split out the
subjects into an independent table, and match them up using the student names as foreign keys.
Adam 15
Alex 14
Stuart 17
In Student Table the candidate key will be Student column, because all other column i.e Age is dependent on it.
New Subject Table introduced for 2NF will be : (table 2: Subject)
Student Subject
Adam Biology
120
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Adam Maths
Alex Maths
Stuart Maths
In Subject Table the candidate key will be {Student, Subject} column. Now, both the above tables qualifies for
Second Normal Form and will never suffer from Update Anomalies. Although there are a few complex cases in
which table in Second Normal Form suffers Update Anomalies, and to handle those scenarios Third Normal Form is
there.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City
can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally,
Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows −
Example 1:
121
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Third Normal form applies that every non-prime attribute of table must be dependent on primary key, or we can
say that, there should not be the case that a non-prime attribute is determined by another non-prime attribute. So
this transitive functional dependency should be removed from the table and also the table must be in Second
Student_Detail Table:
In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip
and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to
new table, with Zip as primary key.
Address Table:
Zip Street city state
122
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan
Employee_zip table:
emp_zip emp_state emp_city emp_district
282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −
• For any non-trivial functional dependency, X → A, X must be a super-key.
123
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
In the figure, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation
ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.
Example:
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies
with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one department. They store the data
like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp
Production and planning D001 200
stores D001 250
124
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
design and technical support D134 100
Purchasing department D134 600
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
125
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Functional Dependencies:
A functional dependency is a relationship between two attributes. Typically between the PK and other non-key
attributes with in the table. For any relation R, attribute Y is functionally dependent on attribute X (usually the PK),
if for every valid instance of X, that value of X uniquely determines the value of Y.
X ———–> Y
The left-hand side of the FD is called the determinant, and the right-hand side is the dependent.
Examples:
SID ———-> Name, Address, Birthdate
SID determines names and address and birthdays. Given SID, we can determine any of the other attributes within
the table.
SID, Course ———> DateCompleted
SID and Course determine date completed. This must also work for a composite PK.
ISBN ———–> Title
ISBN determines title.
126
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• From our understanding of primary keys, A is a Primary Key.
Since the values of E are always the same, it follows that: A →E, B →E, C →E, D →E
However, we cannot generally summarized above by ABCD →E
In general, A →E, B →E AB →E
Other observations:
• combinations of BC are unique, therefore BC →ADE
• combinations of BD are unique, therefore BD →ACE
• if C values match, so do D values, therefore C →D however, D values don’t determine C values, so C does
not determine D, and D does not determine C.
When looking at the data, it makes a lot more sense in terms of which attributes are dependent and which are
determinants.
Inference Rules
Armstrong’s axioms are a set of axioms (or, more precisely, inference rules) used to infer all the functional
dependencies on a relational database. They were developed by William W. Armstrong.
Let R(U) be a relation scheme over the set of attributes U. We will use the letters X, Y, Z to represent any subset of
and, for short, the union of two sets of attributes and by instead of the usual X U Y.
examples, R has some (or all) elements of S.
R = {1,2,3,4} and S= {3,4,5}
R ∪ S = {1,2,3,4,5}
Axiom of reflexivity: (Partial dependency)
Studentno, course —> studentName, address, city, prov, pc, grade, dateCompleted
This situation is not desirable, because every non key attribute has to be fully dependent on the PK. In this
situation Student information is only ‘partially’ dependent on the PK; StudentNo.
To fix this problem, we need to break down the table into two as follows:
StudentNo, course, grade, dateCompleted, StudentNo, studentName, address, city, prov, pc
Axiom of transitivity
127
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
If X determines Y and Y determines Z, then X must also determine Z.
StudentNo, studentName, address, city, prov, pc, ProgramID, ProgramName
X Y Z
This situation is not desirable, because a non-key attributes depends on another non key attribute.
To fix this problem, we need to break this table into two; one to hold information about the student and the other
to hold information about the program. However we still need to leave a FK in the student table, so that we can
determine which program the Student is enrolled in.
StudentNo —> studentName, address, city, prov, pc, ProgramID
ProgramID —> ProgramName
Additional rules:
Union
Decomposition
If X determines Y and Z, then X determines Y and X determines Z separately. This is the reverse of Union. If you
have a table that appears to contain two entities that are determined by the same PK, consider breaking them up
into two tables.
Dependency Diagram
A dependency diagram illustrates the various dependencies that may exist in a non-normalized table. The following
dependencies are identified:
ProjectNo, and EmpNo combined is the PK.
Partial Dependencies:
ProjectNo —> ProjName
128
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
EmpNo —> EmpName, DeptNo, HrsWork
Transitive Dependency:
DeptNo —> DeptName
Remember:
PD – Partial Dependency
TD – Transitive Dependency
FD – Full Dependency
A Functional Dependency with Multiple Attributes is shown below, for the functional dependency
Order#, Prod# > Quantity.
129
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
A derived Functional Dependency involving Partial Key Dependency is shown in the figure below.
The arrow connected to the outer rectangle, which represents Order#, Prod# > Product can be deleted without loss
of information.
A derived Functional Dependency involving Transitive Dependency is shown in the figure below.
The arrow which represents Order# > Supplier can be deleted without loss of information.
130
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Closure Of Functional Dependency
The Closure Of Functional Dependency means the complete set of all possible attributes that can be functionally
derived from given functional dependency using the inference rules known as Armstrong’s Rules.
If “F” is a functional dependency then closure of functional dependency can be denoted using “{F}+”.
There are three steps to calculate closure of functional dependency. These are:
Step-1 : Add the attributes which are present on Left Hand Side in the original functional dependency.
Step-2 : Now, add the attributes present on the Right Hand Side of the functional dependency.
Step-3 : With the help of attributes present on Right Hand Side, check the other attributes that can be derived from
the other given functional dependencies. Repeat this process until all the possible attributes which can be derived
are added in the closure.
The Algorithm
✓ The procedure shown in the previous example can be generalized to an algorithm. Assume we are given
the set of functional dependencies FD and a set of attributes X. The algorithm is as follows:
✓ Add the attributes contained in the attribute set X to the result set X+.
✓ Add the attributes to the result set X+ which can be functionally determined from the attributes already
contained in the result set.
✓ Repeat step 2 until no more attributes can be added to the result set X+.
Example 1
We are given the relation R(A, B, C, D, E). This means that the table R has five columns: A, B, C, D, and E. We
are also given the set of functional dependencies: {A->B, B->C, C->D, D->E}.
What is {A}+?
• First, we add A to {A}+.
• What columns can be determined given A? We have A -> B, so we can determine B. Therefore, {A}+ is now
{A, B}.
• What columns can be determined given A and B? We have B -> C in the functional dependencies, so we can
determine C. Therefore, {A}+ is now {A, B, C}.
131
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
• Now, we have A, B, and C. What other columns can we determine? Well, we have C -> D, so we can add D
to {A}+.
• Now, we have A, B, C, and D. Can we add anything else to it? Yes, since D -> E, we can add E to {A}+.
• We have used all of the columns in R and we have all used all functional dependencies. {A}+ = {A, B, C, D, E}.
Example 2
Let’s look at another example. We are given R(A, B, C, D, E, F). The functional dependencies are {AB->C, BC->AD, D-
>E, CF->B}. What is {A, B}+?
• We start with {A, B}.
• What columns can we determine, given A and B? We have AB -> C, so we can add C to {A, B}+.
• We now have A, B, and C. What other columns can we determine? We have BC -> AD. We already have A in
{A, B}+, so we can add D.
• So, we now have A, B, C, and D. What else can we add? We have D -> E, so we can add E to {A, B}+.
• Now {A, B}+ is {A, B, C, D, E}. Can we add anything else? No. We have one more functional dependency in
our set that we did not use: CF -> B. We can’t use this dependency because F is not in {A, B}+.
• Thus, {A, B}+ is {A, B, C, D, E}.
Prime and non-prime attributes
Attributes which are parts of any candidate key of relation are called as prime attribute, others are non-prime
attributes. For Example, STUD_NO in STUDENT relation is prime attribute, others are non-prime attribute.
Multivalued Dependency:
Multivalued Dependencies is a generalization of Functional Dependencies Concept that significantly helps to design
and optimize a Relation Database Structure. Let R(X1...,Xm, Y1...,Yn, Z1...,Zr) is a relation with m + n + r column
names. For notational convenience, we can define X for {X1...,Xm}; Y for {Y1...,Yn} and Z for {Z1...,Zn} and the
relation is R(X, Y, Z).
Multivalued Dependency is a statement of the form X -> Y, where X and Y are sets of attributes. Let Z be the set of
all the attributes in U (union) that are neither in X nor in Y. The multivalued dependency X -> Y holds in R if for all r1
and r2 in R, if r1[X] = r2[X], then there are r3 and r4 in R such that r3[X] = r1[X], r3[Y] = r1[Y], and r3[Z] = r2[Z]; and
r4[X] = r1[X], r4[y] = r2[Y], and r4[Z] = r1[Z]. examples: R = {1,2,3,4} and S= {3,4,5}
R ∪ S = {1,2,3,4,5}
Let X -> Y holds in R if and only if X -> Y - X holds in R. When the sets X, Y, and Z form a partition of U (union), then it
is convenient to write a tuple r of R like (x, y, z), where x, y, and z denote the projections of r onto X, Y, and Z. The
alphabet (A, B, C, D,...) is denote single attributes and (...,X, Y, Z) to denote sets of attributes. XY is a union of X and
Y attributes. A string of attributes A1, A2, ... An is denotes the set {A1, A2, ..., An}.
The multivalued dependency X -> Y is said to hold for R(X, Y, Z) if Yxz depends only on x; that is, if Yxz = Yxz' for each
x, z, z' such that Yxz and Yxz' are nonempty. Define Yxz to be {y : (x, y, z) ->R}, Yxz is nonempty IFF x and z appear
together in a tuple of R. Multivalued Dependencies provide a necessary and sufficient condition for a relation to be
decomposable into two of its projections without loss of information.
An instance, we can decompose R(X, Y, Z) into R1(X, Y) and R2(X, Z) is the set of tuples (x, y, z) where (x, y) is a tuple
of R1 and where (x, z) is a tuple of R2. In database, a Project_Employee_Task Entity is defined with {Project_Name,
Employee_Name, Task_Name) that can decompose into Project_Employee Entity (Project_Name,
Employee_Name} and Project_Task Entity {Project_Name, Task_Name}.
If X and Y are disjoint and if the functional dependency X -> Y holds for a relation R then the multivalued
dependency X -> Y also holds for R. X -> Y holds for the relation R(X, Y, Z) IFF R is the join of its projections R1(X, Y)
132
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
and R2(X, Z) based on multivalued dependency theorem.
R(X, Y, Z) is the join of its projections R1(X, Y) and R2(X, Z) IFF the following condition holds, whenever (x, y, z) and
(x, y', z') are tuples of R, then (x, y', z) and (x, y, z'). Since the right hand side of the "IFF" in theorem is symmetric in
the role of Y and Z, the next proposition is continue immediately that X -> Y holds for the relation R(X, Y, Z) IFF X -> Z
holds.
Join Dependency
Multivalued Dependencies are helps to lossless decomposition and form relation R based on trivial multivalued
dependencies. An instance, a relation R(A B C) is decomposed into relation R1(A B) and R2(A C) based on trivial A ?
B multivalued functional dependencies.
Join Dependency is helps to lossless decomposition and form relation R based on nontrivial multivalued
dependencies. An instance, a relation R(A B C) is decomposed into relation R1(A B), R2(B C) and R3(A C) based on
nontrivial multivalued functional dependencies.
Let R = {R1, R2, ..., Rn} be a set of relation schemes over union, the relation r(R) satisfies the join dependency * [R1,
R2, ..., Rn] if r decomposes lossless onto R1, R2, ... Rn.
A join dependency * [R1, R2, ..., Rn] over R is trivial if it is satisfied by every relation r(R).
133
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Fourth Normal Form
A relation R is in Fourth Normal Form (4NF) if and only if the following conditions are satisfied simultaneously:
R is already in 3NF or BCNF.
If it contains no multi-valued dependencies.
What is Multi-Valued Dependency (MVD)?
MVD is the dependency where one attribute value is potentially a 'multi-valued fact' about another. Consider the
table
134
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
In this example, 'Address' is a multi-valued fact 'Customer Name' and the converse is also true.
For example, the attribute 'Address' takes on the two values 'New Delhi' and 'Amritsar'for the single
'Customer_Name' value 'Raj'. The attribute 'Customer_Name' takes on the values 'Raj' and 'Suneet' for the single
'address; value 'Amritsar'.
MVD can be defined informally as follows:
MVDs occur when two or more independent multi valued facts about the same attribute occur within the same
table. It means that if in a relation R having A, B and C as attributes, B and Care multi-value facts about A, which is
represented as A- -B and A J C, then multi value dependency exist only if B and C are independent of each other.
There are two things to note about this definition.
Firstly, in order for a table to contain MVD, it must have three or more attributes.
Secondly, it is possible to have a table containing two or more attributes which are interdependent multi valued
facts about another attribute.
This does not give rise to an MVD. The attributes giving rise to the multi-valued facts must be independent of each
other consider the following table:
The table lists students, the textbooks; they have borrowed, the librarians issuing them and the date of borrowing.
It contains three multi-valued facts about students, the books they have borrowed, the librarians who have issued
these books to the and the dates upon which the books were borrowed. However, these multi-valued facts are not
independent of each other. There is clearly an association between librarians, the textbooks they have issued and
the dates upon which they issued the books. Therefore, there are no MVDs in the table. Note that there is no
redundant information in this table. The fact that student 'Ankit', for example, has borrowed the book 'Mechanics'
is recorded twice, but these are different borrowing, one in April and the other in June and therefore constitute
different items of information.
Now consider another table example involving Course, Student_name and text_book.
135
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
This table lists students, the courses they attend and the textbooks they use for these courses. The text books are
prescribed by the authorities for each course, that is, the students have no say in the matter. Clearly the attributes
'Student_name' and 'Text_book' are multivalued facts about the attribute 'Course'. However, since a student has
no influence over the text books to be used for a course, these multi-valued facts about courses are independent of
each other. Thus the table contains an MVD. Multi-value facts are represented by.
Here, in above database following MVDs exists:
Course --> --> Student_name
Course --> --> Text book
Here, Student_name and Text_book are independent of each other.
Now, we can easily check that all the above anomalies of STUDENT_ COURSE_BOOK database are removed. For
example, if now a new student joins a course then we have to make only one insertion in COURSE_STUDENT table
and if a new book introduced for a course then again we have to make a single entry in COURSE_BOOK table, so
this modified database eliminate the problem of redundancy which also solves the update problems.
137
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
In all of the further normal forms discussed so far, no loss decomposition was achieved by the decomposing of a
single table into two separate tables. No loss decomposition is possible because of the availability of the join
operator as part of the relational model. In considering 5NF, consideration must be given to tables where this non-
loss decomposition can only be achieved by decomposition into three or more separate tables. Such decomposition
is not always possible as is shown by the following example.
Consider the table
AGENT_COMPANY_PRODUCT (Agent, Company, Product _Name)
This table lists agents, the companies they work for and the products they sell for those companies. 'The agents do
not necessarily sell all the products supplied by the companies they do business with. An example of this table
might be:
The table is necessary in order to show all the information required. Suneet, for example, sells ABC's Nuts and
Screws, but not ABC's Bolts. Raj is not an age it for CDE and does not sell ABC's Nuts or Screws. The table is in 4NF
because it contains no multi-valued dependency. It does, however, contain an element of redundancy in that it
records the fact that Suneet is an agent for ABC twice. But there is no way of eliminating this redundancy without
losing information. Suppose that the table is decomposed into its two projections, PI and P2.
The redundancy has been eliminated, but the information about which companies make which products and which
of these products they supply to which agents has been lost. The natural join of these projections over the 'agent'
columns is:
The table resulting from this join is spurious, since the asterisked row of the table contains incorrect information.
Now suppose that the original table were to be decomposed into three tables, the two projections, P I and P2
which have already shown, and the final, possible projection, P3.
If a join is taken of all three projections, first of PI and P2 with the (spurious) result shown above, and then of this
result with P3 over the 'Company' and 'Product name' column, the following table is obtained:
138
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
This still contains a spurious row. The order in which the joins are performed makes no difference to the final
result. It is not simply possible of decompose the 'AGENT_COMPANY_PRODUCT' table, populated as shown,
without losing information. Thus, it has to be accepted that it is not possible· to eliminate all redundancies using
normalization techniques, because it cannot be assumed that all decompositions will be non-loss.
But now consider the different case where, if an agent is an agent for a company and that company makes a
product, then he always sells that product for the company. Under these circumstances, the 'agent company
product' table as shown below:
The assumption being that ABC makes both Nuts and Bolts and that CDE makes Bolts only. This table can be
decomposed into its three projections without loss of information as demonstrated below:
All redundancy has been removed, if the natural join of PI and P2 IS taken, the result is:
139
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The spurious row as asterisked. Now, if this result is joined with P3 over the column 'company 'product_name' the
following table is obtained:
This is a correct recomposition of the original table and no loss decomposition into the three projections was
achieved. Again, the order in which the joins are performed does not affect the final result. The original table,
therefore, violated 5NF simply because it was non-loss decomposable into its three projections.
In the first case exemplified above, non-loss decomposition of the 'agent_company -product' table was not
possible. In the second it was. If a table is nonloss decomposable as in the second case, it is said to be in violation of
5NF. The difference, of course, lay in certain semantic properties of the information being represented. These
properties were not understandable simply by looking at the table, but had to be supplemented by further
information about the relationship between products, agents and companies.
Detecting that a table violates 5 NF is very difficult in practice and for this reason this normal form has little if any
practical application. The theoretical concept of fifth normal form is discussed in the following paragraphs.
Suppose that the statement, 'The agent50mpany -product' table is equal to the join of its three projections is to
hold true, this is another way of saying that it Can be non-loss decomposed into its three projections and is
equivalent to saying.
IF the tuple 'agent X, company Y' appears in PI
AND the tuple 'agent X, product Z' appears in P2
AND the tuple 'company Y, product Z' appears in P3
Then the row 'agent X, company Y, product Z' must have appeared in 'agent_company _product'.
If the reader cares to re-examine the projections PI, P2, and P3 from the two versions of ' the table which were
illustrated earlier, then, it will be seen that the earlier version which was in 5NF does not confirm to the above rule,
whereas the later version, which violated
5NF does.
The rule is referred to as a Join. Dependency, because it holds good only if a table can be reconstituted without loss
of information from the join of certain specified projections it.
140
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
The notation used for a join dependency on table Tips:
*(X, Y, Z)
Where X, Y ... Z are projections of T.
Table T is said to satisfy the above join dependency, if it is equal to the join of the projections X, Y, Z.
Thus, the second example given of the table 'agent_company product' can be said to satisfy the join dependency:
*(PI, P2, P3)
In the discussion of the other further normal forms use was made of the concepts of functional and multi-valued
dependencies. In dealing with 5NF the concept of join dependency has been introduced (in a very informal way).
***Thank You***
Tutor: Biran Limbu
141
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Course detail and pedagogy of
BBM (Bachelor of Business Management) 6th Semester
COM 312: Database Management
Credits: 3
Lecture Hours: 48
Course Objectives
The main objective of this module is to provide strong theoretical and practical knowledge of the
database management system.
Course Description
Database system, Data Abstraction, Data Models, Database users, Entity-Relation Model, Constraints,
E-R Diagrams, Design of E-R Database Schema, Relational Data Model, Structure of Relational
Database, Relational Algebra, Fundamental Operations, Additional Operations, Modifying the
database, Structured Query Language Data Definition Language, Data manipulation Language,
Transaction Control Language, Join operations, Integrity Constraints, Assertion, Triggers, Relational
database design issues, Normalization, Database Governance, Database Management, Transaction
Management.
Course Details
Unit 1: Introduction LH 6
Database Management Systems
Purpose of Database Systems
Data Abstraction
Data Models
• The E-R Model
• The Object-Oriented Model
• The Relational Model
• The Network Model
• The Hierarchical Model
• Physical Data Models
Instances and Schemes
Data Independence
Database Administrator
Database Users
Application Architecture (One tier, two tier and n-tire)
142
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
Overall Database System Structure and Components
2
Unit 2: Entity-Relationship Model LH 6
2.1 Entities and Entity Sets
2.2 Relationships and Relationship Sets
2.3 Attributes
2.4 Mapping Constraints
2.5 Keys (Super key, Candidate key and Primary key)
2.5.1 Primary Keys for Entity Sets and Relationship Sets
2.6 The Entity Relationship Diagram
2.7 Reducing E-R Diagrams to Tables
2.7.1 Representation of Strong Entity Sets
2.7.2 Representation of Weak Entity Sets
2.7.3 Representation of Relationship Sets
2.8 Generalization and Specialization
2.9 Aggregation
2.10 Mapping Cardinalities
2.10.1 Representation of Mapping Cardinalities in E-R Diagram
2.11 Use of Entity or Relationship Sets
2.12 Use of Extended E-R Features
2.13 Design of an E-R Database Scheme ( Case study)
Unit 3: Relational Model LH 7
3.1 Structure of Relational Database
3.2 Basic Structure
3.3 Database Scheme
3.4 Keys
3.5 Query Languages
3.6 The Relational Algebra
3.6.1 Fundamental Operations
3.6.2 Formal Definition of Relational Algebra
3.6.3 Additional Operations
3.7 Modifying the Database
3.7.1 Deletion
3.7.2 Insertions
3.7.3 Updating
3.8 Views and View Definition in Relational Algebra
Unit 4: Structured Query Language (SQL) LH 6
4.1 Background
3
4.2 Data Definition Language
4.2.1 Domain Types in SQL
4.2.2 Schema Definition in SQL
4
4.3 Data Manipulation Language
4.3.1 The select Clause
4.3.2 The where Clause
143
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
4.3.3 The from Clause
4.3.4 The Rename Operation
4.3.5 Tuple Variables
4.3.6 String Operations
4.3.7 Ordering the Display of Tuples
4.3.8 Duplicate Tuples
4.4 Set Operations
4.5 Aggregate Functions
4.6 Null Values
4.7 Nested Subqueries
4.7.1 Set Membership
4.7.2 Set Comparison
4.7.3 Test for Empty Relations
4.7.4 Test for the Absence of Duplicate Tuples
4.8 Derived Relations
4.8.1 Views
4.9 Modification of the Database
4.9.1 Deletion
4.9.2 Insertion
4.9.3 Updates
4.9.4 Updates
4.9.5 Update of a View
4.10 Joined Relations
4.10.1 Join types and Conditions
4.11 Embedded SQL
4.12 Dynamic SQL
4.13 Transaction Control Language (Commit, Rollback)
Unit 5: Integrity Constraints LH 3
5.1 Domain Constraints
5.2 Referential Integrity
5.2.1 Basic Concepts
5.2.2 Referential Integrity in the E-R Model
5.2.3 Database Modification
5.2.4 Referential Integrity in SQL
5
5.3 Assertions
5.4 Triggers
Unit 6: Relational Database Design LH 5
6.1 Pitfalls in Relational DB Design
6.1.1 Representation of Information
6.1.2 Anomalies
6.2 Functional Dependencies
6.2.1 Basic Concepts
6.2.2 Closure of a Set of Functional Dependencies
6.2.3 Closure of Attribute Sets
6.3 Decomposition
6.3.1 Lossless-Join Decomposition
144
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.
6.3.2 Dependency Preservation
6.4 Normalization
6.4.1 First Normal Form
6.4.2 Second Normal Form
6.4.3 Third Normal Form
6.4.4 Boyce-Codd Normal Form
6.4.5 Comparison of BCNF and 3NF
Unit 7: Data Governance LH 4
7.1 Introduction
7.2 Data governance drivers
7.3 Data governance initiatives
Unit 8: Database Management LH 6
8.1 Data maintenance
8.2 Data quality Management: Data cleansing, data integrity, Data enrichment, Data quality
8.3 Data Security Management: Data access, Data erasure, Data Privacy, Data Security
Unit 9: Transaction Management LH 5
9.1 ACID Properties
9.2. Transaction States
9.2.1 Implementation of Atomicity and Durability
9.2.1 Serializability
9.2.3 Basic Concept of Concurrency Control and Recovery
9.2.4 Locking Protocols
6
Note:
➢ The students are required to undertake a project work. The project work can be done
individually or in group (at most 4 - 5 students). The format of the project report is as
follows:
o Project Description
o Description of entities or object considered in the project
o Algorithm or Diagram showing description of project
o Conclusion of the project
The project report should be original, and the reproduction of others’ work is strictly
prohibited. Number of pages of the report should be at least 4.
Total lecture Hour: 45
145
Compiled By: Biran Limbu, MMC(TU), Dharan, Nepal.