Com 312 Lecture Note-1
Com 312 Lecture Note-1
Introduction
• A database is an organized collection of data. Database information is organized in such a way
that a computer program can quickly select the pieces of data. Database is a collection of
information that is organized so that it can be easily accessed, managed, updated and deleted. In
other words, database is a Collection of logically interrelated data. (Eg: College)
Week 2
Understand data view and data model
Understand database administrators, users and languages.
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database is to provide users with an abstract
view of data. That is, the system hide certain details of how the data are stored and maintained.
DATA ABSTRACTION
Underlying the structure of a database is the data model: a collection of conceptual tools for de
scribing data, data relationships, data semantics, and consistency constraints. A data model
provides a way to describe the design of a database at the physical, logical, and view levels. For
the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many database-
system users are not computer trained, developers hide the complexity from users through
several levels of abstraction, to simplify users’ interaction with the system:
Physical Level: the lowest level of abstraction describe how the data are actually stored. The
physical level describes complex low-level data structures.
Logical level: the next higher level of abstraction what data are stored in the database, and what
relationship exist among those data. The logical level thus describes the entire database in terms
of a small number of relatively simple structures. Although implementation of the simple
Prepared by BABA SALEH1
structures at the logical level may involve complex physical-level structures, the user of the
logical level does not need to be aware of this complexity. This is referred to as physical data
independence. Database administrators, who must decide what information to keep in the
database, use the logical level of abstraction.
View Level: The highest level of abstraction describes only part of the entire database. Even
though the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database. Many users of the database system do not need all this
information; instead, they need to access only a part of the database. The view level of
abstraction exists to simplify their interaction with the system. The system may provide many
views for the same database.
Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called
a database administrator (DBA). The functions of a DBA include:
Schema definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
Storage structure and access-method definition.
Schema and physical-organization modification. The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.
Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
Routine maintenance. Examples of the database administrator’s routine maintenance activities
are:
Periodically backing up the database, either onto tapes or onto remote servers, to prevent
loss of data in case of disasters such as flooding.
Week 3 & 4
Different types of data model: hierarchical, network and relational model
Continued Different types of data model: hierarchical, network and relational models
A database model defines the logical design and structure of a database and defines how data
will be stored, accessed and updated in a database management system. While the Relational
Model is the most widely used database model, there are other models too:
Hierarchical Model
Network Model
Entity-Relationship Model
Relational Model
Hierarchical Model
This database model organizes data into a tree-like structures, with a single roots, to which all
other data is linked. The hierarchy starts from the Root data, and expands like a tree, adding child
nodes to the parent nodes.
In this model, a child node will only have a single parent nodes.
This model efficiently describes many real-world relationship like index of a book, recipes etc.
In hierarchical model, data is organized into tree-like structure with one-to-many relationship
between two different types of data, for example, one department can have many courses, many
professors and off-course many students.
Relational Model
In this model, data is organized in two-dimensional tables and the relationship is maintained by
storing a common field.
This model was introduced by E.F Cold in 1970, and since then it has been the most widely used
database model, in fact we can say the only database model used around the world.
The basic structure of data in the relational model is tables. All the information related to a
particular type is stored in the rows of that table.
Hence, tables are also known as relations in the relational model.
In the coming tutorials we will learn how to design tables, normalize them to reduce data
redundancy and how use Structured Query Language (SQL) to access data from tables.
Up to this point in the text, we have assumed a given database schema and studied how queries
and updates are expressed. We now consider how to design a database schema in the first place.
In this portion, we focus on the entity relationship data model (E-R), which provides a means of
identifying entities to be represented in the database and how those entities are related.
Ultimately, the database design will be expressed in terms of a relational database design and an
associated set of constraints. We show in this portion how an E-R design can be transformed into
a set of relation schemas and how some of the constraints can be captured in that design. Then, in
this portion, we consider in detail whether a set of relation schemas is a good or bad database
design and study the process of creating good designs using a broader set of constraints. These
two chapters cover the fundamental concepts of database design.
The entity-relationship (E-R) data model was developed to facilitate database design by
allowing specification of an enterprise schema that represents the overall logical structure of a
database.
The E-R model is very useful in mapping the meanings and interactions of real-world enterprises
onto a conceptual schema. Because of this usefulness, many database-design tools draw on
concepts from the E-R model. The E-R data model employs three basic concepts: entity sets,
relationship sets, and attributes, which we study first. The E-R model also has an associated
diagrammatic representation, the E-R diagram, will be study later in this chapter.
Entity Sets
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects.
For example, each person in a university is an entity. An entity has a set of properties, and the
values for some set of properties may uniquely identify an entity. For instance, a person may
have a person id property whose value uniquely identifies that person. Thus, the value 677-89-
9011 for person id would uniquely identify one particular person in the university. Similarly,
courses can be thought of as entities, and course id uniquely identifies a course entity in the
university. An entity may be concrete, such as a person or a book, or it may be abstract, such as a
course, a courses offering, or a flight reservation.
An entity set is a set of entities of the same type that share the same properties, or attributes. The
set of all people who are instructors at a given university, for example, can be defined as the
entity set instructor. Similarly, the entity set student might represent the set of all students in the
university.
In the process of modeling, we often use the term entity set in the abstract, without referring to a
particular set of individual entities. We use the term extension of the entity set to refer to the
actual collection of entities belonging to the entity set. Thus, the set of actual instructors in the
university forms the extension of the entity set instructor. The above distinction is similar to the
difference between a relation and a relation instance.
Entity sets do not need to be disjoint. For example, it is possible to define the entity set of all
people in a university (person). A person entity may be an instructor entity, a student entity,
both, or neither.
AN ENTITY SETS
An Entity set is divided into two:
In this ER diagram,
Two strong entity sets “Student” and “Course” are related to each other.
Student ID and Student name are the attributes of entity set “Student”.
Student ID is the primary key using which any student can be identified uniquely.
Course ID and Course name are the attributes of entity set “Course”.
Course ID is the primary key using which any course can be identified uniquely.
Double line between Student and relationship set signifies total participation.
It suggests that each student must be enrolled in at least one course.
Single line between Course and relationship set signifies partial participation.
It suggests that there might exist some courses for which no enrollments are made.
2. Weak Entity Set-
A weak entity set is an entity set that does not contain sufficient attributes to uniquely identify its
entities.
In other words, a primary key does not exist for a weak entity set.
However, it contains a partial key called as a discriminator.
Discriminator can identify a group of entities from the entity set.
Discriminator is represented by underlining with a dashed line.
Example-
In this ER diagram,
One strong entity set “Building” and one weak entity set “Apartment” are related to each
other.
Strong entity set “Building” has building number as its primary key.
Door number is the discriminator of the weak entity set “Apartment”.
This is because door number alone can not identify an apartment uniquely as there may be
several other buildings having the same door number.
Double line between Apartment and relationship set signifies total participation.
It suggests that each apartment must be present in at least one building.
Single line between Building and relationship set signifies partial participation.
It suggests that there might exist some buildings which has no apartment.
A single line is used for the A double line is used for the
representation of the connection representation of the connection
between the strong entity set and between the weak entity set and
the relationship. the relationship set.
https://www.gatevidyalay.com/entity-sets-in-dbms/
Sid Sname
101 AB Abdullahi
102 PQ John
103 RS Kabiru
104 UV Rabiu
105 NM Usman
Week 7
Understand pitfalls in relational-database design
An Update Anomaly exists when one or more instances of duplicated data is updated, but not all
i.e Change info in one tuple but not in another. In STU_DETAIL, if we want to change the name
of Instructor of Cource_ID cap301 then it will update all the tuples in the table, but some reason
Book -> Author: Here, the Book attribute determines the Author attribute. If you know the book
name, you can learn the author's name. However, Author does not determine Book, because an
author can write multiple books. For instance, just because we know the author's name Orson
Scott Card, we still don't know the book name. Author -> Author_Nationality: Likewise, the
Author attribute determines the Author_Nationality, but not the other way around: just because
we know the nationality does not mean we can determine the author.
Book -> Author -> Author_Nationality
Multi-valued Dependency
Multi-valued Dependency refers to m:n (many-to-many) relationships.
We say multi-valued dependency exists between two data items when one value of the first data
item gives a collection of values of the second data item i.e., it multi-determines the second data
items.
For example, imagine a car company that manufactures many models of car, but always makes
both red and black colors of each model. If you have a table that contains the model name, color
and year of each car the company manufactures, there is a multivalued dependency in that table.
If there is a row for a certain model name and year in blue, there must also be a similar row
corresponding to the red version of that same car.
Join Dependency
Join Dependency—If we decompose a relation into smaller relations and the join of the smaller
relations does not give us tuples as in the parent relation, we say the relation has join
dependency.
Example relation R is decomposed into r1, r2.....rn than when you assembled them it should
equals to R (without any loss).
Decomposition???
Splitting the relation into two or more subrelations.
Decomposition can be of two types:
Loss Less Join Decomposition.
Lossy Decomposition.
Student_table:
Lossy JOIN
NORMAL FORMS
The normal forms (abbrev. NF) of relational database theory provide criteria for determining a
table's degree of vulnerability to logical inconsistencies and anomalies.
The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and
anomalies.
Each table has a "highest normal form" (HNF): by definition, a table always meets the
requirements of its HNF and of all normal forms lower than its HNF.
The designer then becomes aware of a requirement to record multiple telephone numbers for
some customers. He reasons that the simplest way of doing this is to allow the "Telephone
Number" field in any given record to contain more than one value.
Customer_table:
Customer ID FirstName SurName Telephone_no
123 Robert Ingram 08035462722
456 Jane Wright 08020335576
07035456633
789 Maria Fernandez 08077223456
Student Table :
In First Normal Form, any row must not have a column in which more than one value is saved,
like separated with commas.
Rather than that, we must separate such data into multiple rows.
• Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given
Employee might need to appear more than once (he might have multiple Skills), and a given
Skill might need to appear more than once (it might be possessed by multiple Employees). Only
the composite key {Employee, Skill} qualifies as a candidate key for the table.
• The remaining attribute, Current Work Location, is dependent on only part of the candidate
key, namely Employee. Therefore the table is not in 2NF.
• Note the redundancy in the way Current Work Locations are represented: we are told three
times
that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way.
• This redundancy makes the table vulnerable to update anomalies: it is, for example, possible to
update Jones' work location on his "Typing" and "Shorthand" records and not update his
"Whittling" record. The resulting data would imply contradictory answers to the question "What
is Jones' current work location?"
A 2NF alternative to this design would represent the same information in two tables: an
"Employees" table with candidate key {Employee}, and an "Employees' Skills" table with
candidate key {Employee, Skill}:
• In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to apply 3NF,
we need to move the street, city and state to new table, with Zip as primary key.
• New Student_Detail Table :
• Address Table :
Student
_id
BCNF
• It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than
3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X
should be the super key of the table. R must be in 3rd Normal Form
• And, for each functional dependency ( X -> Y ), X should be a super Key(or at least a
candidate key).
• In the above 3NF example, STUDENT_ID is the Primary key in STUDENT table and ZIP is
the primary key in the ZIPCODE table. There is no other key column in each of the tables which
determines the functional dependency. Hence it's in BCNF form. That is, with STUDENT_ID,
we can retrieve STUDENT_NAME and ZIP from STUDENT table. Similarly, with ZIP value,
we can retrieve STREET and CITY from ZIPCODE table.
• BCNF is much restrictive than 3NF which help in normalizing the table more. The relation in
3NF has minimum redundancy left which is further removed by the BCNF.
4NF
• Fourth Normal Form (4NF) 4NF is a stronger normal form than BCNF as it prevents Tables
from containing nontrivial Multi-Valued Dependencies (MVDs) and hence data redundancy.
• Clearly the attributes 'Student_name' and 'Text_book' are multivalued facts about the attribute
'Course'.
• However, since a student has no influence over the text books to be used for a course, these
multi-valued facts about courses are independent of each other. Thus the table contains an MVD.
• Multi-value facts are represented by. Here, in above database following MVDs exists:
Course --> --> Student_name
Course --> --> Text book
Here, Student_name and Text_book are independent of each other.
Rule to transform a relation into Fourth Normal Form
A relation R having A, B, and C, as attributes can be non loss-decomposed into two projections
R1(A,B) and R2(A,C) if and only if the MVD A--> --> B|C hold in R. Looking again at the un-
decomposed COURSE_STUDENT_BOOK table, it contains a multi-valued dependency as
shown below:
Course ---> --> Student_name
Course ---> --> Text_book
To put it into 4NF, two separate tables are formed as shown below:
COURSE_STUDENT (Course, Student_name)
COURSE_BOOK (Course, text_book)
Week 8
Understand domain-key normal form
DOMAIN KEYS IN NORMAL FORMS
A relation is in DKNF when insertion or delete anomalies are not present in the database.
Domain-Key Normal Form is the highest form of Normalization. The reason is that the insertion
and updation anomalies are removed. The constraints are verified by the domain and key
constraints.
A table is in Domain-Key normal form only if it is in 4NF, 3NF and other normal forms. It is
based on constraints:
Domain Constraint
Values of an attribute had some set of values, for example, EmployeeID should be four digits
long:
EmpID EmpName EmpAge
0921 Tom 33
0922 Jack 31
Week 10
Understand nested sub-queries
Understand derived relations and views
Week 11 & 12
Understand views
Understand joined relations
Understand data definition language and embedded SQL.
Week 13
Understand centralized systems
Understand client- server systems
Week 14 & 15
Understand parallel systems
Understand distributed systems and netrgtfwork types