Data Models
How to structure data
What is a Data Model?
Having formed a model of the enterprise, we now need to
represent the data.
The data model tells us the structure of the database.
Historically, three data models:
Hierarchical data model
Network data model
Relational data model
Hierarchical and Network Data Models
Hierarchical and network data models have been
superseded by the relational data model.
Reasons:
Lack of expressive power
E.g. one cannot express many-to-many
relationships in the hierarchical model
More closely tied to the underlying implementation.
Hence, less data independence.
Relational data model has a clean mathematical basis.
The Relational Model
Due to Codd.
Everything is represented as a relation in the
mathematical sense. Also called tables.
A database therefore is a collection of tables, each of
which has a unique name, and each of which is described
by a schema.
In addition, Codd defined a data manipulation language.
Example of Schemas in the Relational Model
Example of a representation of entity sets:
Student(sid,name,addr)
Course(cid,title,eid)
Empl(eid, ename, deptid)
Dept(deptID, dname, loc)
Primary keys are underlined.
Recall that a primary key is one that uniquely identifies an
entity.
An entity is a row in a table.
More Example Schemas
Relationship sets between entity sets are also
represented in tables.
Example of a table corresponding to a relationship:
Enrol(sid, cid, grade)
Again, a relationship is represented by a row (or a tuple)
in a table.
Relational Databases:
Basic Concepts I
Attribute:
A column in a table
Domain
The set of values from which the values of an attribute
are drawn.
Null value
A special value, meaning not known or not
applicable.
Relation schema
A set of attribute names
7
Relational Databases: Basic Concepts II
Tuple
A set of values, one for each attribute in the relation
schema over which the tuple is defined, i.e. a mapping
from attributes to the appropriate domains
Relation instance
A set of tuples over the schema of the relation
Relational Databases:
Basic Concepts III
Relational Database
A set of relations, each with a unique name
Normalized Relation
A relation in which every value is atomic (nondecomposable). Hence, every attribute in every tuple
has a single value.
Keys
Candidate Key
A minimal set of attributes that uniquely identifies a
tuple
Primary Key
The candidate key chosen as the identifying key of the
relation
Alternate Key
Candidate keys which are not primary keys
10
Foreign Key
An attribute (or set of attributes) in table R1 which also
occurs as the primary key of relation R2.
R2 is called the referenced relation.
Foreign keys are also called connection keys or
reference attributes.
11
Integrity Rules: Entity Constraint
Entity constraint
All attributes in a primary key must be non-null.
Motivation: If the primary key uniquely identifies an
entity in an entity set, then we must ensure that we
have all the relevant information
12
Integrity Rules:
Referential Integrity
Referential integrity
A database cannot contain a tuple with a value for a
foreign key that does not match a primary key value in
the referenced relation.
Or, a foreign key must refer to a tuple that exists.
Motivation: If referential integrity were violated, we
could have relationships between entities that we do
not have any information about.
13
Converting ER Diagrams to Relational Schemas
(tables)
A database which conforms to an E-R diagram can be
represented by a collection of relations (tables).
For each entity set and relationship set there is a unique
relation that is assigned the name of the corresponding
entity set or relationship set.
Each schema has a number of columns (generally
corresponding to attributes), which have unique names.
14
Representing Entity Sets as Relations (tables)
A strong entity set reduces to a relation with the attributes
of the entity.
Course(cid, ctitle)
A weak entity set becomes a table that includes the
primary key of the strong entity set.
Example:
Section(cid, secno, quota)
15
Representing Relationship Sets as
Relations (tables)
A many-to-many relationship set is represented as a
relation with attributes for the primary keys of the two
participating entity sets, and any attributes of the
relationship set.
Example:
Enrolls(sid, cid, grade)
16
Total Participation
The participation of an entity set in a relationship set is
total if each entity in the set occurs in at least one
relationship in that relationship set
17
Representing Relationship Sets as
Relations (tables)
Many-to-one and one-to-many relationship sets that
are total on the many-side can be represented by
adding an extra attribute to the many side,
containing the primary key of the one side.
Example:
Course(cid,ctitle,lid)
If participation is not total on the many side,
replacing the relationship table by an extra attribute
in the relation corresponding to the many side could
result in null values.
18
Representing Relationship Sets as
Relations (tables)
For one-to-one relationship sets an extra attribute can be
added to either of the tables corresponding to the two
entity sets.
The relation corresponding to a relationship set linking a
weak entity set to its identifying strong entity set is
redundant.
Example: The section entity already contains the
attributes that would appear in the relation (i.e. cid and
secno).
19
Representing Specialization
Method 1:
Create a relation for the higher-level entity.
Create a relation for each lower-level entity set,
include primary key of higher-level entity set and local
attributes.
Example:
Student(sid,sname)
Graduate(sid,supervisor)
Undergraduate(sid,credits)
Drawback: getting information about a student may
require accessing two relations (i.e. both the low-level
and the high-level relation).
20
Representing Specialization
Method 2:
Form a relation for each entity set with all local and
inherited attributes.
Example:
If specialization is total, the schema for the generalized
entity set (student) is not required to store information.
Student(sid,sname)
Undergraduate(sid,sname,credits)
Graduate(sid,sname,supervisor)
Can be defined as a view relation containing union of
specialization relations
Drawback: some amount of redundancy may be
introduced in the design.
21
Representing Aggregation
Loan-Manager (employee_id, customer_id, loan_id)
Relation Borrows is redundant
What is the primary key in this relation?
Customer
Borrows
Loan
Loan-Manager
Employee
22