Advanced Database Technologies
Advanced Data Modeling
Lecture By:
Teklu Urgessa (PhD in Engineering)
1
Objectives
• In this Lecture, you will learn:
– About the extended entity relationship (EER)
model’s main constructs
– How entity clusters are used to represent
multiple entities and relationships
– The characteristics of good primary keys and
how to select them
– How to use flexible solutions for special data
modeling cases
– What issues to check for when developing data
models based on EER diagrams
2
The Extended Entity
Relationship Model
• Result of adding more semantic constructs to
original entity relationship (ER) model
• Diagram using this model is called an EER
diagram (EERD)
3
Entity Supertypes and Subtypes
• Entity supertype
– Generic entity type related to one or more entity
subtypes
– Contains common characteristics
• Entity subtypes
– Contains unique characteristics of each entity
subtype
4
5
Specialization Hierarchy
• Depicts arrangement of higher-level entity
supertypes and lower-level entity subtypes
• Relationships described in terms of “IS-A”
relationships
• Subtype exists only within context of supertype
• Every subtype has only one supertype to which
it is directly related
• Can have many levels of supertype/subtype
relationships
6
7
Inheritance
• Enables entity subtype to inherit attributes and
relationships of supertype
• All entity subtypes inherit their primary key
attribute from their supertype
• At implementation level, supertype and its
subtype(s) maintain a 1:1 relationship
• Entity subtypes inherit all relationships in which
supertype entity participates
• Lower-level subtypes inherit all attributes and
relationships from all upper level-supertypes
8
9
Subtype Discriminator
• Attribute in supertype entity
– Determines to which entity subtype each
supertype occurrence is related
• Default comparison condition for subtype
discriminator attribute is equality comparison
• Subtype discriminator may be based on other
comparison condition
10
Disjoint and Overlapping Constraints
• Disjoint subtypes
– Also known as non-overlapping subtypes
– Subtypes that contain unique subset of
supertype entity set
• Overlapping subtypes
– Subtypes that contain nonunique subsets of
supertype entity set
11
12
13
Completeness Constraint
• Specifies whether entity supertype occurrence
must be a member of at least one subtype
• Partial completeness
– Symbolized by a circle over a single line
– Some supertype occurrences that are not
members of any subtype
• Total completeness
– Symbolized by a circle over a double line
– Every supertype occurrence must be member of
at least one subtype
14
15
Specialization and Generalization
• Specialization
– Identifies more specific entity subtypes from
higher-level entity supertype
– Top-down process
– Based on grouping unique characteristics and
relationships of the subtypes
16
Specialization and Generalization
(continued)
• Generalization
– Identifies more generic entity supertype from
lower-level entity subtypes
– Bottom-up process
– Based on grouping common characteristics and
relationships of the subtypes
17
Entity Clustering
• “Virtual” entity type used to represent multiple
entities and relationships in ERD
• Considered “virtual” or “abstract” because it is
not actually an entity in final ERD
• Temporary entity used to represent multiple
entities and relationships
• Eliminate undesirable consequences
– Avoid display of attributes when entity
clusters are used
18
19
Entity Integrity:
Selecting Primary Keys
• Primary key most important characteristic of an
entity
– Single attribute or some combination of
attributes
• Primary key’s function is to guarantee entity
integrity
• Primary keys and foreign keys work together to
implement relationships
• Properly selecting primary key has direct
bearing on efficiency and effectiveness
20
Natural Keys and Primary Keys
• Natural key is a real-world identifier used to
uniquely identify real-world objects
– Familiar to end users and forms part of their
day-to-day business vocabulary
• Generally data modeler uses natural identifier
as primary key of entity being modeled
• May instead use composite primary key or
surrogate key
21
Primary Key Guidelines
• Attribute that uniquely identifies entity instances
in an entity set
– Could also be combination of attributes
• Main function is to uniquely identify an entity
instance or row within a table
• Guarantee entity integrity, not to “describe” the
entity
• Primary keys and foreign keys implement
relationships among entities
– Behind the scenes, hidden from user
22
23
When to Use Composite Primary Keys
• Composite primary keys useful in two cases:
– As identifiers of composite entities
• Where each primary key combination allowed
once in M:N relationship
– As identifiers of weak entities
• Where weak entity has a strong identifying
relationship with the parent entity
• Automatically provides benefit of ensuring that
there cannot be duplicate values
24
25
When to Use Composite Primary Keys
(continued)
• When used as identifiers of weak entities
normally used to represent:
– Real-world object that is existent-dependent on
another real-world object
– Real-world object that is represented in data
model as two separate entities in strong
identifying relationship
• Dependent entity exists only when it is related
to parent entity
26
When To Use Surrogate Primary Keys
• Especially helpful when there is:
– No natural key
– Selected candidate key has embedded semantic
contents
– Selected candidate key is too long or
cumbersome
27
When To Use Surrogate Primary Keys
(continued)
• If you use surrogate key
– Ensure that candidate key of entity in question
performs properly
– Use “unique index” and “not null” constraints
28
29
Design Cases:
Learning Flexible Database Design
• Data modeling and design requires skills
acquired through experience
• Experience acquired through practice
• Four special design cases that highlight:
– Importance of flexible design
– Proper identification of primary keys
– Placement of foreign keys
30
Design Case #1: Implementing 1:1
Relationships
• Foreign keys work with primary keys to properly
implement relationships in relational model
• Put primary key of the “one” side on the “many”
side as foreign key
– Primary key: parent entity
– Foreign key: dependent entity
31
Design Case #1: Implementing 1:1
Relationships (continued)
• In 1:1 relationship two options:
– Place a foreign key in both entities (not
recommended)
– Place a foreign key in one of the entities
• Primary key of one of the two entities appears as
foreign key of other
32
33
34
Design Case #2: Maintaining History
of Time-Variant Data
• Normally, existing attribute values replaced with
new value without regard to previous value
• Time-variant data:
– Values change over time
– Must keep a history of data changes
• Keeping history of time-variant data equivalent
to having a multivalued attribute in your entity
• Must create new entity in 1:M relationships with
original entity
• New entity contains new value, date of change
35
36
37
Design Case #3: Fan Traps
• Design trap occurs when relationship is
improperly or incompletely identified
– Represented in a way not consistent with the
real world
• Most common design trap is known as fan trap
• Fan trap occurs when one entity is in two 1:M
relationships to other entities
– Produces an association among other entities
not expressed in the model
38
39
40
Design Case #4:
Redundant Relationships
• Redundancy is seldom a good thing in
database environment
• Occur when there are multiple relationship
paths between related entities
• Main concern is that redundant relationships
remain consistent across model
• Some designs use redundant relationships to
simplify the design
41
42
Data Modeling Checklist
• Data modeling translates specific real-world
environment into data model
– Represents real-world data, users, processes,
interactions
• EERM enables the designer to add more
semantic content to the model
• Data modeling checklist helps ensure data
modeling tasks successfully performed
• Based on concepts and tools learned since
Chapter 3
43
44
Summary
• Extended entity relationship (EER) model adds
semantics to ER model
– Adds semantics via entity supertypes, subtypes,
and clusters
– Entity supertype is a generic entity type related
to one or more entity subtypes
• Specialization hierarchy
– Depicts arrangement and relationships between
entity supertypes and entity subtypes
• Inheritance means an entity subtype inherits
attributes and relationships of supertype
45
Summary (continued)
• Subtype discriminator determines which entity
subtype the supertype occurrence is related to:
– Partial or total completeness
– Specialization vs. generalization
• Entity cluster is “virtual” entity type
– Represents multiple entities and relationships in
ERD
– Formed by combining multiple interrelated
entities and relationships into a single object
46
Summary (continued)
• Natural keys are identifiers that exist in real
world
– Sometimes make good primary keys
• Characteristics of primary keys:
– Must have unique values
– Should be nonintelligent
– Must not change over time
– Preferably numeric or composed of single
attribute
47
Summary (continued)
• Composite keys are useful to represent
– M:N relationships
– Weak (strong-identifying) entities
• Surrogate primary keys are useful when no
suitable natural key makes primary key
• In a 1:1 relationship, place the PK of mandatory
entity
– As FK in optional entity
– As FK in entity that causes least number of nulls
– As FK where the role is played
48
Summary (continued)
• Time-variant data
– Data whose values change over time
– Requires keeping a history of changes
• To maintain history of time-variant data:
– Create entity containing the new value, date of
change, other time-relevant data
– Entity maintains 1:M relationship with entity for
which history maintained
49
Summary (continued)
• Fan trap:
– One entity in two 1:M relationships to other
entities
– Association among the other entities not
expressed in model
• Redundant relationships occur when multiple
relationship paths between related entities
– Main concern is that they remain consistent
across the model
• Data modeling checklist provides way to check
that the ERD meets minimum requirements
50