lOMoARcPSD|31633394
Mid-term DBS (Lec 1 - Lec 5)
Database System (Đại học Hà Nội)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university
Downloaded by Ph??ng anh ph?m (
[email protected])
lOMoARcPSD|31633394
Mid-term DBS
Lec 1: The database environment and development process
1. Basic concepts and de昀椀ni琀椀ons
- What is database ?
+ de昀椀ned as an organized collec琀椀on of logically related data. Databases are used to
store, manipulate, and retrieve data in nearly every type of organiza琀椀on, including
business, health care, educa琀椀on, government, and libraries.
- Data
+ stored representa琀椀ons of meaningful objects and events.
+ There are two types of data:
Structured: numbers, text, dates
Unstructured: images, video, documents
- Informa琀椀on
+ data processed to increase knowledge in the person using the data. (Has meaning,
contrast with data – has no meaning)
- Metadata
+ data that describes the proper琀椀es and context of user data.
2. Disadvantages of File processing systems
- Program-data dependence
- Duplica琀椀on of data
- Limited data sharing
- Lengthy development 琀椀mes
3. Elements of the database approach
- Data models
+ Graphical diagram capturing nature and rela琀椀onship of data.
+ Enterprise Data Model–high-level en琀椀琀椀es and rela琀椀onships for the organiza琀椀on.
+ Project Data Model–more detailed view, matching data structure in database or data
warehouse.
- En琀椀琀椀es
+ Noun form describing a person, place, object, event, or concept.
+ Composed of a琀琀ributes.
- Rela琀椀onships
+ Between en琀椀琀椀es.
+ Usually one-to-many (1:M) or many-to-many (M:N), but could also be one-to-one (1:1)
- Rela琀椀onal databases
+ Database technology involving tables (rela琀椀ons) represen琀椀ng en琀椀琀椀es and
primary/foreign keys represen琀椀ng rela琀椀onships.
lOMoARcPSD|31633394
- Problems:
+ With database dependency
Each applica琀椀on programmer must maintain his/her own data.
Each applica琀椀on program needs to include code for the metadata of each
昀椀le.
Each applica琀椀on program must have its own processing rou琀椀nes for reading,
inser琀椀ng, upda琀椀ng, and dele琀椀ng data.
Lack of coordina琀椀on and central control.
Non-standard 昀椀le formats.
+ With data redundancy
Waste of space to have duplicate data.
Causes more maintenance headaches.
The biggest problem:
Data changes in one 昀椀le could cause
inconsistencies.
Compromises in data integrity.
- Solu琀椀on: The database approach
+ Central repository of shared data.
+ Data is managed by a controlling agent.
+ Stored in a standardized, convenient form.
- Advantages of the database approach
+ Program-data independence
+ Planned data redundancy
+ Improved data consistency
+ Improved data sharing
+ Increased produc琀椀vity of applica琀椀on development
+ Enforcement of standards
+ Improved data quality
+ Improved data accessibility and responsiveness
+ Reduced program maintenance
+ Improved decision support
- Costs and risks of the database approach
+ New, specialized personnel
+ Installa琀椀on and management cost and complexity (training, infrastructure)
+ Conversion costs
+ Need for explicit backup and recovery
+ Organiza琀椀onal con昀氀ict
4. Components of the database environment
- Computer-aided so昀琀ware engineering: Computer-aided so昀琀ware engineering (CASE)
tools CASE tools are automated tools used to design databases and applica琀椀on
lOMoARcPSD|31633394
programs. These tools help with crea琀椀on of data models and in some cases can also
help automa琀椀cally generate the “code” needed to create the database. We reference
the use of automated tools for database design and development throughout the text.
- Repository: A repository is a centralized knowledge base for all data de昀椀ni琀椀ons, data
rela琀椀onships, screen and report formats, and other system components. A repository
contains an extended set of metadata important for managing databases as well as
other components of an informa琀椀on system.
- DBMS: A DBMS is a so昀琀ware system that is used to create, maintain, and provide
controlled access to user databases.
- Database: A database is an organized collec琀椀on of logically related data, usually
designed to meet the informa琀椀on needs of mul琀椀ple users in an organiza琀椀on. It is
important to dis琀椀nguish between the database and the repository. The repository
contains de昀椀ni琀椀ons of data, whereas the database contains occurrences of data.
- Applica琀椀on programs: Computer-based applica琀椀on programs are used to create and
maintain the database and provide informa琀椀on to users.
- The user interface includes languages, menus, and other facili琀椀es by which users
interact with various system components, such as CASE tools, applica琀椀on programs, the
DBMS, and the repository. User interfaces are illustrated throughout this text.
- Data and database administrators: Data administrators are persons who are responsible
for the overall management of data resources in an organiza琀椀on. Database
administrators are responsible for physical database design and for managing technical
issues in the database environment.
- System developers: System developers are persons such as systems analysts and
programmers who design new applica琀椀on programs. System developers o昀琀en use CASE
tools for system requirements analysis and program design.
- End users: End users are persons throughout the organiza琀椀on who add, delete, and
modify data in the database and who request or receive informa琀椀on from it. All user
interac琀椀ons with the database must be routed through the DBMS.
5. Two approaches to database and it’s development
- SDLC:
+ System Development Life Cycle
+ Detailed, well-planned development process
+ Time-consuming, but comprehensive
+ Long development cycle
- Prototyping:
+ Rapid applica琀椀on development (RAD)
+ Cursory a琀琀empt at conceptual data modeling
+ De昀椀ne database during development of ini琀椀al prototype.
+ Repeat implementa琀椀on and maintenance ac琀椀vi琀椀es with
new prototype versions.
6. The range of database applica琀椀ons
- Personal databases:
lOMoARcPSD|31633394
+ designed to support one user.
+ have long resided on personal computers (PCs), including laptops, and increasingly on
smartphones and PDAs.
+ The purpose is to provide the user with ability to manage (store, update, delete, and retrieve)
small amounts of data in an e昀케cient manner.
- Two-琀椀er clients/Server database:
+ Each member of the workgroup has a computer, and the computers are linked by
means of network (wired or wireless LAN). In most cases, each computer has a copy of a
specialized applica琀椀on (client) which provides the user interface as well as the business
logic through which the data is manipulated. The database itself and the DBMS are
stored on a central device called the “database server,” which is also connected to the
network. Thus, each member of the workgroup has access to the shared data.
- Mul琀椀client/Server database
+ In a three-琀椀ered architecture, the user interface is accessible on the individual users’
computer. This user interface may either be Web browser based or wri琀琀en using
programming languages such as Visual Basic.NET, Visual C#, or Java. The applica琀椀on
layer/Web server layer contains the business logic required to accomplish the business
transac琀椀ons requested by the users. This layer in turn talks to the database server.
- Enterprise applica琀椀ons:
+ An enterprise applica琀椀on/database is one whose scope is the en琀椀re organiza琀椀on or
enterprise (or, at least, many di昀昀erent departments). Such databases are intended to
support organiza琀椀on-wide opera琀椀ons and decision making. The evolu琀椀on of enterprise
databases has resulted in two major developments: Enterprise resource planning (ERP)
systems and Data warehousing implementa琀椀on.
+ Enterprise resource planning (ERP) systems: Enterprise resource planning (ERP) A
business management system that integrates all func琀椀ons of the enterprise, such as
manufacturing, sales, 昀椀nance, marke琀椀ng, inventory, accoun琀椀ng, and human resources.
ERP systems are so昀琀ware applica琀椀ons that provide the data necessary for the
enterprise to examine and manage its ac琀椀vi琀椀es.
+ Data warehousing implementa琀椀ons: An integrated decision support database whose
content is derived from the various opera琀椀onal databases.
Lec 2: Modeling data in the organiza琀椀on
1. The E-R Model: an overview
- De昀椀ni琀椀ons:
+ An en琀椀ty-rela琀椀onship model (E-R model) is a detailed, logical representa琀椀on of the
data for an organiza琀椀on or for a business area.
- E-R Model constructs:
+ En琀椀琀椀es:
En琀椀ty instance–person, place, object, event, concept (o昀琀en corresponds to a
row in a table)
lOMoARcPSD|31633394
En琀椀ty Type–collec琀椀on of en琀椀琀椀es (o昀琀en corresponds to a table)
+ Rela琀椀onships:
Rela琀椀onship instance–link between en琀椀琀椀es (corresponds to primary key-
foreign key equivalencies in related tables)
Rela琀椀onship type–category of rela琀椀onship…link between en琀椀ty types
+ A琀琀ributes:
Proper琀椀es or characteris琀椀cs of an en琀椀ty or rela琀椀onship type (o昀琀en
corresponds to a 昀椀eld in a table)
2. Modeling the rules of the organiaz琀椀ons:
- Business rules:
+ Are statements that de昀椀ne or constrain some aspect of the business
+ Are derived from policies, procedures, events, func琀椀ons
+ Assert business structure
+ Control/in昀氀uence business behavior
+ Are expressed in terms familiar to end users
+ Are automated through DBMS so昀琀ware
- A good business rules:
+ Declara琀椀ve–what, not how
+ Precise–clear, agreed-upon meaning
+ Atomic–one statement
+ Consistent–internally and externally
+ Expressible–structured, natural language
+ Dis琀椀nct–non-redundant
+ Business-oriented–understood by business people
- A good data name:
+ Related to business, not technical, characteris琀椀cs
+ Meaningful and self-documen琀椀ng
+ Unique
+ Readable
+ Composed of words from an approved list
+ Repeatable
+ Wri琀琀en in standard syntax
- Data de昀椀ni琀椀ons:
+ Explana琀椀on of a term or fact
Term–word or phrase with speci昀椀c meaning
Fact–associa琀椀on between two or more terms
+ Guidelines for good data de昀椀ni琀椀on
A concise descrip琀椀on of essen琀椀al data meaning
Gathered in conjunc琀椀on with systems requirements
Accompanied by diagrams
Achieved by consensus, and itera琀椀vely re昀椀ned
3. Modeling en琀椀琀椀es
lOMoARcPSD|31633394
- En琀椀ty: a person, a place, an object, an event, or a concept in the user environment
about which the organiza琀椀on wishes to maintain data.
- En琀椀ty type: a collec琀椀on of en琀椀琀椀es that share common proper琀椀es or characteris琀椀cs.
- En琀椀ty instance: A single occurrence of an en琀椀ty type.
- An en琀椀ty:
+ should be
An object that will have many instances in the database
An object that will be composed of mul琀椀ple a琀琀ributes
An object that we are trying to model
+ shouldn’t be
A user of the database system
An output of the database system
- Strong en琀椀ty
+ exists independently of other types of en琀椀琀椀es.
+ has its own unique iden琀椀昀椀er (iden琀椀昀椀er underlined with single line)
- Weak en琀椀ty
+ dependent on a strong en琀椀ty (iden琀椀fying owner)…cannot exist on its own
+ does not have a unique iden琀椀昀椀er (only a par琀椀al iden琀椀昀椀er)
+ en琀椀ty box and par琀椀al iden琀椀昀椀er have double lines
- Iden琀椀fying rela琀椀onship
+ links strong en琀椀琀椀es to weak en琀椀琀椀es
- Names:
+ Singular noun
+ Speci昀椀c to organiza琀椀on
+ Concise, or abbrevia琀椀on
+ For event en琀椀琀椀es, the result not the process
+ Name consistent for all diagrams
- De昀椀ni琀椀ons:
+ “An X is…”
+ Describe unique characteris琀椀cs of each instance
+ Explicit about what is and is not the en琀椀ty
+ When an instance is created or destroyed
+ Changes to other en琀椀ty types
+ History that should be kept
4. A琀琀ributes
- A琀琀ribute–property or characteris琀椀c of an en琀椀ty or rela琀椀onship type
- Classi昀椀ca琀椀ons of a琀琀ributes:
+ Required versus Op琀椀onal A琀琀ributes
+ Simple versus Composite A琀琀ribute
+ Single-Valued versus Mul琀椀valued A琀琀ribute
+ Stored versus Derived A琀琀ributes
+ Iden琀椀昀椀er A琀琀ributes
- Required a琀琀ributes: must have a value for every en琀椀ty (or rela琀椀onship) instance with
which it is associated.
lOMoARcPSD|31633394
- Op琀椀onal a琀琀ributes: may not have a value for every en琀椀ty (or rela琀椀onship) instance with
which it is associated.
- Simple a琀琀ributes: A simple a琀琀ribute is an a琀琀ribute that cannot be further subdivided
into components.
- Composite a琀琀ribute: An a琀琀ribute that has meaningful component parts (a琀琀ributes)
- Mul琀椀valued a琀琀ributes: may take on more than one value for a given en琀椀ty (or
rela琀椀onship) instance.
- Derived a琀琀ributes: values can be calculated from related a琀琀ribute values (not physically
stored in the database).
- Naming a琀琀ributes:
+ Name should be a singular noun or noun phrase
+ Name should be unique
+ Name should follow a standard format
+ Similar a琀琀ributes of di昀昀erent en琀椀ty types should use the same quali昀椀ers and classes
- De昀椀ning a琀琀ributes:
+ State what the a琀琀ribute is and possibly why it is important
+ Make it clear what is and is not included in the a琀琀ribute’s value
+ Include aliases in documenta琀椀on
+ State source of values
+ State whether a琀琀ribute value can change once set
+ Specify required vs. op琀椀onal
+ State min and max number of occurrences allowed
+ Indicate rela琀椀onships with other a琀琀ributes
- Iden琀椀昀椀ers (keys):
+ Iden琀椀昀椀er (Key)–an a琀琀ribute (or combina琀椀on of a琀琀ributes) that uniquely iden琀椀昀椀es individual
instances of an en琀椀ty type
+ Simple versus Composite Iden琀椀昀椀er
+ Candidate Iden琀椀昀椀er–an a琀琀ribute that could be an iden琀椀昀椀er…sa琀椀s昀椀es the requirements for
being an iden琀椀昀椀er
+ Choose Iden琀椀昀椀ers that:
Will not change in value
Will not be null
+ Avoid intelligent iden琀椀昀椀ers (e.g., containing loca琀椀ons or people that might change)
+ Subs琀椀tute new, simple keys for long, composite keys
5. Modeling rela琀椀onships
- Rela琀椀onship Types vs. Rela琀椀onship Instances
+ The rela琀椀onship type is modeled as lines between en琀椀ty types…the instance is
between speci昀椀c en琀椀ty instances
- Rela琀椀onships can have a琀琀ributes
+ These describe features pertaining to the associa琀椀on between the en琀椀琀椀es in the
rela琀椀onship
- Two en琀椀琀椀es can have more than one type of rela琀椀onship between them (mul琀椀ple
rela琀椀onships)
lOMoARcPSD|31633394
- Associa琀椀ve En琀椀ty–combina琀椀on of rela琀椀onship and en琀椀ty
- Degree of rela琀椀onships:
+ is the number of en琀椀ty types that par琀椀cipate in it
+ Unary: A unary rela琀椀onship is a rela琀椀onship between the instances of a single en琀椀ty
type. (Unary rela琀椀onships are also called recursive rela琀椀onships.)
+ Binary: A binary rela琀椀onship is a rela琀椀onship between the instances of two en琀椀ty
types and is the most common type of rela琀椀onship encountered in data binary
rela琀椀onship.
+ Ternary: A ternary rela琀椀onship is a simultaneous rela琀椀onship among the instances of
three en琀椀ty types.
- Cardinality of rela琀椀onships:
+ One-to-One: Each en琀椀ty in the rela琀椀onship will have exactly one related en琀椀ty
+ One-to-Many: An en琀椀ty on one side of the rela琀椀onship can have many related en琀椀琀椀es,
but an en琀椀ty on the other side will have a maximum of one related en琀椀ty
+ Many-to-Many: En琀椀琀椀es on both sides of the rela琀椀onship can have many related
en琀椀琀椀es on the other side
- Cardinality constraints
+ Cardinality Constraints—the number of instances of one en琀椀ty that can or must be
associated with each instance of another en琀椀ty
+ Minimum Cardinality
If zero, then op琀椀onal
If one or more, then mandatory
+ Maximum Cardinality
The maximum number
- Associa琀椀ve en琀椀琀椀es:
+ An en琀椀ty–has a琀琀ributes
+ A rela琀椀onship–links en琀椀琀椀es together
+ When should a rela琀椀onship with a琀琀ributes instead be an associa琀椀ve en琀椀ty?
All rela琀椀onships for the associa琀椀ve en琀椀ty should be many
The associa琀椀ve en琀椀ty could have meaning independent of the other en琀椀琀椀es
The associa琀椀ve en琀椀ty preferably has a unique iden琀椀昀椀er, and should also have
other a琀琀ributes
The associa琀椀ve en琀椀ty may par琀椀cipate in other rela琀椀onships other than the
en琀椀琀椀es of the associated rela琀椀onship
Ternary rela琀椀onships should be converted to associa琀椀ve en琀椀琀椀es
Lec 3: The enhance E-R Model
1. Represen琀椀ng supertype and subtype
- Enhanced E-R model: extends original ER model with new modeling constructs
- Subtype: A subgrouping of the en琀椀琀椀es in an en琀椀ty type that has a琀琀ributes dis琀椀nct from
those in other subgroupings
- Supertype: A generic en琀椀ty type that has a rela琀椀onship with one or more subtypes
lOMoARcPSD|31633394
- A琀琀ribute inheritance:
+ Subtype en琀椀琀椀es inherit values of all a琀琀ributes of the supertype
+ An instance of a subtype is also an instance of the supertype
- Rela琀椀onships and subtypes:
+ Rela琀椀onships at the supertype level indicate that all subtypes will par琀椀cipate in the
rela琀椀onship
+ The instances of a subtype may par琀椀cipate in a rela琀椀onship unique to that subtype. In
this situa琀椀on, the rela琀椀onship is shown at the subtype level
2. Represen琀椀ng specializa琀椀on and generaliza琀椀on
- Generaliza琀椀on: The process of de昀椀ning a more general en琀椀ty type from a set of more
specialized en琀椀ty types. (BOTTOM-UP)
- Specializa琀椀on: The process of de昀椀ning one or more subtypes of the supertype and
forming supertype/subtype rela琀椀onships. (TOP-DOWN)
3. Specifying constraints in supertype/subtype rela琀椀onships
- Completeness constraints: Whether an instance of a supertype must also be a member
of at least one subtype
+ Total Specializa琀椀on Rule: Yes (double line)
+ Par琀椀al Specializa琀椀on Rule: No (single line)
- Disjointness constraints: Whether an instance of a supertype may simultaneously be a
member of two (or more) subtypes
+ Disjoint Rule: An instance of the supertype can be only ONE of the subtypes
+ Overlap Rule: An instance of the supertype could be more than one of the subtypes
- Subtype discriminator: An a琀琀ribute of the supertype whose values determine the target
subtype(s)
+ Disjoint – a simple a琀琀ribute with alterna琀椀ve values to indicate the possible subtypes
+ Overlapping – a composite a琀琀ribute whose subparts pertain to di昀昀erent subtypes.
Each subpart contains a Boolean value to indicate whether or not the instance belongs
to the associated subtype
4. En琀椀ty clusters
- Problem: EER diagrams are di昀케cult to read when there are too many en琀椀琀椀es and
rela琀椀onships.
- Solu琀椀on: Group en琀椀琀椀es and rela琀椀onships into en琀椀ty clusters.
- En琀椀ty clusters: Set of one or more en琀椀ty types and associated rela琀椀onships grouped
into a single abstract en琀椀ty type
Lec 4: Logical database design and the rela琀椀onal model
1. The rela琀椀onal data model
a. Components
- Data structure: Tables (rela琀椀ons), rows, columns
lOMoARcPSD|31633394
- Data manipula琀椀on: Powerful SQL opera琀椀ons for retrieving and modifying data
- Data integrity: Mechanisms for implemen琀椀ng business rules that maintain integrity of
manipulated data
b. Rela琀椀on
- A rela琀椀on is a named, two-dimensional table of data.
- A table consists of rows (records) and columns (a琀琀ribute or 昀椀eld).
- Requirements for a table to qualify as a rela琀椀on:
+ It must have a unique name.
+ Every a琀琀ribute value must be atomic (not mul琀椀valued, not composite).
+ Every row must be unique (can’t have two rows with exactly the same values for all
their 昀椀elds).
+ A琀琀ributes (columns) in tables must have unique names.
+ The order of the columns must be irrelevant.
+ The order of the rows must be irrelevant.
All rela琀椀ons are int 1st Normal Form
c. Key 昀椀elds
- Serve two main purposes:
+ Primary keys are unique iden琀椀昀椀ers of the rela琀椀on. Examples include employee
numbers, social security numbers, etc. This guarantees that all rows are unique.
+ Foreign keys are iden琀椀昀椀ers that enable a dependent rela琀椀on (on the many side of a
rela琀椀onship) to refer to its parent rela琀椀on (on the one side of the rela琀椀onship).
- Keys can be simple (a single 昀椀eld) or composite (more than one 昀椀eld).
- Keys usually are used as indexes to speed up the response to user queries.
2. Integrity constraints
- Domain constraints: Allowable values for an a琀琀ribute.
- En琀椀ty Integrity: No primary key a琀琀ribute may be null. All primary key 昀椀elds MUST
contain data values.
- Referen琀椀al Integrity: Rules that maintain consistency between the rows of two related
tables.
+ rule states that any foreign key value (on the rela琀椀on of the many side) MUST match a
primary key value in the rela琀椀on of the one side. (Or the foreign key can be null).
3. Transform ERD to Rela琀椀ons
- Mapping regular en琀椀琀椀es to rela琀椀ons
+ Simple a琀琀ributes: E-R a琀琀ributes map directly onto the rela琀椀on
+ Composite a琀琀ributes: Use only their simple, component a琀琀ributes
+ Mul琀椀valued a琀琀ributes: Becomes a separate rela琀椀on with a foreign key taken from the
superior en琀椀ty
- Mapping weak en琀椀琀椀es
+ Becomes a separate rela琀椀on with a foreign key taken from the superior en琀椀ty
+ Primary key composed of:
Par琀椀al iden琀椀昀椀er of weak en琀椀ty
lOMoARcPSD|31633394
Primary key of iden琀椀fying rela琀椀on (strong en琀椀ty)
- Mapping binary rela琀椀onships
+ One-to-Many–Primary key on the one side becomes a foreign key on the many side
+ Many-to-Many–Create a new rela琀椀on with the primary keys of the two en琀椀琀椀es as its
primary key
+ One-to-One–Primary key on mandatory side becomes a foreign key on op琀椀onal side
- Mapping associa琀椀ve en琀椀琀椀es
+ Iden琀椀昀椀er no assigned
Default primary key for the associa琀椀on rela琀椀on is composed of the primary
keys of the two en琀椀琀椀es (as in M:N rela琀椀onship)
+ Iden琀椀昀椀er assigned
It is natural and familiar to end-users
Default iden琀椀昀椀er may not be unique
- Mapping unary rela琀椀onships
+ One-to-Many–Recursive foreign key in the same rela琀椀on
+ Many-to-Many–Two rela琀椀ons:
One for the en琀椀ty type
One for an associa琀椀ve rela琀椀on in which the primary key has two a琀琀ributes,
both taken from the primary key of the en琀椀ty
- Mapping ternary (and n-ary) rela琀椀onships
+ One rela琀椀on for each en琀椀ty and one for the associa琀椀ve en琀椀ty
+ Associa琀椀ve en琀椀ty has foreign keys to each en琀椀ty in the rela琀椀onship
- Mapping supertype/subtype rela琀椀onships
+ One rela琀椀on for supertype and for each subtype
+ Supertype a琀琀ributes (including iden琀椀昀椀er and subtype discriminator) go into supertype
rela琀椀on
+ Subtype a琀琀ributes go into each subtype; primary key of supertype rela琀椀on also
becomes primary key of subtype rela琀椀on
+ 1:1 rela琀椀onship established between supertype and each subtype, with supertype as
primary table
4. Introduc琀椀on to normaliza琀椀on
- Normaliza琀椀on is a formal process for deciding which a琀琀ributes should be grouped
together in a rela琀椀on so that all anomalies are removed.
- Data normaliza琀椀on:
+ Primarily a tool to validate and improve a logical design so that it sa琀椀s昀椀es certain
constraints that avoid unnecessary duplica琀椀on of data
+ The process of decomposing rela琀椀ons with anomalies to produce smaller, well-
structured rela琀椀ons.
- Well-structured rela琀椀ons:
+ A rela琀椀on that contains minimal data redundancy and allows users to insert, delete,
and update rows without causing data inconsistencies.
+ Three types of anomalies
Inser琀椀on anomalies: This is when we can’t insert data because some other
data is missing.
lOMoARcPSD|31633394
Update anomalies: An instance where the same informa琀椀on must be
updated in several di昀昀erent places.
Dele琀椀on anomalies: Where dele琀椀ng one piece of data inadvertently causes
other data to be lost.
+ Goal is to avoid anomalies:
Inser琀椀on Anomaly–adding new rows forces user to create duplicate data
Dele琀椀on Anomaly–dele琀椀ng rows may cause a loss of data that would be
needed for other future rows
Modi昀椀ca琀椀on Anomaly–changing data in a row forces changes to other rows
because of duplica琀椀on
General rule of thumb: A table should not
pertain to more than one en琀椀ty type.
- Func琀椀onal dependencies and keys
+ Func琀椀onal Dependency: The value of one a琀琀ribute (the determinant) determines the
value of another a琀琀ribute.
+ Candidate Key:
A unique iden琀椀昀椀er. One of the candidate keys will become the primary key.
Each non-key 昀椀eld is func琀椀onally dependent on every candidate key.
5. Normal Forms
- First normal form
+ No mul琀椀valued a琀琀ributes
+ Every a琀琀ribute value is atomic
+ All rela琀椀ons are in 1st Normal Form.
- Second normal form
+ 1NF PLUS every non-key a琀琀ribute is fully func琀椀onally dependent on the ENTIRE
primary key
Every non-key a琀琀ribute must be de昀椀ned by the en琀椀re key, not by only part of
the key
No par琀椀al func琀椀onal dependencies
- Third normal form
+ 2NF PLUS no transi琀椀ve dependencies (func琀椀onal dependencies on non-primary-key
a琀琀ributes)
+ Note: This is called transi琀椀ve, because the primary key is a determinant for another
a琀琀ribute, which in turn is a determinant for a third
+ Solu琀椀on: Non-key determinant with transi琀椀ve dependencies go into a new table; non-
key determinant becomes primary key in the new table and stays as foreign key in the
old table
Lec 5: Physical database design and performance
1. The physical database design process
- Purpose–translate the logical descrip琀椀on of data into the technical speci昀椀ca琀椀ons for
storing and retrieving data
lOMoARcPSD|31633394
- Goal–create a design for storing data that will provide adequate performance and
ensure database integrity, security, and recoverability
- Designing physical 昀椀les and databases requires certain informa琀椀on that should have
been collected and produced during prior systems development phases. The
informa琀椀on needed for physical 昀椀le and database design includes these requirements:
Normalized rela琀椀ons, including es琀椀mates for the range of the number of rows in each
table
De昀椀ni琀椀ons of each a琀琀ribute, along with physical speci昀椀ca琀椀ons such as maximum
possible length
Descrip琀椀ons of where and when data are used in various ways (entered, retrieved,
deleted, and updated, including typical frequencies of these events)
Expecta琀椀ons or requirements for response 琀椀me and data security, backup, recovery,
reten琀椀on, and integrity
Descrip琀椀ons of the technologies (database management systems) used for
implemen琀椀ng the database
- Physical database design requires several cri琀椀cal decisions that will a昀昀ect the
integrity and performance of the applica琀椀on system. These key decisions include the
following:
Choosing the storage format (called data type) for each a琀琀ribute from the logical data
model. The format and associated parameters are chosen to maximize data integrity
and to minimize storage space.
Giving the database management system guidance regarding how to group a琀琀ributes
from the logical data model into physical records.
Giving the database management system guidance regarding how to arrange similarly
structured records in secondary memory (primarily hard disks), using a structure (called
a 昀椀le organiza琀椀on) so that individual and groups of records can be stored, retrieved, and
updated rapidly
Selec琀椀ng structures (including indexes and the overall database architecture) for storing
and connec琀椀ng 昀椀les to make retrieving related data more e昀케cient
Preparing strategies for handling queries against the database that will op琀椀mize
performance and take advantage of the 昀椀le organiza琀椀ons and indexes that you have
speci昀椀ed. E昀케cient database structures will be of bene昀椀t only if queries and the
database management systems that handle those queries are tuned to intelligently use
those structures.
2. Designing 昀椀elds
- Designing 昀椀elds:
+ Field: smallest unit of applica琀椀on data recognized by system so昀琀ware
+ Field design
Choosing data type
Coding, compression, encryp琀椀on
Controlling data integrity
- Choosing data types
lOMoARcPSD|31633394
+ Selec琀椀ng a data type involves four objec琀椀ves that will have di昀昀erent rela琀椀ve levels of
importance for di昀昀erent applica琀椀ons:
Represent all possible values.
Improve data integrity.
Support all data manipula琀椀ons.
Minimize storage space.
- Field data integrity:
+ Default value–assumed value if no explicit value
+ Range control–allowable value limita琀椀ons (constraints or valida琀椀on rules)
+ Null value control–allowing or prohibi琀椀ng empty 昀椀elds
+ Referen琀椀al integrity–range control (and null value allowances) for foreign-key to
primarykey match-ups
- Handling missing data: Missing data is inevitable. The following methods can be used to
handle missing data
+ Subs琀椀tute an es琀椀mate of the missing value (e.g., using a formula)
+ Construct a report lis琀椀ng missing values
+ In programs, ignore missing data unless the value is signi昀椀cant (sensi琀椀vity tes琀椀ng)
3. Denormalizing data
- De昀椀ni琀椀on: Transforming normalized rela琀椀ons into non-normalized physical record
speci昀椀ca琀椀ons.
- Bene昀椀ts:
+ Can improve performance (speed) by reducing number of table lookups (i.e. reduce
number of necessary join queries)
- Costs (due to data duplica琀椀on):
+ Wasted storage space
+ Data integrity/consistency threats
- Common denormaliza琀椀on opportuni琀椀es:
+ One-to-one rela琀椀onship
+ Many-to-many rela琀椀onship with non-key a琀琀ributes (associa琀椀ve en琀椀ty)
+ Reference data (1:N rela琀椀onship where 1-side has data not used in any other
rela琀椀onship).
- Denormalize with cau琀椀on
+ Can:
Increase chance of errors and inconsistencies
Reintroduce anomalies
Force reprogramming when business rules change
+ Perhaps other methods could be used to improve performance of joins
Organiza琀椀on of tables in the database (昀椀le organiza琀椀on and clustering)
Proper query design and op琀椀miza琀椀on