Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views63 pages

Unit 3

The document discusses relational database design, focusing on key concepts such as functional dependencies, normalization forms (1NF, 2NF, 3NF, BCNF), and the importance of lossless decompositions. It explains how to identify and decompose schemas to eliminate redundancy while preserving data integrity. Additionally, it outlines algorithms for computing functional dependencies and ensuring that decompositions are both dependency-preserving and lossless.

Uploaded by

shailaja.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views63 pages

Unit 3

The document discusses relational database design, focusing on key concepts such as functional dependencies, normalization forms (1NF, 2NF, 3NF, BCNF), and the importance of lossless decompositions. It explains how to identify and decompose schemas to eliminate redundancy while preserving data integrity. Additionally, it outlines algorithms for computing functional dependencies and ensuring that decompositions are both dependency-preserving and lossless.

Uploaded by

shailaja.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Relational Database Design

Relational Database Design

n Features of Good Relational Design


n Atomic Domains and First Normal Form
n Decomposition Using Functional Dependencies
n Functional Dependency Theory
n Algorithms for Functional Dependencies
n Decomposition Using Multivalued Dependencies
n More Normal Form
n Database-Design Process
n Modeling Temporal Data
Combine Schemas?
n Suppose we combine instructor and department into inst_dept
l (No connection to relationship set inst_dept)
n Result is possible repetition of information
A Combined Schema Without Repetition
n Consider combining relations
l sec_class(sec_id, building, room_number) and
l section(course_id, sec_id, semester, year)
into one relation
l section(course_id, sec_id, semester, year,
building, room_number)
n No repetition in this case
What About Smaller Schemas?
n Suppose we had started with inst_dept. How would we know to split up
(decompose) it into instructor and department?
n Write a rule “if there were a schema (dept_name, building, budget), then
dept_name would be a candidate key”
n Denote as a functional dependency:
dept_name  building, budget
n In inst_dept, because dept_name is not a candidate key, the building
and budget of a department may have to be repeated.
l This indicates the need to decompose inst_dept
n Not all decompositions are good. Suppose we decompose
employee(ID, name, street, city, salary) into
employee1 (ID, name)
employee2 (name, street, city, salary)
n The next slide shows how we lose information -- we cannot reconstruct
the original employee relation -- and so, this is a lossy decomposition.
A Lossy Decomposition
Example of Lossless-Join Decomposition

n Lossless join decomposition


n Decomposition of R = (A, B, C)
R1 = (A, B) R2 = (B, C)

A B C A B B C
 1 A  1 1 A
 2 B  2 2 B
r A,B(r) B,C(r)

A B C
A (r) B (r)
 1 A
 2 B
First Normal Form
n Domain is atomic if its elements are considered to be indivisible units
l Examples of non-atomic domains:
 Set of names, composite attributes
 Identification numbers like CS101 that can be broken up into
parts
n A relational schema R is in first normal form if the domains of all
attributes of R are atomic
n Non-atomic values complicate storage and encourage redundant
(repeated) storage of data
l Example: Set of accounts stored with each customer, and set of
owners stored with each account
l We assume all relations are in first normal form (and revisit this in
Chapter 22: Object Based Databases)
Functional Dependencies (Cont.)
n Let R be a relation schema
  R and   R
n The functional dependency

holds on R if and only if for any legal relations r(R), whenever any
two tuples t1 and t2 of r agree on the attributes , they also agree
on the attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ]
Functional Dependencies (Cont.)
n A functional dependency is trivial if it is satisfied by all instances of a
relation
l Example:
 ID, name  ID
 name  name
l In general,    is trivial if   
Closure of a Set of Functional
Dependencies

n Given a set F of functional dependencies, there are certain other


functional dependencies that are logically implied by F.
l For example: If A  B and B  C, then we can infer that A 
C
n The set of all functional dependencies logically implied by F is the
closure of F.
n We denote the closure of F by F+.
n F+ is a superset of F.
Closure of a Set of Functional
Dependencies

n We can find F+, the closure of F, by repeatedly applying


Armstrong’s Axioms:
l if   , then    (reflexivity)
l if   , then      (augmentation)
l if   , and   , then    (transitivity)
n These rules are
l sound (generate only functional dependencies that actually hold),
and
l complete (generate all functional dependencies that hold).
Example
n R = (A, B, C, G, H, I)
F={ AB
AC
CG  H
CG  I
B  H}
n some members of F+
l AH
 by transitivity from A  B and B  H
l AG  I
 by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
l CG  HI
 by augmenting CG  I to infer CG  CGI,
and augmenting of CG  H to infer CGI  HI,
and then transitivity
Procedure for Computing F+
n To compute the closure of a set of functional dependencies F:

F+=F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further

NOTE: We shall see an alternative procedure for this task later


Closure of Functional Dependencies
(Cont.)
n Additional rules:
l If    holds and    holds, then     holds (union)
l If     holds, then    holds and    holds
(decomposition)
l If    holds and     holds, then     holds
(pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.
Boyce-Codd Normal Form

A relation schema R is in BCNF with respect to a set F of


functional dependencies if for all functional dependencies in F+ of
the form

 

where   R and   R, at least one of the following holds:

n    is trivial (i.e.,   )
n  is a super key for R

Example schema not in BCNF:

instr_dept (ID, name, salary, dept_name, building, budget )

because dept_name building, budget


holds on instr_dept, but dept_name is not a superkey
Decomposing a Schema into BCNF
n Suppose we have a schema R and a non-trivial dependency  causes a
violation of BCNF.

We decompose R into:

• (U  )
• (R-(-))
n In our example,

l  = dept_name

l  = building, budget

and inst_dept is replaced by

l (U  ) = ( dept_name, building, budget )

l ( R - (  -  ) ) = ( ID, name, salary, dept_name )


BCNF Decomposition Algorithm

result := {R };
done := false;
compute F +;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let    be a nontrivial functional dependency that
holds on Ri such that  Ri is not in F +,
and    = ;
result := (result – Ri )  (Ri – )  (,  );
end
else done := true;

Note: each Ri is in BCNF, and decomposition is lossless-join.


Example of BCNF Decomposition
n R = (A, B, C )
F = {A  B
B  C}
Key = {A}
n R is not in BCNF (B  C but B is not superkey)
n Decomposition
l R1 = (B, C)
l R2 = (A,B)
Third Normal Form
n A relation schema R is in third normal form (3NF) if for all:
   in F+
at least one of the following holds:
l    is trivial (i.e.,   )
l  is a superkey for R
l Each attribute A in  –  is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
n If a relation is in BCNF it is in 3NF (since in BCNF one of the first two
conditions above must hold).
n Third condition is a minimal relaxation of BCNF to ensure dependency
preservation
3NF Example
n Relation dept_advisor:
l dept_advisor (s_ID, i_ID, dept_name)
F = {s_ID, dept_name  i_ID, i_ID  dept_name}
l Two candidate keys: s_ID, dept_name, and i_ID, s_ID
l R is in 3NF
 s_ID, dept_name  i_ID
dept_name is a superkey
 i_ID  dept_name
– dept_name is contained in a candidate key
3NF Decomposition Algorithm

Let Fc be a canonical cover for F;


i := 0;
for each functional dependency    in Fc do
if none of the schemas Rj, 1  j  i contains  
then begin
i := i + 1;
Ri :=  
end
if none of the schemas Rj, 1  j  i contains a candidate key for R
then begin
i := i + 1;
Ri := any candidate key for R;
end
/* Optionally, remove redundant relations */
repeat
if any schema Rj is contained in another schema Rk
then /* delete Rj */
Rj = R;;
i=i-1;
return (R1, R2, ..., Ri)
3NF Decomposition: An Example

n Relation schema:
cust_banker_branch = (customer_id, employee_id, branch_name, type )
n The functional dependencies for this relation schema are:
1. customer_id, employee_id  branch_name, type
2. employee_id  branch_name
3. customer_id, branch_name  employee_id
n We first compute a canonical cover
l branch_name is extraneous in the r.h.s. of the 1st dependency
l No other attribute is extraneous, so we get FC =
customer_id, employee_id  type
employee_id  branch_name
customer_id, branch_name  employee_id
3NF Decompsition Example (Cont.)
n The for loop generates following 3NF schema:
(customer_id, employee_id, type )
(employee_id, branch_name)
(customer_id, branch_name, employee_id)
l Observe that (customer_id, employee_id, type ) contains a
candidate key of the original schema, so no further relation schema
needs be added
n At end of for loop, detect and delete schemas, such as (employee_id,
branch_name), which are subsets of other schemas
l result will not depend on the order in which FDs are considered
n The resultant simplified 3NF schema is:
(customer_id, employee_id, type)
(customer_id, branch_name, employee_id)
END
Proof of Correctness of 3NF
Decomposition Algorithm
Correctness of 3NF Decomposition
Algorithm
n 3NF decomposition algorithm is dependency preserving (since there
is a relation for every FD in Fc)
n Decomposition is lossless
l A candidate key (C ) is in one of the relations Ri in decomposition
l Closure of candidate key under Fc must contain all attributes in
R.
l Follow the steps of attribute closure algorithm to show there is
only one tuple in the join result for each tuple in Ri
Correctness of 3NF Decomposition
Algorithm (Cont’d.)

Claim: if a relation Ri is in the decomposition generated by the


above algorithm, then Ri satisfies 3NF.
n Let Ri be generated from the dependency   
n Let   B be any non-trivial functional dependency on Ri. (We need only
consider FDs whose right-hand side is a single attribute.)
n Now, B can be in either  or  but not in both. Consider each case
separately.
Correctness of 3NF Decomposition
(Cont’d.)
n Case 1: If B in :
l If  is a superkey, the 2nd condition of 3NF is satisfied
l Otherwise  must contain some attribute not in 
l Since   B is in F+ it must be derivable from Fc, by using attribute
closure on .
l Attribute closure not have used  . If it had been used,  must
be contained in the attribute closure of , which is not possible, since
we assumed  is not a superkey.
l Now, using  (- {B}) and   B, we can derive  B
(since    , and B   since   B is non-trivial)
l Then, B is extraneous in the right-hand side of  ; which is not
possible since   is in Fc.
l Thus, if B is in  then  must be a superkey, and the second
condition of 3NF must be satisfied.
Correctness of 3NF Decomposition
(Cont’d.)
n Case 2: B is in .
l Since  is a candidate key, the third alternative in the definition of
3NF is trivially satisfied.
l In fact, we cannot show that  is a superkey.
l This shows exactly why the third alternative is present in the
definition of 3NF.
Q.E.D.
Figure 8.02
Figure 8.03
Figure 8.04
Figure 8.05
Figure 8.06
Figure 8.14
Figure 8.15
Figure 8.17
Chapter 14: Transactions
n Transaction Concept
n Transaction State
n Concurrent Executions
n Serializability
n Recoverability
n Implementation of Isolation
n Transaction Definition in SQL
n Testing for Serializability.
Transaction Concept
n A transaction is a unit of program execution that accesses and
possibly updates various data items.
n E.g. transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
n Two main issues to deal with:
l Failures of various kinds, such as hardware failures and system
crashes
l Concurrent execution of multiple transactions
Example of Fund Transfer
n Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
n Atomicity requirement
l if the transaction fails after step 3 and before step 6, money will be “lost”
leading to an inconsistent database state
 Failure could be due to software or hardware
l the system should ensure that updates of a partially executed transaction
are not reflected in the database
n Durability requirement — once the user has been notified that the transaction
has completed (i.e., the transfer of the $50 has taken place), the updates to the
database by the transaction must persist even if there are software or
hardware failures.
Example of Fund Transfer (Cont.)
n Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
n Consistency requirement in above example:
l the sum of A and B is unchanged by the execution of the transaction
n In general, consistency requirements include
 Explicitly specified integrity constraints such as primary keys and foreign
keys
 Implicit integrity constraints
– e.g. sum of balances of all accounts, minus sum of loan amounts
must equal value of cash-in-hand
l A transaction must see a consistent database.
l During transaction execution the database may be temporarily inconsistent.
l When the transaction completes successfully the database must be
consistent
 Erroneous transaction logic can lead to inconsistency
Example of Fund Transfer (Cont.)
n Isolation requirement — if between steps 3 and 6, another
transaction T2 is allowed to access the partially updated database, it
will see an inconsistent database (the sum A + B will be less than it
should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
n Isolation can be ensured trivially by running transactions serially
l that is, one after the other.
n However, executing multiple transactions concurrently has significant
benefits, as we will see later.
ACID Properties
A transaction is a unit of program execution that accesses and possibly
updates various data items.To preserve the integrity of data the database
system must ensure:
n Atomicity. Either all operations of the transaction are properly reflected
in the database or none are.
n Consistency. Execution of a transaction in isolation preserves the
consistency of the database.
n Isolation. Although multiple transactions may execute concurrently,
each transaction must be unaware of other concurrently executing
transactions. Intermediate transaction results must be hidden from other
concurrently executed transactions.
l That is, for every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started execution
after Ti finished.
n Durability. After a transaction completes successfully, the changes it
has made to the database persist, even if there are system failures.
Transaction State
n Active – the initial state; the transaction stays in this state while it is
executing
n Partially committed – after the final statement has been executed.
n Failed -- after the discovery that normal execution can no longer
proceed.
n Aborted – after the transaction has been rolled back and the
database restored to its state prior to the start of the transaction.
Two options after it has been aborted:
l restart the transaction
 can be done only if no internal logical error
l kill the transaction
n Committed – after successful completion.
Transaction State (Cont.)
Concurrent Executions
n Multiple transactions are allowed to run concurrently in the system.
Advantages are:
l increased processor and disk utilization, leading to better
transaction throughput
 E.g. one transaction can be using the CPU while another is
reading from or writing to the disk
l reduced average response time for transactions: short
transactions need not wait behind long ones.
n Concurrency control schemes – mechanisms to achieve isolation
l that is, to control the interaction among the concurrent
transactions in order to prevent them from destroying the
consistency of the database
 Will study in Chapter 16, after studying notion of correctness
of concurrent executions.
Schedules
n Schedule – a sequences of instructions that specify the chronological
order in which instructions of concurrent transactions are executed
l a schedule for a set of transactions must consist of all instructions
of those transactions
l must preserve the order in which the instructions appear in each
individual transaction.
n A transaction that successfully completes its execution will have a
commit instructions as the last statement
l by default transaction assumed to execute commit instruction as its
last step
n A transaction that fails to successfully complete its execution will have
an abort instruction as the last statement
Schedule 1
n Let T1 transfer $50 from A to B, and T2 transfer 10% of the
balance from A to B.
n A serial schedule in which T1 is followed by T2 :
Schedule 2
• A serial schedule where T2 is followed by T1
Schedule 3
n Let T1 and T2 be the transactions defined previously. The
following schedule is not a serial schedule, but it is equivalent
to Schedule 1.

In Schedules 1, 2 and 3, the sum A + B is preserved.


Schedule 4
n The following concurrent schedule does not preserve the
value of (A + B ).
Serializability
n Basic Assumption – Each transaction preserves database
consistency.
n Thus serial execution of a set of transactions preserves
database consistency.
n A (possibly concurrent) schedule is serializable if it is
equivalent to a serial schedule. Different forms of schedule
equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
Conflicting Instructions
n Instructions li and lj of transactions Ti and Tj respectively, conflict
if and only if there exists some item Q accessed by both li and lj,
and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
n Intuitively, a conflict between li and lj forces a (logical) temporal
order between them.
l If li and lj are consecutive in a schedule and they do not
conflict, their results would remain the same even if they had
been interchanged in the schedule.
Conflict Serializability
n If a schedule S can be transformed into a schedule S´ by a series of
swaps of non-conflicting instructions, we say that S and S´ are
conflict equivalent.
n We say that a schedule S is conflict serializable if it is conflict
equivalent to a serial schedule
Conflict Serializability (Cont.)
n Schedule 3 can be transformed into Schedule 6, a serial
schedule where T2 follows T1, by series of swaps of non-
conflicting instructions. Therefore Schedule 3 is conflict
serializable.

Schedule 3 Schedule 6
Conflict Serializability (Cont.)

n Example of a schedule that is not conflict serializable:

n We are unable to swap instructions in the above schedule to


obtain either the serial schedule < T3, T4 >, or the serial
schedule < T4, T3 >.
View Serializability
n Let S and S´ be two schedules with the same set of transactions. S
and S´ are view equivalent if the following three conditions are met,
for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in
schedule S’ also transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value
was produced by transaction Tj (if any), then in schedule S’ also
transaction Ti must read the value of Q that was produced by the
same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation
in schedule S must also perform the final write(Q) operation in
schedule S’.
As can be seen, view equivalence is also based purely on reads and
writes alone.
View Serializability (Cont.)
n A schedule S is view serializable if it is view equivalent to a serial
schedule.
n Every conflict serializable schedule is also view serializable.
n Below is a schedule which is view-serializable but not conflict
serializable.

n What serial schedule is above equivalent to?


n Every view serializable schedule that is not conflict serializable has
blind writes.
Testing for Serializability
n Consider some schedule of a set of transactions T1, T2, ..., Tn
n Precedence graph — a directed graph where the vertices
are the transactions (names).
n We draw an arc from Ti to Tj if the two transaction conflict,
and Ti accessed the data item on which the conflict arose
earlier.
n We may label the arc by the item that was accessed.
n Example 1
Recoverable Schedules
Need to address the effect of transaction failures on concurrently
running transactions.
n Recoverable schedule — if a transaction Tj reads a data item
previously written by a transaction Ti , then the commit operation of Ti
appears before the commit operation of Tj.
n The following schedule (Schedule 11) is not recoverable if T9 commits
immediately after the read

n If T8 should abort, T9 would have read (and possibly shown to the user)
an inconsistent database state. Hence, database must ensure that
schedules are recoverable.
Cascading Rollbacks
n Cascading rollback – a single transaction failure leads to a
series of transaction rollbacks. Consider the following schedule
where none of the transactions has yet committed (so the
schedule is recoverable)

If T10 fails, T11 and T12 must also be rolled back.


n Can lead to the undoing of a significant amount of work
Cascadeless Schedules
n Cascadeless schedules — cascading rollbacks cannot occur; for
each pair of transactions Ti and Tj such that Tj reads a data item
previously written by Ti, the commit operation of Ti appears before the
read operation of Tj.
n Every cascadeless schedule is also recoverable
n It is desirable to restrict the schedules to those that are cascadeless

You might also like