Query Language
In simple words, a Language which is used to store and
retrieve data from database is known as query
language. For example – SQL
There are two types of query language:
1.Procedural Query language
2.Non-procedural query language
1. Procedural Query language:
In procedural query language, user instructs the system
to perform a series of operations to produce the desired
results. Here users tells what data to be retrieved from
database and how to retrieve it.
For example – Let’s take a real world example to
understand the procedural language, you are asking
your younger brother to make a cup of tea, if you are
just telling him to make a tea and not telling the process
then it is a non-procedural language, however if you are
telling the step by step process like switch on the stove,
boil the water, add milk etc. then it is a procedural
language.
2. Non-procedural query language:
In Non-procedural query language, user instructs the
system to produce the desired result without telling the
step by step process. Here users tells what data to be
retrieved from database but doesn’t tell how to retrieve
it.
Now let’s back to our main topic of relational algebra
and relational calculus.
Relational Algebra:
Relational algebra is a conceptual procedural query
language used on relational model.
Relational Calculus:
Relational calculus is a conceptual non-procedural query
language used on relational model.
Note:
I have used word conceptual while describing relational
algebra and relational calculus, because they are
theoretical mathematical system or query language, they
are not the practical implementation, SQL is a practical
implementation of relational algebra and relational
calculus.
Relational Algebra, Calculus, RDBMS & SQL:
Relational algebra and calculus are the theoretical
concepts used on relational model.
RDBMS is a practical implementation of relational
model.
SQL is a practical implementation of relational algebra
and calculus.
What is Relational Algebra in DBMS?
Relational algebra is a procedural query language that
works on relational model. The purpose of a query
language is to retrieve data from database or perform
various operations such as insert, update, delete on the
data. When I say that relational algebra is a procedural
query language, it means that it tells what data to be
retrieved and how to be retrieved.
On the other hand relational calculus is a non-
procedural query language, which means it tells what
data to be retrieved but doesn’t tell how to retrieve it. We
will discuss relational calculus in a separate tutorial.
Types of operations in relational algebra
We have divided these operations in two categories:
1. Basic Operations
2. Derived Operations
Basic/Fundamental Operations:
1. Select (σ)
2. Project (∏)
3. Union (∪)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)
Derived Operations:
1. Natural Join (⋈)
2. Left, Right, Full outer join (⟕, ⟖, ⟗)
3. Intersection (∩)
4. Division (÷)
Lets discuss these operations one by one with the help
of examples.
Select Operator (σ)
Select Operator is denoted by sigma (σ) and it is used to
find the tuples (or rows) in a relation (or table) which
satisfy the given condition.
If you understand little bit of SQL then you can think of it
as a where clause in SQL, which is used for the same
purpose.
Syntax of Select Operator (σ)
σ Condition/Predicate(Relation/Table name)
Select Operator (σ) Example
Table: CUSTOMER
---------------
Customer_Id Customer_Name
Customer_City
----------- -------------
-------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:
σ Customer_City="Agra" (CUSTOMER)
Output:
Customer_Id Customer_Name Customer_City
----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
Project Operator (∏)
Project operator is denoted by ∏ symbol and it is used
to select desired columns (or attributes) from a table (or
relation).
Project operator in relational algebra is similar to
the Select statement in SQL.
Syntax of Project Operator (∏)
∏ column_name1, column_name2, ....,
column_nameN(table_name)
Project Operator (∏) Example
In this example, we have a table CUSTOMER with three
columns, we want to fetch only two columns of the table,
which we can do with the help of Project Operator ∏.
Table: CUSTOMER
Customer_Id Customer_Name
Customer_City
----------- -------------
-------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:
∏ Customer_Name, Customer_City (CUSTOMER)
Output:
Customer_Name Customer_City
------------- -------------
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi
Union Operator (∪)
Union operator is denoted by ∪ symbol and it is used to
select all the rows (tuples) from two tables (relations).
Lets discuss union operator a bit more. Lets say we
have two relations R1 and R2 both have same columns
and we want to select all the tuples(rows) from these
relations then we can apply the union operator on these
relations.
Note: The rows (tuples) that are present in both the
tables will only appear once in the union set. In short
you can say that there are no duplicates present after
the union operation.
Syntax of Union Operator (∪)
table_name1 ∪ table_name2
Union Operator (∪) Example
Table 1: COURSE
Course_Id Student_Name Student_Id
--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT
Student_Id Student_Name Student_Age
------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:
∏ Student_Name (COURSE) ∪ ∏ Student_Name
(STUDENT)
Output:
Student_Name
------------
Aditya
Carl
Paul
Lucy
Rick
Steve
Note: As you can see there are no duplicate names
present in the output even though we had few common
names in both the tables, also in the COURSE table we
had the duplicate name itself.
Intersection Operator (∩)
Intersection operator is denoted by ∩ symbol and it is
used to select common rows (tuples) from two tables
(relations).
Lets say we have two relations R1 and R2 both have
same columns and we want to select all those
tuples(rows) that are present in both the relations, then
in that case we can apply intersection operation on
these two relations R1 ∩ R2.
Note: Only those rows that are present in both the
tables will appear in the result set.
Syntax of Intersection Operator (∩)
table_name1 ∩ table_name2
Intersection Operator (∩) Example
Lets take the same example that we have taken above.
Table 1: COURSE
Course_Id Student_Name Student_Id
--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT
Student_Id Student_Name Student_Age
------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:
∏ Student_Name (COURSE) ∩ ∏ Student_Name
(STUDENT)
Output:
Student_Name
------------
Aditya
Steve
Paul
Lucy
Set Difference (-)
Set Difference is denoted by – symbol. Lets say we
have two relations R1 and R2 and we want to select all
those tuples(rows) that are present in Relation R1
but not present in Relation R2, this can be done using
Set difference R1 – R2.
Syntax of Set Difference (-)
table_name1 - table_name2
Set Difference (-) Example
Lets take the same tables COURSE and STUDENT that
we have seen above.
Query:
Lets write a query to select those student names that
are present in STUDENT table but not present in
COURSE table.
∏ Student_Name (STUDENT) - ∏ Student_Name
(COURSE)
Output:
Student_Name
------------
Carl
Rick
Cartesian product (X)
Cartesian Product is denoted by X symbol. Lets say we
have two relations R1 and R2 then the cartesian product
of these two relations (R1 X R2) would combine each
tuple of first relation R1 with the each tuple of second
relation R2. I know it sounds confusing but once we take
an example of this, you will be able to understand this.
Syntax of Cartesian product (X)
R1 X R2
Cartesian product (X) Example
Table 1: R
Col_A Col_B
----- ------
AA 100
BB 200
CC 300
Table 2: S
Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101
Query:
Lets find the cartesian product of table R and S.
R X S
Output:
Col_A Col_B Col_X Col_Y
----- ------ ------ ------
AA 100 XX 99
AA 100 YY 11
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101
Note: The number of rows in the output will always be
the cross product of number of rows in each table. In our
example table 1 has 3 rows and table 2 has 3 rows so
the output has 3×3 = 9 rows.
Rename (ρ)
Rename (ρ) operation can be used to rename a relation
or an attribute of a relation.
Rename (ρ) Syntax:
ρ(new_relation_name, old_relation_name)
Rename (ρ) Example
Lets say we have a table customer, we are fetching
customer names and we are renaming the resulted
relation to CUST_NAMES.
Table: CUSTOMER
Customer_Id Customer_Name
Customer_City
----------- -------------
-------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:
ρ(CUST_NAMES, ∏(Customer_Name)(CUSTOMER))
Output:
CUST_NAMES
----------
Steve
Raghu
Chaitanya
Ajeet
Carl
What is Relational Calculus?
Relational calculus is a non-procedural query language
that tells the system what data to be retrieved but
doesn’t tell how to retrieve it.
Types of Relational Calculus
1. Tuple Relational Calculus (TRC)
Tuple relational calculus is used for selecting those
tuples that satisfy the given condition.
Table: Student
First_Name Last_Name Age
---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Lets write relational calculus queries.
Query to display the last name of those students where
age is greater than 30
{ t.Last_Name | Student(t) AND t.age > 30 }
In the above query you can see two parts separated by |
symbol. The second part is where we define the
condition and in the first part we specify the fields which
we want to display for the selected tuples.
The result of the above query would be:
Last_Name
---------
Singh
Query to display all the details of students where Last
name is ‘Singh’
{ t | Student(t) AND t.Last_Name = 'Singh' }
Output:
First_Name Last_Name Age
---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
2. Domain Relational Calculus (DRC)
In domain relational calculus the records are filtered
based on the domains.
Again we take the same table to understand how DRC
works.
Table: Student
First_Name Last_Name Age
---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Query to find the first name and age of students where
student age is greater than 27
{< First_Name, Age > | ∈ Student ∧ Age > 27}
Note:
The symbols used for logical operators are: ∧ for AND, ∨
for OR and ┓ for NOT.
Output:
First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28
Types of keys in DBMS
Note: Guys I have been getting comments that there are
no examples of keys here. If you click on the hyperlink
provided below in green colour, you would see the
complete separate tutorial of each key with examples.
Primary Key – A primary is a column or set of columns
in a table that uniquely identifies tuples (rows) in that
table.
Super Key – A super key is a set of one of more
columns (attributes) to uniquely identify rows in a table.
Candidate Key – A super key with no redundant attribute
is known as candidate key
Alternate Key – Out of all candidate keys, only one gets
selected as primary key, remaining keys are known as
alternate or secondary keys.
Composite Key – A key that consists of more than one
attribute to uniquely identify rows (also known as
records & tuples) in a table is called composite key.
Foreign Key – Foreign keys are the columns of a table
that points to the primary key of another table. They act
as a cross-reference between tables.
Definition: A primary key is a minimal set of attributes
(columns) in a table that uniquely identifies tuples (rows)
in that table.
Primary Key Example in DBMS
Lets take an example to understand the concept of
primary key. In the following table, there are three
attributes: Stu_ID, Stu_Name & Stu_Age. Out of these
three attributes, one attribute or a set of more than one
attributes can be a primary key.
Attribute Stu_Name alone cannot be a primary key as
more than one students can have same name.
Attribute Stu_Age alone cannot be a primary key as
more than one students can have same age.
Attribute Stu_Id alone is a primary key as each student
has a unique id that can identify the student record in
the table.
Note: In some cases an attribute alone cannot uniquely
identify a record in a table, in that case we try to find a
set of attributes that can uniquely identify a row in table.
We will see the example of it after this example.
Table Name: STUDENT
Stu_Id Stu_Name Stu_Age
101 Steve 23
102 John 24
103 Robert 28
104 Steve 29
105 Carl 29
Points to Note regarding Primary Key
We denote usually denote it by underlining the
attribute name (column name).
The value of primary key should be unique for each
row of the table. The column(s) that makes the key
cannot contain duplicate values.
The attribute(s) that is marked as primary key is not
allowed to have null values.
Primary keys are not necessarily to be a single
attribute (column). It can be a set of more than one
attributes (columns). For
example {Stu_Id, Stu_Name} collectively can identify
the tuple in the above table, but we do not choose it
as primary key because Stu_Id alone is enough to
uniquely identifies rows in a table and we always go
for minimal set. Having that said, we should choose
more than one columns as primary key only when
there is no single column that can uniquely identify
the tuple in table.
Another example of primary key – More than one
attributes
Consider this table ORDER, this table keeps the daily
record of the purchases made by the customer. This
table has three
attributes: Customer_ID, Product_ID & Order_Quantity.
Customer_ID alone cannot be a primary key as a single
customer can place more than one order thus more than
one rows of same Customer_ID value. As we see in the
following example that customer id 1011 has placed two
orders with product if 9023 and 9111.
Product_ID alone cannot be a primary key as more than
one customers can place a order for the same product
thus more than one rows with same product id. In the
following table, customer id 1011 & 1122 placed an
order for the same product (product id 9023).
Order_Quantity alone cannot be a primary key as more
more than one customers can place the order for the
same quantity.
Since none of the attributes alone were able to become
a primary key, lets try to make a set of attributes that
plays the role of it.
{Customer_ID, Product_ID} together can identify the
rows uniquely in the table so this set is the primary key
for this table.
Table Name: ORDER
Customer_ID Product_ID Order_Quantity
1011 9023 10
1122 9023 15
1099 9031 20
1177 9031 18
1011 9111 50
Note: While choosing a set of attributes for a primary
key, we always choose the minimal set that has
minimum number of attributes. For example, if there are
two sets that can identify row in table, the set that has
minimum number of attributes should be chosen as
primary key.
How to define primary key in RDBMS?
In the above example, we already had a table with data
and we were trying to understand the purpose and
meaning of primary key, however you should know that
generally we define the primary key during table
creation. We can define the primary key later as well but
that rarely happens in the real world scenario.
Lets say we want to create the table that we have
discussed above with the customer id and product id set
working as primary key. We can do that in SQL like this:
Create table ORDER
(
Customer_ID int not null,
Product_ID int not null,
Order_Quantity int not null,
Primary key (_Customer_ID, Product ID)
)
Suppose we didn’t define the primary key while creating
table then we can define it later like this:
Create table order(cutomer-id int,product_id
int);
Alter table order add constraint Pk_order
primary key(customer-id,product_id);
ALTER TABLE ORDER
ADD CONSTRAINT PK_Order PRIMARY KEY
(Customer_ID, Product_ID);
Another way:
When we have only one attribute as primary key, like we
see in the first example of STUDENT table. we can
define the key like this as well:
Create table STUDENT
(
Stu_Id int primary key,
Stu_Name varchar(255) not null,
Stu_Age int not null
)
Super key in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
Definition of Super Key in DBMS: A super key is a set
of one or more attributes (columns), which can uniquely
identify a row in a table. Often DBMS beginners get
confused between super key and candidate key, so we
will also discuss candidate key and its relation with
super key in this article.
How candidate key is different from super key?
Answer is simple – Candidate keys are selected from
the set of super keys, the only thing we take care while
selecting candidate key is: It should not have any
redundant attribute. That’s the reason they are also
termed as minimal super key.
Let’s take an example to understand this:
Table: Employee
Emp_SSN Emp_Number Emp_Name
--------- ---------- --------
123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert
Super keys: The above table has following super keys.
All of the following sets of super key are able to uniquely
identify a row of the employee table.
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_SSN, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
Candidate Keys: As I mentioned in the beginning, a
candidate key is a minimal super key with no redundant
attributes. The following two set of super keys are
chosen from the above sets as there are no redundant
attributes in these sets.
{Emp_SSN}
{Emp_Number}
Only these two sets are candidate keys as all other sets
are having redundant attributes that are not necessary
for unique identification.
Super key vs Candidate Key
I have been getting lot of comments regarding the
confusion between super key and candidate key. Let me
give you a clear explanation.
1. First you have to understand that all the candidate
keys are super keys. This is because the candidate keys
are chosen out of the super keys.
2. How we choose candidate keys from the set of super
keys? We look for those keys from which we cannot
remove any fields. In the above example, we have not
chosen {Emp_SSN, Emp_Name} as candidate key
because {Emp_SSN} alone can identify a unique row in
the table and Emp_Name is redundant.
Primary key:
A Primary key is selected from a set of candidate keys.
This is done by database admin or database designer.
We can say that
either {Emp_SSN} or {Emp_Number} can be chosen as
a primary key for the table Employee.
Candidate Key in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
Definition of Candidate Key in DBMS: A super
key with no redundant attribute is known as candidate
key. Candidate keys are selected from the set of super
keys, the only thing we take care while selecting
candidate key is that the candidate key should not have
any redundant attributes. That’s the reason they are also
termed as minimal super key.
Candidate Key Example
Lets take an example of table “Employee”. This table
has three attributes: Emp_Id, Emp_Number &
Emp_Name. Here Emp_Id & Emp_Number will be
having unique values and Emp_Name can have
duplicate values as more than one employees can have
same name.
Emp_Id Emp_Number Emp_Name
------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
How many super keys the above table can have?
1. {Emp_Id}
2. {Emp_Number}
3. {Emp_Id, Emp_Number}
4. {Emp_Id, Emp_Name}
5. {Emp_Id, Emp_Number, Emp_Name}
6. {Emp_Number, Emp_Name}
Lets select the candidate keys from the above set of
super keys.
1. {Emp_Id} – No redundant attributes
2. {Emp_Number} – No redundant attributes
3. {Emp_Id, Emp_Number} – Redundant attribute. Either
of those attributes can be a minimal super key as both of
these columns have unique values.
4. {Emp_Id, Emp_Name} – Redundant attribute
Emp_Name.
5. {Emp_Id, Emp_Number, Emp_Name} – Redundant
attributes. Emp_Id or Emp_Number alone are sufficient
enough to uniquely identify a row of Employee table.
6. {Emp_Number, Emp_Name} – Redundant attribute
Emp_Name.
The candidate keys we have selected are:
{Emp_Id}
{Emp_Number}
Note: A primary key is selected from the set of
candidate keys. That means we can either have Emp_Id
or Emp_Number as primary key. The decision is made
by DBA (Database administrator)
oreign key in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
Definition: Foreign keys are the columns of a table that
points to the primary key of another table. They act as a
cross-reference between tables.
For example:
In the below example the Stu_Id column
in Course_enrollment table is a foreign key as it points
to the primary key of the Student table.
Course_enrollment table:
Course_Id Stu_Id
C01 101
C02 102
C03 101
C05 102
C06 103
C07 102
Student table:
Stu_Id Stu_Name Stu_Age
101 Chaitanya 22
102 Arya 26
103 Bran 25
104 Jon 21
Note: Practically, the foreign key has nothing to do with
the primary key tag of another table, if it points to a
unique column (not necessarily a primary key) of
another table then too, it would be a foreign key. So, a
correct definition of foreign key would be: Foreign keys
are the columns of a table that points to the candidate
key of another table.
Composite key in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
Definition of Composite key: A key that has more than
one attributes is known as composite key. It is also
known as compound key.
Note: Any key such as super key, primary
key, candidate key etc. can be called composite key if it
has more than one attributes.
Composite key Example
Lets consider a table Sales. This table has four columns
(attributes) – cust_Id, order_Id, product_code &
product_count.
Table – Sales
cust_Id order_Id product_code
product_count
-------- -------- ------------
-------------
C01 O001 P007 23
C02 O123 P007 19
C02 O123 P230 82
C01 O001 P890 42
None of these columns alone can play a role of key in
this table.
Column cust_Id alone cannot become a key as a same
customer can place multiple orders, thus the same
customer can have multiple entires.
Column order_Id alone cannot be a primary key as a
same order can contain the order of multiple products,
thus same order_Id can be present multiple times.
Column product_code cannot be a primary key as
more than one customers can place order for the same
product.
Column product_count alone cannot be a primary key
because two orders can be placed for the same product
count.
Based on this, it is safe to assume that the key should
be having more than one attributes:
Key in above table: {cust_id, product_code}
This is a composite key as it is made up of more than
one attributes.
Alternate key in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
As we have seen in the candidate key guide that a table
can have multiple candidate keys. Among these
candidate keys, only one key gets selected as primary
key, the remaining keys are known as alternative or
secondary keys.
Alternate Key Example
Lets take an example to understand the alternate key
concept. Here we have a table Employee, this table has
three attributes: Emp_Id, Emp_Number & Emp_Name.
Table: Employee/strong>
Emp_Id Emp_Number Emp_Name
------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
There are two candidate keys in the above table:
{Emp_Id}
{Emp_Number}
DBA (Database administrator) can choose any of the
above key as primary key. Lets say Emp_Id is chosen
as primary key.
Since we have selected Emp_Id as primary key, the
remaining key Emp_Number would be called alternative
or secondary key.
Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in
Database
BY CHAITANYA SINGH | FILED UNDER: DBMS
Normalization is a process of organizing the data in
database to avoid data redundancy, insertion anomaly,
update anomaly & deletion anomaly. Let’s discuss about
anomalies first then we will discuss normal forms with
examples.
Anomalies in DBMS
There are three types of anomalies that occur when the
database is not normalized. These are – Insertion,
update and deletion anomaly. Let’s take an example to
understand this.
Example: Suppose a manufacturing company stores the
employee details in a table named employee that has
four attributes: emp_id for storing employee’s id,
emp_name for storing employee’s name, emp_address
for storing employee’s address and emp_dept for storing
the department details in which the employee works. At
some point of time the table looks like this:
emp_id emp_name emp_address emp_dept
101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004
The above table is not normalized. We will see the
problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows
for employee Rick as he belongs to two departments of
the company. If we want to update the address of Rick
then we have to update the same in two rows or the
data will become inconsistent. If somehow, the correct
address gets updated in one department but not in other
then as per the database, Rick would be having two
different addresses, which is not correct and would lead
to inconsistent data.
Insert anomaly: Suppose a new employee joins the
company, who is under training and currently not
assigned to any department then we would not be able
to insert the data into the table if emp_dept field doesn’t
allow nulls.
Delete anomaly: Suppose, if at a point of time the
company closes the department D890 then deleting the
rows that are having emp_dept as D890 would also
delete the information of employee Maggie since she is
assigned only to this department.
To overcome these anomalies we need to normalize the
data. In the next section we will discuss about
normalization.
Normalization
Here are the most commonly used normal forms:
First normal form(1NF)
Second normal form(2NF)
Third normal form(3NF)
Boyce & Codd normal form (BCNF)
First normal form (1NF)
As per the rule of first normal form, an attribute (column)
of a table cannot hold multiple values. It should hold only
atomic values.
Example: Suppose a company wants to store the
names and contact details of its employees. It creates a
table that looks like this:
emp_id emp_name emp_address emp_mobile
101 Herschel New Delhi 8912312390
8812121212
102 Jon Kanpur
9900012222
103 Ron Chennai 7778881212
9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile
numbers so the company stored them in the same field
as you can see in the table above.
This table is not in 1NF as the rule says “each attribute
of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates
that rule.
To make the table complies with 1NF we should have
the data like this:
emp_id emp_name emp_address emp_mobile
101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212
102 Jon Kanpur 9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123
104 Lester Bangalore 8123450987
Second normal form (2NF)
A table is said to be in 2NF if both the following
conditions hold:
Table is in 1NF (First normal form)
No non-prime attribute is dependent on the proper
subset of any candidate key of table.
An attribute that is not part of any candidate key is
known as non-prime attribute.
Example: Suppose a school wants to store the data of
teachers and the subjects they teach. They create a
table that looks like this: Since a teacher can teach more
than one subjects, the table can have multiple rows for a
same teacher.
teacher_id subject teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic
values. However, it is not in 2NF because non prime
attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key. This violates
the rule for 2NF as the rule says “no non-prime attribute
is dependent on the proper subset of any candidate key
of the table”.
To make the table complies with 2NF we can break it in
two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).
Third Normal form (3NF)
A table design is said to be in 3NF if both the following
conditions hold:
Table must be in 2NF
Transitive functional dependency of non-prime
attribute on any super key should be removed.
An attribute that is not part of any candidate key is
known as non-prime attribute.
In other words 3NF can be explained like this: A table is
in 3NF if it is in 2NF and for each functional dependency
X-> Y at least one of the following conditions hold:
X is a super key of table
Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is
known as prime attribute.
Example: Suppose a company wants to store the
complete address of each employee, they create a table
named employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id,
emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are
non-prime as they are not part of any candidate keys.
Here, emp_state, emp_city & emp_district dependent on
emp_zip. And, emp_zip is dependent on emp_id that
makes non-prime attributes (emp_state, emp_city &
emp_district) transitively dependent on super key
(emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break
the table into two tables to remove the transitive
dependency:
employee table:
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
employee_zip table:
emp_zip emp_state emp_city emp_district
282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan
Boyce Codd normal form (BCNF)
It is an advance version of 3NF that’s why it is also
referred as 3.5NF. BCNF is stricter than 3NF. A table
complies with BCNF if it is in 3NF and for
every functional dependency X->Y, X should be the
super key of the table.
Example: Suppose there is a company wherein
employees work in more than one department. They
store the data like this:
emp_idemp_nationalityemp_dept dept_typedept_no_of_em
Production and
1001 Austrian D001 200
planning
1001 Austrian stores D001 250
design and technical
1002 American D134 100
support
Purchasing
1002 American D134 600
department
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor
emp_dept alone are keys.
To make the table comply with BCNF we can break the
table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp
Production and planning D001 200
stores D001 250
design and technical support D134 100
Purchasing department D134 600
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional
dependencies left side part is a key.
Functional dependency in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
The attributes of a table is said to be dependent on each
other when an attribute of a table uniquely identifies
another attribute of the same table.
For example: Suppose we have a student table with
attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of
student table because if we know the student id we can
tell the student name associated with it. This is known
as functional dependency and can be written as Stu_Id-
>Stu_Name or in words we can say Stu_Name is
functionally dependent on Stu_Id.
Formally:
If column A of a table uniquely identifies the column B of
same table then it can represented as A->B (Attribute B
is functionally dependent on attribute A)
Types of Functional Dependencies
Trivial functional dependency
non-trivial functional dependency
Multivalued dependency
Transitive dependency