1
KGiSL Institute of Technology
Affiliated to Anna University, Chennai
KGiSLCampus, Saravanampatti
Coimbatore –641 035
Department of Management Studies
Subject Name : INFORMATION MANAGEMENT
Subject Code : BA4106
Class : I MBA
Handled By : P.BANUCHITTARA MBA
Assistant Professor
UNIT – III DATABASE
MANAGEMENT SYSTEMS
SYLLABUS
DBMS – types and evolution, RDBMS, OODBMS,
RODBMS, Data warehousing, Data Mart, Data mining.
DBMS- Data Base Management System
Database is collection of data which is related by
some aspect. Data is collection of facts and figures
which can be processed to produce information. Mostly
data represents recordable facts.
A database management system stores data, in
such a way which is easier to retrieve, manipulate
and helps to produce information.
DEFINITION
Database Management System (DBMS) is a
software that permits the users to create, update,
erase and retrieve the data or the records put away.
It allows users to manipulate data as well as to make
databases according to their prerequisites."
1. Real-world entity
Data management systems have been designed keeping in
mind the needs of business organizations. They help businesses
manage large amounts of data efficiently. These systems store
huge volumes of data and provide easy ways to search through
them.
Some examples of such applications are Microsoft Access, Oracle,
MySQL, etc.
2. Relational databases
Relational databases were first introduced
in the 1970s. In this type of database, each
record contains fields called attributes. Each
attribute represents one piece of information
about a particular object.
3. Structured query language
Structured Query Language was developed in the
1980s. It provides a way to write queries against a
database. Queries written in SQL are known as
structured queries because they use
predefined structures to define relationships among
entities.
4. ACID properties
DBMS adheres to the concepts
of Atomicity, Consistency, Isolation, and Durability,
or ACID Properties. These concepts are applied to
transactions, which operate and play around with data
in a database. In multi-transactional environments,
ACID properties help the database stay healthy in case
of failure.
5. Multiuser and concurrent access
Data can be accessed and manipulated in parallel with the help
of the DBMS. Users are unaware of the restrictions on transactions
when handling the same data item.
DBMS offers multiple views for various clients. A client who is
in the Sales division will have an alternate perspective on the
database than an individual working in the Production office. This
component empowers the clients to have a concentrated
perspective on the database as indicated by their prerequisites.
Multi- User Access
Con - current Access
ADMINISTRATORS:
A bunch of users maintain the DBMS and are responsible for
administrating the database.
They create users access and apply limitation to maintain
isolation and force security.
Administrators also look after DBMS resources like system
license, software application and tools required and other
hardware related maintenance.
DESIGNER:
This is the group of people who actually works on
designing part of database.
The actual database is started with requirement
analysis followed by a good designing process.
They identify and design the whole set of entities,
relations, constraints and views.
END USERS:
This group contains the
persons who actually take
advantage of database
system.
End users can be just viewers
who pay attention to the logs
or market rates or end users
can be as sophisticated as
business a analyst who takes
the most of it.
DBMS ARCHITECTURE
The DBMS design depends upon its
architecture. The basic client/server
architecture is used to deal with a large number
of PCs, web servers, database servers and
other components that are connected with
networks.
1-Tier Architecture
• In this architecture, the database is directly available to the
user. It means the user can directly sit on the DBMS and
uses it.
• Any changes done here will directly be done on the database
itself. It doesn't provide a handy tool for end users.
• The 1-Tier architecture is used for development of the local
application, where programmers can directly communicate
with the database for the quick response.
2-Tier Architecture
• The 2-Tier architecture is same as
basic client-server. In the two-tier
architecture, applications on the client
end can directly communicate with the
database at the server side. For this
interaction, API's
like: ODBC, JDBC are used.
3-Tier Architecture
• The 3-Tier architecture
contains another layer
between the client and
server. In this architecture,
client can't directly
communicate with the
server.
• The 3-Tier architecture is
used in case of large web
application.
DBMS - Data Models:
Data Model gives us an idea that how the final
system will look like after its complete implementation.
It defines the data elements and the relationships
between the data elements.
Data Models are used to show how data is stored,
connected, accessed and updated in the database
management system.
Some of the Data Models in DBMS are:
Hierarchical Model
Network Model
Entity-Relationship Model
Relational Model
DBMS
MODELS Object-Oriented Data Model
Object-Relational Data Model
Flat Data Model
Semi-Structured Data Model
Associative Data Model
Context Data Mode
1. Hierarchical Model
Hierarchical Model was the first DBMS model.
This model organizes the data in the hierarchical tree structure.
The hierarchy starts from the root which has root data and
then it expands in the form of a tree adding child node to the
parent node.
This model easily represents some of the real-world
relationships like food recipes, sitemap of a website etc.
Example: We can represent the relationship between the shoes
present on a shopping website in the following way:
2. Network Model
This model is an extension of the
hierarchical model. It was the most popular model
before the relational model. This model is the same
as the hierarchical model, the only difference is
that a record can have more than one parent. It
replaces the hierarchical tree with a graph.
Example: In the example below we can see that node student has
two parents i.e. CSE Department and Library. This was earlier not
possible in the hierarchical model.
3. Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a
high-level data model diagram. In this model, we
represent the real-world problem in the pictorial
form to make it easy for the stakeholders to
understand.
1. Entities
2. Attributes
3. Relationship
4. Relational Model
Relational Model is the most widely used model. In this
model, the data is maintained in the form of a two-
dimensional table.
All the information is stored in the form of row and
columns.
The basic structure of a relational model is tables. So, the
tables are also called relations in the relational model.
Example: In this example, we have an Employee
table.
5. Object-Oriented Data Model
The real-world problems are more closely represented
through the object-oriented data model.
In this model, both the data and relationship are
present in a single structure known as an object.
We can store audio, video, images, etc in the database
which was not possible in the relational model(although
you can store audio and video in relational database, it
is adviced not to store in the relational database).
In this model, two are more objects are connected
through links. We use this link to relate one object to
other objects. This can be understood by the example
given below.
6.Object-Relational Model
As the name suggests it is a combination of both the relational
model and the object-oriented model. This model was built to fill
the gap between object-oriented model and the relational
model. We can have many advanced features like we can make
complex data types according to our requirements using the
existing data types. The problem with this model is that this can
get complex and difficult to handle. So, proper understanding of
this model is required.
7. Flat Data Model
It is a simple model in which the database
is represented as a table consisting of rows
and columns. To access any data, the
computer has to read the entire table. This
makes the modes slow and inefficient.
8. Semi-Structured Model
Semi-structured model is an evolved form
of the relational model. We cannot
differentiate between data and schema in
this model.
9. Associative Data Model
Associative Data Model is a model in which the data is
divided into two parts. Everything which has
independent existence is called as an entity and the
relationship among these entities are called association.
The data divided into two parts are called items and
links.
• Item: Items contain the name and the identifier(some
numeric value).
• Links: Links contain the identifier, source, verb and
subject.
10. Context Data Model
Context Data Model is a collection of
several models. This consists of models like
network model, relational models etc. Using
this model we can do various types of tasks
which are not possible using any model
alone.
Evolution of Database
• Data modeling and databases evolved together, and their
history dates back to the 1960’s.
The database evolution happened in five “waves”:
The first wave consisted of network, hierarchical, inverted
list, and (in the 1990’s) object-oriented DBMSs; it took
place from roughly 1960 to 1999.
• The relational wave introduced all of the SQL products (and
a few non-SQL) around 1990 and began to lose users
around 2008.
• The decision support wave introduced Online
Analytical Processing (OLAP) and specialized
DBMSs around 1990, and is still in full force
today.
• The graph wave began with The Semantic Web
stack from the Worldwide Web Consortium in
1999, with property graphs appearing around
2008
• The NoSQL wave includes big data and much
more; it began in 2008.
A relational database management
system (RDBMS) is a collection of programs and
capabilities that enable IT teams and others to create,
update, administer and otherwise interact with a
relational database. The RDBMS is the most popular
database system among organizations across the
world.
The relational database has two
major reasons
• Relational databases can be used with
little or no training.
• Database entries can be modified without
specify the entire body.
Properties of Relational Tables
In the relational database we have to follow some
properties which are given below.
It's Values are Atomic
In Each Row is alone
Column Values are of the same thing
Columns are undistinguished
Sequence of Rows is Insignificant
Each Column has a common Name
OODBMS – Object oriented Database Management
System
An object-oriented database management system
(OODBMS) is a database management system that supports the
creation and modeling of data as objects.
OODBMS also includes support for classes of objects and
the inheritance of class properties, and incorporates methods,
subclasses and their objects.
Most of the object databases also offer some kind of
query language, permitting objects to be found through a
declarative programming approach.
The object-oriented database derivation is the integrity of object-
oriented programming language systems and consistent systems. The
power of the object-oriented databases comes from the cyclical treatment
of both consistent data, as found in databases, and transient data, as
found in executing programs.
Object-oriented databases use small, recyclable separated of software
called objects. The objects themselves are stored in the object-oriented
database.
Each object contains of two elements:
1. Piece of data (e.g., sound, video, text, or graphics).
2. Instructions or software programs called methods, for what to do with the
data.
Disadvantage of Object-oriented databases
• Object-oriented databases have these disadvantages.
• Object-oriented database are more expensive to develop.
• In the Most organizations are unwilling to abandon and
convert from those databases.
The benefits to object-oriented databases are compelling.
The ability to mix and match reusable objects provides
incredible multimedia capability.
QUERY PROCESSING
Upper levels of the data integration problem
• How to construct mappings from sources to a
single mediated schema
• How queries posed over the mediated schema
are reformulated over the sources
BASIC STEPS IN QUERY PROCESSING
• Parsing and translation
• Optimization
• Evaluation
SQL
SQL is Structured Query Language, which is a computer language
for storing, manipulating and retrieving data stored in relational
database.
SQL is the standard language for Relation Database System. All
relational database management systems like MySQL, MS Access,
and Oracle, Sybase, Informix, postgres and SQL Server use SQL as
standard database language.
Why SQL?
Allows users to access data in relational database management
systems.
Allows users to describe the data.
Allows users to define the data in database and manipulate that data.
Allows to embed within other languages using SQL modules, libraries &
pre-compilers.
Allows users to create and drop databases and tables.
Allows users to create view, stored procedure, functions in a database.
Allows users to set permissions on tables, procedures, and views
SQL Process
When you are executing an SQL command for any
RDBMS, the system determines the best way to carry out
your request and SQL engine figures out how to interpret
the task.
There are various components included in the process.
These components are Query Dispatcher, Optimization
Engines, Classic Query Engine and SQL Query Engine,
etc. Classic query engine handles all non-SQL queries but
SQL query engine won't handle logical files.
SQL COMMANDS
The standard SQL commands to interact with
relational databases are CREATE, SELECT, INSERT,
UPDATE, DELETE and DROP. These commands can
be classified into groups based on their nature.
DDL - Data Definition
Language
DESCRIPTION
COMMAND
CREATE Creates a new table, a view of a table, or
other object in database
ALTER Modifies an existing database object, such as
a table.
DROP Deletes an entire table, a view of a table or
other object in the database.
DML - DATA MANIPULATION
LANGUAGE
COMMAND DESCRIPTION
SELECT Retrieves certain records from
one or more tables
INSERT Creates a record
UPDATE Modifies records
DELETE Deletes records
DCL - Data Control Language
COMMAND DESCRIPTION
GRANT Gives a privilege to user
REVOKE Takes back privileges granted from user
CONCURRENCY MANAGEMENT
• In a multiprogramming environment where more than one
transactions can be concurrently executed, there exists a need of
protocols to control the concurrency of transaction to ensure
atomicity and isolation properties of transactions.
• Concurrency control protocols, which ensure serializability of
transactions, are most desirable. Concurrency control protocols
can be broadly divided into two categories:
Lock based protocols
Time stamp based protocols
TYPES LOCK PROTOCOLS
Simplistic
Pre-claiming
Two Phase Locking - 2PL
Strict Two Phase Locking
Time stamp based protocols
This protocol uses either system time or logical
counter to be used as a time-stamp.
Every transaction has a time-stamp associated with
it and the ordering is determined by the age of the
transaction.
In addition, every data item is given the latest read
and write-timestamp.
TIME-STAMP ORDERING PROTOCOL
The timestamp-ordering protocol ensures serializability among
transaction in their conflicting read and writes operations. This is
the responsibility of the protocol system that the conflicting pair of
tasks should be executed according to the timestamp values of the
transactions.
» Time-stamp of Transaction Ti is denoted as TS (Ti).
» Read time-stamp of data-item X is denoted by R-
timestamp(X).
» Write time-stamp of data-item X is denoted by W-
timestamp(X).
Timestamp ordering protocol works as follows:
If a transaction Ti issues read(X) operation:
– If TS(Ti) < W-timestamp(X)
• Operation rejected.
– If TS(Ti) >= W-timestamp(X)
• Operation executed.
– All data-item Timestamps updated.
If a transaction Ti issues write(X) operation:
– If TS(Ti) < R-timestamp(X)
• Operation rejected.
• If TS(Ti) < W-timestamp(X)
• Operation rejected and Ti rolled back.
Otherwise, operation executed.
DATA WAREHOUSE
Data warehouse is data management and data analysis
Goal: is to integrate enterprise wide corporate data
into a single repository from which users can easily run
queries
BENEFITS
• The major benefit of data warehousing are high returns
on investment.
• Increased productivity of corporate decision-makers
PROBLEMS
• Underestimation of resources for data loading
• Hidden problems with source systems
• Required data not captured
• Increased end-user demands
• Data homogenization
• High demand for resources
• Data ownership
• High maintenance
• Long-duration projects
• Complexity of integration
MAIN COMPONENTS
Operational data sources
Operational data sstore
query manager
end-user access tools
DATA FLOW
Inflow- The processes associated with the extraction, cleansing,
and loading of the data from the source systems into the data
warehouse.
Up flow- The process associated with adding value to the data in
the warehouse through summarizing, packaging , packaging, and
distribution of the data.
Down flow- The processes associated with archiving and
backing-up of data in the warehouse.
Tools and Technologies
The critical steps in the construction of a data warehouse:
• Extraction
• Cleansing
• Transformation
after the critical steps, loading the results into target system
can be carried out either by separate products, or by a single,
categories:
code generators
database data replication tools
dynamic transformation engines
For the various types of meta-data and the day-to-day operations of
the data warehouse, the administration and management tools must
be capable of supporting those tasks:
• Monitoring data loading from multiple sources
• Data quality and integrity checks
• Managing and updating meta-data
• Monitoring database performance to ensure efficient query
response times and resource utilization
• Auditing data warehouse usage to provide user chargeback
information
• Replicating, subsetting, and distributing data
• Maintaining effient data storage management
• Purging data;
• Archiving and backing-up data
• Implementing recovery following failure
DATA MART
A data mart is a simple form of a data warehouse that is
focused on a single subject (or functional area), such as sales,
finance or marketing. Data marts are often built and controlled
by a single department within an organization. Given their
single-subject focus, data marts usually draw data from only a
few sources. The sources could be internal operational
systems, a central data warehouse, or external data.
DEPENDENT AND INDEPENDENT DATA MARTS
There are two basic types of data marts:
Dependent
Independent
The main difference between independent and dependent
data marts is how you populate the data mart; that is, how
you get data out of the sources and into the data mart.
STEPS IN IMPLEMENTING A DATA MART
Simply stated, the major steps in implementing a data mart are to design
the schema, construct the physical storage, populate the data mart with data
from source systems, access it to make informed decisions, and manage it over
time.
Designing
Constructing
Populating
Accessing
Managing
DESIGNING
The design step is first in the data mart process. This step
covers all of the tasks from initiating the request for a data
mart through gathering information about the requirements,
and developing the logical and physical design of the data
mart. The design step involves the following tasks:
Gathering the business and technical requirements
Identifying data sources
Selecting the appropriate subset of data
Designing the logical and physical structure of the data
mart
CONSTRUCTING
This step includes creating the physical database and the logical
structures associated with the data mart to provide fast and
efficient access to the data. This step involves the following tasks:
– Creating the physical database and storage structures, such as
tablespaces, associated with the data mart
– Creating the schema objects, such as tables and indexes
defined in the design step
– Determining how best to set up the tables and the access
structures
POPULATING
The populating step covers all of the tasks related to getting the
data from the source, cleaning it up, modifying it to the right format
and level of detail, and moving it into the data mart. More formally
stated, the populating step involves the following tasks:
Mapping data sources to target data structures
Extracting data
Cleansing and transforming the data
Loading data into the data mart
Creating and storing metadata
ACCESSING
The accessing step involves putting the data to use: querying the data,
analyzing it, creating reports, charts, and graphs, and publishing these.
Typically, the end user uses a graphical front-end tool to submit queries to
the database and display the results of the queries. The accessing step
requires that you perform the following tasks:
– Set up an intermediate layer for the front-end tool to use. This layer,
the metalayer, translates database structures and object names into
business terms, so that the end user can interact with the data mart
using terms that relate to the business function.
– Maintain and manage these business interfaces.
– Set up and manage database structures, like summarized tables, that
help queries submitted through the front-end tool execute quickly and
efficiently.
MANAGING
This step involves managing the data mart over its lifetime. In
this step, you perform management tasks such as the
following:
Providing secure access to the data
Managing the growth of the data
Optimizing the system for better performance
Ensuring the availability of data even with system failures
DATA MART ISSUES
Data mart functionality the capabilities of data marts have
increased with the growth in their popularity
Data mart size the performance deteriorates as data marts
grow in size, so need to reduce the size of data marts to
gain improvements in performance
Data mart load performance two critical components: end-
user response time and data loading performance to
increment DB updating so that only cells affected by the
change are updated and not the entire MDDB structure.