Unit I DBMS Notes
Unit I DBMS Notes
Programme : BCA
CONTENTS
1 Definition
2 Data Vs Information
3 Introduction to Database
4 Components of DBMS Application
5 Role of DBMS
6 Types of Databases
7 Importance of Database Design
8 Evolution of File system Data processing
9 Difference between File system & DBMS
10 Problems with File system
11 Database System- Environment
12 Data modelling and Data model
13 Data model basic building blocks
14 Business rule
15 Types of Data Model/Evolution of Data Model
16 Characteristics of Big database
17 Degrees of data abstraction
1
2
3
4
1. DEFINITION
Example:
Data is the raw material that can be processed for any computing
machine.
Metadata, or data about data, through which the end-user data are
integrated and managed. The metadata describe the data characteristics
and the set of relationships that links the data found within the database.
Information
Information is the data that has been converted into more useful or
intelligent form.
5
For example: Report card sheet.
Knowledge
The human mind purposefully organizes the information and evaluates it
to produce knowledge.
2. DATA Vs INFORMATION
Data Information
It is a secondary level of
This is the primary level of intelligence.
intelligence.
6
3. INTRODUCTION TO DATABASE:
A database management system (DBMS) is a collection of programs that
manages the database structure and controls access to the data stored in
the database.
7
4. COMPONENTS OF DBMS APPLICATIONS
1. H
ar
d
w
ar
e
2. Software
Actual DBMS software like MySQL, Oracle, PostgreSQL.
3. Data
Raw facts stored in structured or unstructured formats.
4. Procedures
Instructions and rules for using DBMS effectively.
8
Database Administrators (DBA) – Manage security, performance, user
access.
Developers – Build applications using the database.
End Users – Use applications to access the database (e.g., students,
employees).
5. ROLE OF DBMS
Data Management and Storage:
A DBMS handles the complex tasks of storing, organizing, and
managing large volumes of data in a structured format, typically in
tables.
Intermediary:
It serves as a bridge between users/applications and the physical
database, translating data requests into a format the database
understands and back into a usable format for the user.
Data Control:
A DBMS enforces rules and constraints to ensure data accuracy,
consistency, and security, which are often challenging with traditional
file-based systems.
User Interface:
It provides interfaces, often using query languages like SQL, for users
to create, read, update, and delete data efficiently
9
Data sharing: A DBMS provides a platform for sharing data across
multiple applications and users, which can increase productivity and
collaboration.
Data consistency and accuracy: DBMS enforces integrity constraints
to maintain valid data.
Data organization: A DBMS provides a systematic approach to
organizing data in a structured way, which makes it easier to retrieve
and manage data efficiently.
Efficient data access and retrieval: DBMS allows for efficient data
access and retrieval. It boosts system performance and user satisfaction.
Concurrency and maintained Atomicity:. The DBMS allows
concurrent access to multiple users by using the synchronization
technique.
Scalability and flexibility: DBMS is highly scalable and can easily
accommodate changes in data volumes and user requirements.
7. TYPES OF DATABASES:
1. Hierarchical Databases
2.
Network
Databases
A
network databases build on the hierarchical model but allow child
records to be linked to multiple parent records, creating a web-like
structure of interconnected data.
10
3. Object-Oriented Databases
.
4.
Relational Databases
Relational databases are the most widely used type of database today.
They store data in tables, with rows representing records and columns
representing attributes of the records.
11
5. Cloud Databases
A cloud database operates in a virtual environment hosted on
cloud computing platforms. It is designed for storing, managing,
and executing data over the internet,
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
6. Centralized Databases
A centralized database is a database stored and managed at
a single location, such as a central server or datacentre.
.
7. Personal Databases
A personal database is a small-scale database designed for a
single user, typically used on personal computers or mobile
devices.
Examples are:
Microsoft Access:
SQLite:.
8. Operational Databases
An operational database is designed to manage and process
real-time data for daily operations within organizations and
businesses.
9. NoSQL Databases
A NoSQL database (short for "non-SQL" or "non-relational")
provides a mechanism for storing and retrieving data that does
not rely on traditional table-based relational models.
12
8. IMPORTANCE OF DATABASE DESIGN
4. Supports Scalability
13
5. Enhances Security
Data was stored in paper files, ledgers, and registers, making retrieval slow
and prone to errors.
Advantages:
Disadvantages:
14
2. File-Based System (1950s–1960s)
Advantages:
Disadvantages:
Programming languages like COBOL and C were used to store data in flat
files with sequential, indexed, or random access.
Advantages:
Disadvantages:
Advantages:
15
Faster access than flat files.
Disadvantages:
Proposed by E.F. Codd, data was stored in tables (rows and columns) and
managed using SQL.
Advantages:
Disadvantages:
Databases supported complex data types such as images, audio, and video using
object-oriented concepts.
Advantages:
Disadvantages:
16
7. Distributed & Client-Server Databases (1990s–2000s)
Data was stored across multiple servers but appeared as a single unified
database to users.
Advantages:
Disadvantages:
Databases were designed for unstructured and massive amounts of data, used in
social media, IoT, and real-time apps.
Advantages:
Disadvantages:
Data is stored and managed on cloud platforms like AWS, Google Cloud, and
Azure.
Advantages:
17
Disadvantages:
18
Basics File System DBMS
It has a comparatively
It is less expensive than
higher cost than a file
DBMS.
Cost system.
In DBMS data
independence exists,
mainly of two types:
There is no data
1) Logical Data
independence.
Independence .
Data 2)Physical Data
Independence Independence.
19
11. PROBLEMS WITH FILE SYSTEM
Data Redundancy
Data Inconsistency
Program–Data Dependence
20
If file format changes, all programs using that file must also be
modified.
Multiple users accessing or updating files at the same time can cause
conflicts or data corruption.
DATABASE SYSTEM:
1. Hardware
2. Software
3. People
4. Procedures
5. Data
1. Hardware.
Hardware refers to all of the system’s physical devices, including
computers storage devices, printers, network devices and other devices
2. Software.
Although the most readily identified software is the DBMS itself, three
types of software are needed to make the database system function fully:
operating system software,
DBMS software,
21
and application programs
and utilities.
-- Operating system software manages all hardware components and makes it
possible for all other software
to run on the computers. Examples of operating system software include
Microsoft Windows, Linux,
Mac OS, UNIX, and MVS.
-- DBMS software manages the database within the database system. Some
examples of DBMS software include Microsoft’s SQL Server, Oracle
Corporation’s Oracle, Sun’s MySQL, and IBM’s DB2.
-- Application programs and utility software are used to access and manipulate
data in the DBMS and to manage the computer environment in which data
access and manipulation take place.
3. People. This component includes all users of the database system.
five types of users can be identified in a database system:
1. system administrators,
2. database administrators,
3. database designers,
4. system analysts and
5. programmers, and end users..
22
4. Procedures. Procedures are the instructions and rules that govern the
design and use of the database system.
5. Data. The word data covers the collection of facts stored in the database.
23
3. A relationship describes an association among entities. For
example, a relationship exists between customers and agents that can be
described as follows: an agent can serve many customers, and each
customer may be served by one agent.
25
The Evolution of Data Models
26
2. NETWORK MODEL
The network model was created to represent complex data relationships
more effectively than the hierarchical model, to improve database
performance, and to impose a database standard. the network model
allows a record to have more than one parent.
• The schema is the conceptual organization of the entire database as
viewed by the database administrator.
• The subschema defines the portion of the database “seen” by the
application programs that actually produce the desired information from
the data within the database.
• A data manipulation language (DML) defines the environment in
which data can be managed and is used to work with the data in the
database.
• A schema data definition language (DDL) enables the database
administrator to define the schema components
3. THE RELATIONAL MODEL
• The relational model was introduced in 1970 by E. F. Codd of IBM in
his landmark paper “A Relational Model of Data for Large Shared Databanks” .
• Each row in a relation is called a tuple. Each column represents an
attribute. The relational model also describes a precise set of data manipulation
constructs based on advanced mathematical concepts.
27
4. THE ENTITY RELATIONSHIP MODEL
ER models are normally represented in an entity relationship
diagram (ERD), which uses graphical representations to model
database components.
The ER model is based on the following components:
• Entity.
An entity is represented in the ERD by a rectangle, also known as
an entity box. The name of the entity, a noun, is written in the center
of the rectangle.
The entity name is generally written in capital letters and in singular
form: PAINTER rather than PAINTERS, and EMPLOYEE rather
than EMPLOYEES.
Each entity consists of a set of attributes that describes particular
characteristics of the entity.
For example, the entity EMPLOYEE will have attributes such as a
Social Security number, a last name, and a first name. (Chapter 4
explains how attributes are included in the ERD.)
• Relationships.
Relationships describe associations among data. Most relationships
describe associations between two entities. When the basic data
model components were introduced, three types of data
relationships were illustrated: one-to-many (1:M), many-to-
many (M:N), and one-to-one (1:1).
The ER model uses the term connectivity to label the relationship types.
The name of the relationship is usually an active or passive verb. For
example, a PAINTER paints many PAINTINGs, an EMPLOYEE learns
many SKILLs, and an EMPLOYEE manages a STORE
28
5. THE OBJECT-ORIENTED (OO) MODEL
• In the object-oriented data model (OODM), both data and their
relationships are contained in a single structure known as an object. In
turn, the OODM is the basis for the object-oriented database
management system (OODBMS).
• An object is an abstraction of a real-world entity.
Attributes describe the properties of an object. For example, a
PERSON object includes the attributes Name, Social Security
Number, and Date of Birth.
• Objects that share similar characteristics are grouped in classes. A
class is a collection of similar objects with shared structure
(attributes) and behavior (methods).
• Classes are organized in a class hierarchy.
• Inheritance is the ability of an object within the class hierarchy to inherit
the attributes and methods of the classes above it.
Object-oriented data models are typically depicted using Unified Modeling
Language (UML) class diagrams.
29
6. OBJECT/RELATIONAL AND XML
• Object/Relational and XML Facing the demand to support more
complex data representations.
• The ERDM adds many of the OO model’s features within the
inherently simpler relational database structure.
7. Emerging Data Models: Big Data and NoSQL
• ” Big Data refers to a movement to find new and better ways to
manage large amounts of web and sensor-generated data and derive
business insight from it, while simultaneously providing high
performance and scalability at a reasonable cost
8. NOSQL DATABASES
NoSQL to refer to a new generation of databases that address the
specific challenges of the Big Data era.
It is not based on the relational model and SQL, hence the name
NoSQL.
It support distributed database architectures.
It provide high scalability, high availability, and fault tolerance.
It support very large amounts of sparse data.
It geared toward performance rather than transaction consistency
30
17. CHARACTERISTICS OF BIG DATA DATABASES:
volume, velocity, and variety, or the 3 Vs.
1. Volume refers to the amounts of data being stored. With the adoption and
growth of the Internet and social media, companies have multiplied the ways
to reach customers.
2. Velocity refers not only to the speed with which data grows but also to the
need to process these data quickly in order to generate information and
insight.
• The velocity of data growth is also due to the increase in the number of
different data streams from which data is being piped to the organization
(via the web, e-commerce, Tweets, Facebook posts, emails, sensors, GPS,
and so on).
• Variety refers to the fact that the data being collected comes in multiple
different data formats.
Some of the most frequently used Big Data technologies are Hadoop,
MapReduce, and NoSQL databases.
31
• Hadoop is a Java based, open source, high speed, fault-tolerant
distributed storage and computational framework..7 Hadoop has several
modules, but the two main components are Hadoop Distributed File
System (HDFS) and MapReduce.
• Hadoop Distributed File System (HDFS) is a highly distributed, fault-
tolerant file storage system designed to manage large amounts of data at
high speeds.
• MapReduce is an open source application programming interface (API)
that provides fast data analytics services. MapReduce distributes the
processing of the data among thousands of nodes in parallel.
18. DEGREE OF DATA ABSTRCTION
• In the early 1970s, the American National Standards Institute (ANSI)
Standards Planning and RequirementsCommittee (SPARC) defined a
framework for data modeling based on degrees of data abstraction.
• The resulting ANSI/SPARC architecture defines three levels of data
abstraction: external, conceptual, and internal
1. EXTERNAL MODEL
32
• The external model is the end users’ view of the data
environment. The term end users refers to people who use the
application programs to manipulate the data and generate
information.
• A specific representation of an external view is known as an
external schema
Entity relationship
• A PROFESSOR may teach many CLASSes, and each CLASS is taught
by only one PROFESSOR; there is a 1:M relationship between
PROFESSOR and CLASS.
• A CLASS may ENROLL many students, and each STUDENT may
ENROLL in many CLASSes, thus creating an M:N relationship between
STUDENT and CLASS.
• Each COURSE may generate many CLASSes, but each CLASS
references a single COURSE
• Finally, a CLASS requires one ROOM, but a ROOM may be scheduled
for many CLASSes. That is, eachclassroom may be used for several
classes: one at 9:00 a.m., one at 11:00 a.m., and one at 1:00 p.m., for
example. In other words, there is a 1:M relationship between ROOM and
CLASS.
Advantages
33
• It is easy to identify specific data required to support each business
unit’s operations.
• It makes the designer’s job easy by providing feedback about the
model’s adequacy..
• It helps to ensure security constraints in the database design.
Damaging an entire database is more difficult
• when each business unit works with only a subset of data.
• It makes application program development much simpler.
2. CONCEPTUAL MODEL
• The conceptual model represents a global view of the entire
database by the entire organization.That is, the conceptual model
integrates all external views (entities, relationships, constraints,
and processes) into a single global view of the data in the
enterprise
• Also known as a conceptual schema, it is the basis for the
identification and high-level description of the main data objects
(avoiding any database model-specific details
ADVANTAGES
34
• First, it provides a bird’s-eye (macro level) view of the data environment
that is relatively easy to understand
• Second, the conceptual model is independent of both software and
hardware.
• Software independence means that the model does not depend on the
DBMS software used to implement the model.
• Hardware independence means that the model does not depend on the
hardware used in the implementation of the model
3. INTERNAL MODEL
• Once a specific DBMS has been selected, the internal model maps the
conceptual model to the DBMS. The internal model is the representation
of the database as “seen” by the DBMS.
• In other words, the internal model requires the designer to match the
conceptual model’s characteristics and constraints to those of the selected
implementation model.
An internal schema depicts a specific representation of an internal model, using
the database constructs supported by the chosen database
35
4. PHYSICAL MODEL
• The physical model operates at the lowest level of abstraction,
describing the way data are saved on storage media such as magnetic,
solid state, or optical media.
• The physical model requires the definition of both the physical storage
devices and the (physical) access methods required to reach the data
within those storage devices, making it both software and hardware
dependent
36