Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views36 pages

Unit I DBMS Notes

Uploaded by

yasothapriya.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views36 pages

Unit I DBMS Notes

Uploaded by

yasothapriya.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Course Code : U24CSB372

Course Name : Database Management System

Programme : BCA

CONTENTS
1 Definition
2 Data Vs Information
3 Introduction to Database
4 Components of DBMS Application
5 Role of DBMS
6 Types of Databases
7 Importance of Database Design
8 Evolution of File system Data processing
9 Difference between File system & DBMS
10 Problems with File system
11 Database System- Environment
12 Data modelling and Data model
13 Data model basic building blocks
14 Business rule
15 Types of Data Model/Evolution of Data Model
16 Characteristics of Big database
17 Degrees of data abstraction

1
2
3
4
1. DEFINITION

1.1 DATABASE SYSTEM:

A database system is a combination of software, hardware, data,


and procedures used to store, manage, and retrieve information
efficiently. It’s designed to handle large amounts of structured data while
ensuring security, integrity, and ease of access.

A database system has two main components:

 Database – An organized collection of data (e.g., student records,


bank transactions, product inventories).
 Database Management System (DBMS) – The software that
manages the database. It provides tools and functions to:

o Store, retrieve, and update data


o Ensure data security Maintain data integrity
o Support concurrent access
o Handle backups and recovery

Example:

 In a banking system, the database stores customer details, account


balances, and transactions.
1.2 DEFINITION:
DATA, META DATA, INFORMATION, KNOWLEDGE

Data is the raw material that can be processed for any computing
machine.

For example − Employee name, Product name, Name of the student,


Marks of the student, Mobile number, Image etc.

Metadata, or data about data, through which the end-user data are
integrated and managed. The metadata describe the data characteristics
and the set of relationships that links the data found within the database.
Information
Information is the data that has been converted into more useful or
intelligent form.
5
For example: Report card sheet.

Knowledge
The human mind purposefully organizes the information and evaluates it
to produce knowledge.

2. DATA Vs INFORMATION

Data Information

Data is the raw fact. It is a processed form of data.

It is not significant to a business. It is significant to a business.

Data is an atomic level piece of


It is a collection of data.
information.

Example: Product name, Name of student. Example: Report card of student.

It is a phenomenal fact. It is organized data.

It is a secondary level of
This is the primary level of intelligence.
intelligence.

May or may not be meaningful. Always meaningful.

Understanding is difficult. Understanding is easy.

6
3. INTRODUCTION TO DATABASE:
A database management system (DBMS) is a collection of programs that
manages the database structure and controls access to the data stored in
the database.

 It allows users to create, update, and query databases efficiently.


 Ensures data integrity, consistency, and security across multiple users
and applications.
 Reduces data redundancy and inconsistency through centralized control.
 Supports concurrent data access, transaction management, and automatic
backups

DBMS acts as a bridge between a central database and multiple clients-


including apps and users. It uses APIs to handle data requests, enabling apps
and users to interact with the database securely and efficiently without directly
accessing the data.

7
4. COMPONENTS OF DBMS APPLICATIONS

Any DBMS based applications is made up of six key components that


work together to handle data effectively.

1. H
ar
d
w
ar
e

Physical devices like servers, disks, input-output devices.


Examples: Personal computer hard disk, RAM, network devices used for
DBMS operations.

2. Software
Actual DBMS software like MySQL, Oracle, PostgreSQL.
3. Data
Raw facts stored in structured or unstructured formats.
4. Procedures
Instructions and rules for using DBMS effectively.

5. Database Access Language


Used to interact with the database (create, read, update, delete data).
Examples: SQL, MyAccess, Oracle PL/SQL.
DDL (DataDefinitionzLanguage)
– CREATE, ALTER, DROP
DML (Data Manipulation Language)
– INSERT, UPDATE, DELETE
6. People
 Users interacting with DBMS at different levels:

8
 Database Administrators (DBA) – Manage security, performance, user
access.
 Developers – Build applications using the database.
 End Users – Use applications to access the database (e.g., students,
employees).

5. ROLE OF DBMS
 Data Management and Storage:
A DBMS handles the complex tasks of storing, organizing, and
managing large volumes of data in a structured format, typically in
tables.
 Intermediary:
It serves as a bridge between users/applications and the physical
database, translating data requests into a format the database
understands and back into a usable format for the user.
 Data Control:
A DBMS enforces rules and constraints to ensure data accuracy,
consistency, and security, which are often challenging with traditional
file-based systems.
 User Interface:
It provides interfaces, often using query languages like SQL, for users
to create, read, update, and delete data efficiently

6. ADVANTAGES OF DATABASE MANAGEMENT SYSTEM

The advantages of database management systems are:

 Data Security: DBMS enhances data security through access control


and encryption.
 Data integration: DBMS unifies data from different sources into a
centralized system.
 Data abstraction: developers to increase the efficiency of databases
that are being hidden by the users through various data abstraction levels
to allow users to easily interact with the system.
 Reduction in data Redundancy: DBMS avoids data duplication by
enforcing unique constraints. It removes unnecessary repetitive entries
in databases..

9
 Data sharing: A DBMS provides a platform for sharing data across
multiple applications and users, which can increase productivity and
collaboration.
 Data consistency and accuracy: DBMS enforces integrity constraints
to maintain valid data.
 Data organization: A DBMS provides a systematic approach to
organizing data in a structured way, which makes it easier to retrieve
and manage data efficiently.
 Efficient data access and retrieval: DBMS allows for efficient data
access and retrieval. It boosts system performance and user satisfaction.
 Concurrency and maintained Atomicity:. The DBMS allows
concurrent access to multiple users by using the synchronization
technique.
 Scalability and flexibility: DBMS is highly scalable and can easily
accommodate changes in data volumes and user requirements.

7. TYPES OF DATABASES:

1. Hierarchical Databases

Hierarchical databases organize data in a tree-like structure, where


each parent record can have multiple child records.

2.
Network
Databases

A
network databases build on the hierarchical model but allow child
records to be linked to multiple parent records, creating a web-like
structure of interconnected data.

10
3. Object-Oriented Databases

Object-oriented databases are based on the principles of object-oriented


programming (OOP), where data is stored as objects.
These objects include attributes (data) and methods (functions), making
them easily referenced and manipulated.

.
4.

Relational Databases

Relational databases are the most widely used type of database today.
They store data in tables, with rows representing records and columns
representing attributes of the records.

11
5. Cloud Databases
A cloud database operates in a virtual environment hosted on
cloud computing platforms. It is designed for storing, managing,
and executing data over the internet,
 Amazon Web Services (AWS)
 Google Cloud Platform (GCP)
6. Centralized Databases
A centralized database is a database stored and managed at
a single location, such as a central server or datacentre.
.
7. Personal Databases
A personal database is a small-scale database designed for a
single user, typically used on personal computers or mobile
devices.
Examples are:
 Microsoft Access:
 SQLite:.
8. Operational Databases
An operational database is designed to manage and process
real-time data for daily operations within organizations and
businesses.
9. NoSQL Databases
A NoSQL database (short for "non-SQL" or "non-relational")
provides a mechanism for storing and retrieving data that does
not rely on traditional table-based relational models.

12
8. IMPORTANCE OF DATABASE DESIGN

Database design is the process of structuring data into well-organized


tables, relationships, and rules so that it can be stored, retrieved, and
managed efficiently. A good design is the foundation of any reliable
database system.

🔑 Reasons Why Database Design is Important

1. Avoids Data Redundancy

 Good design ensures the same data is not stored in multiple


places unnecessarily.
 Example: Instead of storing customer details in every order
record, they are stored once in a “Customer” table.

2. Ensures Data Integrity & Accuracy

 Proper relationships and constraints prevent invalid or


inconsistent data.
 Example: A student ID in the “Results” table must match an
existing student in the “Students” table.

3. Improves Query Performance

 Well-designed databases use indexes and normalized structures


to speed up data retrieval.
 Example: Searching for a product in an optimized database is
much faster than in a poorly designed one.

4. Supports Scalability

 A strong design can handle growing amounts of data without


major restructuring.
 Example: An e-commerce database can grow from thousands to millions
of customers smoothly.

13
5. Enhances Security

 Sensitive information (like passwords, medical data, financial


records) can be stored separately and securely with access controls.

6. Easier Maintenance & Updates

 A properly structured database is easier to modify when business


rules change.
 Example: Adding a new payment method in a normalized database
only needs changes in one place.

7. Reduces Cost & Saves Time

 Poorly designed databases require frequent fixes, migrations, and


performance tuning, which increases cost.
 Good design avoids these long-term problems.

9. EVOLUTION OF FILE SYSTEM DATA PROCESSING

1. Manual File System

Data was stored in paper files, ledgers, and registers, making retrieval slow
and prone to errors.

Advantages:

 Simple and easy to use.


 No need for technical knowledge.

Disadvantages:

 Time-consuming and inefficient.


 High chances of data loss, duplication, and inconsistency.
 Difficult to share and secure.

14
2. File-Based System (1950s–1960s)

Data was stored in electronic files on computers, and each application


maintained its own separate files.

Advantages:

 Faster than manual storage.


 Easy to maintain small amounts of data.

Disadvantages:

 Data redundancy and inconsistency.


 Poor security and access control.
 Difficult to modify or update structure.

3. Traditional File Processing System (1960s–1970s)

Programming languages like COBOL and C were used to store data in flat
files with sequential, indexed, or random access.

Advantages:

 Better access methods compared to earlier file systems.


 Suitable for small-scale applications.

Disadvantages:

 Strong program–data dependence.


 Poor data sharing and concurrency control.
 Still prone to redundancy and inconsistency.

4. Hierarchical & Network Databases (Late 1960s–1970s)

Data was organized in tree-like (hierarchical) or graph-like (network)


structures to improve sharing and relationships.

Advantages:

 Better data relationships and organization.

15
 Faster access than flat files.

Disadvantages:

 Complex to design and manage.


 Rigid structure with little flexibility.

5. Relational Database Systems (1970s–1980s)

Proposed by E.F. Codd, data was stored in tables (rows and columns) and
managed using SQL.

Advantages:

 Eliminated redundancy and inconsistency.


 Easier querying and data management.
 Stronger security and data integrity.

Disadvantages:

 Slower for very large or unstructured data.


 Can be expensive to implement.

6. Object-Oriented & Multimedia Databases (1980s–1990s)

Databases supported complex data types such as images, audio, and video using
object-oriented concepts.

Advantages:

 Efficient for multimedia and complex applications.


 Supports real-world data representation.

Disadvantages:

 More complex than relational databases.


 Less widely adopted compared to RDBMS.

16
7. Distributed & Client-Server Databases (1990s–2000s)

Data was stored across multiple servers but appeared as a single unified
database to users.

Advantages:

 Improved scalability and reliability.


 Faster access for users in different locations.

Disadvantages:

 Complex management and synchronization.


 Higher cost of setup and maintenance.

8. NoSQL & Big Data Databases (2000s–Present)

Databases were designed for unstructured and massive amounts of data, used in
social media, IoT, and real-time apps.

Advantages:

 Handles huge and unstructured datasets.


 High scalability and performance.

Disadvantages:

 Less consistency compared to RDBMS.


 Limited standardization across systems.

9. Cloud Databases (2010s–Present)

Data is stored and managed on cloud platforms like AWS, Google Cloud, and
Azure.

Advantages:

 Highly scalable and accessible from anywhere.


 Lower hardware costs and easy backup.

17
Disadvantages:

 Dependent on internet connectivity.


 Data privacy and security concerns.

10. DIFFERENCE BETWEEN FILE SYSTEM AND DBMS

Basics File System DBMS

The file system is a way of


arranging the files in a DBMS is software for
storage medium within a managing the database.
Structure computer.

Data Redundant data can be In DBMS there is no


Redundancy present in a file system. redundant data.

It doesn't provide Inbuilt It provides in house tools


Backup and mechanism for backup and for backup and recovery
Recovery recovery of data if it is lost. of data even if it is lost.

There is no efficient query


Efficient query processing
Query processing in the file
is there in DBMS.
processing system.

There is less data There is more data


consistency in the file consistency because of the
Consistency system. process of normalization .

It has more complexity in


It is less complex as
handling as compared to
compared to DBMS.
Complexity the file system.

File systems provide less DBMS has more security


Security security in comparison to mechanisms as compared
Constraints DBMS. to file systems.

18
Basics File System DBMS

It has a comparatively
It is less expensive than
higher cost than a file
DBMS.
Cost system.

In DBMS data
independence exists,
mainly of two types:
There is no data
1) Logical Data
independence.
Independence .
Data 2)Physical Data
Independence Independence.

Only one user can access Multiple users can access


User Access data at a time. data at a time.

The user has to write


The users are not required
procedures for managing
to write procedures.
Meaning databases

Data is distributed in many


Due to centralized nature
files. So, it is not easy to
data sharing is easy
Sharing share data.

Data It give details of storage It hides the internal details


Abstraction and representation of data of Database

Integrity Integrity Constraints are Integrity constraints are


Constraints difficult to implement easy to implement

To access data in a file ,


user requires attributes No such attributes are
such as file name, file required.
Attribute s location.

Example Cobol , C++ Oracle , SQL Server

19
11. PROBLEMS WITH FILE SYSTEM

 Data Redundancy

 The same data is stored in multiple files.


 Example: A student’s name and address stored separately in
“Library File,” “Exam File,” and “Hostel File.”
 This wastes storage and creates unnecessary duplication.

 Data Inconsistency

 Since the same data is stored in different places, updates may


not be reflected everywhere.
 Example: If a student changes their phone number, it may be
updated in the Exam File but not in the Hostel File.

 Lack of Data Sharing

 Data is isolated in different files, making it difficult for multiple


applications or users to access the same information efficiently.

 Poor Data Security

 File systems generally have limited security.


 Anyone with access to the file can read or modify it without
restrictions.

 No Data Integrity / Constraints

 File systems cannot enforce rules like “roll number must be


unique” or “marks cannot be negative.”
 This leads to invalid or inaccurate data.

 Difficulty in Accessing Data

 Complex queries (like finding “all students who borrowed a


book and also paid hostel fees”) are very hard to perform in
file systems.

 Program–Data Dependence

 The structure of data files is tightly linked to the application programs.

20
 If file format changes, all programs using that file must also be
modified.

 Poor Concurrency Control

 Multiple users accessing or updating files at the same time can cause
conflicts or data corruption.

 Lack of Backup & Recovery

 File systems have no proper mechanisms for automatic backup or


recovery in case of failure.

DATABASE SYSTEM:

12. THE DATABASE SYSTEM ENVIRONMENT

The term database system refers to an organization of components that define


and regulate the collection, storage, management, and use of data within a
database environment.

FIVE MAJOR PARTS

1. Hardware
2. Software
3. People
4. Procedures
5. Data

1. Hardware.
Hardware refers to all of the system’s physical devices, including
computers storage devices, printers, network devices and other devices
2. Software.
Although the most readily identified software is the DBMS itself, three
types of software are needed to make the database system function fully:
operating system software,
DBMS software,

21
and application programs
and utilities.
-- Operating system software manages all hardware components and makes it
possible for all other software
to run on the computers. Examples of operating system software include
Microsoft Windows, Linux,
Mac OS, UNIX, and MVS.
-- DBMS software manages the database within the database system. Some
examples of DBMS software include Microsoft’s SQL Server, Oracle
Corporation’s Oracle, Sun’s MySQL, and IBM’s DB2.
-- Application programs and utility software are used to access and manipulate
data in the DBMS and to manage the computer environment in which data
access and manipulation take place.
3. People. This component includes all users of the database system.
five types of users can be identified in a database system:
1. system administrators,
2. database administrators,
3. database designers,
4. system analysts and
5. programmers, and end users..

-- System administrators oversee the database system’s general


operations.

--Database administrators, also known as DBAs, manage the DBMS


and ensure that the database is functioning properly.

-- Database designers design the database structure.


-- System analysts and programmers design and implement the
application programs.
-- End users are the people who use the application programs to run
the organization’s daily operations.

22
4. Procedures. Procedures are the instructions and rules that govern the
design and use of the database system.
5. Data. The word data covers the collection of facts stored in the database.

13. DATA MODELING AND DATA MODELS


Database design focuses on how the database structure will be used to
store and manage end-user data.
A data model is a relatively simple representation, usually graphical, of more
complex real-world data structures. In general terms, a model is an abstraction
of a more complex real-world object or event.

13.1 The Importance of Data Models


Data models can facilitate interaction among the designer, the
applications programmer, and the end user.
A well-developed data model can even foster improved understanding of
the organization for which the database design is developed. In short,
data models are a communication too.

14. DATA MODEL BASIC BUILDING BLOCKS


The basic building blocks of all data models are entities, attributes,
relationships, and constraints.
1. An entity is a person, place, thing, or event about which data will be
collected and stored. An entity represents a particular type of object in
the real world.

Eg: a CUSTOMER entity - such as John, Mohan

2. An attribute is a characteristic of an entity.


For example, a CUSTOMER entity would be described by attributes
such as customer last name, customer first name, customer phone
number, customer address, and customer credit limit. Attributes are
the equivalent of fields in file systems.

23
3. A relationship describes an association among entities. For
example, a relationship exists between customers and agents that can be
described as follows: an agent can serve many customers, and each
customer may be served by one agent.

 Data models use three types of relationships: one-to-many, many-


to-many, and one-to-one. Database designers usually use the
shorthand notations 1:M or 1..*, M:N or *..*, and 1:1 or 1..1,
respectively
 One-to-many (1:M or 1..*) relationship. A painter creates many
different paintings, but each is painted by only one painter. Thus,
the painter (the “one”) is related to the paintings (the “many”).
Therefore, database designers label the relationship “PAINTER
paints PAINTING” as 1:M
 Many-to-many (M:N or *..*) relationship. An employee may learn
many job skills, and each job skill may be learned by many
employees.
 One-to-one (1:1 or 1..1) relationship. A retail company’s
management structure may require that each of its stores be
managed by a single employee.
4. A constraint is a restriction placed on the data. Constraints are important
because they help to ensure data integrity. Constraints are normally
expressed in the form of rules: eg. • Each class must have one and only
one teacher.

15. BUSINESS RULE


A business rule is a brief, precise, and unambiguous description of a
policy, procedure, or principle within a specific organization.
To be effective, business rules must be easy to understand. Business rules
describe, in simple language, the main and distinguishing characteristics of
the data as viewed by the company. Examples of business rules are as
follows:
• A customer may generate many invoices.
• An invoice is generated by only one customer.
• A training session cannot be scheduled for fewer than 10 employees or
for more than 30 employees.
24
 Discovering Business Rules
The process of identifying and documenting business rules is essential to
database design for several reasons:
• They help to standardize the company’s view of data.
• They can be a communications tool between users and designers.
• They allow the designer to understand the nature, role, and scope of
the data.
• They allow the designer to understand business processes.
• They allow the designer to develop appropriate relationship
participation rules and constraints and to create an accurate data model.
 Translating Business Rules into Data Model Components
Business rules set the stage for the proper identification of entities,
attributes, relationships, and constraints.
As a general rule, a noun in a business rule will translate into an entity in
the model, and a verb (active or passive) that associates the nouns will
translate into a relationship among the entities.
For example, the business rule “a customer may generate many
invoices” contains two nouns (customer and invoices) and a verb
(generate) that associates the nouns
 Naming Conventions
During the translation of business rules to data model components, you
identify entities, attributes, relationships, and constraints.
This identification process includes naming the object in a way that
makes it unique and distinguishable from other objects in the problem
domain.
Therefore, it is important to pay special attention to how you name the
objects you are discovering.

25
The Evolution of Data Models

16. TYPES OF DATAMODEL

1. HIERARCHICAL AND NETWORK MODELS


• Hierarchical and Network Models The hierarchical model was developed
in the 1960s to manage large amounts of data for complex manufacturing
projects, such as the Apollo rocket that landed on the moon in 1969.
• The hierarchical structure contains levels, or segments.
• A segment is the equivalent of a file system’s record type. Within the
hierarchy, a higher layer is perceived as the parent of the segment directly
beneath it, which is called the child.

26
2. NETWORK MODEL
The network model was created to represent complex data relationships
more effectively than the hierarchical model, to improve database
performance, and to impose a database standard. the network model
allows a record to have more than one parent.
• The schema is the conceptual organization of the entire database as
viewed by the database administrator.
• The subschema defines the portion of the database “seen” by the
application programs that actually produce the desired information from
the data within the database.
• A data manipulation language (DML) defines the environment in
which data can be managed and is used to work with the data in the
database.
• A schema data definition language (DDL) enables the database
administrator to define the schema components
3. THE RELATIONAL MODEL
• The relational model was introduced in 1970 by E. F. Codd of IBM in
his landmark paper “A Relational Model of Data for Large Shared Databanks” .
• Each row in a relation is called a tuple. Each column represents an
attribute. The relational model also describes a precise set of data manipulation
constructs based on advanced mathematical concepts.

27
4. THE ENTITY RELATIONSHIP MODEL
 ER models are normally represented in an entity relationship
diagram (ERD), which uses graphical representations to model
database components.
 The ER model is based on the following components:
• Entity.
 An entity is represented in the ERD by a rectangle, also known as
an entity box. The name of the entity, a noun, is written in the center
of the rectangle.
 The entity name is generally written in capital letters and in singular
form: PAINTER rather than PAINTERS, and EMPLOYEE rather
than EMPLOYEES.
 Each entity consists of a set of attributes that describes particular
characteristics of the entity.
 For example, the entity EMPLOYEE will have attributes such as a
Social Security number, a last name, and a first name. (Chapter 4
explains how attributes are included in the ERD.)
• Relationships.
Relationships describe associations among data. Most relationships
describe associations between two entities. When the basic data
model components were introduced, three types of data
relationships were illustrated: one-to-many (1:M), many-to-
many (M:N), and one-to-one (1:1).
 The ER model uses the term connectivity to label the relationship types.
The name of the relationship is usually an active or passive verb. For
example, a PAINTER paints many PAINTINGs, an EMPLOYEE learns
many SKILLs, and an EMPLOYEE manages a STORE

28
5. THE OBJECT-ORIENTED (OO) MODEL
• In the object-oriented data model (OODM), both data and their
relationships are contained in a single structure known as an object. In
turn, the OODM is the basis for the object-oriented database
management system (OODBMS).
• An object is an abstraction of a real-world entity.
Attributes describe the properties of an object. For example, a
PERSON object includes the attributes Name, Social Security
Number, and Date of Birth.
• Objects that share similar characteristics are grouped in classes. A
class is a collection of similar objects with shared structure
(attributes) and behavior (methods).
• Classes are organized in a class hierarchy.
• Inheritance is the ability of an object within the class hierarchy to inherit
the attributes and methods of the classes above it.
Object-oriented data models are typically depicted using Unified Modeling
Language (UML) class diagrams.

29
6. OBJECT/RELATIONAL AND XML
• Object/Relational and XML Facing the demand to support more
complex data representations.
• The ERDM adds many of the OO model’s features within the
inherently simpler relational database structure.
7. Emerging Data Models: Big Data and NoSQL
• ” Big Data refers to a movement to find new and better ways to
manage large amounts of web and sensor-generated data and derive
business insight from it, while simultaneously providing high
performance and scalability at a reasonable cost
8. NOSQL DATABASES
 NoSQL to refer to a new generation of databases that address the
specific challenges of the Big Data era.
 It is not based on the relational model and SQL, hence the name
NoSQL.
 It support distributed database architectures.
 It provide high scalability, high availability, and fault tolerance.
 It support very large amounts of sparse data.
 It geared toward performance rather than transaction consistency

30
17. CHARACTERISTICS OF BIG DATA DATABASES:
volume, velocity, and variety, or the 3 Vs.
1. Volume refers to the amounts of data being stored. With the adoption and
growth of the Internet and social media, companies have multiplied the ways
to reach customers.
2. Velocity refers not only to the speed with which data grows but also to the
need to process these data quickly in order to generate information and
insight.
• The velocity of data growth is also due to the increase in the number of
different data streams from which data is being piped to the organization
(via the web, e-commerce, Tweets, Facebook posts, emails, sensors, GPS,
and so on).
• Variety refers to the fact that the data being collected comes in multiple
different data formats.
Some of the most frequently used Big Data technologies are Hadoop,
MapReduce, and NoSQL databases.

31
• Hadoop is a Java based, open source, high speed, fault-tolerant
distributed storage and computational framework..7 Hadoop has several
modules, but the two main components are Hadoop Distributed File
System (HDFS) and MapReduce.
• Hadoop Distributed File System (HDFS) is a highly distributed, fault-
tolerant file storage system designed to manage large amounts of data at
high speeds.
• MapReduce is an open source application programming interface (API)
that provides fast data analytics services. MapReduce distributes the
processing of the data among thousands of nodes in parallel.
18. DEGREE OF DATA ABSTRCTION
• In the early 1970s, the American National Standards Institute (ANSI)
Standards Planning and RequirementsCommittee (SPARC) defined a
framework for data modeling based on degrees of data abstraction.
• The resulting ANSI/SPARC architecture defines three levels of data
abstraction: external, conceptual, and internal

1. EXTERNAL MODEL

32
• The external model is the end users’ view of the data
environment. The term end users refers to people who use the
application programs to manipulate the data and generate
information.
• A specific representation of an external view is known as an
external schema

Entity relationship
• A PROFESSOR may teach many CLASSes, and each CLASS is taught
by only one PROFESSOR; there is a 1:M relationship between
PROFESSOR and CLASS.
• A CLASS may ENROLL many students, and each STUDENT may
ENROLL in many CLASSes, thus creating an M:N relationship between
STUDENT and CLASS.
• Each COURSE may generate many CLASSes, but each CLASS
references a single COURSE
• Finally, a CLASS requires one ROOM, but a ROOM may be scheduled
for many CLASSes. That is, eachclassroom may be used for several
classes: one at 9:00 a.m., one at 11:00 a.m., and one at 1:00 p.m., for
example. In other words, there is a 1:M relationship between ROOM and
CLASS.
Advantages

33
• It is easy to identify specific data required to support each business
unit’s operations.
• It makes the designer’s job easy by providing feedback about the
model’s adequacy..
• It helps to ensure security constraints in the database design.
Damaging an entire database is more difficult
• when each business unit works with only a subset of data.
• It makes application program development much simpler.
2. CONCEPTUAL MODEL
• The conceptual model represents a global view of the entire
database by the entire organization.That is, the conceptual model
integrates all external views (entities, relationships, constraints,
and processes) into a single global view of the data in the
enterprise
• Also known as a conceptual schema, it is the basis for the
identification and high-level description of the main data objects
(avoiding any database model-specific details

ADVANTAGES

34
• First, it provides a bird’s-eye (macro level) view of the data environment
that is relatively easy to understand
• Second, the conceptual model is independent of both software and
hardware.
• Software independence means that the model does not depend on the
DBMS software used to implement the model.
• Hardware independence means that the model does not depend on the
hardware used in the implementation of the model
3. INTERNAL MODEL
• Once a specific DBMS has been selected, the internal model maps the
conceptual model to the DBMS. The internal model is the representation
of the database as “seen” by the DBMS.
• In other words, the internal model requires the designer to match the
conceptual model’s characteristics and constraints to those of the selected
implementation model.
An internal schema depicts a specific representation of an internal model, using
the database constructs supported by the chosen database

35
4. PHYSICAL MODEL
• The physical model operates at the lowest level of abstraction,
describing the way data are saved on storage media such as magnetic,
solid state, or optical media.
• The physical model requires the definition of both the physical storage
devices and the (physical) access methods required to reach the data
within those storage devices, making it both software and hardware
dependent

36

You might also like