Chapter 2: Data base Mgt system 2.1.
Introduction
For financial and/or legal reasons, organizations collect and store vast amounts of data about employees, customers, finances, vendors, inventory, competitors, and markets, to name only a few. The amount of data needed is important because people generally make better decisions if they have more data available to them.
Honelign,2012 1
Data cannot be understood until it is analyzed. As the manager begins to process and analyze the data, it eventually begins to tell a story. A computer cannot process data unless it is organized in special ways; into characters, fields, records, files and databases.
Honelign,2012
Character
A character is the most basic element of data that can be observed and manipulated. e.g., $, #, and ?
Attribute /Field
A field contains an item of data; that is, a character, or group of characters that are related. For instance, a grouping of related text characters such as "John Smith" makes up a name in the name field.
Honelign,2012 3
An attribute is a descriptive property of an entity. synonyms include, element, property, and field. Generally there are four types of fields:
primary key, secondary, key foreign key and descriptive /non key fields.
Honelign,2012 4
A primary key is the attribute, or combination of attributes, that uniquely identifies a specific row in a table. Secondary key-is an alternative identifier for a data base. It may identify either a single record (as with primary key) or a subject of records. A foreign key is an attribute in a table that is a primary key in another table. Foreign keys are used to link tables.
Honelign,2012 5
Foreign keys are pointers to the records of a d/t file in a data base. Foreign keys in one file requires the existence of the corresponding primary key in other table or file-otherwise it dos not point to any thing. Descriptive field-is any other non key fields that stores business data.
Honelign,2012
Record
A record is composed of a group of related fields. A record contains a collection of attributes related to an entity such as a person or product. E.g. A payroll record would contain the name, address, social security number, and title of each employee.
Honelign,2012 7
Database File
A database file is a collection of related records. A database file is sometimes called a table. A file may be composed of a complete list of individuals on a mailing list, including their addresses and telephone numbers. Files are frequently categorized by the purpose or application for which they are intended. common examples include
mailing lists, quality control files, inventory files, or document files.
Honelign,2012 8
Database
A database is composed of related files that
are consolidated, organized and stored together. One collection of related files might pertain to employee information. Another collection of related files might contain sports statistics.
Honelign,2012
Data Management System
A database management system is a software package that enables users to edit, link, and update files as needs dictate. Data management systems are used to access and manipulate data in a database.
Honelign,2012
10
The re are two basic approaches to data management
the flat-file method, and the data base approach.
2.2. Flat-File Versus Database Environments
In a flat-file approach, each user group owns its data and it is not usually available to others, even within the organization. Thus, the same data element may be represented in all user files. This is called data redundancy.
Honelign,2012 11
2.1. Flat-File Environment
User 1 Transactions
Data Program 1 A,B,C
User 2 Transactions
Program 2
User 3 Transactions
X,B,Y
Program 3
L,B,M
Honelign,2012
12
Data Redundancy & Flat-File Problems Data Storage - creates excessive storage costs of paper documents and/or magnetic form. Data Updating - any changes or additions must be performed multiple times. Currency of Information - potential problem of failing to update all affected files. Task-Data Dependency - users inability to obtain additional information as his or her needs change.
Honelign,2012 13
2.2. Database Approach
Data base pools data in user information set in to a common data base that is shared by all users.
Honelign,2012
14
Advantages of the Database Approach
Data sharing/centralize database resolves flat-file problems: No data redundancy - Data is stored only once, eliminating data redundancy and reducing storage costs. Single update - Because data is in only one place, it requires only a single update procedure, reducing the time and cost of keeping the database current. Current values - A change to the database made by any user yields current data values for all other users. Task-data independence - As users information needs expand beyond their immediate domain, the new needs can be more easily satisfied than under the flat-file approach.
Honelign,2012
15
Disadvantages of the Database Approach
Can be costly to implement
additional hardware, software, storage, and network resources are required
Can only run in certain operating environments
may make it unsuitable for some system configurations
Because it is so different from the file-oriented approach, the database approach requires training users
may be inertia or resistance
Honelign,2012
16
2.3. Elements of the Database Approach
Honelign,2012
17
Elements of the database
users, the Data Base Management Systems (DBM S), the database administrator (DBA), and the physical database.
2.3.1. Users access data in two ways. Via user programs that send data access to requests to DBMS, and Through direct query, which requires no formal user program.
Honelign,2012
18
2.3.2. DBMS
The purpose of the DBMS is to provide controlled access to the database. The DBMS is a special software system programmed to know which data elements each user is authorized to access and deny unauthorized requests of data.
Honelign,2012
19
Typical DBMS features are:
Program Development :-DBMS contains application development software. Backup and Recovery - copies database Database Usage Reporting - captures statistics on database usage (who, when, etc.) Database Access - authorizes access to sections of the database. Data base access is facilitated by three software modules. These are: Data definition language Data manipulation language Query language
Honelign,2012 20
Data Definition Language (DDL)
DDL is a programming language used to define the database to the DBMS. The DDL identifies the names and the relationship of all data elements, records, and files that constitute the database. The DDL defines the database on three levels called views: internal view -presents the physical arrangement of records .
There is only one internal view of the database.
conceptual view- logical and abstract representation of data base. There is only one conceptual view of the database. user view - defines how a particular user sees the portion of the database each user views. There are Honelign,2012 21 many user views of a data base.
Honelign,2012
22
Data Manipulation Language (DML)
DML is the proprietary programming language that a particular DBMS uses to retrieve, process, and store data. Entire user programs may be written in the DML, or selected DML commands can be inserted into universal programs, such as COBOL and FORTRAN.
Honelign,2012
23
Query Language
The query capability permits end users and professional programmers to access data in the database without the need for conventional programs. ANSIs Structured Query Language (SQL) is a fourth-generation language that has emerged as the standard query language. SQL is a nonprocedural language with many commands that allow users to input, retrieve, and modify data easily. The SELECT command is a powerful tool for retrieving data.
Honelign,2012
24
SQL is an efficient data processing tool, requires far less training in computer concepts and fewer programming skills than many languages. This feature places ad hoc reporting and data processing capability in the hands of the user/manager. By reducing reliance on professional programmers, managers are better able to deal with problems that pop up. The example in the next figure illustrates the use of the SELECT command to produce a user report from a database called Inventory.
Honelign,2012 25
Honelign,2012
26
2.3.3. The Database Administrator
The DBA is responsible for managing the database resource. Multiple users sharing a common database requires organization, coordination, rules, and guidelines to protect the integrity of the database. The duties of the DBA fall into the following areas: database planning, database design, database implementation, database operation and maintenance, and database change and growth. Creation and maintenance of data dictionary.
Honelign,2012 27
Functions of the DBA
Honelign,2012
28
Organizational Interactions of the DBA Of particular importance is the relationship among the DBA, the end users, and the systems professionals of the organization. As information needs arise, users send formal requests for computer applications to the systems professionals (programmers) of the organization. The requests are handled through formal systems development procedures, which produce the programmed applications.
Honelign,2012
29
Honelign,2012
30
The user requests also go to the DBA, who evaluates these to determine the users database needs. Once this is established, the DBA grants the user access authority by programming the users view (subschema). This relationship is shown as the lines between the user and the DBA and between the DBA and DDL module in the DBMS. By keeping access authority separate from systems development (application programming), the organization is better able to control and protect the database. Intentional and unintentional attempts at unauthorized access are more likely to be discovered when these two groups work independently. Honelign,2012 31
The Data Dictionary
Another important function of the DBA is the creation and maintenance of the data dictionary. The data dictionary describes every data element in the database. This enables all users (and programmers) to share a common view of the data resource and greatly facilitates the analysis of user needs.
Honelign,2012
32
2.3. 4. The Physical Database
Is the fourth major element of the database approach . is the lowest level of the database. It consists of magnetic spots on magnetic disks. The other levels of the database (for example, the user view, conceptual view, and internal view) are abstract representations of the physical level. At the physical level, the database is a collection of records and files. This section deals with the data structures used in the physical data base.
Honelign,2012
33
Data structures
Data structures are the bricks and mortar of the data base. It allows records to be located, stored,and retrieved and enables movement from on record to another.
In general data structures must support the following file processing operations :
Retrieve a record from the file based on its pk value Insert a record in to a file Update a record in the file Read a complete file of records Find the next record in a file Scan a file for records with common secondary keys Delete a record from a file
Honelign,2012 34
Components of data structures
Data structures have two components-Organization and access method Organization of a file refers to the way records are physically arranged on secondary storage devices. This may be either Sequential or Random. The records in sequential files are stored in contiguous locations that occupy a specified area of disc space. Records in random files are stored with out regard for their physical relation ship to other records of the same file. Access method:-is the technique used to locate records & to navigate through the data base. During data processing access method program responds to requests for data from the users application, locates and retrieves or stores the record.
Honelign,2012 35
Criteria for data structure selection No single structure is best for all processing tasks/operations. Therefore, the following criteria are used to select data structure
Rapid file access and data retrieval Efficient use of disc storage device High throughput for transaction processing Protection from data loss Ease of recovery from system failure Accommodation of file growth
Honelign,2012
36
Four basic data structures
i. ii. iii. iv. Sequential data structures Indexed data structures Hashing data structures and Pointers data structures (Figures- Data structures-2.3.pptx )
Honelign,2012
37
i. Sequential structure
Also called sequential access method. Records in the file lie in contiguous storage spaces in a specified sequence arranged by their primary key. Sequential files are simple and easy to process It does not permit accessing a record directly. Thus, it is efficient for only operations
Read a complete file of records Find the next record in a file
Honelign,2012
38
ii. Indexed structure
Contains both actual data file and separate index that is itself a file of record addresses. This index contains numerical value of physical disc storage location (cylinder ,surface and record block) for each record in the associated data file The data file itself may be organized either sequentially or randomly.
Honelign,2012
39
Indexed Random File
Records in an indexed random file are dispersed throughout a disk without regard for their physical proximity to other records. A records physical location is unimportant as long as the operating system software can find it when needed. When a new record is added tot the file the data mgt software randomly selects a vacant disk location, stores the record and adds the new address to the index.
Honelign,2012
40
The physical organization of the index itself may be either sequential (by key value) or random. Advantages of indexed random files is its efficiency in the ff operations of single record processing
Retrieve a record from the file based on its pk value Insert a record in to a file Update a record in the file Scan a file for records with common secondary keys and, Efficient use of disk storage. Disadvantage Not efficient for operations that involve processing a large portion of a file.
Honelign,2012 41
Indexed Sequential Files
Uses an index in conjunction with a sequential file organization. Allows both direct access to individual records and batch processing of the entire file. Eg, indexed sequential access method(ISAM). ISAM structure is used for very large files that require routine batch processing and a moderate degree of individual record processing. ISAM is moderately effective for operation Retrieve a record from the file based on its pk value Update a record in the file
Honelign,2012 42
Direct access speed is sacrificed to achieve very efficient performance in operations:
Read a complete file of records
Find the next record in a file Scan a file for records with common secondary keys Disadvantage: inefficient in record insertion operation. This problem can be resolved by storing new records in an overflow area that is physically separate from the other data records in the file.
Honelign,2012 43
An ISAM file has three physical components: the indexes, the prime data storage area, and the overflow
area.
ISAM is popular option for large and stable files that need both direct access & batch processing but not for highly volatile files.
Honelign,2012
44
Honelign,2012
45
iii. Hashing Structure
It employs an algorithm that converts the primary key of a record directly into a storage address. Eg.prime #/key I.e. 99997/15943=6.27215705 Residual translates to: cylinder 272
surface 15 record # 705
Hashing eliminates the need for a separate index. It uses a random file organization since the process of calculating residuals and converting them into storage locations produces widely dispersed record addresses
Honelign,2012 46
Advantage: access speed
Retrieve a record from the file based on its pk value Insert a record in to a file Update a record in the file Scan a file for records with common secondary keys
Disadvantages:
It does not used storage space efficiently as some disk locations will never be selected by algorithm. Collision(the reverse of the first) that slows down access speed.(see the book p.421)
Honelign,2012
47
iv. Pointer structures
Creats a liked-list file. Records in this type of file are randomly distributed but pointers provide connections b/n records found in same file /different files.(see fig.9-14, p.423)
Honelign,2012
48
Types of Pointers
Three type of pointers
1. physical address pointer
Contains the actual disk storage location(cylindr, surface &record #) Allow the system to access records directly. Advantage:speed Disadvantages: Frequent change of pointers whenever related record is moved from one disk location to another. This is a problem when disks are periodically reorganized /copied. Physical pointers bear no logical r/ship to records they identify.
Honelign,2012
49
2. Relative address pointer.
Contains the relative position of a record in the file(see Fig.9-15,p.424)
3. Logical key pointer
Contains the PK of the related record. By using hashing algorithm this PK value is converted to records physical address.
Honelign,2012
50
2.4. Data Base Models
What is a data model?
It is an abstract representation of the data about entities & their relationships in an organization.
Its purpose is to represent entity attributes in a way that is
understandable to users.
Honelign,2012
51
Three Conceptual models
Hierarchical model
are termed navigational
models as they possess explicit links/paths among data elements
Network model Relational model
possesses implicit linkage
among data elements
Honelign,2012
52
Data Base Terms
Entity- An entity is anything about which the organization wishes to capture data.
Entities may be physical, such as inventories, customers, or employees. They may also be conceptual such as sales , AR, or AP
Data elements Attribute- are the data elements that define an entity.
For example, an Employee entity may be defined by the following partial set of attributes: Name, Address, Job Skill, Years of Service, and Hourly Rate of Pay.
Record type-a group of data elements that logically pertain to an entity. Record associations- the relationship that exists among record types.
Honelign,2012 53
Record associations
Record types exist in relation to other record types. This is called an association. There are three types of record associations:
One-to-One eg. Employee record -to - year to date earning One-to-Many eg. Customer record to-sales order record Many-to-Many(two way relationship) eg. inventory record to- vendor record
Honelign,2012 54
2.4.1. The Hierarchical model
It is constructed of sets of files. Each set contains a parent and a child.
A file can be both the child in one set and the parent in another
set but this is impossible within a set. Files at the same level with the same parent are called siblings. This structure is also called a tree structure. The file at the most aggregated level in the tree is the root
segment, and the file at the most detailed level in a particular
branch is called a leaf.
Honelign,2012 55
The only way to access data at lower levels in the tree is from the root and via the pointers down the navigational path to the desired records. i.e. it allows only one path.
Honelign,2012
56
Honelign,2012
57
Limitations of Hierarchical data model
The following rules, which govern the hierarchical model, reveal its operating constraints:
1. A parent record may have one or more child records. For
example, in above figure , the customer is the parent of both sales invoice and cash receipts. 2. No child record can have more than one parent . Therefore many-to-many record association is impossible
(limitation) .
Honelign,2012
58
2.4.2. The Network Database Model
network model allows a child record to have multiple parents.( principal distinguishing feature). it allows multiple paths to single record many-to-many record association is possible.
(Each file in a set can be both parent and child). Navigating an M:M association requires creating a separate link file that contains pointer records in a linked- list structure and accounting data.
Honelign,2012 59
Honelign,2012
60
2.4.3. The Relational Model
It has its own terminology. Data base table is called Relation. Attributes -Data elements form columns Tuples (records) - form rows Data - the intersection of rows and columns A system is relational if it: 1. Represents data in the form of two-dimensional tables such as
the database table, called Customer. 2. Supports the relational algebra functions of restrict, project, and join.
Honelign,2012 61
Honelign,2012
62
Honelign,2012
63
Properly designed tables possess the following four characteristics: 1.All occurrences at the intersection of a row and a column are a single value. No multiple value or repeating group is allowed 2. All attribute values in any column must be of the same class. 3. Each column in a given table must be uniquely named. However, different tables may contain columns with the same name. 4. Each row in the table must be unique in at least one attribute. This attribute is the PK
Honelign,2012
64
Data linkage in the Relational model
Implicit linkage(Absence of explicit pointers) Data presented as the collection of independent tables (absence of a tree or network structure) Relations are formed by an attribute common to both tables in the relation.(absence of pointers or explicit links)
Honelign,2012
65
Advantages of Relational Tables
Removes all three anomalies
Data update anomaly Date insertion anomaly Data deletion anomaly
Various items of interest (customers, inventory, sales) are stored in separate tables. Space is used efficiently. Very flexible. Users can form ad hoc relationships.
Honelign,2012
66
Foreign key Assignment
The nature of association b/n two tables determine the method used for assigning foreign keys. 1:1 -either of the primary key can be a foreign key
1:M - the one side primary key serves as foreign key
M:M - not a primary key rather a separate link table containing keys for the related tables must be created.
Honelign,2012
67
2.5. Data normalization and its importance
Correctly designed data base tables are critical to the success of the DBMS. Poorly designed tables can cause operational problems that restrict ,,or even deny, users access to the information they need. Data normalization is a process that promotes effective data base design by grouping data attributes into tables that comply to specific conditions
Honelign,2012
68
Importance
Table that have not been normalized are associated with three types of problems called anomalies : update anomaly, insertion anomaly and deletion anomaly. The importance of data normalization is making the data base tables free from these anomalies.
Honelign,2012
69
The Normalization Process
Data normalization is the process of systematically reducing a complex table to a set of simple efficient tables that meet two conditions:
1. All nonkey attributes in the table are dependent on (defined by)the primary key 2. All nonkey attributes are independent of the other nonkey attributes. When these conditions are met, the table in question is in third normal form(3NF).
Honelign,2012
70
Accountants and Data Normalization
The update anomaly can generate conflicting and obsolete database values. The insertion anomaly can result in unrecorded transactions and incomplete audit trails. The deletion anomaly can cause the loss of accounting records and the destruction of audit trails. Accountants should have an understanding of the data normalization process and be able to determine whether a database is properly normalized.
Honelign,2012
71
Normalization process
Step 1. Identify and remove any repeating groups. Repeating groups are multiple data values at the intersection of rows and columns. When this is done, the table is in 1NF.
Step 2. Identify and remove any partial dependencies.
These are nonkey attributes dependent on (defined by) only part of the PK. This condition exists only when PK is the composite key. At this point the table is in 2NF
Honelign,2012 72
Step 3. Remove any transitive dependencies.
These are nonkey attributes dependent on another nonkey attribute in the table.At this point the table is in 3NF(freed from any of the three anomalies).
EXAMPLE: Next Slides.
Honelign,2012
73
Table 1. Unnormalized data base of Student Enrollment
Stdnt# 86432 86432 86432 86789 86789 Stdnt Majar Course Crse desc Sethi Acctg Acct 315 Fin Acct Sethi Acctg Acct 324 Mgt Acct Sethi Acctg Math 21 Archer Mgt Archer Mgt Mgt 1 Hist 1 Calc Intr Mgt Us Hist Instr Ray Paul Jones Buel Patch Office hrs 9 -11 8 -11 1 -3 4-5 9-11 Loc 442 448 323 463 342 Tel no Grade 8-4545 8-8945 8-2345 8-3436 8-2378 A A B C B
98653
98653 98653
Mills Acctg Acct 1
Mills Acctg Math 21 Mills Acctg Mgt 1
Intr Acct
Calc Intr Mgt
Ray
Jones Buel
9-11
1 -3 4-5
442
323 463
8-4545
8-2345 8-3436
B
B C
Table 1. Unnormalized data base of Student Enrollment Stdnt# Stdnt Majar Course Crse desc Instr Office hrs Table 3: Course Grade (1 NF) Crse Stdnt# Course desc Instr Table 2: Student (3NF) Stdnt# Stdnt 86432 Sethi Majar Acctg 86432 Acct 315 Fin Acct 86433 Acct 324 Mgt Acct 86434 Math 21 Calc Ray Paul Jones Buel Patch Ray Jones Buel
Loc
Tel no Grade
Office hrs 9 -11 8 -11 1 -3 4-5 9-11 9-11 1 -3 4-5
Loc
Tel no Grade A A B C B B B C
442 8-4545 448 8-8945 323 8-2345 463 8-3436 342 8-2378 442 8-4545 323 8-2345 463 8-3436
86789 Archer
98653 Mills
Mgt
Acctg
86789 Mgt 1 Intr Mgt 86789 Hist 1 Us Hist
98653 Acct 1 Intr Acct 98653 Math 21 Calc
98653 Mgt 1 Intr Mgt
Table 3: Course Grade(1 NF) Stdnt# Course Crse desc
Instr
Office hrs
Loc
Tel no
Grade
Table 4:Student Grade (3NF) Stdnt# Course Grade 86432 Acct 315 A 86433 Acct 324 A 86434 Math 21 B 86789 Mgt 1 C 86789 Hist 1 B 98653 Acct 1 B 98653 Math 21 B 98653 Mgt 1 C
Table 5: Course Instructor (2NF) Course Crse desc Instr Acct 315 Fin Acct Ray Acct 324 Mgt Acct Paul Math 21 Calc Jones Mgt 1 Intr Mgt Buel Hist 1 Us Hist Patch Acct 1 Intr Acct Ray Math 21 Calc Jones Mgt 1 Intr Mgt Buel
Office hrs 9 -11 8 -11 1 -3 4-5 9-11 9-11 1 -3 4-5
Loc 442 448 323 463 342 442 323 463
Tel no 8-4545 8-8945 8-2345 8-3436 8-2378 8-4545 8-8945 8-2345
Table 5: Course Instructor (2NF) Course Crse desc Instr
Office hrs
Loc
Tel no
Table 6: Course (3NF) Course Crse desc Instr Acct 315 Fin Acct Ray Acct 324 Mgt Acct Paul Math 21 Calc Jones Mgt 1 Intr Mgt Buel Hist 1 Us Hist Patch Acct 1 Intr Acct Ray Math 21 Calc Jones Mgt 1 Intr Mgt Buel
Table 7: Instructor (3NF) Instr Ray Paul Jones Buel Patch Office hrs 9 -11 8 -11 1 -3 4-5 9-11 Loc 442 448 323 463 342 Tel no 8-4545 8-8945 8-2345 8-3436 8-2378
Table 2: Student (3NF) Stdnt# Stdnt Majar 86432 Sethi Acctg 86789 Archer Mgt 98653 Mills Acctg
Table 4:Student Grade (3NF) Stdnt# Course Grade A 86432 Acct 315 A 86433 Acct 324 B 86434 Math 21 C 86789 Mgt 1 B 86789 Hist 1 B 98653 Acct 1 B 98653 Math 21 C 98653 Mgt 1
Table 6: Course (3NF) Course Acct 315 Acct 324 Math 21 Mgt 1 Hist 1 Acct 1 Crse desc Fin Acct Mgt Acct Calc Intr Mgt Us Hist Intr Acct Instr Ray Paul Jones Buel Patch Ray Table 7: Instructor (3NF) Instr Ray Paul Jones Buel Patch Office hrs 9 -11 8 -11 1 -3 4-5 9-11 Loc 442 448 323 463 342 Tel no 8-8945 8-2345 8-3436 8-2378 8-4545
Making Relational Data base Using microsoft Access
Relational Data Base
Honelign,2012
79
2.6. Data base control issues
Controlling techniques for dealing with data base exposures fall in to two general categories: back up controls and access controls.
Back up controls ensure that a current copy of the data base exists at all times. Access controls ensures that only authorized users access the data base and those that do so perform only authorized actions.
Honelign,2012
80