BIDGOLI
7 Revised for HKCC
CCN1007 course
DATABASE
SYSTEMS, DATA
WAREHOUSES,
AND DATA MARTS
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly
accessible website, in whole or in part.
LEARNING OUTCOMES
LO1 Define a database and a database management
system.
LO2 Explain logical database design and the relational
database model.
LO3 Define the components of a database
management system.
LO4 Summarize recent trends in database design and
use.
LO5 Explain the components and functions of a data
warehouse.
LO6 Describe the functions of a data mart.
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
2
Databases
• Database
‐ Collection of related data that can be stored in a
central location or in multiple locations
‐ Usually a group of files
‐ File
Group of related records
All files are integrated information can be linked
‐ Record
Group of related fields
• Data hierarchy
‐ Structure and organization of data, which
involves fields, records, and files
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
3
Databases (cont’d)
– For the above example, fields consist of social security
number, student name, and address
– All the fields storing information for Mary Smith, for instance,
constitute a record
– All the three records make up the student file
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
4
Databases (cont’d)
• Critical component of information systems
‐ Any type of analysis that’s done is based on data
available in the database
• Database management system (DBMS)
‐ Software for creating, storing, maintaining, and
accessing database files
‐ To make using databases more efficient
• “Flat files” system in the past
‐ Data wasn’t arranged in a hierarchy
‐ No relation among the “flat files”
‐ Same data could be stored in more than one file,
creating data redundancy
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
5
Databases (cont’d)
• “Flat files” system in the past
‐ Data redundancy takes up unnecessary storage spaces
‐ Data not updated in all file consistently, resulting in
conflicting reports generated from these files
• Database advantages over a flat file system
‐ Generate more information from the same data
‐ Handle complex requests more easily
‐ Reduce data redundancy
‐ Reduce storage space
‐ Easily maintain relationships among data
‐ More sophisticated security measures
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
6
Exhibit 3.2 Interaction Between the User,
DBMS and Database
i. The user issues a request
ii. DBMS searches the database
iii. DBMS returns the information to the user
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7
Types of Data in a Database
• Internal data
‐ E.g. transaction records, sales records, and so
forth
‐ Collected within organization
‐ Often stored in organization’s internal databases
• External data
‐ Comes from a variety of sources
- E.g. competitors, customers, suppliers…etc.
‐ Often stored in data warehouse
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8
Logical Database Design
There are 2 ways that information is viewed
in a database
• Physical view
‐ How data is stored on and retrieved from storage media,
such as hard disks, CDs ...etc.
‐ Only one physical view of data for each database
• Logical view
‐ How information appears to users
‐ How it can be organized and retrieved
‐ Can have more than one logical view of data, depending
on the user
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9
Logical Database Design (cont’d)
• Data model
‐ Determines how data is created, represented,
organized and maintained
‐ Usually includes three components:
- Data structure
- Operations
- Integrity rules
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
10
Relational Model
• Relational model
‐ Uses a two-dimensional table of rows and columns of
data
‐ Rows are records, and columns are fields (attributes)
• Data dictionary
‐ Store the definitions of each table and the fields in it for
the logical structure of a relational database
‐ Also store:
- Field name – e.g. student name, age, admission date
…etc.
- Field data type – e.g. text, date, and number
- Default value – e.g. value entered if none is available
- Validation rule – e.g. determine whether a value is
valid
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
11
Relational Model (cont’d)
• Primary key
‐ Unique identifier (e.g. student ID) for every record
• Foreign key
‐ Primary key for one table appears in other tables
‐ Establishes relationships (i.e. data can be linked and
retrieved amongst tables) between tables
• Normalization
‐ Process to improve database efficiency
‐ Eliminates redundant data and ensures only related
data is stored in a table
‐ E.g. storing customer names in only one table
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
12
Relational Model (cont’d)
• Data stored in a relational model is
retrieved by using operations that pick and
combine data from one or more tables
• Data retrieval operations
‐ Select: searches data in a table and retrieves
records based on certain criteria or conditions
‐ Project: pares down a table by eliminating
columns (fields) according to certain criteria
‐ Join: combines two tables based on a common
field
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
13
Relational Model (cont’d)
• Data retrieval examples
‐ Select operation
‐ Project operation
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
14
Relational Model (cont’d)
• Data retrieval example – join operation
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
15
Components of a DBMS
Database engine
Data definition
Data manipulation
Application generation
Data administration
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
16
Database Engine
• Heart of DBMS software
• Responsible for data storage, manipulation,
and retrieval
• Converts logical requests from users into
their physical equivalents (e.g. reports) by
interacting with other components of the
DBMS
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
17
Data Definition
• Create and maintain the data dictionary
• Define the structure of files in a database
• Make changes to a database’s structure,
such as:
‐ Adding fields
‐ Deleting fields
‐ Changing field size
‐ Changing data type
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
18
Data Manipulation
• Add, delete, modify, and retrieve records from a
database
• A query language is used:
‐ Structured Query Language (SQL)
- Standard fourth-generation query
language used by many DBMS packages
- Uses keywords (e.g. the SELECT
statement) to specify actions to take
‐ Query by example (QBE)
- Construct a statement made up of query
forms
- Graphical interface
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
19
Application Generation
• Design elements of an application using a
database, such as:
‐ Data entry screens
‐ Interactive menus
‐ Interfaces with other programming languages
• Create a form or generate a report, for
example
• Typically used by IT professionals and
database administrators
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
20
Data Administration
• Used by IT professionals and database
administrators for:
‐ Backup and recovery
‐ Security
‐ Change management
• Determine who has permission to perform:
Create, read, update, and delete (CRUD)
• Database administrator (DBA)
‐ Responsible for database design and
management
‐ Can be an individual or department (for complex
database)
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
21
Recent Trends in Database Design and
Use
• Data‐driven Web sites
• Natural language processing (details will be
covered in Chapter 13)
• Distributed databases
• Client/server databases
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
22
Data-Driven Web Sites
• Data-driven Web site
‐ Interface to a database
‐ Retrieves data and allows users to enter data
- i.e. provide dynamic content (requires no change
to the HTML code of the web page)
• Improves access to information
‐ User’s experiences are more interactive
• Useful for:
‐ E-commerce sites that need frequent updates
‐ News sites that need regular updating of content
‐ Forums and discussion groups
‐ Subscription services, such as newsletters
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
23
Distributed Databases
• Distributed database
‐ Data is stored on multiple servers placed throughout an
organization (in contrast to central database for all
users)
• Main reasons for choosing
‐ Minimizes the effects of computer failures
‐ Helps reduce communication costs for remote users
‐ Supports distributed processing
‐ Not limited by data’s physical location
• Security issues are concerned
‐ Multiple access points from inside and outside the
organization
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
24
Client/Server Databases
• Client/server database
‐ Users’ workstations (clients) linked in a local
area network (LAN) to share the services of a
single server
‐ Clients sends requests to the server
‐ Server processes data and returns only records
meeting request
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
25
Data Warehouses
• Data warehouse
‐ Collection of data (from a variety of sources) used to
support decision-making applications and generate
business intelligence
• Stores multidimensional data (i.e. hypercubes)
• Characteristics of data stored in data warehouse
‐ Subject oriented (focused on a specific area)
‐ Integrated (comes from different sources)
‐ Time variant (categorized based on time, i.e. historical)
‐ Type of data (capture aggregated data)
‐ Purpose (for analytical use)
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
26
Exhibit 3.6 A Data Warehouse Configuration
Four major components:
i. Input
ii. Extraction, transformation, and loading (i.e. ETL)
iii. Storage
iv. Output
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
27
Input
• Data comes from a variety of sources:
‐ External data sources
‐ Databases
‐ Transaction files
‐ ERP (enterprise resource planning) systems
‐ CRM (customer relationship management)
systems
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
28
Extraction, Transformation, and Loading
(ETL)
• Extraction, transformation, and loading (ETL)
• Extraction
‐ Collecting data from a variety of sources
‐ Converting data into a format that can be used in
transformation processing
• Transformation processing
‐ Make sure data meets the data warehouse’s needs
• Loading
‐ Process of transferring data to the data warehouse
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
29
Storage
Collected information is organized in a data
warehouse as:
• Raw data (information in original form)
• Summary data (subtotals of various
categories)
• Metadata (information about data)
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
30
Output
• Data warehouse supports different types of
analysis
‐ Generates reports for decision making
• Online analytical processing (OLAP)
‐ Generates business intelligence
‐ Uses multiple sources of information and
provides multidimensional analysis
‐ Hypercube (similar to multidimensional
spreadsheet)
‐ Performs trend analysis
‐ Drill down and drill up features for accessing
multilayer information
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
31
Exhibit 3.7 Slicing and Dicing Data
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
32
Output (cont’d)
• Data-mining analysis
‐ Discover patterns and relationships
• Reports for decision making
• A data warehouse can allow you to do:
‐ Cross-reference segments of an organization’s
operations for comparison purposes
‐ Find patterns and trends that can’t be found
with databases
‐ Analyze large amounts of historical data quickly
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
33
Data Mart
• Data mart
‐ Smaller version of data warehouse
‐ Used by single department or function
• Advantages over data warehouses
‐ Users are targeted better as it is designed for a
specific department or division
‐ Faster data access because of smaller size
‐ Less expensive
• More limited scope than data warehouses
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
34
Big Data Era
• Big data: Voluminous data which the
conventional computing methods are
unable to efficiently process and manage
• Involves dimensions known as 3 Vs
- Volume: Quantity of transactions
- Variety: Combination of structured and
unstructured data
- Velocity: Speed with which data needs to be
gathered and processed
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
35
Database Marketing
• Database marketing: Using an
organization's database of customers and
potential customers in order to promote
the products or services that an
organization offers.
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
36
Database Marketing (cont’d)
• A database marketing campaign:
‐ Calculates customer lifetime value (CLTV)
‐ Conducts a recency, frequency, and monetary
analysis (RFM)
‐ Uses different techniques to communicate
effectively with customers.
‐ Uses different techniques to monitor customers’
behavior across a number of retail channels,
including organization's Web site, mobile apps,
and social media.
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
37
Factors in the Growth and Popularity of
Big Data
Mobile and wireless technology
Popularity of social networks
Enhanced power and sophistication of
smartphones and handheld devices
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
38
SUMMARY
• Databases
• Accessing files
• Design principles
• Components
• Recent trends
• Data warehouses and data marts
• Industries benefit from big data analytics
and gain a competitive advantage
MIS7 | CH3
Copyright ©2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
39