Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
95 views8 pages

Eval of Business Performance - Module 1

The document provides an overview of data warehousing including key concepts and terminology. It discusses what a data warehouse is, why it is separated from operational databases, its features and applications. It also covers different types of data warehouse and compares data warehouses to operational databases.

Uploaded by

Daniela Samia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views8 pages

Eval of Business Performance - Module 1

The document provides an overview of data warehousing including key concepts and terminology. It discusses what a data warehouse is, why it is separated from operational databases, its features and applications. It also covers different types of data warehouse and compares data warehouses to operational databases.

Uploaded by

Daniela Samia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

[Evaluation of Business Performance]

1
[Introduction to Data Warehousing]

Module 1: Data Warehousing

Course Learning Outcomes:


1. To have an overview in Data Warehousing
2. To know the concepts of Data Warehousing
3. To know the different terminologies that are used in Data Warehousing

Introduction
The term "Data Warehouse" was first coined by Bill Inmon in 1990.
According to Inmon, a data warehouse is a subject oriented, integrated, time-
variant, and non-volatile collection of data. This data helps analysts to take
informed decisions in an organization.

An operational database undergoes frequent changes on a daily basis on


account of the transactions that take place. Suppose a business executive
wants to analyze previous feedback on any data such as a product, a supplier,
or any consumer data, then the executive will have no data available to
analyze because the previous data has been updated due to transactions.

A data warehouses provides us generalized and consolidated data in


multidimensional view. Along with generalized and consolidated view of
data, a data warehouses also provides us Online Analytical Processing
(OLAP) tools. These tools help us in interactive and effective analysis of data
in a multidimensional space. This analysis results in data generalization and
data mining.

Data mining functions such as association, clustering, classification,


prediction can be integrated with OLAP operations to enhance the interactive
mining of knowledge at multiple level of abstraction. That's why data
warehouse has now become an important platform for data analysis and
online analytical processing.
Understanding a Data Warehouse
 A data warehouse is a database, which is kept separate from the organization's operational
database.
 There is no frequent updating done in a data warehouse.
 It possesses consolidated historical data, which helps the organization to analyze its
business.

Course Module
[Evaluation of Business Performance]
2
[Introduction to Data Warehousing]

 A data warehouse helps executives to organize, understand, and use their data to take
strategic decisions.

 Data warehouse systems help in the integration of diversity of application systems.

 A data warehouse system helps in consolidated historical data analysis.


Why a Data Warehouse is separated from Operational Databases
A data warehouses is kept separate from operational databases due to the following reasons:
 An operational database is constructed for well-known tasks and workloads such as
searching particular records, indexing, etc. In contract, data warehouse queries are often
complex and they present a general form of data.
 Operational databases support concurrent processing of multiple transactions.
Concurrency control and recovery mechanisms are required for operational databases to
ensure robustness and consistency of the database.
 An operational database query allows to read and modify operations, while an OLAP query
needs only read only access of stored data.
 An operational database maintains current data. On the other hand, a data warehouse
maintains historical data.
Data Warehouse Features
The key features of a data warehouse are discussed below:
1. Subject Oriented − A data warehouse is subject oriented because it provides information
around a subject rather than the organization's ongoing operations. These subjects can be
product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the
ongoing operations, rather it focuses on modelling and analysis of data for decision making.
2. Integrated − A data warehouse is constructed by integrating data from heterogeneous
sources such as relational databases, flat files, etc. This integration enhances the effective
analysis of data.
3. Time Variant − the data collected in a data warehouse is identified with a particular time
period. The data in a data warehouse provides information from the historical point of view.
4. Non-volatile − Non-volatile means the previous data is not erased when new data is added
to it. A data warehouse is kept separate from the operational database and therefore
frequent changes in operational database is not reflected in the data warehouse.
Note − A data warehouse does not require transaction processing, recovery, and concurrency
controls, because it is physically stored and separate from the operational database.
Data Warehouse Applications
As discussed before, a data warehouse helps business executives to organize, analyze, and use their
data for decision making. A data warehouse serves as a sole part of a plan-execute-assess "closed-
loop" feedback system for the enterprise management. Data warehouses are widely used in the
following fields:
[Evaluation of Business Performance]
3
[Introduction to Data Warehousing]

 Financial services
 Banking services
 Consumer goods
 Retail sectors
 Controlled manufacturing

Types of Data Warehouse

Information processing, analytical processing, and data mining are the three types of data
warehouse applications that are discussed below:

 Information Processing − A data warehouse allows to process the data stored in it. The
data can be processed by means of querying, basic statistical analysis, reporting using
crosstabs, tables, charts, or graphs.
 Analytical Processing − A data warehouse supports analytical processing of the
information stored in it. The data can be analyzed by means of basic OLAP operations,
including slice-and-dice, drill down, drill up, and pivoting.
 Data Mining − Data mining supports knowledge discovery by finding hidden patterns and
associations, constructing analytical models, performing classification and prediction. These
mining results can be presented using the visualization tools.
Sr.No. Data Warehouse (OLAP) Operational Database(OLTP)
1.
It involves historical processing of information It involves day-to-day processing.
2. OLAP systems are used by knowledge OLTP systems are used by clerks,
workers such as executives, managers, and DBAs, or database professionals.
analysts.
3.
It is used to analyze the business. It is used to run the business.
4. It focuses on Information out. It focuses on Data in.
5. It is based on Star Schema, Snowflake Schema, It is based on Entity Relationship
and Fact Constellation Schema. Model.
6. It focuses on Information out. It is application oriented.
7. It contains historical data. It contains current data.
8. It provides summarized and consolidated It provides primitive and highly
data. detailed data
9. It provides summarized and multidimensional It provides detailed and flat
view of data. relational view of data.
10. The number of users is in hundreds. The number of users is in
thousands.
11. The number of records accessed is in millions. The number of records accessed
is in tens.

Course Module
[Evaluation of Business Performance]
4
[Introduction to Data Warehousing]

12. The database size is from 100GB to 100 TB. The database size is from 100 MB
to 100 GB.
13. These are highly flexible. It provides high performance.

Terminologies
In this chapter, we will discuss some of the most commonly used terms in data
warehousing.
Metadata
Metadata is simply defined as data about data. The data that are used to represent other data is
known as metadata. For example, the index of a book serves as a metadata for the contents in the
book. In other words, we can say that metadata is the summarized data that leads us to the detailed
data.
In terms of data warehouse, we can define metadata as following:
 Metadata is a road-map to data warehouse.
 Metadata in data warehouse defines the warehouse objects.
 Metadata acts as a directory. This directory helps the decision support system to locate the
contents of a data warehouse.
Metadata Repository
Metadata repository is an integral part of a data warehouse system. It contains the following
metadata:
 Business metadata − It contains the data ownership information, business definition, and changing
policies.
 Operational metadata − It includes currency of data and data lineage. Currency of data refers to the data
being active, archived, or purged. Lineage of data means history of data migrated and transformation
applied on it.
 Data for mapping from operational environment to data warehouse − It metadata includes source
databases and their contents, data extraction, data partition, cleaning, transformation rules, data refresh
and purging rules.
 The algorithms for summarization − It includes dimension algorithms, data on granularity,
aggregation, summarizing, etc.
Data Cube
A data cube helps us represent data in multiple dimensions. It is defined by dimensions and facts.
The dimensions are the entities with respect to which an enterprise preserves the records.
Illustration of Data Cube
Suppose a company wants to keep track of sales records with the help of sales data warehouse
with respect to time, item, branch, and location. These dimensions allow to keep track of
monthly sales and at which branch the items were sold. There is a table associated with each
[Evaluation of Business Performance]
5
[Introduction to Data Warehousing]

dimension. This table is known as dimension table. For example, "item" dimension table may
have attributes such as item_name, item_type, and item_brand.
The following table represents the 2-D view of Sales Data for a company with respect to time,
item, and location dimensions.

But here in this 2-D table, we have records with respect to time and item only. The sales for New
Delhi are shown with respect to time, and item dimensions according to type of items sold. If we
want to view the sales data with one more dimension, say, the location dimension, then the 3-D
view would be useful. The 3-D view of the sales data with respect to time, item, and location is
shown in the table below:

The above 3-D table can be represented as 3-D data cube as shown in the following figure:

Course Module
[Evaluation of Business Performance]
6
[Introduction to Data Warehousing]

Data Mart
Data marts contain a subset of organization-wide data that is valuable to specific groups of
people in an organization. In other words, a data mart contains only those data that is specific
to a particular group. For example, the marketing data mart may contain only data related to
items, customers, and sales. Data marts are confined to subjects.
Points to Remember About Data Marts
 Windows-based or Unix/Linux-based servers are used to implement data marts. They
are implemented on low-cost servers.
 The implementation cycle of a data mart is measured in short periods of time, i.e., in
weeks rather than months or years.
 The life cycle of data marts may be complex in the long run, if their planning and design
are not organization-wide.
 Data marts are small in size.
 Data marts are customized by department.
 The source of a data mart is departmentally structured data warehouse.
 Data marts are flexible.
The following figure shows a graphical representation of data marts.

Virtual Warehouse
[Evaluation of Business Performance]
7
[Introduction to Data Warehousing]

The view over an operational data warehouse is known as virtual warehouse. It is easy to build a
virtual warehouse. Building a virtual warehouse requires excess capacity on operational database
servers.

Course Module
[Evaluation of Business Performance]
8
[Introduction to Data Warehousing]

References and Supplementary Materials


Online Supplementary Reading Materials
1. Learn DWH, https://www.tutorialspoint.com/dwh/dwh_overview.htm; March 11,
2020

You might also like