Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views48 pages

Data Warehousing Concepts Using Oracle Technology

Data warehousing Concepts Using Oracle Technology

Uploaded by

Tamer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views48 pages

Data Warehousing Concepts Using Oracle Technology

Data warehousing Concepts Using Oracle Technology

Uploaded by

Tamer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Data warehousing Concepts Using Oracle Technology

Knowledge Transfer Session


Prepared By

Author: Tamer Shalaby


Technical Director
[email protected]
https://blog.data-and-analytics.com
*DATAWAREHOUSING-CONCEPTS*

Data Warehousing Concepts Page 1


Objective:

To discuss/explain technical details about the data warehousing concepts.

Data Warehousing Concepts Page 2


Example on data flow from a daily OLTP Banking database to data-warehouse database that will be used for analytical dashboards

Data Warehousing Concepts Page 3


Overview of the Oracle data warehousing implementation

■ What is a Data Warehouse?

■ Data Warehouse Architectures

■ Extracting Information from a Data Warehouse

Data Warehousing Concepts Page 4


■ What is a Data Warehouse?

• A data warehouse is a relational database that is designed for query and analysis rather than for
transaction processing.

• It usually contains historical data derived from transaction data, but can include data from other
sources.

• Data warehouses separate analysis workload from transaction workload and enable an
organization to consolidate data from several sources.

This helps in:

■ Maintaining historical records

■ Analyzing the data, to gain a better understanding of the business and to improve the business.

Data Warehousing Concepts Page 5


Nonvolatile
■ Means that, once logged into the data warehouse, data should not change.
This is logical because the purpose of a data warehouse is to enable you to analyze what has occurred.

Data Warehousing Concepts Page 6


Data Warehouse Architecture with a Staging Area :

• You need to clean and process your operational data before putting it into the warehouse, most
data warehouses use a staging area.

• A staging area simplifies building summaries and general warehouse management.

• The following Figure illustrates this typical architecture.

• This is the suggested model to use .

Data Warehousing Concepts Page 7


Design in Data-warehouses

Data Warehousing Concepts Page 8


• Define the business requirements and agreed upon the scope of your application.

• Create a conceptual design.

• Now you need to translate your requirements into a system deliverable.

• To do so, you create the logical and physical design for the data warehouse. You then define
2 steps:

Logical
Physical

Data Warehousing Concepts Page 9


Creating a Logical Design :

A logical design is conceptual and abstract.

You do not deal with the physical implementation details yet.

You deal only with defining the types of information that you need.

• You can use to model your organization's logical information requirements is entity-
relationship modeling.

• Entity-relationship modeling involves identifying the things of importance (entities), the


properties of these things (attributes), and how they are related to one another
(relationships).

Data Warehousing Concepts Page 10


Data Warehousing Schemas

A schema is a collection of database objects, including tables, views, indexes, and


synonyms.
You can arrange schema objects in the schema models designed for data warehousing in a
variety of ways.

Most data warehouses use a dimensional model.

Data Warehousing Concepts Page 11


• The most natural way to model a data warehouse is as a star schema,
where only one join establishes the relationship between the fact table and any one of
the dimension tables.

• A star schema optimizes performance by keeping queries simple and providing fast
response time.

Data Warehousing Concepts Page 12


• Fact tables and dimension tables are the two types of objects commonly used in
dimensional data warehouse schemas.

• Fact Tables

Are the large tables in your data warehouse schema that store business measurements.
Fact tables typically contain facts and foreign keys to the dimension tables.
Fact tables represent data, usually numeric and additive, that can be analyzed and
examined.

• Requirements of Fact Tables

You must define a fact table for each star schema. From a modeling standpoint, the
primary key of the fact table is usually a composite key that is made up of all of its
foreign keys.

Data Warehousing Concepts Page 13


Examples on Fact Tables include

sales,
cost,
profit.

Data Warehousing Concepts Page 14


Dimension tables,

• also known as lookup or reference tables, contain the relatively static data in the data
warehouse.

• Dimension tables store the information you normally use to contain queries.

• Dimension tables are usually textual and descriptive and you can use them as the row
headers of the result set.

• Dimension data is typically collected at the lowest level of detail and then aggregated
into higher level totals that are more useful for analysis.

These natural rollups or aggregations within a dimension table are called hierarchies.

Examples on dimension tables :


customers ,
products.

Data Warehousing Concepts Page 15


Hierarchies

Hierarchies are logical structures that use ordered levels as a means of organizing data.

A hierarchy can be used to define data aggregation.

For example:
In a time dimension, a hierarchy might aggregate data from the month level to the quarter
level to the year level.

A hierarchy can also be used to define a navigational drill path and to establish a family
structure.

Within a hierarchy, each level is logically connected to the levels above and below it.
Data values at lower levels aggregate into the data values at higher levels.

Levels :
A level represents a position in a hierarchy. For example, a time dimension might have a
hierarchy that represents data at the month, quarter, and year levels.

Data Warehousing Concepts Page 16


Level Relationships :

Level relationships specify top-to-bottom ordering of levels from most general (the root) to
most specific information.

They define the parent-child relationship between the levels in a hierarchy.

Data Warehousing Concepts Page 17


Data Warehousing Objects: Relationships

Relationships guarantee business integrity.

An example is :
if a business sells something, there is obviously a customer and a product.

Designing a relationship between the sales information in the fact table and the dimension
tables products and customers enforces the business rules in databases.

Data Warehousing Concepts Page 18


Example of Data Warehousing Objects and Their Relationships

The Figure above illustrates a common example of a sales fact table and dimension tables
customers, products, promotions, times, and channels.

Data Warehousing Concepts Page 19


Moving from Logical to Physical Design

• Logical design :
is what you draw with a pen and paper before building your data warehouse.

• Physical design :
is the creation of the database with SQL statements.

During the logical design phase, you defined a model for your data warehouse consisting
of entities, attributes, and relationships.

The entities are linked together using relationships.

Attributes are used to describe the entities.

The unique identifier (UID) distinguishes between one instance of an entity and another.

Data Warehousing Concepts Page 20


During the physical design process, you translate the expected schemas into actual
database structures. You have to map:
■ Entities to tables
■ Relationships to foreign key constraints
■ Attributes to columns
■ Primary unique identifiers to primary key constraints
■ Unique identifiers to unique key constraints

Data Warehousing Concepts Page 21


Physical Design Structures

Once you have converted your logical design to a physical one, you will need to create
some or all of the following structures:

■ Tablespaces
■ Tables and Partitioned Tables
■ Views
■ Integrity Constraints
■ Dimensions

Additionally, the following structures may be created for performance improvement:

■ Indexes
■ Materialized Views

Data Warehousing Concepts Page 22


Tablespaces
A tablespace consists of one or more datafiles, which are physical structures within the
operating system you are using.

Tables and Partitioned Tables


Tables are the basic unit of data storage. They are the container for the expected amount of
raw data in your data warehouse.

Using partitioned tables instead of nonpartitioned ones addresses the key problem of
supporting very large data volumes by allowing you to divide them into smaller and
more manageable pieces

Views
A view is a tailored presentation of the data contained in one or more tables or other
views. A view takes the output of a query and treats it as a table.

Data Warehousing Concepts Page 23


Integrity Constraints
Integrity constraints are used to enforce business rules associated with your database
and to prevent having invalid information in the tables.

In data warehousing environments, constraints are only used for query


rewrite. NOT NULL constraints are particularly common in data warehouses.

Indexes
Indexes are optional structures associated with tables or clusters.

Bitmap indexes are very common in data warehousing environments.

Bitmap indexes are optimized index structures for set-oriented operations. Additionally,
they are necessary for some optimized data access methods such as star transformations.

Data Warehousing Concepts Page 24


Materialized Views
Materialized views are query results that have been stored in advance so long-running
calculations are not necessary when you actually execute your SQL statements.

From a physical design point of view, materialized views resemble tables

Dimensions
A dimension is a schema object that defines hierarchical relationships between columns or
column sets.

Data Warehousing Concepts Page 25


Bitmap Indexes

Bitmap indexes are widely used in data warehousing environments.

The environments typically have large amounts of data and ad hoc queries, but a low level
of concurrent DML transactions.

For such applications, bitmap indexing provides:

■ Reduced response time for large classes of ad hoc queries.


■ Reduced storage requirements compared to other indexing techniques.
■ Dramatic performance gains even on hardware with a relatively small number of CPUs
or
a small amount of memory.
■ Efficient maintenance during parallel DML and loads.

Data Warehousing Concepts Page 26


• Bitmap indexes are typically only a fraction of the size of the indexed data in the table.

• An index provides pointers to the rows in a table that contain a given key value.

• A regular index stores a list of rowids for each key corresponding to the rows with
that key value.

• In a bitmap index, a bitmap for each key value replaces a list of rowids.

Data Warehousing Concepts Page 27


Bitmap indexes are most effective for queries that contain multiple conditions in the WHERE
clause. Rows that satisfy some, but not all, conditions are filtered out before the table itself
is accessed. This improves response time, often dramatically.

As the bitmaps from bitmap indexes can be combined quickly, it is usually best to use
single-column bitmap indexes.

When creating bitmap indexes, you should use NOLOGGING and COMPUTE
STATISTICS. In addition, you should keep in mind that bitmap indexes are usually easier
to destroy and re-create than to maintain.

Data Warehousing Concepts Page 28


Benefits for Data Warehousing Applications

Bitmap indexes are primarily intended for data warehousing applications where users
query the data rather than update it.

They are not suitable for OLTP applications with large numbers of concurrent transactions
modifying the data.

Parallel query and parallel DML work with bitmap indexes

Data Warehousing Concepts Page 29


How to Determine Candidates for Using a Bitmap Index
Bitmap indexes should help when either the fact table is queried alone, and there are
predicates on the indexed column, or when the fact table is joined with two or more
dimension tables, and there are indexes on foreign key columns in the fact table, and
predicates on dimension table columns.

A fact table column is a candidate for a bitmap index when the following conditions
are met:

■ There are 100 or more rows for each distinct value in the indexed column.
When this limit is met, the bitmap index will be much smaller than a regular index, and
you will be able to create the index much faster than a regular index.

An example would be one million distinct values in a multi-billion row table.


And either of the following are true:
■ The indexed column will be restricted in queries (referenced in the WHERE clause).
or
■ The indexed column is a foreign key for a dimension table. In this case, such an
index will make star transformation more likely.

Data Warehousing Concepts Page 30


Dimensions

■ What are Dimensions?

■ Creating Dimensions

■ Viewing Dimensions

Data Warehousing Concepts Page 31


What are Dimensions?

A dimension is a structure that categorizes data in order to enable users to answer


business questions.

Commonly used dimensions are :

customers,

products,

time.

The fact information is stored in what is called a fact table, whereas the dimensional
information is stored in dimension tables.

In Oracle Database, the dimensional information itself is stored in a dimension table.

Data Warehousing Concepts Page 32


In addition, the database object dimension helps to organize and group dimensional
Information into hierarchies.

Data Warehousing Concepts Page 33


This represents natural 1:n relationships between columns or column groups (the levels
of a hierarchy) that cannot be represented with constraint conditions.

Going up a level in the hierarchy is called rolling up the data and going down a level in the
hierarchy is called drilling down the data.

Within the customer dimension,

customers roll up to city.


Then cities roll up to state.
Then states roll up to country.
Then countries roll up to subregion.
Finally, subregions roll up to region,

as shown in the opposite Figure =>

Data Warehousing Concepts Page 34


Data Warehousing Concepts Page 35
Data analysis typically starts at higher levels in the dimensional hierarchy and gradually
drills down if the situation warrants such analysis.

Creating Dimensions

Before you can create a dimension object, the dimension tables must exist in the
database possibly containing the dimension data.

Example:
If you create a customer dimension, one or more tables must exist that contain the city,
state, and
country information.

In a star schema data warehouse, these dimension tables already exist.

This customer dimension contains a single hierarchy with a geographical rollup, with
arrows drawn from the child level to the parent level .

Data Warehousing Concepts Page 36


If levels of a hierarchy come from different tables, such a hierarchy is either a fully or
partially normalized hierarchy. This section shows how to define a normalized hierarchy.

Suppose the tracking of a customer's location is done by city, state, and country.

This data is stored in the tables customers and countries.


The customer dimension customers_dim is partially normalized because the data
entities cust_id and country_id are taken from different tables.

Data Warehousing Concepts Page 37


The clause JOIN KEY within the dimension definition specifies how to join together the
levels in the hierarchy.
The dimension statement is partially shown in the following.

CREATE DIMENSION customers_dim


LEVEL customer IS (customers.cust_id)
LEVEL city IS (customers.cust_city)
LEVEL state IS (customers.cust_state_province)
LEVEL country IS (countries.country_id)
LEVEL subregion IS (countries.country_subregion)
LEVEL region IS (countries.country_region)
HIERARCHY geog_rollup (
customer CHILD OF
city CHILD OF
state CHILD OF
country CHILD OF
subregion CHILD OF
region
JOIN KEY (customers.country_id) REFERENCES country);

Data Warehousing Concepts Page 38


How to view the created dimension :

You then see the following output results:


EXECUTE DBMS_DIMENSION.DESCRIBE_DIMENSION('SH.CHANNELS_DIM');

 Result you get :


DIMENSION SH.CHANNELS_DIM
LEVEL CHANNEL IS SH.CHANNELS.CHANNEL_ID
LEVEL CHANNEL_CLASS IS SH.CHANNELS.CHANNEL_CLASS
HIERARCHY CHANNEL_ROLLUP (
CHANNEL CHILD OF
CHANNEL_CLASS)
ATTRIBUTE CHANNEL LEVEL CHANNEL DETERMINES
SH.CHANNELS.CHANNEL_DESC
ATTRIBUTE CHANNEL_CLASS LEVEL CHANNEL_CLASS DETERMINES
SH.CHANNELS.CHANNEL_CLASS

Data Warehousing Concepts Page 39


Flow diagram for datawarehouse architecture example

Data Warehousing Concepts Page 40


Overview of Extraction, Transformation, and Loading

■ Overview of ETL in Data Warehouses

■ ETL Tools for Data Warehouses

Data Warehousing Concepts Page 41


Overview of ETL in Data Warehouses

You need to load your data warehouse regularly so that it can serve its purpose of
facilitating business analysis.

To do this, data from one or more operational systems needs to be extracted and copied
into the data warehouse.

The challenge in data warehouse environments is to integrate, rearrange and consolidate


large volumes of data over many systems, thereby providing a new unified information
base for
business intelligence.

The process of extracting data from source systems and bringing it into the data
warehouse is commonly called ETL, which stands for extraction, transformation, and
loading.

Note that ETL refers to a broad process, and not three well-defined steps.

Data Warehousing Concepts Page 42


The acronym ETL is perhaps too simplistic, because it omits the transportation phase and
implies that each of the other phases of the process is distinct. Nevertheless, the entire
process is known as ETL.

Data Warehousing Concepts Page 43


ETL Basics in Data Warehousing

What happens during the ETL process?

The following tasks are the main actions in the


process.

Extraction of Data

During extraction, the desired data is identified and extracted from many different
sources, including database systems and applications.

The size of the extracted data varies from hundreds of kilobytes up to gigabytes,
depending on the source system and the business situation.

The same is true for the time delta between two (logically) identical extractions: the time
span may vary between days/hours and minutes to near real-time.

Data Warehousing Concepts Page 44


Transportation of Data

After data is extracted, it has to be physically transported to the target system or to an


intermediate system for further processing.

Depending on the chosen way of transportation, some transformations can be done during
this process, too.

Data Warehousing Concepts Page 45


ETL Tools for Data Warehouses

We are going to use Oracle Data Integrator in (example), as an ETL tool that will provide:

1- extracting data from the XMLs of T24 core banking (example)


2- Transformation of data , columns , if required as per business requirements.
3- Loading the data in the target data warehouse database .

Oracle Data Integrator :

streamlines the high performance movement and transformation of data between


heterogeneous systems in :
-batch,
-real time,
-synchronous,
-asynchronous modes;

Data Warehousing Concepts Page 46


It enhances user productivity with design approach and built-in connectivity to all major
databases, data warehouse.

In our case , the source technology to be used is : XML files

The example environment above uses the following elements:


Data Warehousing Concepts Page 47
• The Repository: The Repository contains all of the metadata required for the training
examples. It is hosted, for these evaluation purposes, in a supplied database.
• Orders Application: An application for tracking customer orders, hosted in a supplied database
(the "srcdemo" sample database).
• Parameters (File): Flat files (ASCII) issued from the production system containing a list of
sales representatives and the segmentation of ages into age ranges.
• Sales Administration: The administration or tracking of sales, hosted in another supplied
database (the "trgdemo" sample database).

• This data warehouse is populated with our transformations.

Data Warehousing Concepts Page 48

You might also like