Unit 3- Data Warehousing
Data Warehouse
“A data warehouse is a large collection of business data
used to help an organization make decisions.”
A Data warehouse is typically used to connect and analyze
business data from heterogeneous sources.
Features or Characteristics of Data
Warehouse
1. Subject Oriented:
A data warehouse is subject oriented because it provides
information around a subject rather than the organization's
ongoing operations.
These subjects can be product, customers, suppliers, sales,
revenue, etc.
A data warehouse does not focus on the ongoing
operations; rather it focuses on modeling and analysis of
data for decision making.
2. Integrated:
A data warehouse is constructed by integrating data from
heterogeneous sources such as relational databases, flat
files, etc.
This integration enhances the effective analysis of data.
3. Time Variant:
The data collected in a data warehouse is identified with a
particular time period.
The data in a data warehouse provides information from
the historical point of view.
4. Non-volatile:
Non-volatile means the previous data is not erased when
new data is added to it.
Benefits / Advantages of a Data Warehouse
1. Delivers enhanced business intelligence
By having access to information from various sources
from a single platform.
Additionally, data warehouses can effortlessly be applied
to a business’s processes, for instance, market
segmentation, sales, risk, inventory, and financial
management.
2. Saves times
A data warehouse standardizes preserves, and stores data
from distinct sources.
Since critical data is available to all users, it allows them to
make informed decisions on key aspects. It supports, saving
more time and money.
3. Enhances data quality and consistency
A data warehouse converts data from multiple sources into
a consistent format.
Since the data from across the organization is standardized,
each department will produce results that are consistent.
This will lead to more accurate data, which will become the
basis for solid decisions.
4. Provides competitive advantage
Data warehouses help get a holistic view of their current
standing and evaluate opportunities and risks, thus
providing companies with a competitive advantage.
5. Improves the decision-making process
Data warehousing provides better insights to decision
makers by maintaining a database of current and historical
data.
Disadvantages of Data Warehouse
The data warehouse requires large amount of data, so
required high maintenance.
Constructing of data warehouse for large organization is a
complex task & can take many years to complete it.
Administration of data warehouse is also complex &
requires higher level skills & team with technical expertise.
Quality control of data warehouse make issues both like
quality & consistency of data.
Application of Data Warehouse
The best applications of Data Warehousing
Every organization, no matter in what industry it works in or
how big or small it is, requires a data warehouse to connect its
disparate sources for anticipating, analysis, reporting, business
intelligence, and facilitating robust decision-making. Here, we
are listing down the best applications of data warehousing across
different industries.
1. Banking:
With the perfect Data Warehousing solution, bankers can
manage all their available resources more effectively.
They can better analyze their consumer data, government
regulations, and market trends to facilitate better decision-
making.
2. Finance:
The application of data warehousing in the financial
industry is the same as in the banking sector.
The right solution helps the financing industry analyze
customer expenses that enable them to outline better
strategies to maximize profits at both ends.
3. Education:
The educational sector requires data warehousing to have a
generic view of their students’ and faculty data.
It provides educational institutions access to real-time data.
4. Healthcare:
Another critical use of data warehouses is in the Healthcare
sector.
All the clinical, financial, and employee data are stored in
the warehouse, and analysis is run to derive valuable
insights to strategize resources in the best way possible.
5. Manufacturing & Distribution:
With an effective data warehousing solution, organizations
in the manufacturing & distribution sector can organize all
their data under one roof and predict market changes,
analyze the latest trends, view development areas, and
finally can make result-driven decisions.
6. Retailing:
Retailers are the mediators between wholesalers and end
customers, and that’s why it is necessary for them to
maintain the records of both parties.
For helping them store data in an organized manner, the
application of data warehousing comes into the frame.
7. Insurance:
In the Insurance sector, data warehousing is required to
maintain existing customers’ records and analyze the same
to up see client trends to bring more footsteps towards the
business.
8. Services:
In the services sector, data warehousing is used for
maintaining customer details, financial records, and
resources to analyze patterns and boost decision-making for
positive outcomes.
Architecture of Data Warehouse
1. Single-Tier Architecture
Single-Tier architecture is not periodically used in practice.
Its purpose is to minimize the amount of data stored to
reach this goal; it removes data redundancies.
The figure shows the only layer physically available is the
source layer.
In this method, data warehouses are virtual. This means
that the data warehouse is implemented as a
multidimensional view of operational data created by
specific middleware, or an intermediate processing layer.
2. Two-Tier Architecture
1.Source layer:
A data warehouse system uses a heterogeneous source of
data.
That data is stored initially to corporate relational databases
or legacy databases, or it may come from an information
system outside the corporate walls.
2.Data Staging:
The data stored to the source should be extracted, cleansed
to remove inconsistencies and fill gaps, and integrated to
merge heterogeneous sources into one standard schema.
The so-named Extraction, Transformation, and Loading
Tools (ETL) can combine heterogeneous schemata,
extract, transform, cleanse, validate, filter, and load source
data into a data warehouse.
3.Data Warehouse layer:
Information is saved to one logically centralized individual
repository: a data warehouse.
The data warehouses can be directly accessed, but it can
also be used as a source for creating data marts & Meta
data.
3. Three-Tier Architecture:
The three-tier approach is the most widely used architecture
for data warehouse systems.
Essentially, it consists of three tiers:
1. Bottom Tier :
Bottom tier is the database of the warehouse, where the
cleansed and transformed data is loaded.
2.Middle Tier:
The middle tier is the application layer giving an abstracted
view of the database.
It arranges the data to make it more suitable for analysis.
This is done with an OLAP server, implemented using the
ROLAP or MOLAP model.
3.Top Tier:
The top-tier is where the user accesses and interacts with
the data.
It represents the front-end client layer. You can use
reporting tools, query, analysis or data mining tools.
Components of Data Warehouse:
1. Data Warehouse Database:
It is a central component of data warehouse.
This is the place where the data are stored.
This database is implemented on the RDBMS technology.
2. Meta-Data
Data about data called Meta-Data.
E.g. – Student-name is Data
His father name, business is metadata.
It is used for building, maintaining & managing data
warehouse.
3. Extract Transformation Tool (ETL):
This tool search & replace common name for data arriving
from different sources.
In case of missing data, ETL tool populate them with
default data. (that means if information not available, we
get another information related to that data on YouTube /
Google)
4. Data Marts:
A data mart is only one subtype of data warehouse.
Its size is less than 100GB.
5. Query Tools:
It allows people to interact with data warehouse.
E.g. - Insert, Update, Delete.
Online Analytical Processing (OLAP):
It is a category of software technology that enables analyst
& managers to analyze the complex data derived from the
data warehouse.
OLAP allows managers & analysts to get information
from fast & consistent.
OLAP Architecture / Types of OLAP:
1. Relational OLAP(ROLAP)
2. Multidimensional OLAP(MOLAP)
3. Hybrid OLAP(HOLAP)
1. Relational OLAP:
ROLAP is used for to store & manage warehouse data.
ROLAP uses relational DBMS.
ROLAP tools analyze large volume of data.
ROLAP tools store & analyze highly volatile &
changeable data.
2. Multidimensional OLAP:
MOLAP are not capable of containing detailed data.
Information retrieval is fast. (Query processing is fast)
Very easy to use.
DBMS facility is weak.
Can perform complex computations.
3. Hybrid OLAP:
It combines advantages of both ROLAP & MOLAP.
Data Mart:
A data mart can be called as a subset of data warehouse.
Data mart is focused only on particular function of an
organization and it is maintained by single authority only.
Types of Data Mart:
There are three types of data marts:
1. Dependent Data Mart –
Dependent Data Mart is created by extracting the data from
central repository, Data warehouse.
First data warehouse is created by extracting data (through
ETL tool) from external sources and then data mart is
created from data warehouse.
Dependent data mart is created in top-down approach of
data warehouse architecture.
This model of data mart is used by big organizations.
2. Independent Data Mart –
Independent Data Mart is created directly from external
sources instead of data warehouse.
First data mart is created by extracting data from external
sources and then data warehouse is created from the data
present in data mart.
Independent data mart is designed in bottom-up approach
of data warehouse architecture.
This model of data mart is used by small organizations and
is cost effective comparatively.
Advantages of Data Mart:
1. Implementation of data mart needs less time as compared to
implementation of data warehouse as data mart is designed
for a particular department of an organization.
2. Organizations are provided with choices to choose model of
data mart depending upon cost and their business.
3. Data can be easily accessed from data mart.
4. The cost of implementing a data mart is less when compared
to build a data warehouse.
5. Data marts are flexible.
6. Data marts are smaller in size.
Disadvantages of Data Mart:
1. Since it stores the data related only to specific function, so
does not store huge volume of data related to each and every
department of an organization like data warehouse.
2. Creating too many data marts becomes cumbersome
sometimes.
Difference between Data Warehouse and Data
Mart