0% found this document useful (0 votes)

3 views13 pages

DW Notes

The document provides an overview of data warehousing concepts, including the distinction between OLTP and OLAP systems, data warehouse design methodologies, and architectural components. It discusses dimensional modeling, fact and dimension tables, surrogate keys, slowly changing dimensions, and the importance of data cubes for analysis. Additionally, it covers advanced modeling concepts, query performance enhancement techniques, and the implications of sparsity in data aggregation.

Uploaded by

[email protected]

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views13 pages

DW Notes

Uploaded by

[email protected]

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Data Warehouse:

Module 1: Intro

Software programs can be broadly classified as :

OLTP (Online Transactional processing)Operational Systems required to run the Business applications and have
information about the Processes.Highly transaction and large valumes of data. Normalised to reduce redundancy
More tables result to address 1,2,3NF and to avoid Update,Delete,Insert Anamolies  For faster insert, updates,
deletes As more data comes into system, data quality is maintained and reducing reduncy Data in (Data
Capture) systems

OLAP Decision support systems Information about strategies Subject oriented (each data mart is about a
subject) For faster analysis and search read most systems with little updates except durign loading(ETL)
Denormalised to have few tables, integrated data at one place for faster access to the data.--> data out systems.

Module 2: Data warehouse Design

 ER Modelling is used for OLTP to optimize data updates through normalization which basically creates more
tables to reduce redunancy and avoid data anamolies (Delete, insert,update)
 ER Modelling cannot be used in DW, as normalization does not not support fast data retrieval in large
volumes, as more tables are involved.
 Dimensional modelling is used in DW
 Facts or Measure= function(Dimensions or factors)
 Facts or Measures are numeric and additive. Ex Sale Amt= f(product,location,time)
 Dimensions tables have textual or descriptive or numerical attributes.
 Fact tables have facts or measures which are numeric and also foreign keys from Dimension tables which are
surrogate keys (not intelligent or smart)
 All the foreign keys (Dimension entries) are surrogate keys in fact table,form a composite primary key and
take 4 bytes each and integer values. They are the new primary keys in dimension tables.
 Types ofs : Additive (the fact is additive with all dimension), semi((the fact is additive with some dimensions.
Ex Account balance is not addiitive wrt time, yesterday 10k, today 15k doesn’t imply balance as 15k. But with
customer dimension it is additive.I have 10k and another has 5k, total balance for both is 15k) and non-
additive
 Star schema, dimension tables are not interconnected but all are connected through fact table.
 Snow flakes, dimension tables are normalised into sub tables and so such normalised dimension tables are
inter-connected .
 Design steps:
o Step1 : Idenitfy the Business Process
o Step2 :Declare the grain (level of details ex: per day per month)
o Step3: Identify the dimension
o Step4: Identify the facts

Data Cubes:Users of decision support systems often see data in the form of data cubes. The
cube is used to represent data along some measure of interest. Although called a "cube", it can
be 2-dimensional, 3-dimensional, or higher-dimensional. Each dimension represents some
attribute in the database and the cells in the data cube represent the measure of interest.
Inmon: high risk, high reward, Kimball:low risk low reward

Database sizing:

FACT TABLE SIZE

 3 year data

 100 stores

 Daily grain

 60,000 SKUs
 Sparsity = 10% (10% of total products is sold per day)

 4 dimensions (16 bytes)

 4 facts (16 bytes)

Total Size=3x365x100x6000x32 (20gb)

Dimensional versus normalized approach for storage of data

 The dimensional approach refers to Ralph Kimball's approach in which it is stated that the data
warehouse should be modeled using a Dimensional Model/star schema. The normalized approach, also
called the 3NF model (Third Normal Form) refers to Bill Inmon's approach in which it is stated that the
data warehouse should be modeled using an E-R model/normalized model.
 In the normalized approach, the data in the data warehouse are stored following, to a degree, database
normalization rules.
 In the bottom-up approach, data marts are first created to provide reporting and analytical capabilities for
specific business processes. These data marts can then be integrated to create a comprehensive data
warehouse.
 A data mart is the access layer of the data warehouse environment that is used to get data out to the
users. The data mart is a subset of the data warehouse and is usually oriented to a specific business line
or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts
pertains to a single department.
 Dependant datamarts: DW is first implemented then data marts are created.

 Independent datamarts: Data marts firstc raeted and aggregate of them constitutes dw.
 Coverage Factless table contains all the information. Factless – fact table gives information that is not
part of fact table.It covers all possiblities.
 So factless-fact entries give information about missing events in fact table
 It has no measures but only has foreign keys from dimension tables
 It conatins no data only track events (happened and not happened)

Topic 5- Data warehouse architecture:

• Source Systems

• Data Staging Area (A storage area where extracted data is ..ETL)

• Presentation Servers (A target physical machine on which DW data. Data marts are here.)

• Data Mart/Super Marts

• Data Warehouse

• Operational Data Store

• OLAP (used by end users through client server arcgitecture,front end tool for DW,enables easy reporting
cababilities without sql knowledge)

ODS (operational data store) optional: Half operational (volatile,current valued) & half DSS (subject oriented ,
integrated)

• ODS is particularly useful when:

– ETL process of the main DW delayed the availability of data

– Only aggregated data is available

– Class I – Updates of data from operational systems to ODS are synchronous

– Class II – Updates between operational environment & ODS occurs between 2-3 hour frame
– Class III – synchronization of updates occurs overnight
– Class IV – Updates into the ODS from the DW are unscheduled
• Data in the DW is analyzed, and periodically placed in the ODS
• For Example –Customer Profile Data
• Customer Name & ID
• Customer Volume – High/low
• Customer Profitability – High/low
• Customer Freq. of activity – very freq./very infreq.
• Customer likes & dislikes

OLAP
 A data warehouse serves as a repository to store historical data that can be used for
analysis.
 OLAP is Online Analytical processing that can be used to analyze and evaluate data in a
warehouse.
 The warehouse has data coming from varied sources. OLAP tool helps to organize data in
the warehouse using multidimensional models.

Topic 6,7 : case study

 Factless tables
 Conformed dimensions where dimensional tables are reused.

Module 5: Topic 8 and 9: Advanced Modelling concepts

Surrogate keys

 Natural key or Production keys or Smart keys or Intelligent keys (previously primary key
from OLTP)
 Surrogate,aritifial,integer keys.4 byte integers(4 billion rows ) taking less space than smart
keys. Natural primary key is changed to a surrogate
 Advantages: less space+ faster joins than using natural keys as comparision involves
integers in joins.+easy handling changing dimensions

Slowly Changing Dimensions

 Type1 Overwrite old value in dimension table, no history is mainatined.Easy and fast to
implement. Only latest is refelected.
 Type2Add a new dimension row,history is maintained. New record will have same Natural
key but new surrogate is created for the new row. Effective dates column can be added.
 Type 3 In the same record, new column(new attribute) is added. So previous and new
values are listed. Full history is not available,only recent history is maintained+when more
attributes changes, too many columsn might get added.

Generating Surrogate keys:

 First time load Assign new surrogate keys

 Data Refresh
o First ifnd if the data exists in dimension table, if not assign new surrogate key
o If exists, find if record is changed or not using checksums of each record, if there is a change
find the attributes which changes and according to type 1,2 or 3 load the changes.
o If it is type 3, assign new surrogate key.
 Lookup tables can be used in refreshing activity.

Lookup Tables: Each dimension tables has a lookup table which has mappins between surrogate
keys and natural keys. Only the latest surrogate is stored when multiple rows exists with same
natural keys(previously primary key from OLTP).

Advanatages:

 Makes generartion of Sks fatser

 Helps refershing acivity in dimension
 Helps populating fact table faster
 Alwys contains latest dimension record (latest sk)

Rapidly Changing Monster Dimensions:

 Problem: Type 2 is not recommended in such cases as rows will increase.

 Solution: Mini-Dimension. The rapidly changing attributes are separated out into separate mini dimension
table. The fact table will have foreign key to this new mini dimension table as well. Both the primary
dimension (from where mini is separated) and mini dimension are related through the Fact or factless table.

Snowflaking & Outriggers:

 Snowflakes not recommended: Dimensions are further normalised .

 Outriggers :permissible snowflakes.Outriggers have special characteristicsthat make snowflake ,doel
permissible. Limited normalization and only attributes sets which are higly correlated etc. Outriggers are
exception and should not be considered as a rule or should not be used unless there is a need.

Star Schema: Mini Dimension:

Dimension1 Dimension3
Dimension1 Dimension3

Fact Fact
Mini Dimension4
Dimension

Dimension2 Dimension4

Dimension2
Snowflakes: Outtrigger:

 When all dimensions are normalised : Snowflake, only when some dimensions are normalised, it is called
Starflakes

 Centipede fact table is a normalized fact table. Modeller may decide to normalize the
fact instead of snow flaking dimensions tables.

Time Dimension:

Time Dimension is explicitly added in DW and not something coming from OLTP systems. This avoids deriving time
which saves while accessing or analyzing millions of records.

Conformed dimensions: (Conformed= of similar type)

 A conformed dimension can exist as a single dimension table that relates to multiple
fact tables within the same data warehouse, or as identical dimension tables in separate data
marts.
 Dimensions can reused(created physically) in different data marts or star schemas in three different ways:
Identical only some columns/attributes only some rows/records

Role Playing Dimensions

 When same dimensions/attributes are used multipel times within the same fact table, it complicates the join
operation
 Solution: View are created to separate the dimensions, each view is virtual dimension entity and has its own
SK entry in fact table like any other dimension table.

Multi-valued dimensions: (Same cell with different values )

Ex: Joint accounts in Bank, Multiple sale person involved in a sale

M7: Query Performance Enhancing Techniques (Concepts of physical design)

There are several strategies to improve query response time in the data warehouse
context: indexing techniques, materialized views, and partitioning of
data.

Description/Plan/Reference Brief Description

RL7.1.1 = Aggregation Pre-calculated Summarization of base fact table through new star
RL7.1.2 = Sparsity Failure schema witt reduced record rows. Sparsity problem
RL7.2.1 = Shrunken, Lost, &
Collapsed Dimensions
RL7.3.1 = Aggregate Navigator Helps in making Aggregate unware sql queries from users to
RL7.3.2 = Aggregate Navigation aggregate aware queries.Algorithm find smallets aggregate star to
Algorithm next smallest till results are returned.
RL7.4.1 = Partitioning
RL7.4.1 = Partitioning wrt Time
RL7.5.1 = View Materialization
RL7.5.2 = Selection of Views to
Materialize
RL7.6.1 = View Maintenance
Strategies
RL7.6.2 = Incremental Maintenance
Algorithms
RL7.7.1 = Bitmap Indices
RL7.7.2 = Bitmap Compression
Strategies
CS7.1.1 = Data Warehouse
performance challenges(T1, Ch 18)
CS7.1.2 = Concepts of physical
design (T1, Ch 18)

Aggregation (Summarization in separate star schemas)

 Aggregate fact tables are merely summaries of the most granular data at
higher levels along the dimension hierarchies.
An aggergate is a fact table representing a summarization of base-level fact table data.
 It’s a precalculated summaries that are stored in the data warehouse to imrpove query performance.
 Improves query performance by a factor of 100 to even 1000 as the number of record rows decreases with
aggregation.
 Ex: if a fact table contains daily granular data about sales data. Another summary tablle(another star
schema) is created with summary of the sales details per month. So total number records will reduce in
second instance.Similarly, if total number of sales per product sub catgeory are shown in base fact table, a
summary\aggregate table can be created with sales per category.
 So the aggregation can be done with 1 dimension or 2 way or multiway.
n
 With n dimension, 2 aggregates can be created including with Base table (with no aggregation).
 With 1 way, the sparsity increases, with 2 way the increase is more.
 Sparsity(antonym is dense) refers to how the fact table is occupied sold or % of events occuring.
 So with increase in aggregation (2..3… n), sparsisty inccreases , which results in more records which causes
sparsity failure.

Effect of Sparsity on Aggregation Consider the case of the grocery chain with 300
stores, 40,000 products in each store, but only 4000 selling in each store in a day. As discussed
earlier, assuming that you keep records for 5 years or 1825 days, the maximum
number of base fact table rows is calculated as follows:
Product ¼ 40,000
Store ¼ 300
Time ¼ 1825
Maximum number of base fact table rows ¼ 22 billion

Because only 4000 products sell in each store in a day, not all of these 22 billion rows are
occupied. Because of this sparsity, only 10% of the rows are occupied. Therefore, the real
estimate of the number of base table rows is 2 billion.

Now let us see what happens when you form aggregates. Scrutinize a one-way aggregate:
brand totals by store by day. Calculate the maximum number of rows in this one-way
aggregate.
Brand ¼ 80
Store ¼ 300
Time ¼ 1825
Maximum number of aggregate table rows ¼ 43,800,000
While creating the one-way aggregate, you will notice that the sparsity for this aggregate
is not 10% as in the case of the base table. This is because when you aggregate by brand,
more of the brand codes will participate in combinations with store and time codes. The sparsity
of the one-way aggregate would be about 50%, resulting in a real estimate of 21,900,000.
If the sparsity had remained as the 10% applicable to the base table, the real estimate of the
number of rows in the aggregate table would be much less.
When you go for higher levels of aggregates, the sparsity percentage moves up and
even
reaches 100%.

Experienced data warehousing practitioners have a suggestion. When you form aggregates,
make sure that each aggregate table row summarizes at least 10 rows in the lower
level table. If you increase this to 20 rows or more, it would be really remarkable.

Aggregate Navigator

 Why Aggregates should be hidden be from end users? If not hidden any changes in Aggregates
will force the users to change their queries. To keep them independent, Aggregate Navigator
middleware software is used.
 When a user sends a query targeting base table, the AN converts the query to target an
appropriate Aggregate.

Aggregator navigator Algorithm:

 Starts from the smallest aggregate fact table, of the query can be answered, the table is considered as base
fact table its aggregated dimensions are used in the query processing.
 If it fails, next smallest and so on.
 Worst case, the base table might be queries if none of the aggregate tables satisfy the dimensions in the
query.

Partitioning:
 Fact tables are generally very large. Large tables are not easy to manage. During the load
process, the entire table must be closed to users. Again, back up and recovery of large tables
pose difficulties because of their sheer sizes.
 Partitioning divides large database tables into manageable parts..Partitioning helps Loading fact
tables faster ,supports incrmental backup process and faster retrieval. Horizontal or vertical?
 Partitioning wrt Time means partitioning in year or time ranges which helps in data retrieval or backup or
loading or archiving in time based partitions

 Range Partitioning: Each partition is specified by a range of values of the partitioning key (e.g. a trade table can be
partitioned by the year of the trade date')
 List Partitioning: Each partition is specified by a list of values of the partitioning key (sales data can be partitioned by
region in which countries falling under NA, EMEA etc can be grouped under separate lists.)
 Hash Partitioning: A hash algorithm is applied to the partitioning key to determine the partition for a given row

View Materialization:

 Problems with Views : When a view is created, just the view definition is saved in DB but no rows\data are
fetched or exist physically.
 Whenever a query runs on a view, the view definition is executed .
 So the overall time of the query response is very large on a view in large large data sets or in joins
 View is virtual table and is nothing but a saved sql query.

A materialized view is a database object that contains the results of a query. It contains data which can be
queried on. Can be refreshed .They are pre-computed important, expensive and frequently required
results.
Maintenance or Synchronization:

A view maintenance policy is a decision about when a view is refreshed,

independent of whether the refresh is incremental or not. A view can be refreshed
within the same transaction that updates the underlying tables. This
is called immediate view maintenance. The update transaction is slowed
by the refresh step, and the impact of refresh increass with the nurnber of
materialized views that depend on the updated table.
Alternatively, we can defer refreshing the viewv. Updates are captured in a log
and applied subsequently to the rnaterialized views. There are several deferred
view maintenance policies:
1. Lazy: The rnaterialized viewis refreshed at the timee a query is evaluated
using V, if V is not already consistent with its underlying base tables. This
approach slows down queries rather than updates, in contrast to immediate
view rnaintenance.
2.Periodi(daily or weekly)
3.Forced: The rnaterialized view is refreshed after a certain nurnber of
changes have been made to the underlying tables.
Problems:
In periodic and forced view maintenance, queries see an instance of the
maaterialized view that is not consistent with the current state of the underlying
tables.

Examples

Create a materialized view of columns from the customer and bookorder tables.
CREATE MATERIALIZED VIEW custorder AS
SELECT custno, custname, ordno, book
FROM customer, bookorder
WHERE customer.custno=bookorder.custno;

Create a materialized view of columns x1 and y1 from the t1 table.

CREATE MATERIALIZED VIEW v1 AS SELECT x1, y1 FROM t1
PRIMARY KEY (x1) UNIQUE HASH (x1) PAGES=100;
Immediate lazy Periodic Forced or
Event based
Current value Always Always May not be May not be
when not when not
refereshed refreshed
Delay in Less probable High probable Less probable Less probable
query
response

Incremental Updates on materialized views:

 A change in tables can be propogated to Materialized views retaining the old view and
just implementing the on the view,incrementally
 EX: If V = R join S , if Ir tuples are inserted then the new materialized view can be
obtained by Vnew = Rnew join S
Since Rnew = R Union Ir  Vnew = (R Union Ir) join S
Which can be written as Vnew = (R join S ) Union (Ir join S)
 Vnew = Vold Union (Ir join S)
 EX: If V = R join S , if ir tuples are inserted then the new materialized view can be
obtained by Vnew = Rnew join S
 Similary when Dr tuples are deletd Vnew = Vold – (Dr join S)

Bitmap Indexing:Indexing in general improves Query performance. But the downside is with
large data tables, the index will also be very large. Since Dat warehouse has large datasets,
normal index techniques may not yield effeciency. Bitmap indexes are generally used in data
warehouse tables on low cardinal columns, i.e., Columns with less distinct values. Ex a table
may have 1 million records, but gender column will be either true or false. Similarly state
column. For every distinct value, a bit is used. For 2 distinct value, 2 bits are used. If a value is
present the bit is represented by 1 or else 0. For each row of data, a bit vector is added along
with rowid. Whichever row meets the condition, results in the rowids which need to be fetched.
If there are multiple bitmap indexes on multipe columns in a table. Based on the query
conditions , bitmap operator is applied on the resultset to get the resultant record rows.

In a bitmap index, though there are 10 million records with 1000 columns, the index on a
column will have 10 million rows but with only few columns, equal to the number of distinct
values of the indexed column +1.

M6 :OLAP &Multi-Dimension Databases:

OLAP Operations:

Roll up
The roll-up operation (also called drill-up or aggregation operation) performs aggregation
on a data cube, either by climbing up a concept hierarchy for a dimension or by climbing
down a concept hierarchy, i.e. dimension reduction.
Roll Down
The roll down operation (also called drill down) is the reverse of roll up. It navigates from
less detailed data to more detailed data. It can be realized by either stepping down a
concept hierarchy for a dimension or introducing additional dimensions.
Slicing
Slice performs a selection on one dimension of the given cube, thus resulting in a
subcube.
Dicing
The dice operation defines a subcube by performing a selection on two or more
dimensions. For example, applying the selection (time = day 3 OR time = day 4) AND
(temperature = cool OR temperature = hot) to the original cube we get the following
subcube (still two-dimensional):

 (Data cubes) more organised faster dml faster retrieval faster search operationsfast
analysis
 When is it appropriate: when a fact is function of all the dimension. Or else empty cells will
exist
 Recommendation: EDWRDMS Data marts MDDB
 MOLAP OLAP systems using MDDBs (Multi dimensional olap)
 ROLAP Use RDMS (Relational OLAP)
M9: Support for DW in RDMS:
SQL new operators for DW:

 Rollup Operator: Used for rollup operations: In Hierarchy structure when data is shown at
daily level, when it is coverted to to weekly or monthly, it is called roll-up.
 Cube: Operator which creates aggregates
 Window queries: helps analysing in date windows around a particular date . Ex: 5 daya
before and after 10th August.
 Top N Queries:Ex: Top 5 selling products

Chapter 3 - Roles of Travel Agencies and Tour Operators
No ratings yet
Chapter 3 - Roles of Travel Agencies and Tour Operators
34 pages
Project On Cable Network
50% (6)
Project On Cable Network
51 pages
Accounting Seminar Paper Lecture Notes
100% (1)
Accounting Seminar Paper Lecture Notes
10 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
47 pages
Unit-1 Lecture Notes
100% (1)
Unit-1 Lecture Notes
43 pages
BCG Casebook Consulting Case Interview Book波士顿咨询案例面试
88% (8)
BCG Casebook Consulting Case Interview Book波士顿咨询案例面试
31 pages
STADVDB Slides 02 - Summarizing Volumes of Data
No ratings yet
STADVDB Slides 02 - Summarizing Volumes of Data
38 pages
23-24 SSW Psy
No ratings yet
23-24 SSW Psy
4 pages
Sap QM: Prepared by M.Arunkumar
100% (1)
Sap QM: Prepared by M.Arunkumar
34 pages
International Financial MGM
No ratings yet
International Financial MGM
8 pages
SQL, PL/SQL Faq About Triggers:: Insert Update Delete
No ratings yet
SQL, PL/SQL Faq About Triggers:: Insert Update Delete
25 pages
ETL Testing - PPT
No ratings yet
ETL Testing - PPT
77 pages
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
100% (2)
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
31 pages
Jio Mart
No ratings yet
Jio Mart
3 pages
02 OS90522EN15GLA0 Data Storages
No ratings yet
02 OS90522EN15GLA0 Data Storages
84 pages
Java Database Connectivity (JDBC)
No ratings yet
Java Database Connectivity (JDBC)
90 pages
Data Warehousing
No ratings yet
Data Warehousing
29 pages
SQL Server To Databricks Data Modeling Migration
No ratings yet
SQL Server To Databricks Data Modeling Migration
6 pages
BTP100 EN Col07-40
No ratings yet
BTP100 EN Col07-40
1 page
Essbase Tips
No ratings yet
Essbase Tips
15 pages
2020 06 10 STD 7 Environment Q.bank Part1
No ratings yet
2020 06 10 STD 7 Environment Q.bank Part1
2 pages
Forecasting
No ratings yet
Forecasting
58 pages
Basic Elements of A Data Warehouse: Prof. Navneet Goyal Department of Computer Science BITS, Pilani
No ratings yet
Basic Elements of A Data Warehouse: Prof. Navneet Goyal Department of Computer Science BITS, Pilani
42 pages
How To Use Fast Incremental Backups With Block Change Tracking With Oracle 10g
No ratings yet
How To Use Fast Incremental Backups With Block Change Tracking With Oracle 10g
11 pages
Data Collection and Presentation
No ratings yet
Data Collection and Presentation
58 pages
Unit I DMT
No ratings yet
Unit I DMT
74 pages
Chirag DataScientist
No ratings yet
Chirag DataScientist
3 pages
Apache Traffic Server
No ratings yet
Apache Traffic Server
3 pages
HHSC Research Guidelines and Formats
No ratings yet
HHSC Research Guidelines and Formats
17 pages
2020-06-05 STD VII Math Data Handling
No ratings yet
2020-06-05 STD VII Math Data Handling
2 pages
Life of A DBA Improved
No ratings yet
Life of A DBA Improved
2 pages
Pola Komunikasi Konstruktif Mahasiswa Saat Menghadapi Tekanan Psikologis Dalam Penyelesaian Tugas Akhir
No ratings yet
Pola Komunikasi Konstruktif Mahasiswa Saat Menghadapi Tekanan Psikologis Dalam Penyelesaian Tugas Akhir
17 pages
DWDM Concept Demonstration
No ratings yet
DWDM Concept Demonstration
102 pages
Data Warehouse & Modeling Guide
No ratings yet
Data Warehouse & Modeling Guide
11 pages
Gary Wojda - MSCA - Barcode PDF
No ratings yet
Gary Wojda - MSCA - Barcode PDF
29 pages
HBR Creating Data Driven Culture 111247
No ratings yet
HBR Creating Data Driven Culture 111247
7 pages
Architectural Patterns in de
No ratings yet
Architectural Patterns in de
15 pages
Unit-2 1
No ratings yet
Unit-2 1
60 pages
MongoDB Deployments Tunning
No ratings yet
MongoDB Deployments Tunning
4 pages
Unit 1
No ratings yet
Unit 1
36 pages
Unit - I
No ratings yet
Unit - I
65 pages
Register in Advance For This Meeting:: STD 8 B at 7:00AM (19 - 24 Apr) Time: 7:00 - 9:15 AM
No ratings yet
Register in Advance For This Meeting:: STD 8 B at 7:00AM (19 - 24 Apr) Time: 7:00 - 9:15 AM
1 page
Export C4b3ca49 E3cf 47c0 Ae60 5d6c7b70aaac
No ratings yet
Export C4b3ca49 E3cf 47c0 Ae60 5d6c7b70aaac
1 page
II Term Syllabus (Summative Assessment) STD 6th, 2019-20-1
No ratings yet
II Term Syllabus (Summative Assessment) STD 6th, 2019-20-1
1 page
Data Warehouse 1
No ratings yet
Data Warehouse 1
6 pages
Gis
No ratings yet
Gis
8 pages
Int Ques General Mcse NW
No ratings yet
Int Ques General Mcse NW
27 pages
Assignment - 2 DWH
No ratings yet
Assignment - 2 DWH
13 pages
Project Information Management Systems2
No ratings yet
Project Information Management Systems2
18 pages
Data Modeling - Presentation PDF
No ratings yet
Data Modeling - Presentation PDF
46 pages
Oracle SQL & PL/SQL Training Guide
No ratings yet
Oracle SQL & PL/SQL Training Guide
9 pages
Data Warehouse Implementation
No ratings yet
Data Warehouse Implementation
37 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
Lesson 1 - MIS Introduction
No ratings yet
Lesson 1 - MIS Introduction
94 pages
12.data Dictionary
No ratings yet
12.data Dictionary
16 pages
dw4 - Dimension1
No ratings yet
dw4 - Dimension1
75 pages
Data Warehousing Unit 1,2
No ratings yet
Data Warehousing Unit 1,2
9 pages
Nat
No ratings yet
Nat
4 pages
DWM Unit 1 (2023)
No ratings yet
DWM Unit 1 (2023)
38 pages
Unit - 1
100% (1)
Unit - 1
29 pages
CST466-M1 - Ktunotes - in
No ratings yet
CST466-M1 - Ktunotes - in
24 pages
2.data Warehouse and OLAP
No ratings yet
2.data Warehouse and OLAP
14 pages
Notebook 1 - Matplotlib Basics
No ratings yet
Notebook 1 - Matplotlib Basics
15 pages
Dimensional Modeling Guide
No ratings yet
Dimensional Modeling Guide
45 pages
Cloud Research
No ratings yet
Cloud Research
10 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
Ashish Naidu - InfoCepts
No ratings yet
Ashish Naidu - InfoCepts
8 pages
Specialization Manual
No ratings yet
Specialization Manual
5 pages
Cloud Computing Research Paper
No ratings yet
Cloud Computing Research Paper
15 pages
Grocery Store Data Warehouse: Dr. Navneet Goyal Professor Computer Science Department BITS, Pilani
No ratings yet
Grocery Store Data Warehouse: Dr. Navneet Goyal Professor Computer Science Department BITS, Pilani
23 pages
Google Cloud Security Learning Resources
No ratings yet
Google Cloud Security Learning Resources
1 page
DW CrashCoursePPT
No ratings yet
DW CrashCoursePPT
24 pages
Core Enhancements
No ratings yet
Core Enhancements
4 pages
Sort Hash Tuning Paper
No ratings yet
Sort Hash Tuning Paper
4 pages
DWDM IT-32 DATAWAREHOUSING & DATAMINING
No ratings yet
DWDM IT-32 DATAWAREHOUSING & DATAMINING
9 pages
Data Warehouse Schemas & OLAP
No ratings yet
Data Warehouse Schemas & OLAP
12 pages
Unit 3 OLAP and OLTP
No ratings yet
Unit 3 OLAP and OLTP
64 pages
Dimensional Modeling: Prof. Sunita Sahu
No ratings yet
Dimensional Modeling: Prof. Sunita Sahu
50 pages
Data Warehousing & OLAP Guide
No ratings yet
Data Warehousing & OLAP Guide
20 pages
Datawarehousing - Study - 1
No ratings yet
Datawarehousing - Study - 1
3 pages
Power BI Facilitation Guide - Session 1
No ratings yet
Power BI Facilitation Guide - Session 1
13 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
Data Warehousing: People Making Technology Wor K™
100% (1)
Data Warehousing: People Making Technology Wor K™
44 pages
Data Model
100% (1)
Data Model
11 pages
DSO Internship File
No ratings yet
DSO Internship File
38 pages
Azure Synapse vs Databricks Guide
No ratings yet
Azure Synapse vs Databricks Guide
2 pages
DWH Architecture & Concepts
No ratings yet
DWH Architecture & Concepts
37 pages
SSAS Brief Introduction
50% (2)
SSAS Brief Introduction
160 pages
Hot Topics For Research Papers in Computer Science
No ratings yet
Hot Topics For Research Papers in Computer Science
8 pages
Tuffaha - Adoption Factors of Artificial Intelligence in Human Resource Management
No ratings yet
Tuffaha - Adoption Factors of Artificial Intelligence in Human Resource Management
154 pages
Vendor Evaluation Metrics
No ratings yet
Vendor Evaluation Metrics
14 pages
Skripsi Dian Diana
No ratings yet
Skripsi Dian Diana
85 pages
Httpseitdhh - Sbs#withdraw Applytype USDT TRC&Icon Usdt&Ratio 0.1&fee 0.1
No ratings yet
Httpseitdhh - Sbs#withdraw Applytype USDT TRC&Icon Usdt&Ratio 0.1&fee 0.1
1 page
Data Warehousing and OLAP Technology For Data Mining
No ratings yet
Data Warehousing and OLAP Technology For Data Mining
30 pages
Data Warehousing and OLAP Guide
No ratings yet
Data Warehousing and OLAP Guide
87 pages
Datawarehouse PPT
No ratings yet
Datawarehouse PPT
39 pages
NetCDF Data Handling in R Guide
No ratings yet
NetCDF Data Handling in R Guide
2 pages
Chapter 06
No ratings yet
Chapter 06
36 pages
Dimensional Modeling Guide
No ratings yet
Dimensional Modeling Guide
37 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
DWH Int Questions
100% (1)
DWH Int Questions
9 pages
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
No ratings yet
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
10 pages
Dataware House
No ratings yet
Dataware House
19 pages
Dimensional Models Intro
No ratings yet
Dimensional Models Intro
18 pages
MIS 385/MBA 664 Systems Implementation With DBMS/ Database Management
No ratings yet
MIS 385/MBA 664 Systems Implementation With DBMS/ Database Management
39 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
7 pages
Olp PDF
No ratings yet
Olp PDF
25 pages
Data Warehouse Ques
No ratings yet
Data Warehouse Ques
10 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
11 pages
DWH Concepts Interview Q&A
No ratings yet
DWH Concepts Interview Q&A
12 pages
Data Warehousing: Data Models and OLAP Operations: Lecture-1
No ratings yet
Data Warehousing: Data Models and OLAP Operations: Lecture-1
47 pages
Data Stage
No ratings yet
Data Stage
10 pages
Data Warehouse Modeling Guide
No ratings yet
Data Warehouse Modeling Guide
38 pages
DW Basic Questions
No ratings yet
DW Basic Questions
9 pages
Dimensional Modeling Essentials
No ratings yet
Dimensional Modeling Essentials
10 pages

DW Notes

Uploaded by

DW Notes

Uploaded by

Data Warehouse:

Software programs can be broadly classified as :

Module 2: Data warehouse Design

FACT TABLE SIZE

 4 dimensions (16 bytes)

 4 facts (16 bytes)

Total Size=3x365x100x6000x32 (20gb)

Dimensional versus normalized approach for storage of data

Topic 5- Data warehouse architecture:

• Data Staging Area (A storage area where extracted data is ..ETL)

• Data Mart/Super Marts

• Operational Data Store

• ODS is particularly useful when:

– Only aggregated data is available

– Class I – Updates of data from operational systems to ODS are synchronous

Topic 6,7 : case study

Module 5: Topic 8 and 9: Advanced Modelling concepts

Slowly Changing Dimensions

Generating Surrogate keys:

 First time load Assign new surrogate keys

 Makes generartion of Sks fatser

Rapidly Changing Monster Dimensions:

 Problem: Type 2 is not recommended in such cases as rows will increase.

Snowflaking & Outriggers:

 Snowflakes not recommended: Dimensions are further normalised .

Star Schema: Mini Dimension:

Conformed dimensions: (Conformed= of similar type)

Role Playing Dimensions

Multi-valued dimensions: (Same cell with different values )

Ex: Joint accounts in Bank, Multiple sale person involved in a sale

M7: Query Performance Enhancing Techniques (Concepts of physical design)

Description/Plan/Reference Brief Description

Aggregation (Summarization in separate star schemas)

Aggregator navigator Algorithm:

A view maintenance policy is a decision about when a view is refreshed,

Create a materialized view of columns x1 and y1 from the t1 table.

Incremental Updates on materialized views:

M6 :OLAP &Multi-Dimension Databases:

You might also like