Data Warehousing for Analysts

A data warehouse integrates data from multiple sources to support reporting, queries, and decision making. It contains consolidated historical data kept separately from operational databases. A data warehouse helps executives analyze data to make strategic decisions. It uses data extraction, cleaning, transformation, loading, and refreshing to improve data quality. Data marts contain subsets of data specific to groups within an organization. Common schemas include star, snowflake, and dimension and fact tables.

Uploaded by

kotovi5317

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views11 pages

Data Warehousing for Analysts

Uploaded by

kotovi5317

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Warehouse

Data: a meaning full information.

Datawarehouse:
A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting,
structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data
consolidations.
Understanding a Data Warehouse
•A data warehouse is a database, which is kept separate from the organization's operational database.
•There is no frequent updating done in a data warehouse.
•It possesses consolidated historical data, which helps the organization to analyze its business.
•A data warehouse helps executives to organize, understand, and use their data to take strategic decisions.
•Data warehouse systems help in the integration of diversity of application systems.
•A data warehouse system helps in consolidated historical data analysis
Functions of Data Warehouse Tools and Utilities
The following are the functions of data warehouse tools and utilities −
•Data Extraction − Involves gathering data from multiple heterogeneous sources.
•Data Cleaning − Involves finding and correcting the errors in data.
•Data Transformation − Involves converting the data from legacy format to warehouse format.
•Data Loading − Involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions.
•Refreshing − Involves updating from data sources to warehouse.
Note − Data cleaning and data transformation are important steps in improving the quality of data and data mining results.
Data Mart
• Data mart contains a subset of organization-wide data. This subset of data is valuable to specific groups of an organization.
• In other words, we can claim that data marts contain data specific to a particular group. For example, the marketing data mart
may contain data related to items, customers, and sales. Data marts are confined to subjects.
• Points to remember about data marts −
• Window-based or Unix/Linux-based servers are used to implement data marts. They are implemented on low-cost servers.
• The implementation data mart cycles is measured in short periods of time, i.e., in weeks rather than months or years.
• The life cycle of a data mart may be complex in long run, if its planning and design are not organization-wide.
• Data marts are small in size.
• Data marts are customized by department.
• The source of a data mart is departmentally structured data warehouse.
• Data mart are flexible.
Enterprise Warehouse:
• An enterprise warehouse collects all the information and the subjects spanning an entire organization
• It provides us enterprise-wide data integration.
• The data is integrated from operational systems and external information providers .
• This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or beyond.
Star schema
• A Star Schema is a schema Architectural structure used for creation and implementation of the Data Warehouse systems, where
there is only one fact table and multiple dimension tables connected to it. It is structured like a star in shape of appearance.

Snowflake schema
• The snowflake schema consists of one fact table which is linked to many dimension tables, which can be linked to other
dimension tables through a many-to-one relationship. Tables in a snowflake schema are generally normalized to the third normal
form. Each dimension table performs exactly one level in a hierarchy.
Table:
• It consists of columns, and rows . In relational databases, and flat file databases, a table is a set of data elements (values) using a
model of vertical columns (identifiable by name) and horizontal rows, the cell being the unit where a row and column intersect. A
table has a specified number of columns, but can have any number of rows.
Dimension & Fact Tables:
Dimension: Dimensions store the textual descriptions of the business. With help of dimension you can easily identify the measures.
The different types of dimension tables are available as below:
Types of Dimension Tables
Slowly Changing Dimensions : This is the popular dimension type. Attributes of a dimension that would undergo changes over
time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data
warehouse. This is called a slowly changing attribute and a dimension containing such an attribute is called a slowly changing
dimension. Eg. Home Address doesnt change often, its a SCD attribute
Types of SCD :
Type 1 :In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no
history is kept.
Advantages : SCD-1 is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of
the old information
Disadvantages : All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case,
the company would not be able to know that Charlie lived in Illinois before
Type 2 : In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information.
Therefore, both the original and the new record will be present. The new record gets its own primary key.
Advantages:
• Type 2 is the popular dimension in Data warehousing. It preserves entire history of changes and is the most effective SCD
Disadvantages:
• Complex ETL required to do change data capture and perform the SCD Type 2 Process
• As a new record is inserted every time there is a change.
• This will cause the size of the table to grow fast.

Type 3 :In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of
interest, one indicating the original value, and one indicating the current value. There will also be a column
that indicates when the current value becomes active
Advantages:
• Does not increase the size of the table, since new information is updated in the same row
• Allows us to store some part of history
Disadvantages:
• Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Charlie later moves to
Texas on April 15, 2020, the California information will be lost
• Rapidly Changing Dimensions: A dimension attribute that changes frequently is a rapidly changing attribute. If you
don't need to track the changes, the rapidly changing attribute is no problem, but if you do need to track the
changes, using a standard slowly changing dimension technique can result in a huge inflation of the size of the
dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table.
This new dimension is called a rapidly changing dimension. Eg. Body Temperature is a rapidly changing attribute
• Junk Dimensions: A junk dimension is a single table with a combination of different and unrelated attributes to avoid
having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign
keys created by rapidly changing dimensions. For example, attributes such as flags, weights, BMI (body mass index)
etc
• Degenerate Dimensions: A degenerate dimension is when the dimension attribute is stored as part of fact table, and
not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a
data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated
number in a report. You can use these values to trace back to transactions in the OLTP system. For example, receipt
number does not have dimension table associated with it. Such details are just for information purpose
• Conformed Dimensions: A dimension that is used in multiple locations is called a conformed dimension. A conformed
dimension may be used with multiple fact tables in a single database, or across multiple data marts or data
warehouses. Conformed dimension example would be Customer dimension, i.e. both marketing and sales
department can use Customer dimension for their reporting purpose
• Static Dimensions: Static dimensions are not extracted from the original data source, but are created within the
context of the data warehouse. A static dimension can be loaded manually - for example with status codes - or it can
be generated by a procedure, such as a date or time dimension
• Role Playing Dimensions: A role-playing dimension is one where the same dimension key - along with its associated
attributes - can be joined to more than one foreign key in the fact table. For example, a fact table may include foreign
keys for both ship date and delivery date. But the same date dimension attributes apply to each foreign key, so you
can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship
date as well as delivery date, and hence the name of role playing dimension. For example, you can use a date
dimension for “date of sale”, as well as “date of delivery”, or “date of hire”
Types of Facts: There are three types of facts:
• Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
• Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but
not the others.
• Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact
table.
Indexes:
• Primary indexing is defined mainly on the primary key of the data-file, in which the data-file is already ordered based
on the primary key. Primary Index is an ordered file whose records are of fixed length with two fields.
• Secondary Index − Secondary index may be generated from a field which is a candidate key and has a unique value
in every record.
Joins:
Inner Join:selects records that have matching values in both tables.
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;

Left Outer Join: returns all records from the left table (table1), and the matching records from the right table (table2). The result is 0
records from the right side, if there is no match.
SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;

Right Outer Join:returns all records from the right table (table2), and the matching records from the left table (table1). The result is 0
records from the left side, if there is no match.
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;

Full Outer Join:returns all records when there is a match in left (table1) or right (table2) table records.
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;
Union & Union all
• The UNION operator is used to combine the result-set of two or more SELECT statements.
• Every SELECT statement within UNION must have the same number of columns
• The columns must also have similar data types
• The columns in every SELECT statement must also be in the same order
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL:
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;

Self Join: A self join is a regular join, but the table is joined with itself.
SELECT column_name(s)
FROM table1 T1, table1 T2
WHERE condition;
Thank You
References:
Sql:https://www.w3schools.com/sql/default.asp
Practice:https://www.w3schools.com/sql/trysql.asp?
filename=trysql_select_join_inner

Introduction To Aliyun ACP Exam Question Bank
No ratings yet
Introduction To Aliyun ACP Exam Question Bank
39 pages
Dimensional Modeling in Data Warehousing
No ratings yet
Dimensional Modeling in Data Warehousing
7 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
7 pages
DWH Int Questions
100% (1)
DWH Int Questions
9 pages
What Is Data Warehouse?: Explanatory Note
No ratings yet
What Is Data Warehouse?: Explanatory Note
11 pages
Facts & Dims
No ratings yet
Facts & Dims
14 pages
Week 3
No ratings yet
Week 3
39 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
11 pages
Abinitio Vijay - 8553385664
No ratings yet
Abinitio Vijay - 8553385664
28 pages
5.data Warehouse
No ratings yet
5.data Warehouse
19 pages
DWHDoc
No ratings yet
DWHDoc
37 pages
In The Star Schema Design
No ratings yet
In The Star Schema Design
11 pages
Data Warehouse Ques
No ratings yet
Data Warehouse Ques
10 pages
BI - Lecture 3 - Kimball Concepts
No ratings yet
BI - Lecture 3 - Kimball Concepts
44 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
84 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
5 pages
Tutorial # 1
No ratings yet
Tutorial # 1
58 pages
Data Warehouse: Subject Oriented
No ratings yet
Data Warehouse: Subject Oriented
6 pages
dw4 - Dimension1
No ratings yet
dw4 - Dimension1
75 pages
Data Warehousin G Concepts
No ratings yet
Data Warehousin G Concepts
41 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
46 pages
Data Warehouse: What, Why and How ?
No ratings yet
Data Warehouse: What, Why and How ?
25 pages
Data Warehousing Interview Questions and Answers
No ratings yet
Data Warehousing Interview Questions and Answers
5 pages
What Are The Dimensions in Data Warehouse
100% (1)
What Are The Dimensions in Data Warehouse
6 pages
Different Types of Dimensions and Facts in Data
No ratings yet
Different Types of Dimensions and Facts in Data
5 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
47 pages
DW Basic Questions
No ratings yet
DW Basic Questions
9 pages
Data Stage
No ratings yet
Data Stage
10 pages
Class 3
No ratings yet
Class 3
28 pages
C 01 Dimensional Modeling
No ratings yet
C 01 Dimensional Modeling
30 pages
Data Warehouse Design & Implementation
No ratings yet
Data Warehouse Design & Implementation
27 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
14 pages
Data Warehouse Basics for Analysts
0% (1)
Data Warehouse Basics for Analysts
14 pages
Dimensions DW
No ratings yet
Dimensions DW
6 pages
Dimensional Modeling Guide
No ratings yet
Dimensional Modeling Guide
26 pages
Populating A DW With SS2K
No ratings yet
Populating A DW With SS2K
5 pages
Data Warehousing INTERVIEW QUESTION
No ratings yet
Data Warehousing INTERVIEW QUESTION
17 pages
Datawarehousing Top50 Interview Questions
No ratings yet
Datawarehousing Top50 Interview Questions
10 pages
Data Warehousing: People Making Technology Wor K™
100% (1)
Data Warehousing: People Making Technology Wor K™
44 pages
Data Warehouse Schema
No ratings yet
Data Warehouse Schema
10 pages
Lecture 1 Notes: Dimension Tables
No ratings yet
Lecture 1 Notes: Dimension Tables
2 pages
DW Concepts
No ratings yet
DW Concepts
7 pages
DatawareHousing Concepts
No ratings yet
DatawareHousing Concepts
20 pages
What Is The Difference Between OLTP and OLAP?
No ratings yet
What Is The Difference Between OLTP and OLAP?
33 pages
Dimensional Modeling Guide
No ratings yet
Dimensional Modeling Guide
59 pages
Dimensional Modeling: Prof. Sunita Sahu
No ratings yet
Dimensional Modeling: Prof. Sunita Sahu
50 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
59 pages
Abinitio
100% (1)
Abinitio
28 pages
Data Cubemod2
100% (1)
Data Cubemod2
21 pages
Cost Based Optimization
No ratings yet
Cost Based Optimization
14 pages
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
No ratings yet
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
10 pages
Week 5
No ratings yet
Week 5
19 pages
Data Warehouse 1735829229
No ratings yet
Data Warehouse 1735829229
11 pages
DWM 2
No ratings yet
DWM 2
21 pages
SCD Types
No ratings yet
SCD Types
23 pages
CH 3
No ratings yet
CH 3
60 pages
DMDW
No ratings yet
DMDW
40 pages
Data Warehouse & Modeling Guide
No ratings yet
Data Warehouse & Modeling Guide
11 pages
ETL Testing
No ratings yet
ETL Testing
3 pages
14BCC51-Microprocessor Lab Manual
No ratings yet
14BCC51-Microprocessor Lab Manual
65 pages
Adobe Scan Jul 24, 2022
No ratings yet
Adobe Scan Jul 24, 2022
12 pages
UEStudio Manual
No ratings yet
UEStudio Manual
492 pages
CMSV6 Active Security Cloud Platform User Manual 20190719
No ratings yet
CMSV6 Active Security Cloud Platform User Manual 20190719
219 pages
Course Jasper PPT
No ratings yet
Course Jasper PPT
74 pages
Operators: General Properties of Operators
No ratings yet
Operators: General Properties of Operators
23 pages
Matrikon Guide To OPC
No ratings yet
Matrikon Guide To OPC
9 pages
How To Insert An Auto - Increment Key Into SQL Server Table - Stack Overflow
No ratings yet
How To Insert An Auto - Increment Key Into SQL Server Table - Stack Overflow
4 pages
Laravel-Composite Primary Key
No ratings yet
Laravel-Composite Primary Key
9 pages
Principles of Input and Output Devices
No ratings yet
Principles of Input and Output Devices
17 pages
Nbu 100 HCL
No ratings yet
Nbu 100 HCL
263 pages
Technical Interview Question
No ratings yet
Technical Interview Question
21 pages
Comtech/EFData CDM-625A Satellite Modem Data Sheet
No ratings yet
Comtech/EFData CDM-625A Satellite Modem Data Sheet
5 pages
OS 2022 Solution
No ratings yet
OS 2022 Solution
37 pages
Practice Practical 3
No ratings yet
Practice Practical 3
4 pages
Review Your Answers: Automation Anywhere Certified Advanced RPA Professional (A2019) Assessment
No ratings yet
Review Your Answers: Automation Anywhere Certified Advanced RPA Professional (A2019) Assessment
12 pages
Variables and Types
No ratings yet
Variables and Types
8 pages
IT Database Fundamentals Guide
No ratings yet
IT Database Fundamentals Guide
5 pages
C C C Programming Concepts - Compress
No ratings yet
C C C Programming Concepts - Compress
1 page
API Speccification (TTT)
No ratings yet
API Speccification (TTT)
10 pages
Adaptec 29160 Ultra160scsi
No ratings yet
Adaptec 29160 Ultra160scsi
3 pages
An A To Z Guide To Understanding and Implementing SAP Content Server
No ratings yet
An A To Z Guide To Understanding and Implementing SAP Content Server
16 pages
Netcor 2020 Steelcentral NPM Family Xx80 Spec Sheet
No ratings yet
Netcor 2020 Steelcentral NPM Family Xx80 Spec Sheet
10 pages
UDS Protocol Implementation in An ECU
100% (4)
UDS Protocol Implementation in An ECU
6 pages
Data Transformation COM API Reference: Informatica B2B Data Exchange™
No ratings yet
Data Transformation COM API Reference: Informatica B2B Data Exchange™
24 pages
BCS403 - DBMS - Assignment2 (3 Sets)
No ratings yet
BCS403 - DBMS - Assignment2 (3 Sets)
3 pages
USB Flash Drive Speed Insights
No ratings yet
USB Flash Drive Speed Insights
2 pages
Cercul Trigonometric
No ratings yet
Cercul Trigonometric
1,380 pages
JSON Quiz for Coding Students
No ratings yet
JSON Quiz for Coding Students
18 pages

Data Warehousing for Analysts

Uploaded by

Data Warehousing for Analysts

Uploaded by

Data Warehouse

Data: a meaning full information.

You might also like