Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views5 pages

Week 2 Lectures

The document outlines the architecture of data warehouses, detailing components such as source systems, ETL processes, and data storage layers. It emphasizes the importance of data integration, management, and practical design of a data warehouse schema for analyzing sales performance. Key concepts include data storage types, optimization techniques, and the role of metadata and reporting tools in facilitating decision-making.

Uploaded by

moroansoma23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views5 pages

Week 2 Lectures

The document outlines the architecture of data warehouses, detailing components such as source systems, ETL processes, and data storage layers. It emphasizes the importance of data integration, management, and practical design of a data warehouse schema for analyzing sales performance. Key concepts include data storage types, optimization techniques, and the role of metadata and reporting tools in facilitating decision-making.

Uploaded by

moroansoma23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Week 2: Data Warehouse Architecture

• Lecture Topics:
o Data Warehouse Architecture Components
o Data Storage and Management
o Data Integration
• Readings:
o "What is a Data Warehouse?"[2]
• Practical:
o Designing a simple data warehouse schema

Lecture Notes: Data Warehouse Architecture and Key Components


1. Data Warehouse Architecture Components
A data warehouse is a centralized repository that stores integrated data from multiple sources,
designed to support decision-making processes. The architecture typically comprises the
following components:
a) Source Systems
• These are operational systems (e.g., ERP, CRM, transactional databases) from which
data is extracted.
• Examples: Sales databases, customer support systems, or financial systems.
b) ETL (Extract, Transform, Load) Processes
• ETL tools extract data from source systems, transform it to fit the data warehouse
schema, and load it into the warehouse.
• Practical Example: A retailer extracts daily sales data, cleans it to remove duplicates,
aggregates sales by store, and loads it into the data warehouse.
c) Staging Area
• Temporary storage area where data is cleaned, deduplicated, and transformed before
loading into the warehouse.
• Example: A logistics company might use a staging area to align shipment tracking
data formats from different systems.
d) Data Storage Layer
• Contains the organized, structured, and optimized data for querying and analysis.
• Composed of fact tables (storing measurable data) and dimension tables (storing
descriptive attributes).
o Example: A sales fact table might have sales amounts, while the product
dimension table contains product descriptions.
e) Metadata Layer

Page 1 of 5
• Stores information about the data, such as definitions, structure, and lineage.
• Practical Example: Metadata helps analysts understand what "monthly revenue"
represents and how it was calculated.
f) Query and Reporting Tools
• Allow users to query the data and generate reports or dashboards.
• Example: A financial analyst uses a BI tool like Tableau to visualize quarterly
revenue trends.
g) Data Marts
• Subsets of the data warehouse, tailored for specific departments or business units.
• Example: A marketing data mart might focus on campaign performance metrics.

2. Data Storage and Management


Efficient data storage and management ensure high performance and scalability for a data
warehouse.
a) Storage Types
• Relational Databases: Traditional databases like Oracle, SQL Server.
• Columnar Databases: Optimized for analytical queries, e.g., Amazon Redshift,
Snowflake.
• Cloud-Based Solutions: Offer scalability and flexibility, e.g., Google BigQuery.
b) Storage Optimization Techniques
• Partitioning: Splitting large tables into smaller parts for faster queries.
o Example: Partitioning a sales table by year.
• Indexing: Creating indexes for frequently queried fields.
o Example: Adding an index on the "product_id" column to speed up product
searches.
• Compression: Reducing data size without losing information.
o Example: Storing numeric data in compressed formats.
c) Data Backup and Recovery
• Regular backups ensure data integrity in case of failures.
• Example: A company schedules nightly backups of their data warehouse.
d) Data Security

Page 2 of 5
• Ensures sensitive information is protected through access controls, encryption, and
monitoring.
• Example: A healthcare provider encrypts patient data stored in the warehouse.

3. Data Integration
Data integration is the process of combining data from various sources into a unified view,
crucial for analytics and decision-making.
a) Types of Data Integration
• ETL (Extract, Transform, Load): Data is extracted from sources, transformed, and
loaded into the warehouse.
• ELT (Extract, Load, Transform): Data is loaded into the warehouse first and then
transformed.
• Example: ELT is commonly used in modern cloud-based warehouses like Snowflake.
• Data Virtualization: Real-time access to data without physical storage.
o Example: A business user queries live data from multiple databases without
moving it to a central repository.
b) Data Transformation
• Standardizing data formats, cleansing errors, and enriching data.
• Example: Converting date formats from "MM/DD/YYYY" to "YYYY-MM-DD."
c) Data Consolidation Challenges
• Data Silos: Disconnected systems lead to incomplete views.
o Solution: Use APIs or middleware to connect systems.
• Data Quality Issues: Errors or inconsistencies affect trust.
o Solution: Implement data quality checks during integration.
d) Tools for Data Integration
• Popular tools include Informatica, Talend, Apache Nifi, and Microsoft Azure Data
Factory.
• Example: Informatica integrates customer data from CRM and ERP systems into a
centralized warehouse.

Conclusion
Understanding these components—data warehouse architecture, data storage and
management, and data integration—lays the foundation for implementing robust, scalable,

Page 3 of 5
and efficient analytical solutions. Practical use of these principles enables organizations to
transform raw data into actionable insights.

4. Practical: Designing a Simple Data Warehouse Schema


Objective: To design a basic schema for a retail business to analyze sales performance.
Steps:
1. Identify Business Requirements:
o Analyze the business’s key metrics, such as total sales, revenue by region, and
product performance.
o Example: A retail chain wants to track daily sales across stores and identify
top-selling products.
2. Define Fact and Dimension Tables:
o Fact Table:
 Name: Sales_Fact
 Attributes: Sale_ID, Date_ID, Store_ID, Product_ID, Revenue,
Quantity_Sold
o Dimension Tables:
 Date_Dimension: Date_ID, Date, Month, Quarter, Year
 Store_Dimension: Store_ID, Store_Name, Region, City
 Product_Dimension: Product_ID, Product_Name, Category, Brand
3. Create the Schema:
o Star Schema Design:
 The Sales_Fact table is at the center, with foreign keys linking to
dimension tables.
o Example:
o Sales_Fact
o -------------------
o Sale_ID | Date_ID | Store_ID | Product_ID | Revenue | Quantity_Sold
o

o Date_Dimension
o -------------------

Page 4 of 5
o Date_ID | Date | Month | Quarter | Year
o

o Store_Dimension
o -------------------
o Store_ID | Store_Name | Region | City
o

o Product_Dimension
o -------------------
Product_ID | Product_Name | Category | Brand
4. Populate Data:
o Collect data from the source systems and transform it to match the schema.
o Example: Load daily sales transactions into the Sales_Fact table and update
the dimension tables with store and product details.
5. Test and Query:
o Verify the schema by running queries.
o Example Query: "Find the total revenue by product category in Q1 2023

Page 5 of 5

You might also like