Mid-Term Exam Notes: Data Warehousing
1. Definition of Data Warehousing
Data Warehousing:
A process of transforming data into information and making it available to users in a timely
manner to support decision-making. (Forrester Research, 1996)
Data Warehouse:
A subject-oriented, integrated, time-varying, and non-volatile collection of data used
primarily for organizational decision-making. (Bill Inmon, 1996)
2. Key Features of Data Warehousing
• Integrated Data: Combines data from various sources across the enterprise.
• Historical Data: Helps analyze trends and patterns over time.
• Summarized Data: Provides high-level insights for decision-making.
• What-If Analysis: Supports scenario-based evaluations.
3. Evolution of Data Systems
• 1960s: Batch reports, inflexible, and expensive.
• 1970s: Terminal-based DSS (Decision Support Systems) and EIS (Executive Information
Systems).
• 1980s: Desktop tools with query capabilities, limited to operational databases.
• 1990s: Integrated data warehousing with OLAP engines for advanced decision support.
4. Differences: OLTP vs. Data Warehouse
Aspect OLTP Data Warehouse
Purpose Run business operations Analyze business data
Data Type Current, detailed Historical, summarized
Access Repetitive, structured tasks Ad-hoc, multidimensional
queries
Users Clerks, salespeople Managers, knowledge
workers
Performance Transaction throughput Query throughput
Database Size 100MB–100GB 100GB–terabytes
5. Why Separate a Data Warehouse?
• Performance: OLTP systems are optimized for transactional tasks, not complex queries.
• Specialized Design: Data warehouses require specific methods for multidimensional
queries and views.
6. Decision Support & OLAP
Decision Support:
• Helps manage and control business operations.
• Historical data optimized for analysis and inquiry.
• Ad-hoc queries to understand business trends.
OLAP (Online Analytical Processing):
• Facilitates complex queries like:
- 'Which customers are most likely to switch to competitors?'
- 'What promotions have the largest revenue impact?'
7. Data Representation in Warehouses
• Data Cube: Multidimensional representation of data for complex queries.
• Schemas:
- Star Schema: Simplified structure with one fact table and multiple dimension tables.
- Snowflake Schema: More normalized schema with additional dimension tables.
8. Data Warehouse vs. Data Mart
• Data Warehouse: Centralized repository for the entire organization.
• Data Mart: Department-specific, smaller, and customized subset of the data warehouse.
Characteristics of a Data Mart:
• Small, flexible, customized.
• Department-focused.
• Sources data from the central warehouse.
9. Applications of Data Warehousing
• Business Insights:
- Understand customer loyalty and transaction behaviors.
- Evaluate impacts of pricing strategies on ROI.
- Forecast trends using historical data.
• Operational Improvements:
- Manage inventory effectively.
- Enhance supplier collaborations.
10. Wal-Mart Case Study
• Old Paradigm: Inventory management and supplier promotions managed separately.
• New Paradigm: Just-in-time restocking and supplier integration with daily updates.
• System Highlights: NCR system with 24TB disk space, processing billions of rows.
11. Problems with Data Mart-Centric Solutions
• Lack of integration with the broader enterprise data.
• Redundancy and inefficiency in data management.
True Warehouse: A unified and centralized approach to managing organizational data for
decision support.