Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views11 pages

Data Warehousing - Lecture - 5

The document discusses the components and processes involved in data warehousing, including the sourcing of external data, the staging area for data extraction, transformation, and loading, as well as data storage and information delivery mechanisms. It highlights the importance of a separate staging area for data preparation and the need for metadata to facilitate user access to the data warehouse. The document also outlines the distinct functions of data loading and the different types of users who interact with the data warehouse.

Uploaded by

h20240086
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views11 pages

Data Warehousing - Lecture - 5

The document discusses the components and processes involved in data warehousing, including the sourcing of external data, the staging area for data extraction, transformation, and loading, as well as data storage and information delivery mechanisms. It highlights the importance of a separate staging area for data preparation and the need for metadata to facilitate user access to the data warehouse. The document also outlines the distinct functions of data loading and the different types of users who interact with the data warehouse.

Uploaded by

h20240086
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Warehousing

Dr. L. Rajya Lakshmi


Data warehouse: source data
• External data:
• Data from external sources such as industry related statistics produced by
external agencies and national statistical offices
• To spot industry trends and for performance comparison
• The format of this external data may not match with the enterprise data
Data warehouse: staging area
• Functions to be performed to make data ready to store are:
• Extraction
• Transformation
• Loading
• Data staging:
• provides a place and an area
• A set of functions to clean, change, combine, convert, and prepare
• Why a separate staging area?
• Data come from different operational systems
• To create a subject-oriented view of the data
• Separate area is required to prepare the data
Data warehouse: staging area
• Data extraction
• Has to deal with numerous data sources
• Appropriate technique has to be identified for each source
• Relational database systems, legacy systems, flat files, etc
• Can use tools available or can write in-house programs to extract data
• Extracted data is stored in a separate environment (need not be) for easy
movement into the data warehouse.
Data warehouse: staging area
• Data transformation
• More challenging than data extraction
• Transformation tasks should be adaptable to revised data as well
• First you clean the data
• Correction of spelling mistakes, resolving conflicts
• Providing default values for data elements with missing values
• Deletion of duplicates, etc.
• Standardization of data elements
• data types and Field length for same data elements
• If two or more items from different systems have the same meaning, then we have to
resolve a synonym
• If the same term has different meanings in different systems, then we have to resolve a
homonym
Data warehouse: staging area
• Data transformation
• Combine the pieces of data from different sources
• Combining task may involve combining data from a single source record or data
elements from multiple records.
• Purging useless data
• Sorting and merging on a large scale
• Assignment of surrogate keys?
• Summarization
• Ex: Total units of each product sold at each store
• You have a collection of integrated data that is cleaned, standardized and
summarized
Data warehouse: staging area
• Two distinct functions for data loading
• What are they?
• Initial load
• Incremental data load
Data warehouse: staging area
Data warehouse: data storage
• Storage area of a data warehouse is kept different from that of
operational systems, why?
• The purpose of both the systems is different
• The formats of both the data are different
• Operational data could change very frequently
• Data in DW repository has to be stable
• DW data repositories are open (generally RDBMSs)
• DWs also employ MDDBs (especially summary data)
Data warehouse: information delivery
• Ad hoc reports: for casual users
• Complex queries, multi-dimensional and statistical analysis: business
analysts and power users
• Executive information systems: senior executives and high-level
managers
• Can provide online query and report
Data warehouse: metadata
• Similar to the data dictionary or data catalogue of a database management
system
• Data about the data in the data warehouse (Yellow pages for data
warehouse)
• Three categories
• Operational metadata
• Extraction and transformation metadata
• End-user metadata
• Operational metadata: information is tied to the original sources
• Extraction and transformation metadata: extraction frequency, extraction
methods, business rules for extraction, and information about all data
transformations
• End-user metadata: Navigational map of data warehouse; enables end-
users to get the required information using their own business terminology

You might also like