100% found this document useful (1 vote)

5K views24 pages

UNIT - 1 - Datawarehouse & Data Mining

The document provides an overview of data warehousing and data mining. It discusses key concepts such as what a data warehouse is, its characteristics of being subject-oriented, integrated, time-variant and non-volatile. It also describes the typical components of a data warehouse including the data warehouse database, ETL tools, metadata, query tools and data marts. Finally, it discusses mapping a data warehouse to a multiprocessor architecture and different types of database parallelism.

Uploaded by

Gamer Bhagvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

5K views24 pages

UNIT - 1 - Datawarehouse & Data Mining

Uploaded by

Gamer Bhagvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

KCA012: Data Warehousing & Data Mining

UNIT-1
Data Warehouse Introduction
A data warehouse is a collection of data marts representing historical data from
different operations in the company.
The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in
the following way: "A data warehouse is a subject-oriented, integrated, time-
variant and non-volatile collection of data in support of management's decision
making process".
 A data warehouse is constructed by integrating data from multiple
heterogeneous sources.
 A data warehouse is a database, which is kept separate from the
organization's operational database.
 It possesses consolidated historical data, which helps the organization to
analyze its business.
 A data warehouse helps executives to organize, understand, and use their
data to take strategic decisions.
 Data warehouse systems help in the integration of diversity of application
systems.
 Data warehouse is an information system that contains historical and
commutative data from single or multiple sources. It simplifies reporting and
analysis process of the organization.
 A data warehouse system helps in consolidated historical data analysis.
Data warehouse is an information system that contains historical and commutative
data from single or multiple sources. It simplifies reporting and analysis process of
the organization. It is also a single version of truth for any company for decision
making and forecasting.

Characteristics of Data warehouse

 Subject-Oriented
 Integrated
 Time-variant
 Non-volatile

 Subject Oriented − A data warehouse is subject oriented because it provides

information around a subject rather than the organization's ongoing
operations. These subjects can be product, customers, suppliers, sales,
revenue, etc. A data warehouse does not focus on the ongoing operations,
rather it focuses on modeling and analysis of data for decision making.

1
 Integrated − A data warehouse is constructed by integrating data from
heterogeneous sources such as relational databases, flat files, etc. This
integration enhances the effective analysis of data.
 Time Variant − The data collected in a data warehouse is identified with a
particular time period. The data in a data warehouse provides information
from the historical point of view.
 Non-volatile − Non-volatile means the previous data is not erased when new
data is added to it. A data warehouse is kept separate from the operational
database and therefore frequent changes in operational database is not
reflected in the data warehouse.

DATA WAREHOUSE COMPONENTS

The data warehouse is based on an RDBMS server which is a central information
repository that is surrounded by some key components to make the entire
environment functional, manageable and accessible

There are mainly five components of Data Warehouse:

Data Warehouse Database:
The central database is the foundation of the data warehousing environment. This
database is implemented on the RDBMS technology. Although, this kind of

2
implementation is constrained by the fact that traditional RDBMS system is
optimized for transactional database processing and not for data warehousing. For
instance, ad-hoc query, multi-table joins, aggregates are resource intensive and
slow down performance.
Sourcing, Acquisition, Clean-up and Transformation Tools (ETL)
The data sourcing, transformation, and migration tools are used for performing all
the conversions, summarizations, and all the changes needed to transform data into
a unified format in the data warehouse. They are also called Extract, Transform
and Load (ETL) Tools.
Metadata
The name Meta Data suggests some high- level technological concept. However, it
is quite simple. Metadata is data about data which defines the data warehouse. It is
used for building, maintaining and managing the data warehouse. Metadata can be
classified into following categories:
1. Technical Meta Data: This kind of Metadata contains information about
warehouse which is used by Data warehouse designers and administrators.
2. Business Meta Data: This kind of Metadata contains detail that gives end-users
a way easy to understand information stored in the data warehouse.

Query Tools
One of the primary objects of data warehousing is to provide information to
businesses to make strategic decisions. Query tools allow users to interact with the
data warehouse system.
These tools fall into four different categories:
1. Query and reporting tools
2. Application Development tools
3. Data mining tools
4. OLAP tools

3
Data Marts
A data mart is an access layer which is used to get data out to the users. It is
presented as an option for large size data warehouse as it takes less time and
money to build. However, there is no standard definition of a data mart is differing
from person to person.

BUILDING A DATA WAREHOUSE

In general, building any data warehouse consists of the following steps:
1. Extracting the transactional data from the data sources into a staging area
2. Transforming the transactional data
3. Loading the transformed data into a dimensional database
4. Building pre-calculated summary values to speed up report generation
5. Building (or purchasing) a front-end reporting tool

Extracting Transactional Data:

A large part of building a DW is pulling data from various data sources and
placing it in a central storage area

Transforming Transactional Data:

An equally important and challenging step after extracting is transforming and
relating the data extracted from multiple sources.

4
Creating a Dimensional Model:
The third step in building a data warehouse is coming up with a dimensional
model. Most modern transactional systems are built using the relational model.
The relational database is highly normalized; when designing such a system

Loading the Data:

After you've built a dimensional model, it's time to populate it with the data in
the staging database. This step only sounds trivial. It might involve combining
several columns together or splitting one field into several columns

Generating Pre calculated Summary Values:

The next step is generating the pre calculated summary values which are
commonly referred to as aggregations. This step has been tremendously simplified
by SQL Server Analysis Services (or OLAP Services,

Building (or Purchasing) a Front-End Reporting Tool

After you've built the dimensional database and the aggregations you can decide
how sophisticated your reporting tools need to be. If you just need the drill-down
capabilities, and your users have Microsoft Office 2000 on their desktops, the
Pivot Table Service of Microsoft Excel 2000 will do the job.

MAPPING THE DATA WAREHOUSE TO A MULTIPROCESSOR

ARCHITECTURE
The functions of data warehouse are based on the relational data base technology.
The relational data base technology is implemented in parallel manner. There are
two advantages of having parallel relational data base technology for data
warehouse:
 Linear Speed up: refers the ability to increase the number of processor to
reduce response time
 Linear Scale up: refers the ability to provide same performance on the same
requests as the database size increases
Types of parallelism:
 Inter query Parallelism: In which different server threads or processes
handle multiple requests at the same time.
 Intra query Parallelism: This form of parallelism decomposes the serial
SQL query into lower level operations such as
scan, join, sort etc. Then these lower level operations are executed
concurrently in parallel.

5
Intra query parallelism can be done in either of two ways:
 Horizontal parallelism: which means that the data base is partitioned
across multiple disks and parallel processing
occurs within a specific task that is performed concurrently on different
processors against different set of data
 Vertical parallelism: This occurs among different tasks. All query
components such as scan, join, sort etc are executed in parallel in a pipelined
fashion. In other words, an output from one task becomes an input into
another task.

Types of DBMS Parallelism

Data Partitioning: Data partitioning is the key component for effective parallel
execution of data base operations. Data Partition can be done in two ways:-
 Random Partitioning: Includes random data striping across multiple disks
on a single server. Another option for random portioning is round robin
fashion partitioning in which each record is placed on the next disk assigned
to the data base.
 Intelligent partitioning: Assumes that DBMS knows where a specific
record is located and does not waste time searching for it across all disks.

2. Data base architectures of parallel processing

There are three DBMS software architecture styles for parallel processing:
 Shared memory or shared everything Architecture
 Shared disk architecture
 Shred nothing architecture

6
2.1 Shared Memory Architecture:
Tightly coupled shared memory systems, illustrated in following figure have the
following characteristics:
 Multiple PUs share memory.
 Each PU has full access to all shared memory through a common bus.
 Communication between nodes occurs via shared memory.
 Performance is limited by the bandwidth of the memory bus.
 It is simple to implement and provide a single system image, implementing
an RDBMS on SMP(symmetric multiprocessor)

2.2 Shared Disk Architecture

Shared disk systems are typically loosely coupled. Such systems, illustrated in
following figure, have the following characteristics:
 Each node consists of one or more PUs and associated memory.
 Memory is not shared between nodes.
 Communication occurs over a common high-speed bus.
 Each node has access to the same disks and other resources.
 A node can be an SMP if the hardware supports it.
 Bandwidth of the high-speed bus limits the number of nodes (scalability) of
the system.
 The Distributed Lock Manager (DLM ) is required.

7
2.3 Shared Nothing Architecture
Shared nothing systems are typically loosely coupled. In shared nothing systems
only one CPU is connected to a given disk. If a table or database is located on that
disk, access depends entirely on the PU which owns it.
Shared nothing systems are concerned with access to disks, not access to memory.
Adding more PUs and disks can improve scale up.

8
Draw the 3-tier data warehouse architecture. Explain ETL process.

Generally a data warehouses adopts a three-tier architecture. Following are the

three tiers of the data warehouse architecture.
Bottom Tier − The bottom tier of the architecture is the data warehouse database
server. It is the relational
database system. We use the back end tools and utilities to feed data into the
bottom tier. These back end tools
and utilities perform the Extract, Clean, Load, and refresh functions.
Middle Tier − In the middle tier, we have the OLAP Server that can be
implemented in either of the following
ways.
By Relational OLAP (ROLAP), which is an extended relational database
management system. The ROLAP
maps the operations on multidimensional data to standard relational operations.
By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and
operations.
9
Top-Tier − This tier is the front-end client layer. This layer holds the query tools
and reporting tools, analysis tools and data mining tools.

Difference between Database System and Data Warehouse:

Database System:
Database System is used in traditional way of storing and retrieving data. The
major task of database system is to perform query processing. These systems are
generally referred as online transaction processing system. These systems are used
day to day operations of an organization.

Data Warehouse:
Data Warehouse is the place where huge amount of data is stored. It is meant for
users or knowledge workers in the role of data analysis and decision making.
These systems are supposed to organize and present data in different format and
different forms in order to serve the need of the specific user for specific purpose.
These systems are referred as online analytical processing.

Database System Data Warehouse

It supports operational processes.It supports analysis and performance
reporting.
Capture and maintain the data. Explore the data.
Current data. Multiple years of history.
Data is balanced within the scope Data must be integrated and balanced
of this one system. from multiple system.
Data is updated when transaction Data is updated on scheduled processes.
occurs.
Data verification occurs when Data verification occurs after the fact.
entry is done.
100 MB to GB. 100 GB to TB.
ER based. Star/Snowflake.
Application oriented. Subject oriented.
Primitive and highly detailed. Summarized and consolidated.
Flat relational. Multidimensional.

10
MULTIDIMENSIONAL DATA MODEL
Multidimensional data model stores data in the form of data cube. Mostly, data
warehousing supports two or three-dimensional cubes.
A data cube allows data to be viewed in multiple dimensions. Dimensions are
entities with respect to which an organization wants to keep records. For example
in store sales record, dimensions allow the store to keep track of things like
monthly sales of items and the branches and locations. A multidimensional
database helps to provide data-related answers to complex business queries quickly
and accurately. Data warehouses and Online Analytical Processing (OLAP) tools
are based on a multidimensional data model. OLAP in data warehousing enables
users to view data from different angles and dimensions

The multi-Dimensional Data Model is a method which is used for ordering data in
the database along with good arrangement and assembling of the contents in the
database.
The Multi-Dimensional Data Model allows customers to interrogate analytical
questions associated with market or business trends, unlike relational databases
which allow customers to access data in the form of queries. They allow users to
rapidly receive answers to the requests which they made by creating and
examining the data comparatively fast.
OLAP (online analytical processing) and data warehousing uses multi-dimensional
databases. It is used to show multiple dimensions of the data to users.

Working on a Multidimensional Data Model

11
The following stages should be followed by every project for building a Multi-
Dimensional Data Model:
Stage 1: Assembling data from the client: In first stage, a Multi-Dimensional Data
Model collects correct data from the client. Mostly, software professionals provide
simplicity to the client about the range of data which can be gained with the
selected technology and collect the complete data in detail.
Stage 2: Grouping different segments of the system: In the second stage, the
Multi-Dimensional Data Model recognizes and classifies all the data to the
respective section they belong to and also builds it problem-free to apply step by
step.
Stage 3: Noticing the different proportions: In the third stage, it is the basis on
which the design of the system is based. In this stage, the main factors are
recognized according to the user’s point of view. These factors are also known as
“Dimensions”.
Stage 4: Preparing the actual-time factors and their respective qualities: In the
fourth stage, the factors which are recognized in the previous step are used further
for identifying the related qualities. These qualities are also known as “attributes”
in the database.
Stage 5: Finding the actuality of factors which are listed previously and their
qualities: In the fifth stage, A Multi-Dimensional Data Model separates and
differentiates the actuality from the factors which are collected by it. These
actually play a significant role in the arrangement of a Multi-Dimensional Data
Model.
Stage 6: Building the Schema to place the data, with respect to the information
collected from the steps above: In the sixth stage, on the basis of the data which
was collected previously, a Schema is built.
For Example:
1. Let us take the example of a firm. The revenue cost of a firm can be recognized
on the basis of different factors such as geographical location of firm’s workplace,
products of the firm, advertisements done, time utilized to flourish a product, etc.

12
Let us take the example of the data of a factory which sells products per quarter in
Bangalore. The data is represented in the table given below:

In the above given presentation, the factory’s sales for Bangalore are, for the time
dimension, which is organized into quarters and the dimension of items, which is
sorted according to the kind of item which is sold. The facts here are represented
in rupees (in thousands).
Now, if we desire to view the data of the sales in a three-dimensional table, then
it is represented in the diagram given below.

Let us consider the data according to item, time and location (like Kolkata, Delhi,
and Mumbai). Here is the table:

13
This data can be represented in the form of three dimensions conceptually, which
is shown in the image below:

Advantages of Multi-Dimensional Data Model

The following are the advantages of a multi-dimensional data model:
 A multi-dimensional data model is easy to handle.
 It is easy to maintain.
 Its performance is better than that of normal databases (e.g. relational
databases).

14
 The representation of data is better than traditional databases. That is
because the multi-dimensional databases are multi-viewed and carry
different types of factors.
 It is workable on complex systems and applications, contrary to the simple
one-dimensional database systems
Disadvantages of Multi-Dimensional Data Model
The following are the disadvantages of a Multi-Dimensional Data Model:
 The multi-dimensional Data Model is slightly complicated in nature and it
requires professionals to recognize and examine the data in the database.
 During the work of a Multi-Dimensional Data Model, when the system
caches, there is a great effect on the working of the system.
 It is complicated in nature due to which the databases are generally dynamic
in design.

Data Cube
A data cube enables data to be modeled and viewed in several dimensions. It is
represented by dimensions and facts. In other terms, dimensions are the views or
entities related to which an organization is required to keep records.
When data is grouped or combined in multidimensional matrices called Data
Cubes. The data cube method has a few alternative names or a few variants, such
as "Multidimensional databases," "materialized views," and "OLAP (On-Line
Analytical Processing)."
For example, a relation with the schema sales (part, supplier, customer, and sale-
price) can be materialized into a set of eight views as shown in fig,
where psc indicates a view consisting of aggregate function value (such as total-
sales) computed by grouping three attributes part, supplier, and
customer, p indicates a view composed of the corresponding aggregate function
values calculated by grouping part alone, etc.

15
A data cube is created from a subset of attributes in the database.
The model view data in the form of a data cube. OLAP tools are based on the
multidimensional data model. Data cubes usually model n-dimensional data.
A data cube enables data to be modeled and viewed in multiple dimensions. A
multidimensional data model is organized around a central theme, like sales and
transactions.
.

Example: In the 2-D representation, we will look at the All Electronics sales data
for items sold per quarter in the city of Vancouver. The measured display in
dollars sold (in thousands).

3-Dimensional Cuboids
Let suppose we would like to view the sales data with a third dimension. For
example, suppose we would like to view the data according to time, item as well as
the location for the cities Chicago, New York, Toronto, and Vancouver. The
measured display in dollars sold (in thousands). These 3-D data are shown in the
table. The 3-D data of the table are represented as a series of 2-D tables.

16
Let us suppose that we would like to view our sales data with an additional fourth
dimension, such as a supplier.
In data warehousing, the data cubes are n-dimensional. The cuboid which holds the
lowest level of summarization is called a base cuboid.
For example, the 4-D cuboid in the figure is the base cuboid for the given time,
item, location, and supplier dimensions.

17
Figure is shown a 4-D data cube representation of sales data, according to the
dimensions time, item, location, and supplier. The measure displayed is dollars
sold (in thousands).
The topmost 0-D cuboid, which holds the highest level of summarization, is
known as the apex cuboid. In this example, this is the total sales, or dollars sold,
summarized over all four dimensions.
The lattice of cuboid forms a data cube. The figure shows the lattice of cuboids
creating 4-D data cubes for the dimension time, item, location, and supplier. Each
cuboid represents a different degree of summarization.

18
Question: Explain star, snowflakes and fact constellation schema.
OR
SCHEMAS FOR MULTI-DIMENSIONAL DATA MODEL
Schema is a logical description of the entire database. It includes the name and
description of records of all record types including all associated data-items and
aggregates. Much like a database, a data warehouse also requires to maintain a
schema. A database uses relational model, while a data warehouse uses Star,
Snowflake, and Fact Constellation schema.
Star Schema
 Each dimension in a star schema is represented with only one-dimension
table.
 This dimension table contains the set of attributes.
 The following diagram shows the sales data of a company with respect to the
four dimensions, namely time, item, branch, and location.
 There is a fact table at the center. It contains the keys to each of four
dimensions.
 The fact table also contains the attributes, namely dollars sold and units sold.

Star Schema

19
Snowflake Schema
 Some dimension tables in the Snowflake schema are normalized.
 The normalization splits up the data into additional tables.
 Unlike Star schema, the dimensions table in a snowflake schema are
normalized. For example, the item dimension table in star schema is
normalized and split into two dimension tables, namely item and supplier
table.
 Now the item dimension table contains the attributes item_key, item_name,
type, brand, and supplier-key.
 The supplier key is linked to the supplier dimension table. The supplier
dimension table contains the attributes supplier_key and supplier_type.

Snowflake Schema
20
Fact Constellation Schema
 A fact constellation has multiple fact tables. It is also known as galaxy
schema.
 The following diagram shows two fact tables, namely sales and shipping.
 The sales fact table is same as that in the star schema.
 The shipping fact table has the five dimensions, namely item_key, time_key,
shipper_key, from_location, to_location.
 The shipping fact table also contains two measures, namely dollars sold and
units sold.
 It is also possible to share dimension tables between fact tables. For
example, time, item, and location dimension tables are shared between the
sales and shipping fact table.

Fact Constellation Schema

21
Give the difference between star and fact constellation multidimensional data
model.

Data Warehouse Applications

22
Data warehouses are widely used in the following fields −
 Financial services
 Banking services
 Consumer goods
 Retail sectors
 Controlled manufacturing

Data Mart
A data mart is focused on a single functional area of an organization and contains a
subset of data stored in a Data Warehouse.
A data mart is a condensed version of Data Warehouse and is designed for use by a
specific department, unit or set of
users in an organization. E.g., Marketing, Sales, HR or finance. It is often
controlled by a single department in an organization.

Types of Data Mart

There are three main types of data mart:
Dependent: Dependent data marts are created by drawing data directly from
operational, external or both sources. A dependent data marts is a logical subset of a
physical subset of a higher data warehouse. These data mart are dependent on the data
warehouse and extract the essential record from it. In this technique, as the data
warehouse creates the data mart; therefore, there is no need for data mart integration. It
is also known as a top-down approach.

23
Independent: Independent data mart is created without the use of a central data
warehouse. In this approach, as all the data marts are designed independently;
therefore, the integration of data marts is required. It is also termed as a bottom-up
approach as the data marts are integrated to develop a data warehouse.

Hybrid: This type of data marts can take data from data warehouses or operational
systems. It allows us to combine input from sources other than a data warehouse.

Data Structures and Algorithm Analysis in C 4th Edition by Mark Weiss 013284737X 9780132847377 Full Chapters Included
No ratings yet
Data Structures and Algorithm Analysis in C 4th Edition by Mark Weiss 013284737X 9780132847377 Full Chapters Included
108 pages
Change IMEI
50% (2)
Change IMEI
11 pages
Epanet Training
No ratings yet
Epanet Training
21 pages
MCQs Unit 2 Measures of Central Tendency
100% (1)
MCQs Unit 2 Measures of Central Tendency
16 pages
DDS Unit - 2
No ratings yet
DDS Unit - 2
7 pages
2024010865
No ratings yet
2024010865
1 page
20240108100
No ratings yet
20240108100
1 page
Acrobat DC
No ratings yet
Acrobat DC
10 pages
Importance of ICT in Education
No ratings yet
Importance of ICT in Education
13 pages
Unit-2 SQE
No ratings yet
Unit-2 SQE
8 pages
How To Install Java
No ratings yet
How To Install Java
17 pages
Software Architecture 2019
No ratings yet
Software Architecture 2019
295 pages
Cloud Native Computing: Daniyal Nagori
No ratings yet
Cloud Native Computing: Daniyal Nagori
31 pages
Azure DevOps & GitHub for Digital Transformation
No ratings yet
Azure DevOps & GitHub for Digital Transformation
39 pages
1734787260059cloud Computing AKTU Notes Password Chaudhary - Unlocked
No ratings yet
1734787260059cloud Computing AKTU Notes Password Chaudhary - Unlocked
55 pages
PEP Yearbook Methodology
No ratings yet
PEP Yearbook Methodology
5 pages
Usability Design Principles
No ratings yet
Usability Design Principles
17 pages
Course Registration System Analysis
No ratings yet
Course Registration System Analysis
7 pages
Bluetooth Barcode Reader Manual
No ratings yet
Bluetooth Barcode Reader Manual
29 pages
26 Must Have AI Tools 1696814944
No ratings yet
26 Must Have AI Tools 1696814944
28 pages
SIM Overview Servicenow
No ratings yet
SIM Overview Servicenow
9 pages
Data Warehouse Aktu Question Papers
100% (1)
Data Warehouse Aktu Question Papers
7 pages
Cryptography and Network Security Quantum PDF
No ratings yet
Cryptography and Network Security Quantum PDF
44 pages
Data Warehouse Architecture
100% (2)
Data Warehouse Architecture
5 pages
Unit 6 Data Warehousing
No ratings yet
Unit 6 Data Warehousing
40 pages
DDS Unit - 5
No ratings yet
DDS Unit - 5
27 pages
2024010730
No ratings yet
2024010730
3 pages
DWDM
No ratings yet
DWDM
15 pages
Types of Digital Data
100% (1)
Types of Digital Data
34 pages
What Is Google App Engine? - Definition From TechTarget
No ratings yet
What Is Google App Engine? - Definition From TechTarget
8 pages
SE (KCS-601) Unit-1 Notes
No ratings yet
SE (KCS-601) Unit-1 Notes
16 pages
Object-Oriented Analysis Guide
0% (1)
Object-Oriented Analysis Guide
20 pages
Daa Lab Manual Kcs553 2022-23
No ratings yet
Daa Lab Manual Kcs553 2022-23
89 pages
Dbms Unit 1 Acoording To AKTU Syllabus
100% (1)
Dbms Unit 1 Acoording To AKTU Syllabus
22 pages
TW Ebook Modern Data Engineering Playbook
No ratings yet
TW Ebook Modern Data Engineering Playbook
38 pages
AKTU Syllabus CS 3rd Yr
75% (4)
AKTU Syllabus CS 3rd Yr
4 pages
Project Management & Entrepreneurship
100% (1)
Project Management & Entrepreneurship
59 pages
Data Warehousing Notes Aktu
50% (2)
Data Warehousing Notes Aktu
10 pages
UNIT-3 Part 2 Cloud Computing
0% (1)
UNIT-3 Part 2 Cloud Computing
10 pages
DH&DM Unit-1
No ratings yet
DH&DM Unit-1
16 pages
Distributed Databases & Security
No ratings yet
Distributed Databases & Security
27 pages
SPPU 2022 Solved Question Paper DWDM
50% (2)
SPPU 2022 Solved Question Paper DWDM
25 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Big Data Analytics Notes
67% (3)
Big Data Analytics Notes
16 pages
Computer Graphics Quantum
No ratings yet
Computer Graphics Quantum
74 pages
Dbms Lab File - Kit
57% (7)
Dbms Lab File - Kit
70 pages
CS402 Data Mining and Warehousing Question Bank
No ratings yet
CS402 Data Mining and Warehousing Question Bank
6 pages
UVI Soundbank EULA & User Guide
No ratings yet
UVI Soundbank EULA & User Guide
10 pages
PHP 2
No ratings yet
PHP 2
19 pages
Cloud Computing Notes B Tech AKTU by Krazy Kreation (Kulbhushan)
100% (2)
Cloud Computing Notes B Tech AKTU by Krazy Kreation (Kulbhushan)
4 pages
It Final Exams Timetable 2024-2025
No ratings yet
It Final Exams Timetable 2024-2025
3 pages
HP Insight Management Agents 10.20 Installation Guide
No ratings yet
HP Insight Management Agents 10.20 Installation Guide
19 pages
Important Questions Unit-1: I Introduction: Algorithms, Analyzing Algorithms, Complexity of Algorithms, Growth of
No ratings yet
Important Questions Unit-1: I Introduction: Algorithms, Analyzing Algorithms, Complexity of Algorithms, Growth of
7 pages
Data Warehouse and Data Mining Notes
No ratings yet
Data Warehouse and Data Mining Notes
66 pages
ER Model Slides
No ratings yet
ER Model Slides
58 pages
Asterisk vs. ShoreTel vs. Cisco PBX Comparison
No ratings yet
Asterisk vs. ShoreTel vs. Cisco PBX Comparison
4 pages
B.Tech AI Exam Paper 2022-23
100% (1)
B.Tech AI Exam Paper 2022-23
2 pages
Komplete Kontrol MK3 Manual English 17102023
No ratings yet
Komplete Kontrol MK3 Manual English 17102023
86 pages
Nutanix Licensing Quick Reference Guide2
No ratings yet
Nutanix Licensing Quick Reference Guide2
5 pages
ARM Cortex-M0 DesignStart ReleaseNote
No ratings yet
ARM Cortex-M0 DesignStart ReleaseNote
21 pages
Basics of ICT in Education
No ratings yet
Basics of ICT in Education
40 pages
Ankit Bansal Data Science 24
No ratings yet
Ankit Bansal Data Science 24
1 page
ADBMS Question Paper
100% (1)
ADBMS Question Paper
2 pages
Data Analytics III I
No ratings yet
Data Analytics III I
86 pages
IT8075-UNIT-5-Best-methods-of-staff-selection-Motivation SPM
No ratings yet
IT8075-UNIT-5-Best-methods-of-staff-selection-Motivation SPM
35 pages
Mapping The Data Warehouse Architecture To Multiprocessor Architecture
No ratings yet
Mapping The Data Warehouse Architecture To Multiprocessor Architecture
15 pages
Web Tech Quantum Updated
No ratings yet
Web Tech Quantum Updated
204 pages
Oosd Unit - 5
No ratings yet
Oosd Unit - 5
24 pages
Interoperability and Compatibility Between Autodesk Civil 3D Versions
No ratings yet
Interoperability and Compatibility Between Autodesk Civil 3D Versions
1 page
Unit-5 NoSQL Data Management-Big Data
100% (2)
Unit-5 NoSQL Data Management-Big Data
14 pages
Unit 1 - Data Mining and Warehousing - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Data Mining and Warehousing - WWW - Rgpvnotes.in
16 pages
A Convergence of Key Trends: Kept Large Amounts of Information Information On Tape
No ratings yet
A Convergence of Key Trends: Kept Large Amounts of Information Information On Tape
14 pages
Rmarkdown::: Cheat Sheet
No ratings yet
Rmarkdown::: Cheat Sheet
2 pages
Quantum Design and Analysis of Algorithms Full PDF
100% (1)
Quantum Design and Analysis of Algorithms Full PDF
196 pages
Project Management & Entrepreneurship KHU-702
100% (1)
Project Management & Entrepreneurship KHU-702
82 pages
IV-cse DM Viva Questions
No ratings yet
IV-cse DM Viva Questions
10 pages
Data WareHouse Previous Year Question Paper
100% (1)
Data WareHouse Previous Year Question Paper
10 pages
DWDM UNIT-1 Lecture Notes
No ratings yet
DWDM UNIT-1 Lecture Notes
15 pages
Quantum Data Warehousing Data Mining Koe 093
No ratings yet
Quantum Data Warehousing Data Mining Koe 093
67 pages
Database Systems Evolution
100% (2)
Database Systems Evolution
33 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Warehousing and Data Mining
100% (1)
Warehousing and Data Mining
70 pages
SPM Latest Quantum
No ratings yet
SPM Latest Quantum
91 pages
OOSD Quantum (Unit 2,3,4,5)
67% (3)
OOSD Quantum (Unit 2,3,4,5)
136 pages
Pattern Warehouse
No ratings yet
Pattern Warehouse
6 pages
Compiler Design Quantum PDF
100% (1)
Compiler Design Quantum PDF
211 pages
Quantum Web Technology Kcs 602
No ratings yet
Quantum Web Technology Kcs 602
338 pages
Define Object Oriented Programming.: Unit I & Ii
No ratings yet
Define Object Oriented Programming.: Unit I & Ii
27 pages
Ibm All 5 Unit Ibm Notes
No ratings yet
Ibm All 5 Unit Ibm Notes
29 pages
Koe093 Data Warehousing Data Mining
100% (1)
Koe093 Data Warehousing Data Mining
2 pages
ITCS Unit 1 Notes knc552
100% (1)
ITCS Unit 1 Notes knc552
23 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
Software Engineering Notes Unit-5
100% (1)
Software Engineering Notes Unit-5
10 pages
Computer Networks Lab File
No ratings yet
Computer Networks Lab File
35 pages
Koe-064 Object Oriented Programming
0% (2)
Koe-064 Object Oriented Programming
2 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Challenges InThreading A Loop - Doc1
100% (2)
Challenges InThreading A Loop - Doc1
6 pages

UNIT - 1 - Datawarehouse & Data Mining

Uploaded by

UNIT - 1 - Datawarehouse & Data Mining

Uploaded by

KCA012: Data Warehousing & Data Mining

Characteristics of Data warehouse

 Subject Oriented − A data warehouse is subject oriented because it provides

DATA WAREHOUSE COMPONENTS

There are mainly five components of Data Warehouse:

BUILDING A DATA WAREHOUSE

Extracting Transactional Data:

Transforming Transactional Data:

Loading the Data:

Generating Pre calculated Summary Values:

Building (or Purchasing) a Front-End Reporting Tool

MAPPING THE DATA WAREHOUSE TO A MULTIPROCESSOR

Types of DBMS Parallelism

2. Data base architectures of parallel processing

2.2 Shared Disk Architecture

Generally a data warehouses adopts a three-tier architecture. Following are the

Difference between Database System and Data Warehouse:

Database System Data Warehouse

Working on a Multidimensional Data Model

Advantages of Multi-Dimensional Data Model

Fact Constellation Schema

Data Warehouse Applications

Types of Data Mart

You might also like