0% found this document useful (0 votes)

16 views46 pages

Module1 Part3

Uploaded by

amvarshney123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views46 pages

Module1 Part3

Uploaded by

amvarshney123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Operational Information v/s Strategic

Information

• Operational computer systems did provide information to

Operational
run day-to-day operations, and answer’s daily questions.
• Also called online transactional processing system (OLTP)
• Data is read or manipulated with each transaction
Information • Transactions/queries are simple, and easy to write
• Usually for middle management

• Sales systems

Examples • Hotel reservation systems

• Railway Reservation system
Operational System

• OLTP Systems used to run

the day to day core
business of company.

• Optimized to handle large

numbers of simple
read/write transactions
Strategic Information

• Data set are mounting everywhere, but not useful for decision support
• Decision-making require complex questions from integrated data.
• Decision makers wanted to know which project lines to strengthen
and which markets to strengthen.

They need Strategic Information

• Strategic information is required by the executives and managers to

formulate business strategies, establish goals, set objectives and
monitor results etc.

Example:
• Gain market share by 10% in the next 3 yrs.
• Bring 3 new products to market in 2 yrs.
A New Type of System Environment is Required

Desired features are

• Data is designed for analytical tasks

• Data from multiple applications
• Read-intensive data usage
• Direct interaction with the system by the users without IT
assistance
• Content updated periodically and stable
• Content to include current and historical data
• Ability for users to run queries and get results online
• Ability for users to initiate reports
Data warehouse

• A decision support database that is maintained separately

from the organization’s operational database

• Support information processing by providing a solid

platform of consolidated, historical data for analysis.

• Also allows to create a lots of reports by the use of

mining tools.
Data Warehouse

• A data warehouse is a Subject-oriented, integrated, time-variant

and non-volatile collection of data in support of Management’s
decision making process.

Data warehousing-
Process of constructing and using data warehouses. Process of
extracting and transferring operational data into informational data &
loading it into a central data store (warehouse).
Characteristics of Data warehouse

Subject
Oriented

Integrated

Time
Variant

Non
Volatile
1. Subject oriented data

 Organized around major subjects, such as customer, product,

sales.

 Focusing on the modeling and analysis of data for decision

makers, not on daily operations or transaction processing.

 Provide a simple and concise view around particular subject

issues by excluding data that are not useful in the decision
support process.
 In operational systems data is stored by individual
applications or business process. Like data about individual
order , customer etc.

 For example in banking industry data sets for saving or

checking accounts contain data about that particular
application.

 But in DW data is stored by real world business objectives or

events not by the applications.
• In DW subject is the organization method
• Subjects vary with enterprise
2. Integrated data

• Data in DW comes from several operational systems.

• Different datasets have different file formats.

Example: Data for subject Account comes from 3 different data

sources.
• Before moving the data into the data warehouse, you
have to go through a process of transformation,
consolidation, and integration of the source data.

• Here are some of the items that would need

standardization:
 Naming conventions
 Codes
 Data attributes
Time variant data
In operational systems the stored data contains current values.

Like in saving account system the balance is the current balance of the customer.

But the data in the DW is meant for analysis and decision making.

Comparative analysis is one of the best techniques for business performance

evaluation

Time is critical factor for comparative analysis.

Every data structure in DW contains time element.

Contd…

So, DW has to contain historical data and current values.

Data is stored as snapshots over past and current periods.

The time-variant nature of the data in a data warehouse

Allows for analysis of the Relates information to the Enables forecasts for the
past present future
Non Volatile Data

• Data from operational systems are moved into DW after

specific intervals
• Every business transaction don’t update in DW
• Data from DW is not deleted
• Data is neither changed by individual transactions
OLAP AND OLTP

OLTP Vs OLAP

OLTP OLAP
Users Cleark, IT Professional Knowledge worker
Function Day to day operations Long term informational
requirements Decision
Support
DB Design Application Oriented Subject Oriented
Data Current, Up-to-date Historical, summar-
detailed, flat relational ized, multi-dimensi-
onal, integrated,
Consolidated
Unit of work Short,simple transaction Complex query
Access Read/write, index/ hash Lot of scans
on primary key
No. of records Tens Millions
accessed
No. of Users Thousands Hundreds
DB size 100 MB-GB 100 GB-TB
Metric Transaction throughput Query throughput, response

ASET 16/55
Example of OLTP queries

Query 1: What is the salary of Mr. Mishra?

Query 2: What is the address and phone number of a person in-charge of the
supplies department?

Query 3: How many employees have received an excellent credential in latest

appraisal?
.

17
Example of OLAP queries

Query 1: How is the employee attrition scene changing over the years across
company?

Query 2: Is there a correlation between the geographical location of company

unit and excellent employee appraisals?

Query 3: Is it financially viable to continue over manufacturing unit in Noida?

18
OLAP AND OLTP

OLAP

OnLineAnalyticalProcessing is computer
processing that enables a user to easily
and selectively extract and view data
from different view points.

OLAP allowed users to analyse

database information from multiple
database systems at once.

OLAP data is stored in

multidimensional databases.

ASET 19/55
Data Warehouse
DATA WAREHOUSE: ACHITECTURE

Datawarehouse: Architecture

ASET 22/55
Data Source

Production data
Comes from various operational systems of the enterprise.
Internal Data
• Like private documents, customer profiles, departmental databases etc.

External Data
• Statistics data produced by external agencies. Used for comparing
performance against other organizations.

Archived Data
• In every operational systems, the old data periodically stored in archived
files or on disk storage. This data is also required as the data warehouse
keeps historical snapshots of data.
Data staging component
• After data is extracted, data is to be prepared
• Data extracted from sources needs to be changed,
converted and made ready in suitable format

• Three major functions to make data ready (ETL)

• Extract
• Transform
• Load

• Staging area provides a place and area with a set of

functions to
• Clean
• Change
• Combine
• Convert
ETL(Extraction, Transformation and Loading process)

Data extraction
get data from multiple, heterogeneous, and external sources
Data cleaning
detect errors in the data and rectify them when possible
Data transformation
convert data from legacy or host format to warehouse format
Load
sort, summarize, consolidate, compute views, check integrity,
Refresh
propagate the updates from the data sources to the warehouse

25
Data Loading: Data Movement to the Data Warehouse
Data Marts
Data mart is a subset of data warehouse and is
oriented to specific purpose.

Data warehouse is seen as a collection of data marts.

Data marts can also be seen as small warehouse for

OLAP activities within a given segment.

e.g. In Indian railway system , we have one segment

of railway called express train reservation. So, this
segment can be considered as data mart.

27
Data Mart

The data mart is a subset of the data warehouse and is usually

oriented to a specific business line or team.

Whereas data warehouses have an enterprise-wide depth, the

information in data marts pertains to a single department.
How Data Mart is Different from a Data Warehouse?
Metadata component

• Metadata is the data about the data in the data warehouse.

• Metadata in a data warehouse contains the answers to

questions about the data in the data warehouse.

• Serves as a directory of the contents of the data warehouse

Metadata can hold all kinds of information about DW data like:

• Source for any extracted data.
• Use of that DW data.
• Any kind of data and its values.
• Features of data.
• Transformation logic for extracted data.
• DW tables and their attributes.
• Timestamps
Metadata Repository contain
Metadata are created for the data names and definitions of the
given warehouse, timestamping of extracted data, source of
extracted data, missing fields have been added by data cleaning or
integration process

• Description of data warehouse structure

• Operational metadata
• Details for summarization
• Mapping from the operational environment to datawarehouse
• Data related to system performance
• Business metadata
INTRODUCTION: DM AND KDD PROCESS

Cluster Analysis
How many clusters do u expect..??

ASET 32/55
INTRODUCTION: DM AND KDD PROCESS

Cluster Analysis
Possibility-1

ASET 33/55
INTRODUCTION: DM AND KDD PROCESS

Cluster Analysis
Possibility-2

ASET 34/55
INTRODUCTION: DM AND KDD PROCESS

Outlier Detection

ASET 35/55
INTRODUCTION: DM AND KDD PROCESS

Classification

Data mining technique used to predict group membership for

data.
Two methods

Crispy classification- given an input, the

classifier returns its label
Probabilistic classification- given an input,
the classifier returns its probabilities to
belong to particular class.
useful when some mistakes can be more
costly than others.

ASET 36/55
INTRODUCTION: DM AND KDD PROCESS

Regression/ Forecasting

Considers data’s statistical correlation

mapping without any prior assumption on
functional form of data distribution.

Curve fitting
finds a well defined and known function
underlying the data
theory/expertise helps in this.

ASET 37/55
SUPERVISED AND UNSUPERVISED LEARNING

Machine Learning
Machine learning is to teach computer the capability to
learn without being explicitly programmed.
Types of Learning:
Supervised learning
Training data includes both input and
desired results
Correct target results are known for
some
cases which are provided while training
Construction of proper training, validation
and test set is important.
Fast and accurate.
Should be able to generalize.
Unsupervised learning
Is not provided with correct results while
training
can be used to cluster input data in
Classes on the basis of their statistical
properties only.
Cluster significance and labeling
ASET 38/55
Supervised Machine Learning Methods

Classification- Classification refers to taking an input value and

mapping it to a discrete value. In classification problems, our
output typically consists of classes or categories. This could be
things like trying to predict what objects are present in an image (a
cat/ a dog) or whether it is going to rain today or not.

Regression- Regression is related to continuous data (value

functions). In Regression, the predicted output values are real
numbers. It deals with problems such as predicting the price of a
house or the trend in the stock price at a given time, etc.

Some of the most common algorithms in Supervised Learning

include Support Vector Machines (SVM), Logistic Regression,
Naive Bayes, Neural Networks, K-nearest neighbor (KNN), and
Random Forest.
Supervised Machine Learning
Applications
•Predictive analytics (house prices, stock exchange prices, etc.)
•Text recognition
•Spam detection
•Customer sentiment analysis
•Object detection (e.g. face detection)
Unsupervised Learning Methods
Clustering

Association

Dimensionality reduction
Example: Data Mining on Web framework

Web mining includes:

Data cleaning
Data integration from multiple sources
Warehousing the data
Data cube construction
Data selection for data mining
Data mining
Presentation of mining results
Patterns/ Knowledge used/stored in
knowledge base.

ASET 47/55

DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
A7-R5-Databases Technologies (A Level Syllabus Based Notes)
100% (2)
A7-R5-Databases Technologies (A Level Syllabus Based Notes)
63 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
DWM Unit-I Notes
No ratings yet
DWM Unit-I Notes
9 pages
Module 1
No ratings yet
Module 1
71 pages
Data Warehouse Full Slides
100% (3)
Data Warehouse Full Slides
822 pages
DW Concepts
100% (1)
DW Concepts
40 pages
ETL Testing
No ratings yet
ETL Testing
32 pages
DataminingWarehousing Module 1 PPT Notes
No ratings yet
DataminingWarehousing Module 1 PPT Notes
95 pages
Unit 1
No ratings yet
Unit 1
99 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
DBMS II Seven 7
No ratings yet
DBMS II Seven 7
13 pages
Unit I DWDM
No ratings yet
Unit I DWDM
67 pages
Data Warehousing AND Data Mining: Prepared by
100% (3)
Data Warehousing AND Data Mining: Prepared by
58 pages
Data Warehouse
No ratings yet
Data Warehouse
33 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
57 pages
In T e G R A Ti o N: Integration of Data
No ratings yet
In T e G R A Ti o N: Integration of Data
21 pages
Data Warehouse Concepts & Terminology: - Vamshi Myana
No ratings yet
Data Warehouse Concepts & Terminology: - Vamshi Myana
39 pages
Chapter1 Lat
No ratings yet
Chapter1 Lat
43 pages
Unit 2
No ratings yet
Unit 2
31 pages
Unit I
No ratings yet
Unit I
90 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
38 pages
Data Warehouse
No ratings yet
Data Warehouse
97 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
17 pages
Data Mining Unit-2 Notes
No ratings yet
Data Mining Unit-2 Notes
8 pages
DWDM Book
No ratings yet
DWDM Book
58 pages
DWH Start l2
No ratings yet
DWH Start l2
117 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
134 pages
Chapter 2
No ratings yet
Chapter 2
79 pages
CH3 Data Warehousing
No ratings yet
CH3 Data Warehousing
51 pages
CH 1
No ratings yet
CH 1
65 pages
chp15 16 17 Warehouse NoSQL
No ratings yet
chp15 16 17 Warehouse NoSQL
38 pages
DWHDM 22cse120 Module-1
No ratings yet
DWHDM 22cse120 Module-1
45 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
Data Warehouse OLAP OLTP
No ratings yet
Data Warehouse OLAP OLTP
12 pages
Data Ware House Concepts
No ratings yet
Data Ware House Concepts
12 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
39 pages
Data Warehousing (Chapter 2)
No ratings yet
Data Warehousing (Chapter 2)
21 pages
Module 1-1basic Concepts
No ratings yet
Module 1-1basic Concepts
40 pages
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
100% (1)
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
77 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
43 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
31 pages
Presentation Prepared By:: Aqsa Ashfaq
No ratings yet
Presentation Prepared By:: Aqsa Ashfaq
22 pages
DATA Ware House Mining NOTES
No ratings yet
DATA Ware House Mining NOTES
31 pages
Data Warehouse and Data Sources
No ratings yet
Data Warehouse and Data Sources
18 pages
Business Intelligence/ Data Warehousing: Lakshmi Prashad PMG
100% (1)
Business Intelligence/ Data Warehousing: Lakshmi Prashad PMG
101 pages
Lesson 2. Data Warehouse Basic Concepts
No ratings yet
Lesson 2. Data Warehouse Basic Concepts
18 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
21 pages
Introduction To Data Warehousw
No ratings yet
Introduction To Data Warehousw
92 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
Unit Ii DWDM
No ratings yet
Unit Ii DWDM
10 pages
Datawarehouse Unit-2
No ratings yet
Datawarehouse Unit-2
59 pages
Intro To DW
No ratings yet
Intro To DW
5 pages
ECM XT v2.7 ReleaseNotes en
No ratings yet
ECM XT v2.7 ReleaseNotes en
26 pages
Microsoft .NET Training Modules
No ratings yet
Microsoft .NET Training Modules
25 pages
CSEC IT Exam: Practical Guide
No ratings yet
CSEC IT Exam: Practical Guide
10 pages
PDF Oracle Database Programming Using Java and Web Services 1st Edition Kuassi Mensah Download
100% (16)
PDF Oracle Database Programming Using Java and Web Services 1st Edition Kuassi Mensah Download
47 pages
SAP BCM Disaster Recovery
No ratings yet
SAP BCM Disaster Recovery
8 pages
7 - M.SC Cyber Security Syllabus
No ratings yet
7 - M.SC Cyber Security Syllabus
58 pages
REAL ESTATE MANAGEMENT SYSTEM Report
No ratings yet
REAL ESTATE MANAGEMENT SYSTEM Report
58 pages
TaTaTu Whitepaper
100% (1)
TaTaTu Whitepaper
48 pages
How Does Mongodb Differ From Traditional Relational Databases?
No ratings yet
How Does Mongodb Differ From Traditional Relational Databases?
6 pages
Module 5
No ratings yet
Module 5
15 pages
08-DS Agile SCE Overview - Rev K
100% (1)
08-DS Agile SCE Overview - Rev K
30 pages
IBM Infosphere Information Services Director v8 7 User Guide
No ratings yet
IBM Infosphere Information Services Director v8 7 User Guide
97 pages
Computer Science Textbook Solutions - 29
No ratings yet
Computer Science Textbook Solutions - 29
31 pages
Intergraph Smart 3D: (Includes Smartplant® 3D, Smartmarine® 3D, Smartplant® 3D Materials Handling Edition)
No ratings yet
Intergraph Smart 3D: (Includes Smartplant® 3D, Smartmarine® 3D, Smartplant® 3D Materials Handling Edition)
165 pages
Deep Dive Into Oracle Identity Governance 12.2.1.4.0 Performance On Oracle Cloud Infrastructure Container Engine For Kubernetes
No ratings yet
Deep Dive Into Oracle Identity Governance 12.2.1.4.0 Performance On Oracle Cloud Infrastructure Container Engine For Kubernetes
11 pages
Fulldocumentation 230410150219 435b2373
No ratings yet
Fulldocumentation 230410150219 435b2373
41 pages
Essentials of Management Information Systems 10th Edition by Jane Laudon, Kenneth Laudon ISBN 0133051108 9780133051105 PDF Download
No ratings yet
Essentials of Management Information Systems 10th Edition by Jane Laudon, Kenneth Laudon ISBN 0133051108 9780133051105 PDF Download
55 pages
1.5. Hardware and Software Requirements
No ratings yet
1.5. Hardware and Software Requirements
4 pages
SN Client Side Vs Server Side Scripting
No ratings yet
SN Client Side Vs Server Side Scripting
4 pages
Postgre SQL
No ratings yet
Postgre SQL
43 pages
As1 1622
No ratings yet
As1 1622
17 pages
The SDQL Manual For The NFL
No ratings yet
The SDQL Manual For The NFL
40 pages
Connected and Disconnected CH 4
No ratings yet
Connected and Disconnected CH 4
4 pages
MERN FIle
No ratings yet
MERN FIle
27 pages
FoDB - Lab 3
No ratings yet
FoDB - Lab 3
15 pages
Digitalization With TIA Portal: Integration of Planning Data From EPLAN Electric P8 To TIA Portal
No ratings yet
Digitalization With TIA Portal: Integration of Planning Data From EPLAN Electric P8 To TIA Portal
20 pages
Syllogism - Verbal Reasoning Questions and Answers
No ratings yet
Syllogism - Verbal Reasoning Questions and Answers
3 pages
Data Modeling
No ratings yet
Data Modeling
26 pages

Module1 Part3

Uploaded by

Module1 Part3

Uploaded by

Operational Information v/s Strategic

• Operational computer systems did provide information to

Examples • Hotel reservation systems

• OLTP Systems used to run

• Optimized to handle large

They need Strategic Information

• Strategic information is required by the executives and managers to

Desired features are

• Data is designed for analytical tasks

• A decision support database that is maintained separately

• Support information processing by providing a solid

• Also allows to create a lots of reports by the use of

• A data warehouse is a Subject-oriented, integrated, time-variant

 Organized around major subjects, such as customer, product,

 Focusing on the modeling and analysis of data for decision

 Provide a simple and concise view around particular subject

 For example in banking industry data sets for saving or

 But in DW data is stored by real world business objectives or

• Data in DW comes from several operational systems.

Example: Data for subject Account comes from 3 different data

• Here are some of the items that would need

Comparative analysis is one of the best techniques for business performance

Time is critical factor for comparative analysis.

Every data structure in DW contains time element.

So, DW has to contain historical data and current values.

Data is stored as snapshots over past and current periods.

The time-variant nature of the data in a data warehouse

• Data from operational systems are moved into DW after

Query 1: What is the salary of Mr. Mishra?

Query 3: How many employees have received an excellent credential in latest

Query 2: Is there a correlation between the geographical location of company

Query 3: Is it financially viable to continue over manufacturing unit in Noida?

OLAP allowed users to analyse

OLAP data is stored in

• Three major functions to make data ready (ETL)

• Staging area provides a place and area with a set of

Data warehouse is seen as a collection of data marts.

Data marts can also be seen as small warehouse for

e.g. In Indian railway system , we have one segment

The data mart is a subset of the data warehouse and is usually

Whereas data warehouses have an enterprise-wide depth, the

• Metadata is the data about the data in the data warehouse.

• Metadata in a data warehouse contains the answers to

• Serves as a directory of the contents of the data warehouse

Metadata can hold all kinds of information about DW data like:

• Description of data warehouse structure

Data mining technique used to predict group membership for

Crispy classification- given an input, the

Considers data’s statistical correlation

Classification- Classification refers to taking an input value and

Regression- Regression is related to continuous data (value

Some of the most common algorithms in Supervised Learning

Web mining includes:

You might also like