0% found this document useful (0 votes)

19 views9 pages

DSS Chapter 5

Uploaded by

kkindamughrabi04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views9 pages

DSS Chapter 5

Uploaded by

kkindamughrabi04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

DSS chapter 5 , Data mining

Why data mining?

- More intense competition at the global scale (differentiation)

- Recognition of the value in data sources that are continuously growing.
- Availability of quality data on customers, vendors, web, transactions
- Exponential increase in data processing and storage capabilities, decrease in cost
- to gain a better understanding of customers and own operations, and to solve complex org
problems
- Generally, data mining is a way to develop intelligence from data than an org collects,
organizes, stores and analyze.

Data Mining: process of discovering patterns, relationships and insights within large datasets,
typically through computational algorithms and statistical techniques (BIDA)

- Involves extracting valuable knowledgeable and actionable info from vast amounts of
data, which may be structured, semi-structured or unstructured
- Data mining is primarily concerned with knowledge discovery.
- Aims to find patterns, trends, associations and anomalies within data.

Data mining importance: by uncovering patterns, trends and relationships within data, data
mining enables orgs to make informed decisions, optimize processes and gain a competitive
advantage.

Data mining (Fayyad et al 1996) : the nontrivial process of identifying valid, novel (new),
potentially useful and ultimately understandable patterns in data stored in structured databases.

•Data mining is positioned at the intersection of many disciplines.

• Data mining and AI are closely related fields, they often intersect and complement each other.

Data: a collection of facts usually obtained as the result of experiences, observations or

experiments.

- May consist of numbers, words, images, voice recordings etc.

- Data is the lowest level of abstraction (from which info and knowledge are derived)
Structured data is what data mining algorithms use and can be classified as.

1. Categorial, ex: race, gender race, age, group, and educational group can be subdivided
into:
a. Nominal Data: simple codes assigned to objects as labels (tags) which are not
measurements (single, married, divorced, yes/no)
b. Ordinal data: labels that also represent the rank order among them (high, medium,
low)

2. Numeric, ex: age, number of children, total household income, temperature, miles. Can
be subdivided into interval or ratio.

• Data mining extracts patterns from data:

- A data mining pattern is a recurring and meaningful observation or structure within a

dataset that is not immediately apparent but can be identified through statistical,
mathematical, or computational techniques.
- These patterns are derived from the data through various techniques and algorithms and
can provide valuable insights into the underlying information

Types of Patterns

1. Association: find the commonly co-occurring groupings of things (ex: toothpaste and
toothbrush)
2. Cluster (segmentation): identify natural groupings of things based on their known
characteristics (ex: demographics, similar things together)
3. Prediction: tell the nature of future occurrences of certain events based on what has
happened in the past ( 2 methods : classifications and Regression)
4. Sequential relationships: discover logical sequence of actions or events (ex : symptoms,
receives diagnosis, starts medication, follows up with doctor)

Regression analysis: is a powerful statistical method that allows you to examine the relationship
between two or more variables of interest. It is basically classification where we forecast a
number instead of a category.
Data mining techniques:

1. Time-series: forecasting (part of sequence (trend) or link analysis) to extrapolate.

2. Visualization: to gain a clearer understanding of underlying relationships, easier and
faster.

Types of data mining

1. Hypothesis-driven: data mining using the right-sized data (through surveys)

2. Discovery-driven: data mining (without preconceived hypotheses) using as big data as
possible.

Data mining applications

• Customer relationship mgt (CRM):

- maximize return on marketing compaigns (customer profiling)

- improve customer retention (churn analysis/customer attrition)

- maximize customer value (cross-selling, up-selling)

• Banking and other financial:

- Automate the loan application ; predicting defaulters

- Detecting fraudulent transactions
- Maximize customer value
- Optimize cash reserves

• Retailing and logistics

- Optimize inventory levels at diff locations based on its sales volumes predictions
- Improve the store layout and sales promotions (with market-basket analysis)
Market-basket analysis: association btwn pairs of products purchased together identify
patterns of co-occurrence.
- Optimize logistics by predicting seasonal effects
- Minimize losses due to limited shel life ( analyzing sensory and RFID data)

• Manufacturing and Maintenance

- Predict/prevent machinery failure (condition-based maintenance)

- Identify anomalies (irregularities)
- Discover novel patterns to improve product quality
• Brokerage and securities trading

- Predict changes on certain bond prices

- Forecast the direction of stock fluctuations
- Assess the effect of events on market movements
- Identify and prevent fraudulent activities in trading

• Insurance

- Forecast claim costs for better business planning

- Determine optimal rate plans
- Optimize marketing
- Identify and prevent fraudulent claim activities

Data mining processes

(a systematic way to conduct data mining projects)

- Based on best practices, several processes are proposed to maximize the chances of
success in conducting data mining projects

Processes: can be workflows or simple step-by-step approaches

Most common standard processes (methodology)

- CRISP-DM: Cross-industry standard process for data mining

- SEMMA: sample, explore, modify, model and assess
- KDD: knowledge discovery in databases

- The data mining process is iterative, and adjustments may be made at various stages
based on the insights gained and the performance of the models.

Dirty data: incomplete, missing, duplicate, inaccurate, inconsistent

Data cleaning: a crucial step in data analysis involving refining, correcting and preparing raw
data for meaningful insights and accurate decision making.

• Main purpose of data transformation: is to improve the quality and structure of data, making
it easier for data mining algorithms to uncover patterns, insights and relationships.

-Proper transformation can lead to better model performance, accuracy, and more actionable
insights.
Data Mining Process: CRISP-DM

- Standardized process (methodology). Most popular.

- Proposed in the mid 1990s by a European consortium of companies
- Nonproprietary (free) standard methodology.

Process (steps):

Step 1. Business understanding

Step 2 data understanding accounts for 85% of total project time

Step 3 data preparation

Step 4 model building

Step 5 testing and evaluation

Step 6 deployment

- The process is highly repetitive and experimental

(data preparation ) Normalization: usually involves adjusting values to a common scale so

that different variables can be compared on a similar scale essentially for comparing data
accurately and effectively in analysis

Data Mining Process: SEMMA

- Begin with a statistically representative sample of the data

- Applies exploratory statistical and visualization techniques
- Select & transform the most significant predictive variables.
- Model the variables to predict outcomes
- Confirm a models accuracy

• CRISP-DM and SEMMA are driven by a highly iterative experimentation cycle

Data mining = explaining the past (by data exploration) and predicting the future by means of
data analysis.
Data Mining Process: Classification (most frequently used data mining method)

- Classification is a data mining predictive method/ technique that assigns items in a data
set to target classes to group records into a class based on their characteristics
- The goal of classification is to accurately predict the target class for each case in the data.
for example, a classification model could be used to identify loan applications as low,
medium or high credit risks and whether as sunny rainy or cloudy
- It’s part of the machine-learning family; learns patterns from past data.

Used for: spam filtering, language detection, search of similar documents, recognition and fraud
detection.

- If being predicted as a class label (sunny, rainy or cloudy) then prediction problem is
called a classification
- If it is a numeric value (temperature 20c), the prediction problem is called a regression

Classification terminology :

Row = example or instance

Column = attribute

Output attribute = the one we want to

determine/predict

Input attribute= everything else

Nominal attribute = values that

are “names” of categories

Numeric attributes = have

values that are numbers
• Estimation methodologies for Classification

Simple split (or holdout or test sample estimation): split the data into 2 mutually exclusive sets.

- Main criticism: it assumes that the data in the 2 subsets are of the same kind( same
properties/ characteristics)

Classification techniques/algorithms

- Decision tree analysis (machine-learning technique), most popular classification technique in

the data mining arena.
- statistical analysis -neural networks -support vector machines -rough sets

-case-based reasoning (CBR) -Bayesian classifiers -Genetic Algorithms

Classification technique – ANN:

ANN: a type of artificial intelligence that imitates some functions of the person mind. It has a
normal tendency to store experiential knowledge.

- Can learn by example

- The use of ANN in the solution of a task initially involves a learning phase, which is
when the network extracts the patterns, thereby creating a specific representation of the
problem.
- Described as one of the best techniques to model the stock market (does not contain
standard formulas and may be easily adapted to market changes)

ANN applications:
1. Speech to text transcription 2. Handwriting recognition 3. Weather prediction
4. facial recognition 5.chatbots 6.stock market prediction

7. Delivery route planning and optimization

Classification techniques – Decision Trees

- employs the divide and conquer method

- repeatedly divides are training set until each division consists of examples from one class

A general algorithm for decision tree building:

1. Create a root node and assign all the training data to it.

2. Select the best splitting attribute.

3. Add a branch to the root node for each value of the split. Split the data into mutually exclusive
subsets along the lines of the specific split.

4. Repeat steps 2 and 3 for every leaf node until the stopping criteria is reached.

Cluster Analysis (aka segmentation) :

-Finding groups of objects such that the objects in our group would be similar or related to one
another and different from or unrelated to the objects and other groups.
-Used for automatic identification of natural groupings of things, based on their known
characteristics, e.g., demographics, shapes, colors, photos, etc.
▪ Part of the machine-learning family.
▪ Employs unsupervised learning.
▪ Learns the clusters of things from past data, then assigns new instances
(data).
▪ There is NO output variable.
▪ Also known as Segmentation
Data mining software: process of identifying patterns analyzing data and transforming
unstructured data into structured and valuable information that can be used to make informed
business decisions.

- Data mining software allows the organization to analyze data from a wide range of
databases and detect patterns

Data mining tools: main aim is to find data extract data refine data distribute the information
and monetize it.

-that's a mining is important because it extracts insights from data whether it's structured or
unstructured.

-structured data refers to data that has been organized into columns and rows for efficient
modification.

Data mining = an interdisciplinary sciences that combines computer science and

mathematical algorithms depicted by a machine.

Data mining: a powerful analytical tool that enables business executives to advance from
describing the nature of the past to predicting the future. → increase revenue, reduce expenses,
and identify fraud, and locate business opportunities, offering a whole new realm of competitive
advantage.

Data Mining Myths

StartUp Engineering
100% (2)
StartUp Engineering
218 pages
DPS5020 Operating Manual
No ratings yet
DPS5020 Operating Manual
9 pages
Apple iPhone 6S Plus Invoice Receipt
No ratings yet
Apple iPhone 6S Plus Invoice Receipt
5 pages
Data Mining Survey Overview
No ratings yet
Data Mining Survey Overview
8 pages
Lesson 3 Transportation Problem
No ratings yet
Lesson 3 Transportation Problem
41 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Data Mining
No ratings yet
Data Mining
41 pages
Business Intelligence Data Mining: (John Naisbett)
No ratings yet
Business Intelligence Data Mining: (John Naisbett)
60 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
48 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
My Chapter Two
No ratings yet
My Chapter Two
57 pages
Data Mining, Data Pattern, Machine Learning (Week 2
No ratings yet
Data Mining, Data Pattern, Machine Learning (Week 2
19 pages
Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
6 pages
Data Mining-1
No ratings yet
Data Mining-1
7 pages
Data Mining & Machine Learning Guide
No ratings yet
Data Mining & Machine Learning Guide
19 pages
Data Mining
No ratings yet
Data Mining
63 pages
Data Mining Information
100% (1)
Data Mining Information
15 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
DSS Lec.8
No ratings yet
DSS Lec.8
22 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Data Mining: Techniques & Applications
No ratings yet
Data Mining: Techniques & Applications
38 pages
DM Unit-1
No ratings yet
DM Unit-1
27 pages
Seminar On Data Mining Concepts and Its
No ratings yet
Seminar On Data Mining Concepts and Its
8 pages
Data Mining
No ratings yet
Data Mining
30 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Lecture 7 & 8 Data Mining
No ratings yet
Lecture 7 & 8 Data Mining
21 pages
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
No ratings yet
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
35 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Unit 1
No ratings yet
Unit 1
59 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
LM07-MIS440 - M2 - Ch5 - Data Mining For Business Intelligence
No ratings yet
LM07-MIS440 - M2 - Ch5 - Data Mining For Business Intelligence
25 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Data Mining
No ratings yet
Data Mining
6 pages
Internship
No ratings yet
Internship
12 pages
Introduction To Data Mining Unit1
No ratings yet
Introduction To Data Mining Unit1
37 pages
Process: 1. Data Mining (The Analysis Step of The "Knowledge Discovery in Databases" Process, or KDD)
No ratings yet
Process: 1. Data Mining (The Analysis Step of The "Knowledge Discovery in Databases" Process, or KDD)
4 pages
Discussion Questions BA
No ratings yet
Discussion Questions BA
11 pages
BI Unit 3 Part 1
No ratings yet
BI Unit 3 Part 1
51 pages
Screenshot 2024-06-04 at 12.07.18 AM
No ratings yet
Screenshot 2024-06-04 at 12.07.18 AM
45 pages
Screenshot 2024-06-04 at 12.00.45 AM
No ratings yet
Screenshot 2024-06-04 at 12.00.45 AM
45 pages
Screenshot 2024-06-03 at 11.59.21 PM
No ratings yet
Screenshot 2024-06-03 at 11.59.21 PM
45 pages
Screenshot 2024-06-04 at 12.01.00 AM
No ratings yet
Screenshot 2024-06-04 at 12.01.00 AM
45 pages
Data Mining - An Overview
No ratings yet
Data Mining - An Overview
40 pages
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
No ratings yet
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
50 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
MR22-DM 1
No ratings yet
MR22-DM 1
21 pages
Unit - I
No ratings yet
Unit - I
22 pages
Data Mining Concepts and Applications: Six Factors Behind The Sudden Rise in Popularity of Data Mining
No ratings yet
Data Mining Concepts and Applications: Six Factors Behind The Sudden Rise in Popularity of Data Mining
36 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Chapter Five Data Mining For Healthcare Analytics
No ratings yet
Chapter Five Data Mining For Healthcare Analytics
77 pages
Data Mining Techniques Using R Unit 1
No ratings yet
Data Mining Techniques Using R Unit 1
26 pages
Digital Design - Morris Mano-Fifth Edition
No ratings yet
Digital Design - Morris Mano-Fifth Edition
31 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
6 pages
BlackBelt Plus Roadmap - 23 - v2
No ratings yet
BlackBelt Plus Roadmap - 23 - v2
6 pages
V-Vi Semester Syllabus Cse-Iot 22
No ratings yet
V-Vi Semester Syllabus Cse-Iot 22
39 pages
Audison Thesis Car Audio
100% (3)
Audison Thesis Car Audio
5 pages
Analisis Swot Kurikulum Prodi Pgmi Menyongsong Pembangunan Uin Sun An Kalijaga Yogyakarta 2038 Yang Bervisi Integrasi-Interkonektif
No ratings yet
Analisis Swot Kurikulum Prodi Pgmi Menyongsong Pembangunan Uin Sun An Kalijaga Yogyakarta 2038 Yang Bervisi Integrasi-Interkonektif
16 pages
Multimedia Unit 4
No ratings yet
Multimedia Unit 4
16 pages
DLC OBE Assignment Solution 22-49016-3
No ratings yet
DLC OBE Assignment Solution 22-49016-3
3 pages
MBIST (Memory Built-In Self Test) - 5
No ratings yet
MBIST (Memory Built-In Self Test) - 5
5 pages
Software Requirements Specification
No ratings yet
Software Requirements Specification
7 pages
Syllabus IST 8105-Spring 2024
No ratings yet
Syllabus IST 8105-Spring 2024
10 pages
Target Hardware Debugging Boundary Scan
No ratings yet
Target Hardware Debugging Boundary Scan
13 pages
FRST
No ratings yet
FRST
19 pages
How Does The Positioning of Information Technology Firms in Strat
No ratings yet
How Does The Positioning of Information Technology Firms in Strat
35 pages
CV Varsha Gupta 2 (1) (1) .7 Years Exp
No ratings yet
CV Varsha Gupta 2 (1) (1) .7 Years Exp
4 pages
Entry-Task-Validation-Exit (ETVX)
No ratings yet
Entry-Task-Validation-Exit (ETVX)
13 pages
Uganda National Bureau of Standards: Laboratory Test Report
No ratings yet
Uganda National Bureau of Standards: Laboratory Test Report
1 page
Boiler Tube IRIS Inspection Report
100% (1)
Boiler Tube IRIS Inspection Report
11 pages
Chart - Poster - PMBOK 6th Ed Data Flow Diagram
No ratings yet
Chart - Poster - PMBOK 6th Ed Data Flow Diagram
1 page
Chapter 3 BJT
No ratings yet
Chapter 3 BJT
45 pages
Exploit Labs Short
No ratings yet
Exploit Labs Short
17 pages
VIDWAN
No ratings yet
VIDWAN
4 pages
Quiz - Cloud Security - Revisão Da Tentativa - Training Institute - PDF 3
No ratings yet
Quiz - Cloud Security - Revisão Da Tentativa - Training Institute - PDF 3
2 pages
Deploying FortiMail Server Mode
No ratings yet
Deploying FortiMail Server Mode
5 pages
LCD TV/DVD: Service Manual Circuit Diagrams
No ratings yet
LCD TV/DVD: Service Manual Circuit Diagrams
31 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
VLSI Testing - DFT and Scan
No ratings yet
VLSI Testing - DFT and Scan
35 pages
Engineering Mathematics
100% (1)
Engineering Mathematics
14 pages

DSS Chapter 5

Uploaded by

DSS Chapter 5

Uploaded by

DSS chapter 5 , Data mining

Why data mining?

- More intense competition at the global scale (differentiation)

•Data mining is positioned at the intersection of many disciplines.

Data: a collection of facts usually obtained as the result of experiences, observations or

- May consist of numbers, words, images, voice recordings etc.

• Data mining extracts patterns from data:

- A data mining pattern is a recurring and meaningful observation or structure within a

1. Time-series: forecasting (part of sequence (trend) or link analysis) to extrapolate.

Types of data mining

1. Hypothesis-driven: data mining using the right-sized data (through surveys)

Data mining applications

• Customer relationship mgt (CRM):

- maximize return on marketing compaigns (customer profiling)

- improve customer retention (churn analysis/customer attrition)

- maximize customer value (cross-selling, up-selling)

• Banking and other financial:

- Automate the loan application ; predicting defaulters

• Retailing and logistics

• Manufacturing and Maintenance

- Predict/prevent machinery failure (condition-based maintenance)

- Predict changes on certain bond prices

- Forecast claim costs for better business planning

Data mining processes

(a systematic way to conduct data mining projects)

Processes: can be workflows or simple step-by-step approaches

Most common standard processes (methodology)

- CRISP-DM: Cross-industry standard process for data mining

Dirty data: incomplete, missing, duplicate, inaccurate, inconsistent

- Standardized process (methodology). Most popular.

Step 1. Business understanding

Step 2 data understanding accounts for 85% of total project time

Step 3 data preparation

Step 5 testing and evaluation

- The process is highly repetitive and experimental

(data preparation ) Normalization: usually involves adjusting values to a common scale so

Data Mining Process: SEMMA

- Begin with a statistically representative sample of the data

• CRISP-DM and SEMMA are driven by a highly iterative experimentation cycle

Row = example or instance

Output attribute = the one we want to

Input attribute= everything else

Nominal attribute = values that

Numeric attributes = have

- Decision tree analysis (machine-learning technique), most popular classification technique in

-case-based reasoning (CBR) -Bayesian classifiers -Genetic Algorithms

Classification technique – ANN:

- Can learn by example

7. Delivery route planning and optimization

- employs the divide and conquer method

A general algorithm for decision tree building:

2. Select the best splitting attribute.

Cluster Analysis (aka segmentation) :

Data mining = an interdisciplinary sciences that combines computer science and

Data Mining Myths

You might also like