Data Mining and Other Analogous Disciplines

The document discusses the relationships between data mining and disciplines such as statistics and artificial intelligence. It explains that data mining uses techniques from these disciplines such as regression, clustering analysis, and neural networks. It then introduces an information theory-based approach to measure the amount of information in the data and determine what part can be used to solve business problems. This approach allows for the selection of optimal variables and assesses how well the models capture the information.

Uploaded by

ScribdTranslations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

Data Mining and Other Analogous Disciplines

Uploaded by

ScribdTranslations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

5 Data mining and other analogous disciplines

There is some controversy in defining the existing boundaries between data mining and disciplines.
analogous, such as statistics, artificial intelligence, etc. There are those who argue that
data mining is nothing but statistics wrapped in business jargon that makes it
a sellable product. Others, on the other hand, find in it a series of problems and methods
specifics that make it distinct from other disciplines.

The fact is that in practice, almost all models and algorithms used in mining
data—neural networks, regression and classification trees, logistic models, analysis of
main components, etc.—enjoy a relatively long tradition in other fields.

5.1 On Statistics

Certainly, data mining draws from statistics, from which it takes the following techniques:

Analysis of variance, through which the existence of significant differences is evaluated between
the means of one or more continuous variables in different populations.
Regression: defines the relationship between one or more variables and a set of predictor variables
of the first.
Chi-square test: by means of which the hypothesis of dependence is tested.
between variables.
Clustering analysis: it allows the classification of a population of individuals.
characterized by multiple attributes (binary, qualitative or quantitative) in a number
determined by groups, based on the similarities or differences of individuals.
Discriminant analysis: it allows the classification of individuals into groups that have previously been established.
established, allows finding the classification rule of the elements of these groups, and by
both a better identification of what the variables are that define membership in the group.
Time series: allows the study of the evolution of a variable over time to
to be able to make predictions, based on that knowledge and under the assumption that they will not
structural changes occur.

5.2 From Computing

From computer science, take the following techniques:

Genetic algorithms: They are numerical optimization methods, in which that variable or
variables that are intended to be optimized along with the study variables constitute a
segment of information. Those configurations of the analysis variables that obtain
the best values for the response variable will correspond to segments with higher
reproductive capacity. Through reproduction, the best segments endure and their
the proportion grows from generation to generation. Additionally, elements can be introduced
random for the modification of the variables (mutations). After a certain number of
iterations, the population will be composed of good solutions to the optimization problem,
Well, the bad solutions have been discarded, iteration after iteration.
Artificial Intelligence: Through a computer system that simulates an intelligent system, it
proceed to the analysis of the available data. Among the Artificial Intelligence systems there
Expert Systems and Neural Networks would fall under.
Expert Systems: These are systems that have been created from practical rules extracted from
knowledge of experts. Mainly based on inferences or cause-effect.
Intelligent Systems: They are similar to expert systems, but with greater advantages over
new unknown situations for the expert.
Neural networks: Generically, they are methods of parallel numerical processing, in which
the variables interact through linear or nonlinear transformations, until obtaining some
outputs. These outputs are contrasted with those that should have been released, based on certain
test data, leading to a feedback process through which the network
reconfigure until obtaining a suitable model.
6 Data mining based on information theory

All traditional data mining tools assume that the data they will use to
building the models contain the necessary information to achieve the desired purpose:
obtain sufficient knowledge that can be applied to the business (or problem) to achieve a
benefit (or solution).

The drawback is that this is not necessarily true. Furthermore, there is another bigger problem.
still. Once the model is built, it is not possible to know if it has captured all of the
information available in the data. For this reason, the common practice is to create several models
with different parameters to see if any achieve better results.

A relatively new approach to data analysis solves these problems by making the
data mining practice resembles more a science than an art.

In 1948, Claude Shannon published a paper called 'A Mathematical Theory of Communication.'
Subsequently, this came to be called Information Theory and laid the foundations of communication.
and the encoding of information. Shannon proposed a way to measure the amount of
information to be expressed in bits. In 1999 Dorian Pyle published a book called 'Data
Preparation for Data Mining" in which it proposes a way to use Information Theory
to analyze data. In this new approach, a database is a channel that transmits
information. On one hand, there is the real world that captures data generated by the business. On the
Another is all the important situations and problems of the business. And the information flows from
the real world and through data, to the issues of the business.
With this perspective and using Information Theory, it is possible to measure the amount of
information available in the data and what portion of it can be used to solve the
business problem. As a practical example, it could be found that the data contains
65% of the information needed to predict which customers will terminate their contracts. Of this
that way, if the final model is able to make predictions with a 60% accuracy, it can be
ensure that the tool that generated the model did a good job capturing the
available information. Now, if the model had had an accuracy percentage of only the
10%, for example, then trying other models or even with other tools could be worth it.
the penalty.
The ability to measure information contained in data has other important advantages.
By analyzing the data from this new perspective, an information map is generated that makes
unnecessary prior preparation of the data, an absolutely essential task if one wishes
good results, but it takes a huge amount of time.
It is possible to select an optimal set of variables that contains the necessary information to
create a prediction model.
Once the variables are processed in order to create the information map and then
selected those that provide the most information, the choice of the tool that is
what will be used to create the model stops being important, since the most work was done in the
previous steps.
BIBLIOGRAPHIES

Unable to access external content.

http://shuy-rz.blogspot.com/2011/09/data-mining-and-other-disciplines.html

Unable to access or translate content from the provided link.

dm.shtml#datamininb

MT6771 Android Scatter
No ratings yet
MT6771 Android Scatter
18 pages
Time Table Scheduling in Data Mining
No ratings yet
Time Table Scheduling in Data Mining
61 pages
Data Mining
No ratings yet
Data Mining
30 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
BI Unit 3 Part 1
No ratings yet
BI Unit 3 Part 1
51 pages
(IJCST-V3I1P21) : S. Padmapriya
No ratings yet
(IJCST-V3I1P21) : S. Padmapriya
5 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Data Mining
No ratings yet
Data Mining
4 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
Fcthgchgtbelow
No ratings yet
Fcthgchgtbelow
6 pages
Artificial Intelligence
100% (1)
Artificial Intelligence
76 pages
Data Mining
100% (1)
Data Mining
40 pages
Data Mining Overview and Applications
No ratings yet
Data Mining Overview and Applications
125 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
Whats App
No ratings yet
Whats App
23 pages
Data Mining
No ratings yet
Data Mining
18 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining
No ratings yet
Data Mining
8 pages
Chapter 6 - Data Mining Techniques
No ratings yet
Chapter 6 - Data Mining Techniques
19 pages
1.1 What Is Data Mining?
No ratings yet
1.1 What Is Data Mining?
6 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining: Techniques & Applications
No ratings yet
Data Mining: Techniques & Applications
16 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Es 2646574663
No ratings yet
Es 2646574663
7 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Data Mining Overview
No ratings yet
Data Mining Overview
24 pages
Datamining With Big Data - Siva
No ratings yet
Datamining With Big Data - Siva
69 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Unit 1
No ratings yet
Unit 1
27 pages
Data Mining and Visualization
No ratings yet
Data Mining and Visualization
8 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
Data Mining System and Applications A Re
No ratings yet
Data Mining System and Applications A Re
13 pages
Data Mining
No ratings yet
Data Mining
10 pages
EDA Website Material (As Per VTU Syllabus)
No ratings yet
EDA Website Material (As Per VTU Syllabus)
160 pages
Topic 4 - Data Mining Tools and Technique
No ratings yet
Topic 4 - Data Mining Tools and Technique
22 pages
Unit III
No ratings yet
Unit III
101 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Data Mining Notes
No ratings yet
Data Mining Notes
14 pages
The Survey of Data Mining Applications and Feature Scope
No ratings yet
The Survey of Data Mining Applications and Feature Scope
16 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Data Mining Complete Notes
No ratings yet
Data Mining Complete Notes
26 pages
Web Data Mining: A Case Study: Samia Jones
No ratings yet
Web Data Mining: A Case Study: Samia Jones
6 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Data Mining: Prof Jyotiranjan Hota
No ratings yet
Data Mining: Prof Jyotiranjan Hota
17 pages
Data Warehousing & Mining Guide
No ratings yet
Data Warehousing & Mining Guide
32 pages
Unit 5 DM
No ratings yet
Unit 5 DM
50 pages
Acp Excise
No ratings yet
Acp Excise
11 pages
Data Mining Overview and Applications
No ratings yet
Data Mining Overview and Applications
6 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
84 pages
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
0% (1)
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
31 pages
Ware House Server
No ratings yet
Ware House Server
89 pages
Advanced Databases and Mining Unit 3
No ratings yet
Advanced Databases and Mining Unit 3
30 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Data Mining
No ratings yet
Data Mining
24 pages
MY LIFE PROJECT
No ratings yet
MY LIFE PROJECT
4 pages
Branches of Law
No ratings yet
Branches of Law
13 pages
ANALYTICAL INVENTORY
No ratings yet
ANALYTICAL INVENTORY
11 pages
Predecessors of Social Dynamics
No ratings yet
Predecessors of Social Dynamics
6 pages
MEDICAL EQUIPMENT MANAGEMENT REPORT YEAR 2020
No ratings yet
MEDICAL EQUIPMENT MANAGEMENT REPORT YEAR 2020
4 pages
AREAS OF OPPORTUNITY IN EMOTIONAL INTELLIGENCE
No ratings yet
AREAS OF OPPORTUNITY IN EMOTIONAL INTELLIGENCE
3 pages
Stirrers and Mixers
No ratings yet
Stirrers and Mixers
5 pages
F02 Soil push
No ratings yet
F02 Soil push
12 pages
Model Sale Contract for Movable Property
No ratings yet
Model Sale Contract for Movable Property
1 page
THEME FOR THE YOUTH GROUP
No ratings yet
THEME FOR THE YOUTH GROUP
5 pages
Piazzolla and the Double Bass
No ratings yet
Piazzolla and the Double Bass
19 pages
Allahu Robbi
No ratings yet
Allahu Robbi
1 page
SHAO LI EXAM
No ratings yet
SHAO LI EXAM
1 page
SMUGGLING IN BOLIVIA last work.docx
No ratings yet
SMUGGLING IN BOLIVIA last work.docx
5 pages
CLASS ASSIGNMENT N°5 Driving Course
No ratings yet
CLASS ASSIGNMENT N°5 Driving Course
6 pages
The journey, the axis of Islamic spirituality.pdf
No ratings yet
The journey, the axis of Islamic spirituality.pdf
17 pages
20180112 Bosch DNM60L Level Inclinometer 502001209
No ratings yet
20180112 Bosch DNM60L Level Inclinometer 502001209
5 pages
ALTERNATE 3
No ratings yet
ALTERNATE 3
4 pages
Pets Topographic Survey
No ratings yet
Pets Topographic Survey
4 pages
Psychological Report Benton Visual Retention Test
No ratings yet
Psychological Report Benton Visual Retention Test
5 pages
PRACTICAL CASES
No ratings yet
PRACTICAL CASES
6 pages
Lpic-1-117-101 - Exam questions.pdf
No ratings yet
Lpic-1-117-101 - Exam questions.pdf
8 pages
EPISTEMOLOGY (1)
No ratings yet
EPISTEMOLOGY (1)
4 pages
GUSTAVO ROLDAN
No ratings yet
GUSTAVO ROLDAN
6 pages
Participant Manual for SPTR-P Signatory Course Rev0.2020
No ratings yet
Participant Manual for SPTR-P Signatory Course Rev0.2020
39 pages
Christian recreation
No ratings yet
Christian recreation
24 pages
Unit I. Policy System, Accounting III.
No ratings yet
Unit I. Policy System, Accounting III.
8 pages
REPORT ON PURPLE CORN ALFAJORES
No ratings yet
REPORT ON PURPLE CORN ALFAJORES
6 pages
Actions and Types of Actions in Peru
No ratings yet
Actions and Types of Actions in Peru
1 page
Project 3 Water Rocket
No ratings yet
Project 3 Water Rocket
23 pages
BestPractices - 2 - Naming Conventions - Data Quality
No ratings yet
BestPractices - 2 - Naming Conventions - Data Quality
13 pages
Prelim Quiz 2
No ratings yet
Prelim Quiz 2
3 pages
Amps63componentlist PDF
100% (1)
Amps63componentlist PDF
236 pages
Bda U-5
No ratings yet
Bda U-5
30 pages
Rohan CV PDF
No ratings yet
Rohan CV PDF
1 page
Mod3 InsightIQ
No ratings yet
Mod3 InsightIQ
33 pages
Mewtocol
No ratings yet
Mewtocol
28 pages
Data Warehousing For Business Intelligence
No ratings yet
Data Warehousing For Business Intelligence
5 pages
Education 15 00343 v2
No ratings yet
Education 15 00343 v2
12 pages
Assignment#02: Q1. Draw An ER Diagram For The Given Scenario
No ratings yet
Assignment#02: Q1. Draw An ER Diagram For The Given Scenario
3 pages
Capstone Project Guidelines 2025
No ratings yet
Capstone Project Guidelines 2025
16 pages
Interview Questions: A 4 B A++ Explain
No ratings yet
Interview Questions: A 4 B A++ Explain
2 pages
PL-300 Exam - Free Actual Q&as, Page 5 - ExamTopics
No ratings yet
PL-300 Exam - Free Actual Q&as, Page 5 - ExamTopics
4 pages
ITD102 Final Project Database Design For University Accommodation Office
No ratings yet
ITD102 Final Project Database Design For University Accommodation Office
7 pages
MMW Lesson 1 (1st Year - MMLS)
No ratings yet
MMW Lesson 1 (1st Year - MMLS)
63 pages
Fundational Concepts of The Ais
No ratings yet
Fundational Concepts of The Ais
13 pages
Gathering Information and Summarizing Findings
No ratings yet
Gathering Information and Summarizing Findings
20 pages
UT Dallas Syllabus For cs4347.501 05s Taught by Latifur Khan (Lkhan)
No ratings yet
UT Dallas Syllabus For cs4347.501 05s Taught by Latifur Khan (Lkhan)
3 pages
Emerging Trends in Content Analysis 2015
No ratings yet
Emerging Trends in Content Analysis 2015
14 pages
(FREE PDF Sample) A Pocket Style Manual APA Version 8th Edition Diana Hacker Ebooks
100% (1)
(FREE PDF Sample) A Pocket Style Manual APA Version 8th Edition Diana Hacker Ebooks
55 pages
Kendriya Vidyalaya CS Pre Board 1 2023-24
No ratings yet
Kendriya Vidyalaya CS Pre Board 1 2023-24
11 pages
DATA-PROCESSING WAEC Syllabus
No ratings yet
DATA-PROCESSING WAEC Syllabus
7 pages
SMC Electrolyte Drink Evolution Report
No ratings yet
SMC Electrolyte Drink Evolution Report
51 pages
C Programming Exam Prep
No ratings yet
C Programming Exam Prep
19 pages
Introduction to Statistics Guide
No ratings yet
Introduction to Statistics Guide
12 pages
LP 1
No ratings yet
LP 1
17 pages
Export Gridview To PDF in ASP Net 3 5
0% (1)
Export Gridview To PDF in ASP Net 3 5
2 pages
SATVIKA
No ratings yet
SATVIKA
23 pages
1.6 Efficient Data Cube Computation & Indexing OLAP
No ratings yet
1.6 Efficient Data Cube Computation & Indexing OLAP
25 pages

Data Mining and Other Analogous Disciplines

Uploaded by

Data Mining and Other Analogous Disciplines

Uploaded by

5 Data mining and other analogous disciplines

5.2 From Computing

From computer science, take the following techniques:

Unable to access external content.

Unable to access or translate content from the provided link.

You might also like