0% found this document useful (0 votes)

121 views23 pages

Lecture 1 - Data Mining 101

This document provides an overview of data mining concepts from a lecture. It defines data mining as the process of discovering patterns in large amounts of data. It describes the typical steps in the knowledge discovery process including data cleaning, integration, selection, transformation, mining, evaluation, and presentation. It outlines the different types of data that can be mined, including database, data warehouse, transactional, and other structured and unstructured data. It also discusses the various types of patterns that can be mined, such as frequent patterns, associations, correlations, classification models, clustering, and outliers. Finally, it briefly introduces some common data mining technologies like statistics, machine learning, and database and data warehouse systems.

Uploaded by

Reymar Ventura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views23 pages

Lecture 1 - Data Mining 101

Uploaded by

Reymar Ventura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Lecture 1 – Data Mining 101

[email protected] | Department of Computing and Information Sciences

Outline
• What is Data Mining?
• What Kinds of Data can be Mined?
• What Kinds of Patterns can be Mined?
• Data Mining Technologies
• Data Mining Applications
• Major Issues in Data Mining
What is Data Mining?

• Data mining is the process of discovering interesting patterns and knowledge

from large amounts of data.
• The data sources can include databases, data warehouses, the web, other
information repositories, or data that are streamed into the system dynamically

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What is Data Mining?
• The Knowledge Discovery Process is an iterative
sequence of the following steps:
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be
combined)
3. Data selection (where data relevant to the analysis
task are retrieved from the database)
4. Data transformation (where data are transformed and
consolidated into forms appropriate for mining by
performing summary or aggregation operations)
5. Data mining (an essential process where intelligent
methods are applied to extract data patterns)
6. Pattern evaluation (to identify the truly interesting
patterns representing knowledge based on
interestingness measures)
7. Knowledge presentation (where visualization and
knowledge representation techniques are used to
present mined knowledge to users)
Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Data Can Be Mined?
• As a general technology, data mining can be applied to any kind of data as
long as the data are meaningful for a target application.
• Basic forms of data:
• Database Data
• Data Warehouse Data
• Transactional Data
• Other forms of data:
• Data Streams
• Ordered/Sequence Data
• Graph or Networked Data
• Spatial data
• Text Data
• Multimedia Data
• Web Data
Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Data Can Be Mined?
• Database Data
• A database system, also called a database management system (DBMS), consists of a
collection of interrelated data, known as a database, and a set of software programs to
manage and access the data.
• A relational database is a collection of tables, each of which is assigned a unique name. Each
table consists of a set of attributes (columns or fields) and usually stores a large set of tuples
(records or rows).

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Data Can Be Mined?
• Database Warehouses
• A Data Warehouse is a repository of information collected from multiple sources, stored
under a unified schema, and usually residing at a single site.
• Data warehouses are constructed via a process of data cleaning, data integration, data
transformation, data loading, and periodic data refreshing.
• A data warehouse is usually modeled by a multidimensional data structure, called a data
cube, in which each dimension corresponds to an attribute or a set of attributes in the
schema, and each cell stores the value of some aggregate measure.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Data Can Be Mined?
• Database Warehouses

Typical Data Warehouse Framework Multidimensional Data Cube

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Data Can Be Mined?
• Transactional Data
• Each record in a transactional database captures a transaction, such as a customer’s
purchase, a flight booking, or a user’s clicks on a web page.
• A transaction typically includes a unique transaction identity number (trans ID) and a list of
the items making up the transaction, such as the items purchased in the transaction.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Patterns Can Be Mined?
• Data Characterization
• Data Characterization is a summarization of the general characteristics or features of a target class of
data.
• The output of data characterization can be presented using pie charts, bar charts, curves,
multidimensional data cubes, and multidimensional tables, including crosstabs.
• The resulting descriptions can also be presented as generalized relations or in rule form (called
characteristic rules).
• Data Discrimination
• Data discrimination is a comparison of the general features of the target class data objects against the
general features of objects from one or multiple contrasting classes.
• The target and contrasting classes can be specified by a user, and the corresponding data objects can be
retrieved through database queries.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Patterns Can Be Mined?
• Frequent Patterns, Associations and Correlations
• Frequent patterns, as the name suggests, are patterns that occur frequently in data.
• There are many kinds of frequent patterns, including frequent itemsets, frequent subsequences (also
known as sequential patterns), and frequent substructures.

where 𝑋 is a variable representing a customer. A confidence, or certainty, of 50% means that if a customer
buys a computer, there is a 50% chance that she will buy software as well. A 1% support means that 1% of
all the transactions under analysis show that computer and software are purchased together.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Patterns Can Be Mined?
• Factors (Predictive Analysis)
• Classification is the process of finding a model (or function) that describes and distinguishes data
classes or concepts.
• The model are derived based on the analysis of a set of training data (i.e., data objects for which the
class labels are known).
• The model is used to predict the class label of objects for which the class label is unknown.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Patterns Can Be Mined?
• Factors (Predictive Analysis)
• Regression models continuous-valued functions.
• This is used to predict missing or unavailable numerical data values rather than (discrete) class labels.
• The term prediction refers to both numeric prediction and class label prediction.
• Regression analysis is a statistical methodology that is most often used for numeric prediction, although
other methods exist as well.
• Regression also encompasses the identification of distribution trends based on the available data.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Patterns Can Be Mined?
• Groups (Cluster Analysis)
• Unlike classification and regression, which
analyze class-labeled (training) data sets,
clustering analyzes data objects without
consulting class labels.
• In many cases, class labeled data may simply
not exist at the beginning. Clustering can be
used to generate class labels for a group of
data.
• The objects are clustered or grouped based on
the principle of maximizing the intraclass
similarity and minimizing the interclass
similarity.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
What Kinds of Patterns Can Be Mined?
• Abnormalities (Outlier Analysis)
• A data set may contain objects that do not
comply with the general behavior or model of
the data.
• These data objects are outliers. Many data
mining methods discard outliers as noise or
exceptions.
• However, in some applications (e.g., fraud
detection) the rare events can be more
interesting than the more regularly occurring
ones.
• The analysis of outlier data is referred to as
outlier analysis or anomaly mining.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
Data Mining Technologies
• Statistics
• Statistics studies the collection, analysis,
interpretation or explanation, and
presentation of data.
• Data mining has an inherent connection with
statistics.
• A statistical model is a set of mathematical
functions that describe the behavior of the
objects in a target class in terms of random
variables and their associated probability
distributions.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
Data Mining Technologies
• Machine Learning
• Machine learning investigates how computers
can learn (or improve their performance) based
on data.
• A main research area is for computer programs
to automatically learn to recognize complex
patterns and make intelligent decisions based
on data.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
Data Mining Technologies
• Database System and Data Warehouse
• Database systems research focuses on the
creation, maintenance, and use of databases
for organizations and end-users.
• Database systems researchers have established
highly recognized principles in data models,
query languages, query processing and
optimization methods, data storage, and
indexing and accessing methods.
• Recent database systems have built systematic
data analysis capabilities on database data
using data warehousing and data mining
facilities.
• A data warehouse integrates data originating
from multiple sources and various timeframes.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
Data Mining Technologies
• Information Retrieval
• Information retrieval (IR) is the science of
searching for documents or information in
documents.
• Documents can be text or multimedia, and
may reside on the Web.
• The differences between traditional
information retrieval and database systems are
twofold:
• the data under search are unstructured;
• the queries are formed mainly by keywords, which do
not have complex structures (unlike SQL queries in
database systems)

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
Data Mining Applications
• Where there are data, there are data mining applications.
• In Business intelligence (BI), technologies provide historical, current, and predictive views of business
operations.
• A Web search engine is a specialized computer server that searches for information on the Web. The search
results of a user query are often returned as a list (sometimes called hits).
• In Financial Analysis, the banking and finance industry relies on high-quality, reliable data. In loan markets,
financial and user data can be used for a variety of purposes, like predicting loan payments and determining
credit ratings.
• Network resources can face threats and actions that intrude on their confidentiality or integrity. Therefore,
detection of intrusion (Intrusion Detection) has emerged as a crucial data mining practice.
• In Biological data mining practices are common in genomics, proteomics, and biomedical research. From
characterizing patients’ behavior and predicting office visits to identifying medical therapies for their
illnesses, data science techniques provide multiple advantages.

Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
Major Issues in Data Mining
• Mining Methodology
• Researchers have been vigorously developing new data mining methodologies. This involves the investigation of new kinds of
knowledge, mining in multidimensional space, integrating methods from other disciplines, and the consideration of semantic ties
among data objects.

• User Interaction
• The user plays an important role in the data mining process. Interesting areas of research include how to interact with a data mining
system, how to incorporate a user’s background knowledge in mining, and how to visualize and comprehend data mining results.

• Efficiency and Scalability

• Efficiency and scalability are always considered when comparing data mining algorithms. As data amounts continue to multiply, these
two factors are especially critical.

• Diversity of Database Types

• The wide diversity of database types brings about challenges to data mining.

• Data Mining and Society

• With data mining penetrating our everyday lives, it is important to study the impact of data mining on society.
• Data mining will help scientific discovery, business management, economy recovery, and security protection (e.g., the real-time
discovery of intruders and cyberattacks). However, it poses the risk of disclosing an individual’s personal information (Privacy-
Preserving Data Mining).
Han, J., Kamber, M., and Pei, J. (2012). Data Mining Concepts and Techniques 3rd Edition. Elsevier.
Lecture 1 – Data Mining 101
[email protected] | Department of Computing and Information Sciences

Data Mining SSWT ZC 425
No ratings yet
Data Mining SSWT ZC 425
381 pages
Sociology in Our Times 0-495-00685-8
100% (1)
Sociology in Our Times 0-495-00685-8
770 pages
Qualitative Research Practical Research
100% (1)
Qualitative Research Practical Research
339 pages
Data Mining-Introduction
No ratings yet
Data Mining-Introduction
47 pages
Report of BAI
100% (2)
Report of BAI
4 pages
Data Mining
No ratings yet
Data Mining
44 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
17 pages
DWDM Unit II Notes
No ratings yet
DWDM Unit II Notes
22 pages
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
30 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
21 pages
Data Mining Intro, Functionalities, Issues
No ratings yet
Data Mining Intro, Functionalities, Issues
30 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
145 pages
Basics of Social Psychology GE
50% (2)
Basics of Social Psychology GE
2 pages
1 Intro
No ratings yet
1 Intro
50 pages
Unit III
No ratings yet
Unit III
101 pages
Data Mining
No ratings yet
Data Mining
11 pages
Data Mining-1
No ratings yet
Data Mining-1
7 pages
Unit 1 - Introduction
No ratings yet
Unit 1 - Introduction
25 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
The Survey of Data Mining Applications and Feature Scope
No ratings yet
The Survey of Data Mining Applications and Feature Scope
16 pages
Data Mining Unit-1
No ratings yet
Data Mining Unit-1
59 pages
Bi - Unit 3
No ratings yet
Bi - Unit 3
18 pages
Module 4
No ratings yet
Module 4
54 pages
Week1 2
No ratings yet
Week1 2
24 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Data Mining Introduction Guide
No ratings yet
Data Mining Introduction Guide
36 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
Data Mining Essentials
No ratings yet
Data Mining Essentials
13 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Chapter 6 - Data Mining Techniques
No ratings yet
Chapter 6 - Data Mining Techniques
19 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
36 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
35 pages
DWDM 01 Introduction
No ratings yet
DWDM 01 Introduction
43 pages
Data Mining: Key Issues and Tasks
No ratings yet
Data Mining: Key Issues and Tasks
5 pages
1.1 - Data Mining
No ratings yet
1.1 - Data Mining
18 pages
01 Intro
No ratings yet
01 Intro
26 pages
Data Mining
No ratings yet
Data Mining
26 pages
Chapitre 1
No ratings yet
Chapitre 1
22 pages
Reflection Paper Darren Carino
No ratings yet
Reflection Paper Darren Carino
2 pages
Grade 12 Socialization Lesson Plan
No ratings yet
Grade 12 Socialization Lesson Plan
6 pages
Software
No ratings yet
Software
93 pages
Unit-1 PPT
No ratings yet
Unit-1 PPT
21 pages
IDW Lecture 31 - Basic Concepts About Data Mining
No ratings yet
IDW Lecture 31 - Basic Concepts About Data Mining
9 pages
MCQs B.ed and Principles of Teaching For HeadShip Test Etc - Ver-A-Nov10-2015
No ratings yet
MCQs B.ed and Principles of Teaching For HeadShip Test Etc - Ver-A-Nov10-2015
9 pages
Data Mining and Warehousing Guide
No ratings yet
Data Mining and Warehousing Guide
27 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
85 pages
Introduction to Data Mining Techniques
No ratings yet
Introduction to Data Mining Techniques
6 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
2020 - UNIT 2 Chapter 1
No ratings yet
2020 - UNIT 2 Chapter 1
73 pages
History and Foundations of Computer Science
No ratings yet
History and Foundations of Computer Science
19 pages
15 Module in ADV NURSING STAT
No ratings yet
15 Module in ADV NURSING STAT
6 pages
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
No ratings yet
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
24 pages
Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm
No ratings yet
Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm
33 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
31 pages
Unit - Introduction - : Data Mining: Concepts and Techniques
No ratings yet
Unit - Introduction - : Data Mining: Concepts and Techniques
56 pages
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
33 pages
Data Mining
No ratings yet
Data Mining
22 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
10 pages
SF 9 Report Card Automated
No ratings yet
SF 9 Report Card Automated
28 pages
Behavioral Economics: Glossary
No ratings yet
Behavioral Economics: Glossary
7 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
DPS TERM-2 Syllabus
No ratings yet
DPS TERM-2 Syllabus
2 pages
RMT Unit 5 Cont...
No ratings yet
RMT Unit 5 Cont...
4 pages
Evolution of Communication in India
No ratings yet
Evolution of Communication in India
42 pages
WHLP Answer Sheet Q1 M1 L2 L3
No ratings yet
WHLP Answer Sheet Q1 M1 L2 L3
4 pages
Asu Major Map
No ratings yet
Asu Major Map
5 pages
Chapter 13 - Slides Leadership
No ratings yet
Chapter 13 - Slides Leadership
36 pages
Career Counselling Approaches in Pakistani Students
No ratings yet
Career Counselling Approaches in Pakistani Students
9 pages
Bahan Ajar 2 (PPt. Energi Terbarukan)
No ratings yet
Bahan Ajar 2 (PPt. Energi Terbarukan)
13 pages
صهيب عادل دفع الله العوض - مناهج بحث - ادارة اعمال PDF
No ratings yet
صهيب عادل دفع الله العوض - مناهج بحث - ادارة اعمال PDF
4 pages
List of Course Selection Advisors
No ratings yet
List of Course Selection Advisors
3 pages
The Hawthorne Effect How Attention Can Improve Productivity
No ratings yet
The Hawthorne Effect How Attention Can Improve Productivity
6 pages
Task-Based Teaching in Writing
No ratings yet
Task-Based Teaching in Writing
13 pages
Undergraduate Students' Motivation On English Language Learning at Universitas Teknokrat Indonesia Wulandari Pranawengtias
No ratings yet
Undergraduate Students' Motivation On English Language Learning at Universitas Teknokrat Indonesia Wulandari Pranawengtias
6 pages
Garas Abstract
No ratings yet
Garas Abstract
5 pages
KB5021 - 2023-24 - Coursework - Specification - Tagged
No ratings yet
KB5021 - 2023-24 - Coursework - Specification - Tagged
5 pages
Progressive Education Reform Insights
No ratings yet
Progressive Education Reform Insights
4 pages
Blackwell Model
No ratings yet
Blackwell Model
3 pages
Result 26
No ratings yet
Result 26
1 page

Lecture 1 - Data Mining 101

Uploaded by

Lecture 1 - Data Mining 101

Uploaded by

Lecture 1 – Data Mining 101

[email protected] | Department of Computing and Information Sciences

• Data mining is the process of discovering interesting patterns and knowledge

Typical Data Warehouse Framework Multidimensional Data Cube

• Efficiency and Scalability

• Diversity of Database Types

• Data Mining and Society

You might also like