Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views41 pages

Data Mining Lecture 1 Arabic

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views41 pages

Data Mining Lecture 1 Arabic

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Data Warehousing and Data Mining

INTRODUCTION

Lecture 1

Dr. Sultan Yahya Al-Sultan


Assistant Professor
Books
▶ Required Text
▶ Data mining: Concepts and Techniques, by Jiawei Han and
Micheline Kamber, Morgan Kaufmann, ISBN 1-55860-489-8
▶ Pang-Ning Tan, Michael Steinbach, Vipin Kumar:
Introduction to Data Mining. 2nd Edition.Pearson / Addison
Wesley.
▶ Attached books
▶ Data mining: Concepts and Techniques
"Introduction to Data Mining:
1. The book provides a comprehensive introduction to the concept of data
mining and its importance in extracting knowledge from large datasets.
1.Basic Concepts:
1. It explains fundamental concepts in data mining such as exploratory
analysis, classification, clustering, and association analysis.
2.Key Techniques:
1. The book covers important techniques such as neural networks, decision
trees, factor analysis, and classification using various algorithms.
3.Big Data Analysis:
1. The book also addresses how to deal with big data and use data mining
techniques to extract knowledge from it.
4.Predictive Analysis:
1. It presents methods for predictive analysis and how to use data mining
techniques to predict trends and future events.
5.Practical Applications:
1. The book discusses case studies and practical applications of data mining in
various fields such as marketing, healthcare, and finance.
Course structure
▶ The course has two parts:
▶ Lectures - Introduction to the main topics
▶ Projects (done in groups)
▶ 1 project OR,
▶ 1 research project…but we will follow the project.

3
Grading

 Attendance … %

 Interaction …%

 Assignments … % …

 Midterm + others : …%

 Projects: …%

 Final Exam: ….%

4
Course Topics
▶ Introduction
▶ Data pre-processing
▶ Data warehousing
▶ Association rules and sequential patterns
▶ Classification (supervised learning)
▶ Clustering (unsupervised learning)
▶ Post-processing of data mining results
▶ Measures of Interestingness
▶ Objective Measures
▶ Subjective Measures

5
Rules

• Keep your phone silent

you should be in class on time

We are a…….
Lecture Outline
 Motivation: Why Data Mining?
 What is Data Mining?
 History of Data Mining
 Data Mining Tasks
 Data Mining Techniques
 Data Mining Applications
 Are all the Patterns Interesting?
 Issues in Data Mining
Data Mining Motivation
“The key in business is to know something that
nobody else knows.”
— Aristotle Onassis

“To understand is to perceive-recognize- patterns.”


— Sir Isaiah Berlin
Necessity is the Mother of Invention

 Data explosion
 Huge amount of data stored in databases, data warehouses and other
information repositories
 This data need to data collection tools and advanced database technology via
mining and discovering new patterns and trends

 We are drowning in sea of data, but starving-looking for - for


knowledge!
Necessity is the Mother of Invention

 We are drowning in data,

 but starving for knowledge!

 Solution

 Data Mining
 Extraction of interesting
knowledge (rules, regularities,
patterns, constraints) from data
in large databases
Data Warehousing and Storage

Analysis and extract

11
Why Data mining?

▶ Huge amounts of data


▶ Electronic records of our decisions
▶ Choices in the supermarket
▶ Financial records
▶ Our comings and goings
▶ Data rich – but information poor
▶ We want to dig
Why Data Mining?
From a managerial perspective:

Analyzing trends
Wealth generation

Security

Strategic decision making


Why Data Mining?
▶ Huge amount of data ex:

▶ Google has Peta Bytes of web data

▶ Facebook has billions of active users

▶ Amazon handles millions of visits/day


Data vs. Information
Society produces massive amounts of data
▶ business, science, medicine, economics, sports, …

▶ Raw data is useless


▶ need techniques to automatically extract information

▶ Data: raw facts

▶ Information: patterns come form processed data


What is DATA MINING?
▶ Extracting or “mining” knowledge from large amounts of
data
▶ Discovery and modeling of hidden patterns (never existed
before) in large volumes of data
▶ Extraction of implicit, previously unknown and
unexpected, novel, potentially extremely useful
information from data
………..
Needs of different levels of Management

▶ Operational Level – Control Oriented Data

▶ Tactical (Middle) Level – Planning and Control Information


Oriented

▶ Strategic (Top) Level – Fundamentally Planning


Knowledge
oriented
Knowledge pyramid

Wisdom

Intelligence Knowledge + experience

Knowledge Information + rules

Information Data + context

Data
Data Mining
▶ Look for hidden patterns and trends in data that is not
immediately apparent from summarizing the data

▶ No Query…

▶ …But an “Interestingness criteria”


Data Mining

+ =
Interestingness Hidden
Data criteria patterns
Data Mining
Type of data Type of
Interestingness criteria

+ =
Interestingness Hidden
Data criteria patterns
Data Mining
Type of Patterns

+ =
Interestingness Hidden
Data criteria patterns
Data Mining is NOT
▶ Data Warehousing
▶ (Deductive) query processing
▶ SQL/ Reporting
▶ Software Agents
▶ Expert Systems
▶ Online Analytical Processing (OLAP)
▶ Statistical Analysis Tool
▶ Data visualization
Great Opportunities to Solve Society’s Major Problems

Improving health care and reducing costs Predicting the impact of climate change

Finding alternative/ green energy sources Reducing hunger and poverty by


increasing agriculture production

24
Data Mining Challenges

▶ Problem 1: most patterns are not interesting

▶ Problem 2: patterns may be inexact or completely fake


when noisy data present
Multidisciplinary Field

Database
Statistics
Technology

Machine
Learning
Data Mining Visualization

Artificial Other
Intelligence Disciplines
Data mining History
▶ Emerged late 1980s
▶ Grown –1990s
▶ Roots traced back along three family lines
▶ Classical Statistics
▶ Artificial Intelligence
▶ Machine Learning….Deep learning
Statistics
▶ Foundation of most DM technologies
▶ Regression analysis
▶ Standard distribution
▶ Deviation/Variance
▶ Cluster analysis
▶ Confidence intervals
Data Mining vs Statistical Inference

Statistics:

Statistical
Conceptual Reasoning
Model
(Hypothesis)

“Proof”
(Validation of Hypothesis)
Data Mining vs Statistical Inference

Data mining:

Mining
Algorithm
Data Based on
Interestingness

Pattern
(model, rule,
hypothesis)
discovery
Artificial Intelligence
▶ Artificial intelligence is the branch of computer science to
create computers that think like humans.
▶ Heuristics vs. Statistics
▶ Human-thought-like processing
▶ Requires vast computer processing power
▶ Supercomputers
Machine Learning
▶ Union of statistics and AI
▶Combinations of AI heuristics with advanced statistical
analysis
▶ Machine Learning – let computer programs
▶ learn about data they study –
▶ make different decisions based on the quality of studied data
▶ using statistics for fundamental concepts and adding more advanced AI
heuristics and algorithms
Data Mining: A KDD Process

▶ Data mining: the core of Pattern Evaluation


knowledge discovery process.
Data Mining

Task-relevant Data
Data Selection

Data Warehouse
Data preprocessing
Data Cleaning
Data Integration

Databases
KDD Process
Steps of a KDD Process
1. First :Ask about:

1. what is the application domain

2. What are relevant prior knowledge and goals of application

2. Creating a target data set: data selection

3. Data cleaning and preprocessing: (may take 60% of effort!)

4. Data reduction and transformation:

▶ Find useful features

▶ variable reduction

▶ Invariants-constants- representation.

▶ Choosing functions of data mining

▶ summarization, classification, regression, association, clustering.

▶ Choosing the mining algorithm(s)


Steps of a KDD Process
5. Data mining: search for patterns of interest

6. Pattern evaluation and knowledge presentation

▶ Visualization

▶ transformation

▶ removing redundant patterns.

▶ Then we will use of discovered knowledge


Data Mining and Business Intelligence
Increasing potential
to support
business decisions End User
Making
Decisions

Data Presentation Business Analyst


Visualization Techniques
Data Mining Data Analyst
Information Discovery
Data Exploration
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
OLAP…. DBA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
37
Learning Algorithms
▶ Fundamental idea:

learn rules/patterns/relationships
automatically from the data
Data Mining Tasks
1. Prediction Tasks
▶ Use some variables to predict unknown or future values of other variables
2. Description Tasks
▶ Find human patterns that describe the data.

3. Common data mining tasks


▶ Classification [Predictive]
▶ Clustering [Descriptive]
▶ Association Rule Discovery [Descriptive]
▶ Sequential Pattern Discovery [Descriptive]
▶ Regression [Predictive]
▶ Deviation Detection [Predictive]
The End

My best wishes for success to all of you….


.

Dr. Sultan Yahya Al-Sultan


Assistant Professor

You might also like