Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views27 pages

DM Lec01

This document outlines an introductory lecture on data mining, covering its necessity, objectives, and key tools and technologies. It highlights the importance of data mining in various application domains, such as marketing and retail, and discusses the syllabus and examination pattern for the course. The session aims to equip students with foundational knowledge about data mining processes and techniques.

Uploaded by

offadarsh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views27 pages

DM Lec01

This document outlines an introductory lecture on data mining, covering its necessity, objectives, and key tools and technologies. It highlights the importance of data mining in various application domains, such as marketing and retail, and discusses the syllabus and examination pattern for the course. The session aims to equip students with foundational knowledge about data mining processes and techniques.

Uploaded by

offadarsh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Data Mining

Lecture I
Introduction to Data Mining, Syllabus & Examination Pattern
Dr. Tamal Mondal

Symbiosis International (Deemed


University)
Session Objectives
By the end of this session, you will be able to:
1. Figure out the reason behind the popularity of data mining, and how does it
fit into the natural progression of information technology?
2. Find out key basis of data mining in information technology.
3. Explore the fundamental goals and objectives of data mining in information
technology and retrieval.
4. Frame an introduction to the types of data and related tools/technologies
used for mining process.
5. Find out targeted application domains where the data mining can be used for
information retrieval, predictive analytics and many more.
6. Understand the Syllabus and Examination Pattern of the Course.
What Nécessitâtes Data Mining?
 Voluminous Data
Generation.
 Traditional and
Conventional
Techniques.
 Data Generation Varies.
 Necessity of Utilizing
Such Data.
Data Mining at Conceptual Level
Solution??? Data Mining
What Nécessitâtes Data Mining?
Key Basis of Data Mining
 Analyzing Data
 Mathematical, Statistical and
Computing Tools.
 Identifying Patterns
and Trends
 Evaluating the Relationships,
Visualization.
 Decision or Prediction
 Learning Trends for Generalizing
Future Insights.
Objectives of Data Mining

 Identifying patterns and


trends.
 Predictive Analysis.
 Detecting anomalies.
 Object Segmentation.
 Improving decision
making.
Data Mining Examples
Data mining is a tool used by mobile service providers to create
marketing campaigns and keep clients from switching to rival
suppliers.

Data mining technologies are capable of predicting "churn," or


the number of consumers who are looking to switch providers,
using a vast amount of data, including billing details, emails,
texts, web data flows, and customer service.

Based on these outcomes, a probability score is provided.


Customers who are more likely to churn can then receive offers
and incentives from the mobile service providers. Major service
providers, like those who offer phone, gas, and broadband,
frequently use this type of mining.
Data Mining Examples
Supermarket and retail business owners can better understand
their clients' preferences thanks to data mining. The data
mining technologies display the clients' purchasing inclinations
based on their past purchases.

The supermarkets use these findings to help them arrange


products on shelves and launch promotions, such as coupons
for complementary goods and exclusive price cuts on specific
products.

RFM grouping is the foundation of these efforts. RFM is an


acronym for monetary grouping, frequency, and recency.
These segments have specific marketing strategies and
promotions. A consumer who spends a lot of money but does
so infrequently would receive different treatment than one
who makes smaller purchases every two to three days.
Data Mining Tools & Technologies
Data Mining Tools & Technologies
I. Different Forms of Data
 Structured Database Models (Relational Model).
 Data Warehouses.
 Online Data Repositories.
 Transactional data.
 Others (data streams, ordered/sequence data, graph
or networked data, spatial data, text data, multimedia
data).
Data Mining Tools & Technologies
I. Different Forms of Data
 Structured Database Models (Relational Model).
Database Languages
Data Mining Tools & Technologies
I. Different Forms of Data
 Data Warehouses. (Structured Database Models)
Data Mining Tools & Technologies
I. Different Forms of Data
 Online Data Repositories.

crisislex - Bing images

crisisnlp - Bing images


Data.gov - Bing images
Data Mining Tools & Technologies
I. Different Forms of Data
 Transactional data. (Semi-Structured)
 Customer's purchase, a
travel reservation, or a
user's clicks on a web
page and many more.
 TransactionID,
additional details and
other related schema.
Data Mining Tools & Technologies
I. Different Forms of Data
 Others (data streams, ordered/sequence data, graph or
networked data, spatial data, text data, multimedia
data). (Semi Structured or Unstructured)
Data Mining Tools & Technologies
I. Different Forms of Data
Database, Flat files, Excel
 Stored Data Sheets etc.

 Streaming of Data
Data Mining Tools & Technologies
II. Pattern Recognition
 Descriptive mining jobs characterize data attributes in
a target data set.
Preprocessing Steps – Missing Values and Outliers, Understanding Data
through Statistical Operations, Exploring Class Characteristics Based on
Features, Feature Engineering and many more.
 Predictive mining tasks use induction on current data
to produce predictions.
Exploring models for describing or distinguishing data objects
(Regression or Classification).
 Clustering
Data set is grouped based on their distances or similarity scores into
segments.
Data Mining Tools & Technologies
II. Pattern Recognition
 Descriptive Mining
 Data summarization on Facts – Facts like product category, producer, location, or time.
 Data Cubes – Multidimensional Data Aggregation on Facts (3D and More), Operations like
Drill Down, Roll Up etc. are applicable.
 The output of such multidimensional summarization can be presented in various forms,
such as pie charts, bar charts, curves, multidimensional data cubes, and multidimensional
tables, contingency table.
 Mining Frequent Patterns, Associations and Correlations
 Notion of Attributes – Numerical/Categorical, Nominal/Ordinal/Interval/Ratio,
Discrete/Continuous.
 Description & Association of Attributes – Central Tendency, Variance, Correlation, Contingency
Table, Scatter Plots.
 Associating Patterns (Single Dimension)
 Suppose that, a webstore manager wants to know which items are frequently purchased together (i.e.,
in the same transaction). An example of such a rule, mined from the transactional database, is
buys(X, “computer”) ⇒ buys(X, “webcam”) [support = 1%, confidence = 50%]
where X is a variable representing a customer. A confidence, or certainty, of 50% means that if a customer
buys a computer, there is a 50% chance that she will buy webcam as well. A 1% that 1% of all the transactions
under analysis show that computer and webcam are purchased together.
Data Mining Tools & Technologies
II. Pattern Recognition
 Descriptive Mining
 Associating Patterns (Multi-Dimensional)
 Suppose, mining the same database generates another association rule, age(X, “20..29”) ∧ income(X,
“40K..49K”) ⇒ buys(X, “laptop”), [support = 0.5%, confidence = 60%] . The rule indicates that of all its
customers under study, 0.5% are 20 to 29 years old with an income of $40,000 to $49,000 and have
purchased a laptop (computer). There is a 60% probability that a customer in this age and income group
will purchase a laptop.
 Typically, association rules are discarded as uninteresting if they do not satisfy both a minimum support
threshold and a minimum confidence threshold.

 Predictive Mining
 Classification is the process of finding a model (or function) that describes and distinguishes data classes
or concepts - classification rules (i.e., IF-THEN rules), a decision tree, a mathematical formula, or a
learned neural network.
 Regression analysis is a statistical methodology that is most often used for numeric prediction, although
other methods exist as well.
Data Mining Tools & Technologies
II. Pattern Recognition
 Predictive Mining
Data Mining Tools & Technologies
III. Mathematical Tools
 Summarizing Data Properties.
 Central tendency (mean, median, mode) and measures of dispersion
(variance, standard deviation)
 Drawing Conclusions from Data
 Hypothesis Testing and Confidence Interval.
 Describing Uncertainties and Randomness
 Probability, Bayesian Models, Hidden Markov Models.
 Minimizing Errors / Maximizing Accuracy
 Calculus, Prediction, Optimization, Modelling.
 Data Transformation for Complicated Data sets
 Linear Algebra, Matrix Operations etc.
Data Mining Tools & Technologies
IV. Machine Learning
Supervised Learning
Unsupervised Learning
 Ensemble Learning
 Deep Learning
Semi-Supervised Learning
V. Programming Languages Libraries and Development
Environment
 Python - NumPy, Pandas, and Scikit-learn etc.
 R - dplyr, tidyr, and caret
 SQL (Structured Query Language)
Data Mining Application Domain

https://data-flair.training/blogs/data-science-applications/
Syllabus Coverage
Introduction (Lecture 01) Know about Data & Patterns (Lecture 02 – Lecture 04)
1. Need for Data Mining 1. Data Objects & Attributes
2. Key Idea Behind Data Mining 2. Statistical Measures
3. Objectives & Goals of Data Mining 3. Visualization
4. Data, Related Tools & Techniques 4. Introducing Patterns – Class, Associations, Correlations,
5. Different Application Domains – Business, Social Media, Interestingness
Agriculture, and many more 5. Pattern Classification & Regression

Data Warehousing: Fundamental Concept (Lecture 05 – Lecture 06) Introduction to Classification & Regression (Lecture 07 – Lecture 08)
1. Introduction 1. Basic Concept
2. Models & Architecture 2. General Approach to Regression & Classification
3. Processing & Meta Data 3. Basic Regression Analysis – Linear & Non-linear
4. Multidimensional Data Modeling 4. Rule based Classification – IF-THEN, 1R, Decision Trees
5. Covering Rules
6. Model Evaluation & Selection
Clustering Analysis – Fundamentals & Techniques (Lecture 09)
Solved Case Studies on Data Mining using Programming Tools
1. Overview of Clustering
2. Partitioning Based Clustering – K-Means, Expected Maximization (Lecture 10 – Lecture 12)
3. Hierarchical Methods: Distance-based Agglomerative and Divisible 1. Financial Data Analysis
Clustering 2. Air Quality Monitoring
4. Conceptual Clustering: Cobweb 3. Social Media Analysis
5. Clustering Evaluation 4. Data Mining and Recommender Systems
Examination Pattern

Decomposition of Internal Marks (60)


 Quiz (Total 10 Marks), Question Type – Objective Type, Number of Questions – 10 (each having 1
Marks) – All Questions are Compulsory to Answer.
 Experiential Learning Assignment (Total 20 Marks), Question Type – Problem Solving, Number of
Questions – 2 (each having 10 Marks) – All Questions are Compulsory to Answer.
 Mid-Term Examination (Total 30 Marks), Question Type – Subjective, Number of Questions – 3 (each
having 10 Marks)

External Examination (40)


Session Outcomes
In this session you learned about:
1. Requirement & Key Basis of Data Mining Process
2. Key Objectives of Data Mining.
3. Introduction to the Data Mining Tools & Technologies
4. Various Application Domains of Data Mining.
Thank You

You might also like