Data Mining
Lecture I
Introduction to Data Mining, Syllabus & Examination Pattern
Dr. Tamal Mondal
Symbiosis International (Deemed
University)
Session Objectives
By the end of this session, you will be able to:
1. Figure out the reason behind the popularity of data mining, and how does it
fit into the natural progression of information technology?
2. Find out key basis of data mining in information technology.
3. Explore the fundamental goals and objectives of data mining in information
technology and retrieval.
4. Frame an introduction to the types of data and related tools/technologies
used for mining process.
5. Find out targeted application domains where the data mining can be used for
information retrieval, predictive analytics and many more.
6. Understand the Syllabus and Examination Pattern of the Course.
What Nécessitâtes Data Mining?
Voluminous Data
Generation.
Traditional and
Conventional
Techniques.
Data Generation Varies.
Necessity of Utilizing
Such Data.
Data Mining at Conceptual Level
Solution??? Data Mining
What Nécessitâtes Data Mining?
Key Basis of Data Mining
Analyzing Data
Mathematical, Statistical and
Computing Tools.
Identifying Patterns
and Trends
Evaluating the Relationships,
Visualization.
Decision or Prediction
Learning Trends for Generalizing
Future Insights.
Objectives of Data Mining
Identifying patterns and
trends.
Predictive Analysis.
Detecting anomalies.
Object Segmentation.
Improving decision
making.
Data Mining Examples
Data mining is a tool used by mobile service providers to create
marketing campaigns and keep clients from switching to rival
suppliers.
Data mining technologies are capable of predicting "churn," or
the number of consumers who are looking to switch providers,
using a vast amount of data, including billing details, emails,
texts, web data flows, and customer service.
Based on these outcomes, a probability score is provided.
Customers who are more likely to churn can then receive offers
and incentives from the mobile service providers. Major service
providers, like those who offer phone, gas, and broadband,
frequently use this type of mining.
Data Mining Examples
Supermarket and retail business owners can better understand
their clients' preferences thanks to data mining. The data
mining technologies display the clients' purchasing inclinations
based on their past purchases.
The supermarkets use these findings to help them arrange
products on shelves and launch promotions, such as coupons
for complementary goods and exclusive price cuts on specific
products.
RFM grouping is the foundation of these efforts. RFM is an
acronym for monetary grouping, frequency, and recency.
These segments have specific marketing strategies and
promotions. A consumer who spends a lot of money but does
so infrequently would receive different treatment than one
who makes smaller purchases every two to three days.
Data Mining Tools & Technologies
Data Mining Tools & Technologies
I. Different Forms of Data
Structured Database Models (Relational Model).
Data Warehouses.
Online Data Repositories.
Transactional data.
Others (data streams, ordered/sequence data, graph
or networked data, spatial data, text data, multimedia
data).
Data Mining Tools & Technologies
I. Different Forms of Data
Structured Database Models (Relational Model).
Database Languages
Data Mining Tools & Technologies
I. Different Forms of Data
Data Warehouses. (Structured Database Models)
Data Mining Tools & Technologies
I. Different Forms of Data
Online Data Repositories.
crisislex - Bing images
crisisnlp - Bing images
Data.gov - Bing images
Data Mining Tools & Technologies
I. Different Forms of Data
Transactional data. (Semi-Structured)
Customer's purchase, a
travel reservation, or a
user's clicks on a web
page and many more.
TransactionID,
additional details and
other related schema.
Data Mining Tools & Technologies
I. Different Forms of Data
Others (data streams, ordered/sequence data, graph or
networked data, spatial data, text data, multimedia
data). (Semi Structured or Unstructured)
Data Mining Tools & Technologies
I. Different Forms of Data
Database, Flat files, Excel
Stored Data Sheets etc.
Streaming of Data
Data Mining Tools & Technologies
II. Pattern Recognition
Descriptive mining jobs characterize data attributes in
a target data set.
Preprocessing Steps – Missing Values and Outliers, Understanding Data
through Statistical Operations, Exploring Class Characteristics Based on
Features, Feature Engineering and many more.
Predictive mining tasks use induction on current data
to produce predictions.
Exploring models for describing or distinguishing data objects
(Regression or Classification).
Clustering
Data set is grouped based on their distances or similarity scores into
segments.
Data Mining Tools & Technologies
II. Pattern Recognition
Descriptive Mining
Data summarization on Facts – Facts like product category, producer, location, or time.
Data Cubes – Multidimensional Data Aggregation on Facts (3D and More), Operations like
Drill Down, Roll Up etc. are applicable.
The output of such multidimensional summarization can be presented in various forms,
such as pie charts, bar charts, curves, multidimensional data cubes, and multidimensional
tables, contingency table.
Mining Frequent Patterns, Associations and Correlations
Notion of Attributes – Numerical/Categorical, Nominal/Ordinal/Interval/Ratio,
Discrete/Continuous.
Description & Association of Attributes – Central Tendency, Variance, Correlation, Contingency
Table, Scatter Plots.
Associating Patterns (Single Dimension)
Suppose that, a webstore manager wants to know which items are frequently purchased together (i.e.,
in the same transaction). An example of such a rule, mined from the transactional database, is
buys(X, “computer”) ⇒ buys(X, “webcam”) [support = 1%, confidence = 50%]
where X is a variable representing a customer. A confidence, or certainty, of 50% means that if a customer
buys a computer, there is a 50% chance that she will buy webcam as well. A 1% that 1% of all the transactions
under analysis show that computer and webcam are purchased together.
Data Mining Tools & Technologies
II. Pattern Recognition
Descriptive Mining
Associating Patterns (Multi-Dimensional)
Suppose, mining the same database generates another association rule, age(X, “20..29”) ∧ income(X,
“40K..49K”) ⇒ buys(X, “laptop”), [support = 0.5%, confidence = 60%] . The rule indicates that of all its
customers under study, 0.5% are 20 to 29 years old with an income of $40,000 to $49,000 and have
purchased a laptop (computer). There is a 60% probability that a customer in this age and income group
will purchase a laptop.
Typically, association rules are discarded as uninteresting if they do not satisfy both a minimum support
threshold and a minimum confidence threshold.
Predictive Mining
Classification is the process of finding a model (or function) that describes and distinguishes data classes
or concepts - classification rules (i.e., IF-THEN rules), a decision tree, a mathematical formula, or a
learned neural network.
Regression analysis is a statistical methodology that is most often used for numeric prediction, although
other methods exist as well.
Data Mining Tools & Technologies
II. Pattern Recognition
Predictive Mining
Data Mining Tools & Technologies
III. Mathematical Tools
Summarizing Data Properties.
Central tendency (mean, median, mode) and measures of dispersion
(variance, standard deviation)
Drawing Conclusions from Data
Hypothesis Testing and Confidence Interval.
Describing Uncertainties and Randomness
Probability, Bayesian Models, Hidden Markov Models.
Minimizing Errors / Maximizing Accuracy
Calculus, Prediction, Optimization, Modelling.
Data Transformation for Complicated Data sets
Linear Algebra, Matrix Operations etc.
Data Mining Tools & Technologies
IV. Machine Learning
Supervised Learning
Unsupervised Learning
Ensemble Learning
Deep Learning
Semi-Supervised Learning
V. Programming Languages Libraries and Development
Environment
Python - NumPy, Pandas, and Scikit-learn etc.
R - dplyr, tidyr, and caret
SQL (Structured Query Language)
Data Mining Application Domain
https://data-flair.training/blogs/data-science-applications/
Syllabus Coverage
Introduction (Lecture 01) Know about Data & Patterns (Lecture 02 – Lecture 04)
1. Need for Data Mining 1. Data Objects & Attributes
2. Key Idea Behind Data Mining 2. Statistical Measures
3. Objectives & Goals of Data Mining 3. Visualization
4. Data, Related Tools & Techniques 4. Introducing Patterns – Class, Associations, Correlations,
5. Different Application Domains – Business, Social Media, Interestingness
Agriculture, and many more 5. Pattern Classification & Regression
Data Warehousing: Fundamental Concept (Lecture 05 – Lecture 06) Introduction to Classification & Regression (Lecture 07 – Lecture 08)
1. Introduction 1. Basic Concept
2. Models & Architecture 2. General Approach to Regression & Classification
3. Processing & Meta Data 3. Basic Regression Analysis – Linear & Non-linear
4. Multidimensional Data Modeling 4. Rule based Classification – IF-THEN, 1R, Decision Trees
5. Covering Rules
6. Model Evaluation & Selection
Clustering Analysis – Fundamentals & Techniques (Lecture 09)
Solved Case Studies on Data Mining using Programming Tools
1. Overview of Clustering
2. Partitioning Based Clustering – K-Means, Expected Maximization (Lecture 10 – Lecture 12)
3. Hierarchical Methods: Distance-based Agglomerative and Divisible 1. Financial Data Analysis
Clustering 2. Air Quality Monitoring
4. Conceptual Clustering: Cobweb 3. Social Media Analysis
5. Clustering Evaluation 4. Data Mining and Recommender Systems
Examination Pattern
Decomposition of Internal Marks (60)
Quiz (Total 10 Marks), Question Type – Objective Type, Number of Questions – 10 (each having 1
Marks) – All Questions are Compulsory to Answer.
Experiential Learning Assignment (Total 20 Marks), Question Type – Problem Solving, Number of
Questions – 2 (each having 10 Marks) – All Questions are Compulsory to Answer.
Mid-Term Examination (Total 30 Marks), Question Type – Subjective, Number of Questions – 3 (each
having 10 Marks)
External Examination (40)
Session Outcomes
In this session you learned about:
1. Requirement & Key Basis of Data Mining Process
2. Key Objectives of Data Mining.
3. Introduction to the Data Mining Tools & Technologies
4. Various Application Domains of Data Mining.
Thank You