0% found this document useful (0 votes)

2 views38 pages

Unit 2 Introduction To Data Mining

Data mining is the process of extracting useful information from large datasets, which has gained prominence due to the increasing availability of data. It encompasses various functionalities such as classification, clustering, and pattern recognition, and is a crucial step in the Knowledge Discovery in Databases (KDD) process. The document also discusses the types of data that can be mined, the attributes associated with data objects, and the challenges faced in data mining methodologies.

Uploaded by

narayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views38 pages

Unit 2 Introduction To Data Mining

Uploaded by

narayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Data Warehousing & Data

Mining
1

B S C . C S I T , 7 TH S E M
UNIT: 2 INTRODUCTION TO DATA MINING
Motivation For Data Mining
2

 Data Mining is defined as the process of extracting information or

knowledge from huge amount of data.
 Data mining is mining knowledge from the data.
 The major reason that data mining has attracted a great deal of
attention in information industry in recent years is due to the wide
availability of huge amounts of data and the imminent need for turning
such data into useful information and knowledge.
 The data available in the industry has no use until it is converted into
useful information.
 So, it is necessary to analyze huge amount of data and extract useful
information from it.
 The information and knowledge gained can be used for applications
ranging from business management, production control, and market
analysis, to engineering design and science exploration.
Introduction:
Data Mining
3

 Data mining is the process of discovering interesting patterns and

knowledge from the huge amount of data.
 Data mining is also known as knowledge mining, knowledge
extraction, data/pattern analysis, data archaeology, and data
dredging.
 Data Mining is one of the essential step in the process of
KDD(Knowledge Discovery from Data ).
 The information or knowledge extracted by data mining can be used
for applications like:
• Market Analysis
• Fraud Detection
•Customer Retention
• Production Control
• Science Exploration
What kinds of data can be mined by data mining?
4

A data mining can mined the

1. Database data
 Database management system consists of collections of
inter-related data, known as database and a set of
program to access those data.
 A relational database consists of collection of tables, each
of which is assigned a unique name.
 The table consists of rows and columns. The row
contains a large set of tuples (records) and the column
contains a set of attributes(fields).
Data to be Mined
5
2. Data warehouse data
 A data warehouse is a repository of information collected from multiple,
heterogeneous sources and placed in a single site.
 A data warehouse is a subject oriented, integrated, time variant and non-
volatile collection of data that helps in the management decision making
process.
 Data warehouses are constructed via a process of data cleaning, data
integration, data transformation, data loading and periodic data refreshing.
 A data warehouse is usually modeled by a multi-dimensional data
structure called data cube which allows data to be modeled and viewed in
a multiple dimension.
 Each data cube consists of dimensions which corresponds to an attribute
or a set of attributes in the schema and a Cell stores the value of some
aggregated measure.
Data to be Mined
6

3. Transactional data
 A transaction data base consists of a transactions like customer
purchase, flight bookings etc..
 A transaction typically includes a unique transaction identifier
(TID) and a set of items associated with that transaction.
4. Other kinds of data
a) Time series data
b) Spatial data
c) Multi media data
d) Web data etc…
Functionalities of Data Mining
7

 The data mining functionalities are used to specify the

kinds of patterns to be found in data mining tasks.
 In general, such tasks can be classified into two
categories: descriptive and predictive.
 Descriptive mining tasks characterize properties of the
data in a target data set.
 Predictive mining tasks perform inducti0n on the current
data in order to make predictions.
 Data mining functionalities and the kinds of patterns they
discover are:
Functionalities of Data Mining
8

1.Class/Concept Description: Characterization and Discrimination:

 Data entries can be associated with classes or concept.
 For example, in the all electronics store, classes of items for sale
include computes and printers, and concepts of customers include big
spenders and budget spenders.
Data characterization
 is a summarization of the general characteristics or features of a target
class of data. There are several methods for effective data
characterization.
 Simple data summaries based on statistical measures and plots
 The data cube- based OLAP roll-up operation
 The output of data characterization can be presented in various form
likes pie charts, bar charts, curves, multidimensional data cube.
Functionalities of Data Mining
9
Functionalities of Data Mining
10

 Data discrimination is a comparison of the general

features of the target class data objects against the general
features of objects from one or multiple contrasting
classes.
 The target and contrasting classes can be specified by a
user.
 The forms of output presentation are similar to those for
characteristic description, although discrimination
description should include comparative measures that help
to distinguish between the target and contrasting classes.
Functionalities of Data Mining
11
Functionalities of Data Mining
12

2. Mining Frequent Patterns, Associations:

 Frequent patterns are patterns that occur frequently in data.
There are many kinds of frequent patterns: frequent item-set,
frequent subsequences.
 A frequent item-set typically refers to a set of items that often
appear together in a transactional data set. For example, milk
and bread, which are frequently bought together in grocery
stores by many customers.
 A frequent subsequence, such as the pattern that customers
tend to purchase first a laptop, followed by a digital camera
and then a memory card.
 Mining frequent patterns leads to the discovery of interesting
associations.
Functionalities of Data Mining
13
Functionalities of Data Mining
14

3.Classification :
 Classification is the process of finding a model(functions) that describe
and distinguish data classes or concepts for future prediction.
 Classification uses predefined classes in which objects are assigned.
 The model are derived based on the analysis of a set of training
data(data objects for which the class labels are known).
 It can be represented in form such as if-then rules, decision trees,
neural networks.
Functionalities of Data Mining
15
Functionalities of Data Mining
16

5.Cluster analysis:
 Unlike classification, which analyze class label(training) data sets, clustering
analyzes data objects without consulting class label.
 In many cases, class label data may simply not exist at the beginning.
 Clustering identifies similarities between objects, which it groups according to
those characteristics in common and which differentiate them from other
groups of objects.
 Clustering can be used to generate class labels for a group of data.
 The objects are clustered on the principle of Maximizing intra-class similarity
& minimizing interclass similarity Clusters of objects are formed so that
objects within a cluster have high similarity in comparison to one another, but
are rather dissimilar to objects in other cluster.
Functionalities of Data Mining
17
Functionalities of Data Mining
18

6.Outlier analysis
 Outlier: Data object that does not comply with the general behavior or
model of the data.
 Many data mining methods discard outliers as noise or exceptions.

 However, in some application the rare events can be more interesting than
the more regularly occurring ones. Useful in fraud detection, rare events
analysis.
 Outliers may be detected using statistical tests that assume a distribution or
probability model for the data or using distance measures where objects that
are remote from any other cluster are considered outliers.
KDD (Knowledge Discovery in Database)
19

 Knowledge Discovery in a database is the process of discovering useful

knowledge from a collection of data.
 Knowledge discovery consist of an iterative sequence of the following
steps:.
1. Data Cleaning
2. Data Integration
3. Data Selection
4. Data Transformation
5. Data Mining
6. Pattern Evaluation
7. Knowledge Presentation
 Data Mining is one of the essential step in the process of KDD.
Stages of KDD
20
Data mining as an essential step in the process of KDD
Stages of KDD Contd…
21

 Data Cleaning- Used to remove noise or inconsistent data.

 Data Integration- where data from multiple heterogeneous
sources are combined.
 Data Selection- where data relevant to the analysis task
are retrieved from the database.
 Data Transformation- where data are transformed and
changed into the form appropriate for mining by
performing summary or aggregation operations.
Stages of KDD Contd…
22

 Data Mining- an essential process where intelligent

methods are applied to extract data patterns.
 Pattern evaluation- used to identify truly interesting
patterns representing knowledge based on the
interestingness measure.
 Knowledge presentation- where visualization and
knowledge presentation techniques are used to present
mined knowledge to user.
Data objects and attribute types
23

 A data objects represents an entity – in a sales database, the

objects may be customers, items, sales.
 Data objects are typically described by attributes.
 An attribute is a data field, representing a characteristic or
feature of a data object.
 Attributes describing a customer object can include
customer_id, name, address.
 Data objects can also be referred to as samples, examples
instances or objects.
 The type of attribute is determined by the set of possible
values- nominal, binary, ordinal, or numeric-the attribute can
have.
Types of Attributes
24

1. Nominal attributes:
 The value of nominal attributes are symbols or names of
things.
 Each value represents some kind of category, code or
state.
 For example: suppose that hair_color and marital_status
are two attributes describing person objects. In our
application, possible values of hair_color are black,
brown, white. The attribute marital_status can take on the
values single, married, divorced. Both hair_color and
marital_status are nominal attributes.
Types of Attributes
25

2.Binary attributes:
 A binary attributes is a nominal attributes with only two categories or
states: 0 or 1 where 0 typically means that the attribute is absent and 1
means it is present.
 Example: given a attribute smoker describing a patient object, 1 indicates
that the patient smokes, while 0 indicates that the patient does not.
3.Ordinal attributes:
 An ordinal attributes is an attribute with possible values that have a
meaningful order or ranking among them, but the magnitude between
successive values is not known.
 Example: suppose that drink_size corresponds to the size of drinks
available at a restaurant. This nominal attribute has three possible values:
small, medium and large. The value have a meaningful sequence however,
we cannot tell them from the values how much bigger, say, a medium is
than a large.
Types of Attributes
26

4. Numeric attributes:
 A numeric attributes is quantitative i.e. it is a measurable quantity,
represented in integer or real values.
 It can be interval scaled or ratio scaled.
 Interval scaled attributes provide a ranking of values, such attributes
allows us to compare and quantify the difference between values.
Examples: calendar dates, temperatures in Celsius or Fahrenheit.
 If a measurement is ratio scaled, we can speak of a value as being a
multiple (ratio) of another value.
Examples: temperature in Kelvin, length, counts, elapsed time (e.g., time
to run a race)
Note: The Fahrenheit scale for temperature has an arbitrary zero point and is
therefore not a ratio scale. However, zero on the Kelvin scale is absolute zero. This
makes the Kelvin scale a ratio scale.
Types of Attributes
27

5. Discrete versus Continuous Attributes:

 A discrete attributes has a finite or countably infinite set of
values , which may or maynot be represented as integers.
 The attributes hair_color, medical-test, drink_size each have
finite number of values and so are discrete.
 An attribute is countably infinite if the set of possible values is
infinite but the values can be put in a one to one
correspondence with natural number.
 For example, the attribute customer_id is countably infinite.
The number of customers can grow to infinity but in reality the
actual set of values is countable.
 If an attribute is not discrete it is continuous. Continuous
attributes are typically represented as floating point variables.
Basic Statistical Description of Data
28

 The Basic statistical descriptions can be used to identify

properties of the data and highlight which data values
should be treated as noise or outliers.
 The three areas of basic statistical descriptions are
1. Measure of Central Tendency
- Measures the location of the middle or centre of a data
distribution.
- Measure of central tendency includes: mean, median,
mode and midrange(The midrange of a data set i s the
average of the minimum and maximum values) .
Basic Statistical Description of Data
29

2. Dispersion of the data

- Measures how the data are spread out.
- The common data dispersion measures are: range,
quartiles, and inter quartile range ; the five-number
summary (the Five-Number Summary of a data set
is a five-item list comprising the minimum value,
first quartile, median, third quartile, and maximum
value of the set) and box plots; and the variance and
standard deviation of the data.
- These measures are useful for identifying outliers.
Basic Statistical Description of Data
30

3. Graphical Data Presentation

- These are used to visually inspect our data.
- Most statistical or graphical data presentation software
packages include bar charts, pie charts, and line graphs.
- Other popular displays of data summaries and
distributions include quantile plots, quantile–quantile
plots, histograms and scatter plots.

Note: For more details go through kambler book unit 2.2.

Issues in data mining
31

In data mining, the algorithm used is complex and data

is not available from single sources so these factors also
create some issues.
 The major issues are
1) Mining Methodology and User Interaction
2) Performance Issues
3) Diverse Data Types Issues
Data Mining issues
32
Mining Methodology and User Interaction Issues
33
a) Mining different kinds of knowledge in databases - Different users may be
interested in different kinds of knowledge. Therefore it is necessary for data mining
to cover a broad range of knowledge discovery task.
b) Interactive mining of knowledge at multiple levels of abstraction -The data
mining process should be highly interactive. Thus it is important to build flexible
user interfaces and exploratory mining environment, facilitating the users
interaction with the system. Interactive mining should allow users to dynamically
change the focus of the search, to refine mining request based on the returned
results.
c) Incorporation of background knowledge – Background knowledge, constraints,
rules, and other information regarding the domain under study should be
incorporated into the knowledge discovery process. Such knowledge can be used for
pattern evaluation as well as to guide the search toward interesting patterns.
d) Data mining query languages and ad hoc data mining - Query language (SQL)
have played and important role in flexible searching because they allows the user to
describe ad hoc mining tasks, should be integrated with a data warehouse query
language and optimized for efficient and flexible data mining.
Mining Methodology and User Interaction Issues
34

e) Presentation and visualization of data mining results - how can a data mining
system present data mining result vividly and flexibly so that the discovered knowledge
can be easily understood and directly usable by human?
Once the patterns are discovered it needs to be expressed in high level languages, and
visual representations. These representations should be easily understandable.
f) Handling noisy or incomplete data - The data cleaning methods are required to
handle the noise and incomplete objects while mining the data regularities. If the data
cleaning methods are not there then the accuracy of the discovered patterns will be
poor.
g) Pattern evaluation - Not all the patterns generated by mining process are interesting.
What makes the pattern interesting may vary from user to user. Therefore, techniques
are needed to assess the interestingness of discovered pattern based on subjective
measures.
The patterns discovered should be interesting because either they represent common
knowledge or lack novelty
Performance Issues
35

a) Efficiency and scalability of data mining algorithms - In order to

effectively extract the information from huge amount of data in
databases, data mining algorithm must be efficient and scalable. In other
words, the running time of a data mining algorithm must be predictable,
short and acceptable by application.
b) Parallel, distributed, and incremental mining algorithms - The factors
such as huge size of databases, wide distribution of data, and
complexity of data mining methods motivate the development of parallel
and distributed data mining algorithms. These algorithms divide the
data into partitions which is further processed in a parallel fashion.
Then the results from the partitions is merged.
In addition, the high cost of some data mining processes and the
incremental nature of input promote incremental data mining, which
incorporates new data updates without having to mine the entire data
from scratch.
Diverse Data Types Issues
36

a) Handling of relational and complex types of data - The

database may contain complex data objects, multimedia data
objects, spatial data, temporal data etc. It is not possible for one
system to mine all these kind of data.
b) Mining information from heterogeneous databases and
global information systems - The data is available at different
data sources on LAN or WAN. These data source may be
structured, semi structured or unstructured. Therefore mining
the knowledge from them adds challenges to data mining.
Applications of Data Mining
37

 Data analysis and decision support

 Market analysis and management
 Target marketing, customer relationship management (CRM), market
basket analysis, cross selling (Cross-selling is a sales technique involving
the selling of an additional product or service to an existing customer),
market segmentation.
 Risk analysis and management
 Forecasting, customer retention, quality control, competitive analysis
 Fraud detection and detection of unusual patterns (outliers)
38

END

Android Developer Intrenship (20-445)
No ratings yet
Android Developer Intrenship (20-445)
46 pages
Data Mining Essentials
No ratings yet
Data Mining Essentials
13 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
02 DM BI Data Mining
No ratings yet
02 DM BI Data Mining
66 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Unit 1
No ratings yet
Unit 1
59 pages
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
No ratings yet
FALLSEM2025 26 - VL - ISWE209L - 00100 - TH - 2025 07 31 - Course Material For Module 1
31 pages
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
Data Mining & Machine Learning Guide
No ratings yet
Data Mining & Machine Learning Guide
19 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
23 pages
Lecture 2 Data Mining Functions
No ratings yet
Lecture 2 Data Mining Functions
40 pages
Unit 1
No ratings yet
Unit 1
148 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
145 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
21 pages
Comprehensive Guide to Data Mining
No ratings yet
Comprehensive Guide to Data Mining
32 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
32 pages
Unit 1
No ratings yet
Unit 1
43 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
15 pages
#CH-2 2 2
No ratings yet
#CH-2 2 2
16 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
11 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
DM 1
No ratings yet
DM 1
47 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
38 pages
Unit-2 Finalized
No ratings yet
Unit-2 Finalized
12 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
Data Mining (Introduction)
No ratings yet
Data Mining (Introduction)
31 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
Chapter 2 Data Mining
No ratings yet
Chapter 2 Data Mining
25 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
21IS503 UnitII LM5
No ratings yet
21IS503 UnitII LM5
20 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
Data Preprocessing Personal
No ratings yet
Data Preprocessing Personal
11 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Introduction
No ratings yet
Introduction
27 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
Data Mining UNIT - 1 (Important)
No ratings yet
Data Mining UNIT - 1 (Important)
7 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
Data Mining: Tasks, Models, and Issues
No ratings yet
Data Mining: Tasks, Models, and Issues
19 pages
01 Intro 1
No ratings yet
01 Intro 1
33 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
DM Module1
No ratings yet
DM Module1
15 pages
Unit 1 Filled Area Primitives
No ratings yet
Unit 1 Filled Area Primitives
6 pages
Unit 6 Classification and Prediction
No ratings yet
Unit 6 Classification and Prediction
66 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
UNIT 3 Data Preprocessing
No ratings yet
UNIT 3 Data Preprocessing
22 pages
Laravel OOP Concepts & Installation Guide
No ratings yet
Laravel OOP Concepts & Installation Guide
3 pages
Historian B0193YL J
No ratings yet
Historian B0193YL J
336 pages
SravaniKoganti Hyderabad Secunderabad, Telangana 3.11 Yrs
No ratings yet
SravaniKoganti Hyderabad Secunderabad, Telangana 3.11 Yrs
3 pages
SQL Server Hacking with PowerShell
No ratings yet
SQL Server Hacking with PowerShell
110 pages
Topic 1 Conditional Formatting
No ratings yet
Topic 1 Conditional Formatting
5 pages
BDDcpaq
No ratings yet
BDDcpaq
664 pages
Power Apps - Troubleshooting Performance Issues - Part 1 - by Mustaque Ehiya - Medium
No ratings yet
Power Apps - Troubleshooting Performance Issues - Part 1 - by Mustaque Ehiya - Medium
14 pages
Tech Stack in Detail For BDM
No ratings yet
Tech Stack in Detail For BDM
9 pages
Co4, Co5, Co6 Rdbms Assignment Solution
No ratings yet
Co4, Co5, Co6 Rdbms Assignment Solution
32 pages
Applying Retrograde Analysis To Nine Men's Morris: Ralph Gasser Informatik ETH 8092 Zürich Switzerland
No ratings yet
Applying Retrograde Analysis To Nine Men's Morris: Ralph Gasser Informatik ETH 8092 Zürich Switzerland
12 pages
Mobile GIS Enhances Algerian Road Management
No ratings yet
Mobile GIS Enhances Algerian Road Management
17 pages
Vikram FullStackJava Resume
No ratings yet
Vikram FullStackJava Resume
4 pages
Redacted Resume
No ratings yet
Redacted Resume
1 page
Developer-Days - NSO-CDM Migration
No ratings yet
Developer-Days - NSO-CDM Migration
25 pages
Ch3 Profiles, Password Policies, Privileges, and Roles
No ratings yet
Ch3 Profiles, Password Policies, Privileges, and Roles
79 pages
Norcal OAUG BI Publisher
No ratings yet
Norcal OAUG BI Publisher
27 pages
Digital Forensics - A Intro
100% (1)
Digital Forensics - A Intro
40 pages
Web-Based Employee Attendance System Development U
No ratings yet
Web-Based Employee Attendance System Development U
13 pages
A Searchable and Verifiable Data Protection Scheme For Scholarly Big Data
No ratings yet
A Searchable and Verifiable Data Protection Scheme For Scholarly Big Data
57 pages
Practical File - 23 - 24
No ratings yet
Practical File - 23 - 24
28 pages
Oracle-Goldengate-Release-Notes - 21 11 0 0 0
No ratings yet
Oracle-Goldengate-Release-Notes - 21 11 0 0 0
62 pages
Azure Free Account for Developers
No ratings yet
Azure Free Account for Developers
8 pages
Data Warehousing Concepts & Architecture
No ratings yet
Data Warehousing Concepts & Architecture
37 pages
Wimax Radio-Planning With Ics Telecom Using High-Resolution Cartography Project Management
No ratings yet
Wimax Radio-Planning With Ics Telecom Using High-Resolution Cartography Project Management
12 pages
Tripleten 5 - Introduction To Table Relationships and Joining Tables
No ratings yet
Tripleten 5 - Introduction To Table Relationships and Joining Tables
31 pages
Reports
100% (1)
Reports
12 pages
Distributed System and Cloud Computing Journal - Abhishek
No ratings yet
Distributed System and Cloud Computing Journal - Abhishek
36 pages
Constraints in SQL
No ratings yet
Constraints in SQL
75 pages
Huawei Data Transfer Process
No ratings yet
Huawei Data Transfer Process
4 pages

Unit 2 Introduction To Data Mining

Uploaded by

Unit 2 Introduction To Data Mining

Uploaded by

Data Warehousing & Data

 Data Mining is defined as the process of extracting information or

 Data mining is the process of discovering interesting patterns and

A data mining can mined the

 The data mining functionalities are used to specify the

1.Class/Concept Description: Characterization and Discrimination:

 Data discrimination is a comparison of the general

2. Mining Frequent Patterns, Associations:

 Knowledge Discovery in a database is the process of discovering useful

 Data Cleaning- Used to remove noise or inconsistent data.

 Data Mining- an essential process where intelligent

 A data objects represents an entity – in a sales database, the

5. Discrete versus Continuous Attributes:

 The Basic statistical descriptions can be used to identify

2. Dispersion of the data

3. Graphical Data Presentation

Note: For more details go through kambler book unit 2.2.

In data mining, the algorithm used is complex and data

a) Efficiency and scalability of data mining algorithms - In order to

a) Handling of relational and complex types of data - The

 Data analysis and decision support

You might also like