0% found this document useful (0 votes)

8 views23 pages

Over View of Data Mining

Data mining is the process of extracting insights from large datasets using various techniques to uncover hidden patterns and support decision-making across industries like marketing and healthcare. It has advantages such as smarter decisions and fraud detection, but also poses challenges including privacy concerns and the need for skilled professionals. The evolution of data mining has transformed it from manual coding to sophisticated algorithms integrated with big data and machine learning technologies.

Uploaded by

sivanisri1319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views23 pages

Over View of Data Mining

Uploaded by

sivanisri1319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Over view of data mining

Introduction to data mining:

Today, data is being generated at a rapid pace. Every time we click, make
a purchase or interact online we create valuable information which
businesses are using to make smarter decisions, understand customer
behavior and stay competitive in the market and this process is called
data mining.

1. Define data mining

Data mining is the process of extracting insights from large
datasets using statistical and computational techniques. It can
involve structured, semi-structured or unstructured data stored in
databases, data warehouses or data lakes. The goal is to uncover
hidden patterns and relationships to support informed decision-
making and predictions using methods like clustering, classification,
regression and anomaly detection.
Data mining is widely used in industries such as marketing, finance,
healthcare and telecommunications. For example, it helps identify
customer segments in marketing or detect disease risk factors in
healthcare. However, it also raises ethical concerns particularly regarding
privacy and the misuse of personal data, requiring careful safeguards.

1.2.list type of data mining:

Data mining is the process of discovering useful patterns and knowledge from large data sets. Based
on the kind of data or purpose, data mining is divided into different types:
 Classification. Classification is a technique used to categorize data into predefined classes or
categories based on the features or attributes of the data instances. ...
 Regression. ...
 Clustering. ...
 Association Rule. ...
 Anomaly Detection. ...
 Time Series Analysis. ...
 Neural Networks. ...
 Decision Trees.

1.3;List the advantages of data mining:

1. Discover Hidden Patterns

You can find relationships and trends in huge piles of data—like knowing that people who
buy bread often buy butter too

2. Smarter Decisions

Instead of guessing, decisions are based on real facts and past data, leading to better
outcomes .

3. Predict the Future

Data mining can help predict future events—like what products customers will buy or when
machines might break down .

4. Better Customer Experiences

By analyzing shopping habits, businesses can send personalized offers or recommendations,

making customers feel special

5. Detect Fraud & Reduce Risk

It can spot unusual behavior (like fake bank transactions) early, helping prevent fraud and
reduce risks

6. Save Money & Improve Efficiency

By finding which processes are slow or wasteful, companies can fix them—saving time and
money

7. Handle Massive Data (Big Data)

With big data tools, you can analyze data that’s too large or complex for normal methods—
giving even deeper insights

8. Real-Time Insights with Big Data

You can process live data to get insights instantly—useful for things like detecting fraud as it
happens or adjusting prices in real time .

1.4.List the Disadvantages of data mining

Here are the main disadvantages of data mining, explained in simple terms:

1. Privacy Worries �

Data mining often involves collecting personal information. This can lead to privacy breaches
if the data is misused or shared illegally
2. Security Risks

Storing large amounts of data attracts hackers. If systems aren’t secure, sensitive info (like
financial or personal records) can be stolen .

3. High Costs

It requires expensive software, powerful computers, and skilled experts. Small businesses
might find the investment too costly

4. Need Skilled People

Using data-mining tools well needs special training. Without that expertise, it’s easy to make
mistakes or misinterpret results

5. Poor Data Means Poor Results

If the data is incomplete, inconsisHandling huge, varied datasets is tricky. The tools and processes
get more complex as data growtent, or wrong, the insights will also be misleading

6. Overfitting / False Discoveries

Sometimes the system finds patterns that are just coincidences—these don’t hold true in real
situations .

7. Complex & Hard to Scale

Handling huge, varied datasets is tricky. The tools and processes get more complex as data grows.

8. Ethical & Bias Problems

Data may reflect stereotypes or unfair trends. If unchecked, models can reinforce
discrimination

1.5.Aplications of Data Mining

Data mining finds applications across numerous fields, such as:
 Business: Analyzing customer data to identify trends and patterns that
inform marketing strategies and enhance sales.
 Healthcare: Identifying patterns in patient data to inform treatment
decisions and improve patient outcomes.
 Multimedia: Extracting insights from unstructured data like text and images
using natural language processing and computer vision.
1.6.list challenges of implementation in data mining:
Challenges of Data Mining Implementation:

1. Data Quality Issues

o Incomplete, noisy, or inconsistent data can reduce the accuracy of results.
2. Large Volume of Data
o Handling and processing huge amounts of data is time-consuming and needs
powerful systems.
3. Data Privacy and Security
o Protecting sensitive data during mining is a major concern.
4. Integration from Multiple Sources
o Combining data from different formats and sources is difficult.
5. High Cost of Implementation
o Advanced tools, storage, and skilled professionals are expensive.
6. Complexity of Algorithms
o Many mining algorithms are difficult to understand and implement.
7. Changing Data
o Data keeps updating, so the results can become outdated quickly.
8. Interpretation of Results
o Understanding the output and converting it into useful decisions is not easy.
9. Scalability
o Algorithms must work efficiently as data size grows.
10. Legal and Ethical Issues

 Using customer or user data can raise legal and ethical concerns.

1.7.Evolution of data mining

Data mining has evolved significantly since its early beginnings in the
1960s, driven by advancements in computing power, storage, and
processing capabilities. Initially, it was a manual, coding-intensive process,
but it has grown into a sophisticated field utilizing powerful algorithms and
techniques to extract valuable insights from vast datasets.
Here's a more detailed look at the evolution:

1.Early Stages (1960s-1980s):

2.Formlization and Expansion (1990s):

3.Modern Era (2000s - Present):

1.Early Stages (1960s-1980s):
 Roots in AI:
Data mining emerged from the field of artificial intelligence, initially referred to as
"knowledge discovery in databases" (KDD).
 Manual Coding:
Early data mining involved extensive manual coding and specialized expertise for
data preparation, analysis, and interpretation.
 Emergence of Basic Techniques:
Techniques like clustering, classification, and decision trees were developed.
2.Formlization and Expansion (1990s);

.Increased Popularity:
The 1990s saw a surge in popularity, with the establishment of dedicated
conferences and the widespread adoption of data mining in commercial settings.
 KDD Focus:
The term "Knowledge Discovery in Databases" (KDD) became prominent,
emphasizing the process of extracting useful patterns from data.
 Advancements in Algorithms:
Algorithms like association rule mining (e.g., Apriori) and support vector machines
were developed and refined.
 Rise of Data Warehousing:
Data warehouses became common for storing large volumes of data for analysis.
 Impact of Loyalty Cards:
Customer loyalty programs generated massive datasets that fueled the growth of
data mining in retail.
3.Modern Era (2000s - Present):
 Big Data and Cloud Computing:
The rise of big data technologies like Hadoop and Spark, along with cloud
computing platforms (AWS, Azure, GCP), enabled the analysis of massive,
unstructured datasets.
 Integration with Machine Learning:
Data mining techniques are now deeply integrated with machine learning, including
deep learning, NLP, and reinforcement learning.
 Real-time Mining:
Scalable infrastructure and advancements in processing power have facilitated
real-time data mining and analysis.
 Broader Applications:
Data mining is now applied across diverse industries, including finance, healthcare,
marketing, research, and more.

 Focus on Non-Standard Data:

The field is increasingly addressing the challenges of mining non-tabular data, such
as text, images, and videos.
 Evolutionary Algorithms:
Techniques like evolutionary algorithms are used to emulate natural evolution in
data mining processes, optimizing rules and models.
 Continuous Evaluation:
Data mining evaluation remains crucial for assessing the effectiveness and
efficiency of different methods, models, and algorithms.
1.8.List and explain the Data mining
techniques:
Data Mining is the process of discovering useful patterns and insights
from large amounts of data. Data science, information technology, and
artisanal practices put together to reassemble the collected information
into something valuable.

Here some steps in data mining include:
1. Association
Association analysis looks for patterns where certain items or conditions
tend to appear together in a dataset. It's commonly used in market basket
analysis to see which products are often bought together. One method,
called associative classification, generates rules from the data and uses
them to build a model for predictions.
2. Classification
Classification builds models to sort data into different categories. The
model is trained on data with known labels and is then used to predict
labels for unknown data.
3. Prediction
Prediction is similar to classification, but instead of predicting categories, it
predicts continuous values (like numbers). The goal is to build a model
that can estimate the value of a specific attribute for new data.
4. Clustering
Clustering groups similar data points together without using predefined
categories. It helps discover hidden patterns in the data by organizing
objects into clusters where items in each cluster are more similar to each
other than to those in other clusters.
5. Regression
Regression is used to predict continuous values, like prices or
temperatures, based on past data. There are two main types: linear
regression, which looks for a straight-line relationship, and multiple linear
regression, which uses more variables to make predictions.
6. Artificial Neural Network (ANN) Classifier
An artificial neural network (ANN) is a model inspired by how the human
brain works. It learns from data by adjusting connections between artificial
neurons. Neural networks are great for recognizing complex patterns but
require a lot of training and can be hard to interpret.
7. Outlier Detection
Outlier detection identifies data points that are very different from the rest
of the data. These unusual points, called outliers, can be spotted using
statistical methods or by checking if they are far away from other data
points.
8. Genetic Algorithm
Genetic algorithms are inspired by natural selection. They solve problems
by evolving solutions over several generations. Each solution is like a
"species," and the fittest solutions are kept and improved over time,
simulating "survival of the fittest" to find the best solution to a problem.
1.9.Expalin the data mining implementation process
1. Business understanding:
It focuses on understanding the project goals and requirements form a business
point of view, then converting this information into a data mining problem afterward
a preliminary plan designed to accomplish the target.
Tasks:

 Determine business objectives

 Access situation
 Determine data mining goals
 Produce a project plan
2. Data Understanding:
Data understanding starts with an original data collection and proceeds with
operations to get familiar with the data, to data quality issues, to find better insight
in data, or to detect interesting subsets for concealed information hypothesis.

Tasks:

 Collects initial data

 Describe data
 Explore data
 Verify data quality
3.Data Preparation
It takes more time. Covers all operations to build the final data set from the original
raw information. Several times, data preparation is probable to be done

Tasks:

 Select data
 Clean data
 Construct data
 Integrate data
 Format data
4.Modeling:
In modeling, various modeling methods are selected and applied, and their
parameters are measured to optimum values. Some methods gave particular
requirements on the form of data. Therefore, stepping back to the data preparation
phase is necessary.

Tasks:

 Select modeling technique

 Generate test design
 Build model
 Access model
5.Evaluation:
At the last of this phase, a decision on the use of the data mining results should be
reached. It evaluates the model efficiently, and review the steps executed to build the
model and to ensure that the business objectives are properly achieved. The main
objective of the evaluation is to determine some significant business issue that has
not been regarded adequately. At the last of this phase, a decision on the use of the
data mining outcomes should be reached.

Tasks:

 Evaluate results
 Review process
 Determine next steps
6.Deployment
The concept of deployment in data mining refers to the appliance of a model for
prediction using a new data. The deployment phase are often as simple as
generating a report or as complex as implementing a repeatable data mining process.

Tasks

 Plan deployment
 Plan monitoring and maintenance
 Produce final report
 Review project

1.10.Explain Data Mining Architecture:

The architecture of Data Mining:
Basic Working:
1. It all starts when the user puts up certain data mining requests, these
requests are then sent to data mining engines for pattern evaluation.
2. These applications try to find the solution to the query using the
already present database.
3. The metadata then extracted is sent for proper analysis to the data
mining engine which sometimes interacts with pattern evaluation
modules to determine the result.
4. This result is then sent to the front end in an easily understandable
manner using a suitable interface.
A detailed description of parts of data mining architecture is shown:
1. Data Sources: Database, World Wide Web(WWW), and data
warehouse are parts of data sources. The data in these sources may
be in the form of plain text, spreadsheets, or other forms of media like
photos or videos. WWW is one of the biggest sources of data.
2. Database Server: The database server contains the actual data ready
to be processed. It performs the task of handling data retrieval as per
the request of the user.
3. Data Mining Engine: It is one of the core components of the data
mining architecture that performs all kinds of data mining techniques
like association, classification, characterization, clustering, prediction,
etc.
4. Pattern Evaluation Modules: They are responsible for finding
interesting patterns in the data and sometimes they also interact with
the database servers for producing the result of the user requests.
5. Graphic User Interface: Since the user cannot fully understand the
complexity of the data mining process so graphical user interface helps
the user to communicate effectively with the data mining system.
6. Knowledge Base: Knowledge Base is an important part of the data
mining engine that is quite beneficial in guiding the search for the result
patterns. Data mining engines may also sometimes get inputs from the
knowledge base. This knowledge base may contain data from user
experiences. The objective of the knowledge base is to make the result
more accurate and reliable.
Types of Data Mining architecture:
1. No Coupling: The no coupling data mining architecture retrieves data
from particular data sources. It does not use the database for retrieving
the data which is otherwise quite an efficient and accurate way to do
the same. The no coupling architecture for data mining is poor and only
used for performing very simple data mining processes.
2. Loose Coupling: In loose coupling architecture data mining
systemThis mining is for memory-based data mining architecture.
3. Semi-Tight Coupling: It tends to use various advantageous features
of the data warehouse systems. It includes sorting, indexing, and
aggregation. In this architecture, an intermediate result can be stored
in the database for better performance.
4. Tight coupling: In this architecture, a data warehouse is considered
one of its most important components whose features are employed for
performing data mining tasks. This architecture provides scalability,
performance, and integrated information

1.11.Explain KDD-knowledge Discovery in Data bases of data mining:

Knowledge Discovery in Databases (KDD) refers to the complete process
of uncovering valuable knowledge from large datasets. It starts with the
selection of relevant data, followed by preprocessing to clean and
organize it, transformation to prepare it for analysis, data mining to
uncover patterns and relationships, and concludes with the evaluation and
interpretation of results, ultimately producing valuable knowledge or
insights. KDD is widely utilized in fields like machine learning, pattern
recognition, statistics, artificial intelligence, and data visualization.
The KDD process is iterative, involving repeated refinements to ensure
the accuracy and reliability of the knowledge extracted. The whole
process consists of the following steps:
1. Data Selection
2. Data Cleaning and Preprocessing
3. Data Transformation and Reduction
4. Data Mining
5. Evaluation and Interpretation of Results

1.Data Selection
Data Selection is the initial step in the Knowledge Discovery in Databases
(KDD) process, where relevant data is identified and chosen for analysis.
It involves selecting a dataset or focusing on specific variables, samples,
or subsets of data that will be used to extract meaningful insights.
 It ensures that only the most relevant data is used for analysis,
improving efficiency and accuracy.
 It involves selecting the entire dataset or narrowing it down to particular
features or subsets based on the task’s goals.
 Data is selected after thoroughly understanding the application domain.
By carefully selecting data, we ensure that the KDD process delivers
accurate, relevant, and actionable insights.
2.Data Cleaning
In the KDD process, Data Cleaning is essential for ensuring that the
dataset is accurate and reliable by correcting errors, handling missing
values, removing duplicates, and addressing noisy or outlier data.
 Missing Values: Gaps in data are filled with the mean or most
probable value to maintain dataset completeness.
 Noisy Data: Noise is reduced using techniques like binning, regression,
or clustering to smooth or group the data.
 Removing Duplicates: Duplicate records are removed to maintain
consistency and avoid errors in analysis.
Data cleaning is crucial in KDD to enhance the quality of the data and
improve the effectiveness of data mining.
3.Data Transformation and Reduction
Data Transformation in KDD involves converting data into a format that is
more suitable for analysis.
 Normalization: Scaling data to a common range for consistency
across variables.
 Discretization: Converting continuous data into discrete categories for
simpler analysis.
 Data Aggregation: Summarizing multiple data points (e.g., averages
or totals) to simplify analysis.
 Concept Hierarchy Generation: Organizing data into hierarchies for a
clearer, higher-level view.
Data Reduction helps simplify the dataset while preserving key
information.
 Dimensionality Reduction (e.g., PCA): Reducing the number of
variables while keeping essential data.
 Numerosity Reduction: Reducing data points using methods like
sampling to maintain critical patterns.
 Data Compression: Compacting data for easier storage and
processing.
Together, these techniques ensure that the data is ready for deeper
analysis and mining.
4.Data Mining
Data Mining is the process of discovering valuable, previously unknown
patterns from large datasets through automatic or semi-automatic means.
It involves exploring vast amounts of data to extract useful information that
can drive decision-making.
Key characteristics of data mining patterns include:
 Validity: Patterns that hold true even with new data.
 Novelty: Insights that are non-obvious and surprising.
 Usefulness: Information that can be acted upon for practical outcomes.
 Understandability: Patterns that are interpretable and meaningful to
humans.
In the KDD process, choosing the data mining task is critical. Depending
on the objective, the task could involve classification, regression,
clustering, or association rule mining. After determining the task, selecting
the appropriate data mining algorithms is essential. These algorithms are
chosen based on their ability to efficiently and accurately identify patterns
that align with the goals of the analysis.
5.Evaluation and Interpretation of Results
Evaluation in KDD involves assessing the patterns identified during data
mining to determine their relevance and usefulness. It includes calculating
the "interestingness score" for each pattern, which helps to identify
valuable insights. Visualization and summarization techniques are then
applied to make the data more understandable and accessible for the
user.
Interpretation of Results focuses on presenting these insights in a way that is meaningful and
actionable. By effectively communicating the findings, decision-makers can use the results to drive
informed actions and strategies

1.12.List and explain the data mining tools:

Data mining tools are software applications that automate the process of
extracting valuable insights, patterns, and relationships from large datasets.

Here is a list of popular data mining tools along with simple explanations useful for exam
purposes, especially for diploma or undergraduate level:

� 1. RapidMiner

 Type: Open-source (also has a commercial version)

 Use: Data preparation, machine learning, deep learning, text mining.
 Why it’s used: It provides a drag-and-drop interface, so you don't need to write
code.
 Best for: Beginners and researchers who want fast results.

� 2. WEKA (Waikato Environment for Knowledge Analysis)

 Type: Open-source
 Use: Data analysis, data preprocessing, classification, clustering.
 Why it’s used: It has GUI-based tools and is good for educational purposes.
 Best for: Students and researchers.
� 3. KNIME (Konstanz Information Miner)

 Type: Open-source
 Use: Visual data analytics, ETL (Extract, Transform, Load), machine learning.
 Why it’s used: It allows visual workflows for analyzing data.
 Best for: Business analysts and data scientists.

� 4. Orange

 Type: Open-source
 Use: Data visualization, machine learning, interactive data analysis.
 Why it’s used: It provides widgets and has easy drag-and-drop features.
 Best for: Beginners in machine learning and teaching environments.

� 5. SAS (Statistical Analysis System)

 Type: Commercial
 Use: Advanced analytics, data management, and business intelligence.
 Why it’s used: Strong in predictive modeling and enterprise-level data analysis.
 Best for: Enterprises and large-scale organizations.

� 6. R (with RStudio)

 Type: Open-source
 Use: Statistical computing, graphics, data analysis.
 Why it’s used: It is very powerful for statistical modeling and data mining.
 Best for: Statisticians and researchers.

� 7. Python (with libraries like Pandas, Scikit-learn, NumPy)

 Type: Open-source programming language

 Use: General-purpose + powerful for machine learning and data mining.
 Why it’s used: It's flexible and used in real-world projects.
 Best for: Developers and data scientists.

� 8. Tableau

 Type: Commercial
 Use: Data visualization and reporting.
 Why it’s used: It makes it easy to visualize patterns and trends.
 Best for: Business intelligence professionals.

� 9. Excel (with Data Analysis ToolPak)

 Type: Commercial
 Use: Simple data mining tasks like classification, regression, basic statistics.
 Why it’s used: Easy to use and available in most workplaces.
 Best for: Beginners and small businesses.

1.13.List the major difference between Data mining and machine

learning:
1.14.state the importance of data analytics:
Data Analytics is the process of collecting, organizing and studying data to
find useful information understand what’s happening and make better
decisions. In simple words it helps people and businesses learn from data
like what worked in the past, what is happening now and what might
happen in the future.
Data Analytics is very important in today’s world because it helps people and businesses make better
decisions using data. Here are some key points:

1. Better Decision Making

2. Finding Trends and Patterns
3. Saving Time and Money
4. Improving Products and Services
5. Risk management
6. Better Customer Experience
7. Competitive Advantage

1.15.List and explain the phases of data analystics:

The phases of data analytics can be summarized as: Discovery, Data
Preparation, Model Planning, Model Building, Communicating Results, and
Operationalization.

Here's a more detailed breakdown:

1. 1. Discovery:
This initial phase focuses on understanding the business problem, defining
objectives, and identifying relevant data sources. It involves scoping the data
analytics project and collaborating with stakeholders to clarify requirements.
2. 2. Data Preparation:
Once the problem and data sources are identified, the data needs to be prepared
for analysis. This includes collecting, cleaning, and transforming the data to ensure
its quality and suitability for modeling.
3. 3. Model Planning:
In this phase, the data scientist or analyst designs the data model. This involves
choosing the appropriate analytical techniques and algorithms based on the
problem and the nature of the data.
4. 4. Model Building:
This phase involves building and executing the model using the chosen
techniques. It includes training the model on the prepared data, evaluating its
performance, and making adjustments as needed.
5. 5. Communicating Results:
The findings from the model are then communicated to stakeholders. This often
involves creating visualizations and reports to present the insights in a clear and
understandable manner.
6. 6. Operationalization:
Finally, the results are put into action. This phase involves deploying the insights
into business processes, monitoring their effectiveness, and making further
adjustments as needed

1.16.differentiate between data mining and data analystics:

Below is a table of differences between Data Mining and Data Analystics :

Based on Data Mining Data Analystics

It is the process of
It is the process of analysing and organizing
extracting important raw data in order to
Definition
pattern from large determine useful
datasets. informations and
decisions

In this all operations are

It is used in discovering
involved in examining
Function hidden patterns in raw
data sets to fine
data sets .
conclusions.

Dataset can be large,

In this data set are
medium or small, Also
Data set generally large and
structured, semi
structured.
structured, unstructured.

Often require
Analytical and business
Models mathematical and
intelligence models
statistical models

Visualization It generally does not Surely requires Data

Based on Data Mining Data Analystics

require visualization visualization.

Prime goal is to make It is used to make data

Goal
data usable. driven decisions.

It requires the knowledge

It involves the
of computer science,
intersection of machine
Required Knowledge statistics, mathematics,
learning, statistics, and
subject knowledge
databases.
Al/Machine Learning.

Data analysis can be

divided into descriptive
It is also known as
statistics, exploratory
Also known as Knowledge discovery in
data analysis,
databases.
and confirmatory data
analysis.

It shows the data tends The output is verified or

Output
and patterns. discarded hypothesis

1.18.Explain text data mining:

Text Data Mining (also called Text Mining or Text Analytics) is the process of extracting
useful and meaningful information from unstructured text data. It is like finding
patterns, facts, or knowledge from large amounts of written content, such as emails,
social media posts, reports, articles, etc.

✅ Key Steps in Text Data Mining:

1. Text Preprocessing
o Cleaning the text (removing punctuation, special characters)
o Removing stop words (like "is", "the", "and")
o Converting text to lowercase
o Tokenization (splitting text into words)
2. Text Representation
o Changing text into a numerical format for analysis (e.g., Bag of Words, TF-
IDF, or word embeddings)
3. Pattern Discovery or Analysis
o Finding useful patterns using:
 Classification (e.g., spam or not spam)
 Clustering (grouping similar documents)
 Sentiment analysis (positive or negative opinion)
 Topic modeling (finding topics in large documents)
4. Interpretation and Evaluation
o Interpreting the mined patterns to make decisions or gain insights

Text Mining Process

Conventional Process of Text Mining

 Gathering unstructured information from various sources accessible in

various document organizations, for example, plain text, web pages,
PDF records, etc.
 Pre-processing and data cleansing tasks are performed to distinguish
and eliminate inconsistency in the data. The data cleansing process
makes sure to capture the genuine text, and it is performed to eliminate
stop words stemming (the process of identifying the root of a certain
word and indexing the data.
 Processing and controlling tasks are applied to review and further
clean the data set.
 Pattern analysis is implemented in Management Information System.
 Information processed in the above steps is utilized to extract important
and applicable data for a powerful and convenient decision-making
process and trend analysis

� Common Applications:

 Email spam detection

 Sentiment analysis of customer reviews
 Chatbot conversations
 Legal or medical document analysis
 News article classification

1.19.Differentiate between classification and clustering in data

mining
Parameter CLASSIFICATION CLUSTERING

used for unsupervised

Type used for supervised learning
learning

process of classifying the input grouping the instances based

Basic instances based on their on their similarity without the
corresponding class labels help of class labels

it has labels so there is need of

there is no need of training
Need training and testing dataset for
and testing dataset
verifying the model created

more complex as compared to less complex as compared to

Complexity
clustering classification

k-means clustering algorithm,

Logistic regression, Naive
Example Fuzzy c-means clustering
Bayes classifier, Support
Algorithms algorithm, Gaussian (EM)
vector machines, etc.
clustering algorithm, etc.

Final Report Online Fruits and Vegetables Store 5TH Sem
No ratings yet
Final Report Online Fruits and Vegetables Store 5TH Sem
30 pages
Canteen Project Report
No ratings yet
Canteen Project Report
119 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Health Insurance Project
No ratings yet
Health Insurance Project
8 pages
Data Analytics and Visualization - Module 1 and 2 Ver 2.0
No ratings yet
Data Analytics and Visualization - Module 1 and 2 Ver 2.0
152 pages
Lec.01 Introduction To DM
No ratings yet
Lec.01 Introduction To DM
56 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Data Mining
No ratings yet
Data Mining
395 pages
UNIT 5 Introduction To Data Mining-1
No ratings yet
UNIT 5 Introduction To Data Mining-1
185 pages
Data Mining Data Mining: Knowledge Discovery in Data (KDD)
No ratings yet
Data Mining Data Mining: Knowledge Discovery in Data (KDD)
26 pages
Fundamental of Data Mining (CSI-508) .
No ratings yet
Fundamental of Data Mining (CSI-508) .
19 pages
Likitha
No ratings yet
Likitha
6 pages
Likhitha
No ratings yet
Likhitha
6 pages
Big Data & Cloud Computing CME Unit 1
No ratings yet
Big Data & Cloud Computing CME Unit 1
23 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
Datamining Topic 2
No ratings yet
Datamining Topic 2
13 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Revision 3
No ratings yet
Revision 3
28 pages
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
No ratings yet
A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in
14 pages
Oracle Forms PDF
No ratings yet
Oracle Forms PDF
121 pages
DataMining and Warehousing - Chapter1
No ratings yet
DataMining and Warehousing - Chapter1
23 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
71 pages
Dmi Unit 1 - 186 - N3
No ratings yet
Dmi Unit 1 - 186 - N3
12 pages
01 Intro
No ratings yet
01 Intro
41 pages
1 - DM
No ratings yet
1 - DM
5 pages
FDS (Answers)
No ratings yet
FDS (Answers)
15 pages
Data-Mining by Harshit Khattar
No ratings yet
Data-Mining by Harshit Khattar
11 pages
VO - MCA - S4 - Data Mining Unit 1
No ratings yet
VO - MCA - S4 - Data Mining Unit 1
18 pages
Introduction To Data Mining and Its Importance
No ratings yet
Introduction To Data Mining and Its Importance
16 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Unit 3
No ratings yet
Unit 3
22 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Data Mining
No ratings yet
Data Mining
18 pages
MUAZ
No ratings yet
MUAZ
21 pages
DM Mod1
No ratings yet
DM Mod1
29 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
DBMS Theory Assignment 1 (B)
0% (1)
DBMS Theory Assignment 1 (B)
1 page
Data Mining OVERVIEW
No ratings yet
Data Mining OVERVIEW
8 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
DWDM Unit II
No ratings yet
DWDM Unit II
18 pages
Data Mining
No ratings yet
Data Mining
8 pages
Data Mining
No ratings yet
Data Mining
88 pages
Introduction
No ratings yet
Introduction
27 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
DM
No ratings yet
DM
15 pages
Unit 1
No ratings yet
Unit 1
7 pages
Comprehensive Guide to Data Mining
No ratings yet
Comprehensive Guide to Data Mining
32 pages
Data Mining
No ratings yet
Data Mining
4 pages
Module 1 Introduction To Data Mining
No ratings yet
Module 1 Introduction To Data Mining
4 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Data Mining
No ratings yet
Data Mining
6 pages
Data Analytics & Mining Guide
No ratings yet
Data Analytics & Mining Guide
3 pages
Data Mining
No ratings yet
Data Mining
9 pages
DM Module1
No ratings yet
DM Module1
15 pages
Discovering Insights With Data Mining
No ratings yet
Discovering Insights With Data Mining
8 pages
Software Requirement Specification (SRS) : Online Photo Gallery
50% (2)
Software Requirement Specification (SRS) : Online Photo Gallery
5 pages
Arpita Paul DWDM2024
No ratings yet
Arpita Paul DWDM2024
10 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
16 pages
Life Insurance Management System (LIMS)
No ratings yet
Life Insurance Management System (LIMS)
9 pages
YCCC Website Redesign RFP 2020
No ratings yet
YCCC Website Redesign RFP 2020
8 pages
The Role of Automated Information Systems in The Tourism Industry
No ratings yet
The Role of Automated Information Systems in The Tourism Industry
6 pages
Project Caretaker Management
100% (2)
Project Caretaker Management
68 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
5 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
BOOK Harvey Greenfield System Design Interviews Raspoznan
No ratings yet
BOOK Harvey Greenfield System Design Interviews Raspoznan
16 pages
Background of The Study: Manual System in Generating Reports of Inventory and Check-Up
No ratings yet
Background of The Study: Manual System in Generating Reports of Inventory and Check-Up
5 pages
Data Mining: The Basic Concept
No ratings yet
Data Mining: The Basic Concept
23 pages
Rdbms Chapter 1 Database and Database Users: Describes The Structure of The Data
No ratings yet
Rdbms Chapter 1 Database and Database Users: Describes The Structure of The Data
12 pages
RFID Arduino Coffee Machine
No ratings yet
RFID Arduino Coffee Machine
6 pages
AI Exam Answer Key for Class X
No ratings yet
AI Exam Answer Key for Class X
9 pages
Azure Data Lake Analytics Guide
No ratings yet
Azure Data Lake Analytics Guide
17 pages
5th & 6th Sem BBA (Aviation MGMT) Syllabus
No ratings yet
5th & 6th Sem BBA (Aviation MGMT) Syllabus
35 pages
Wa0005.
No ratings yet
Wa0005.
145 pages
Experiment No 2
No ratings yet
Experiment No 2
2 pages
DBMS Advanced Test Key
0% (1)
DBMS Advanced Test Key
30 pages
Information Systems Chapter 2
No ratings yet
Information Systems Chapter 2
46 pages
Free Self-Taught Computer Science Path
No ratings yet
Free Self-Taught Computer Science Path
16 pages
Cisco Smart Software Manager Satellite: Q A Q A Q A
No ratings yet
Cisco Smart Software Manager Satellite: Q A Q A Q A
3 pages
Digitalization With TIA Portal: Integration of Planning Data From EPLAN Electric P8 To TIA Portal
No ratings yet
Digitalization With TIA Portal: Integration of Planning Data From EPLAN Electric P8 To TIA Portal
20 pages
Questioning Patterns A.K.A. Q Patterns: Vipul Kocher Gaurav Khera River Run Software Group
No ratings yet
Questioning Patterns A.K.A. Q Patterns: Vipul Kocher Gaurav Khera River Run Software Group
19 pages
Module 5 Database Design
No ratings yet
Module 5 Database Design
18 pages
Moving ASM Database Files From One Diskgroup To Another
No ratings yet
Moving ASM Database Files From One Diskgroup To Another
4 pages
Host Jira on AWS ECS: Step-by-Step Guide
No ratings yet
Host Jira on AWS ECS: Step-by-Step Guide
4 pages
Database Concepts and SQL Basics
No ratings yet
Database Concepts and SQL Basics
6 pages

Over View of Data Mining

Uploaded by

Over View of Data Mining

Uploaded by

Over view of data mining

Introduction to data mining:

1. Define data mining

1.2.list type of data mining:

1.3;List the advantages of data mining:

3. Predict the Future

4. Better Customer Experiences

By analyzing shopping habits, businesses can send personalized offers or recommendations,

5. Detect Fraud & Reduce Risk

6. Save Money & Improve Efficiency

7. Handle Massive Data (Big Data)

8. Real-Time Insights with Big Data

1.4.List the Disadvantages of data mining

4. Need Skilled People

5. Poor Data Means Poor Results

6. Overfitting / False Discoveries

7. Complex & Hard to Scale

8. Ethical & Bias Problems

1.5.Aplications of Data Mining

1. Data Quality Issues

1.7.Evolution of data mining

1.Early Stages (1960s-1980s):

2.Formlization and Expansion (1990s):

3.Modern Era (2000s - Present):

 Focus on Non-Standard Data:

 Determine business objectives

 Collects initial data

 Select modeling technique

1.10.Explain Data Mining Architecture:

1.11.Explain KDD-knowledge Discovery in Data bases of data mining:

1.12.List and explain the data mining tools:

 Type: Open-source (also has a commercial version)

� 2. WEKA (Waikato Environment for Knowledge Analysis)

� 5. SAS (Statistical Analysis System)

� 7. Python (with libraries like Pandas, Scikit-learn, NumPy)

 Type: Open-source programming language

� 9. Excel (with Data Analysis ToolPak)

1.13.List the major difference between Data mining and machine

1. Better Decision Making

1.15.List and explain the phases of data analystics:

Here's a more detailed breakdown:

1.16.differentiate between data mining and data analystics:

Below is a table of differences between Data Mining and Data Analystics :

In this all operations are

Dataset can be large,

Visualization It generally does not Surely requires Data

require visualization visualization.

Prime goal is to make It is used to make data

It requires the knowledge

Data analysis can be

It shows the data tends The output is verified or

1.18.Explain text data mining:

✅ Key Steps in Text Data Mining:

Text Mining Process

Conventional Process of Text Mining

 Gathering unstructured information from various sources accessible in

 Email spam detection

1.19.Differentiate between classification and clustering in data

used for unsupervised

process of classifying the input grouping the instances based

it has labels so there is need of

more complex as compared to less complex as compared to

k-means clustering algorithm,

You might also like