Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views23 pages

Over View of Data Mining

Data mining is the process of extracting insights from large datasets using various techniques to uncover hidden patterns and support decision-making across industries like marketing and healthcare. It has advantages such as smarter decisions and fraud detection, but also poses challenges including privacy concerns and the need for skilled professionals. The evolution of data mining has transformed it from manual coding to sophisticated algorithms integrated with big data and machine learning technologies.

Uploaded by

sivanisri1319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views23 pages

Over View of Data Mining

Data mining is the process of extracting insights from large datasets using various techniques to uncover hidden patterns and support decision-making across industries like marketing and healthcare. It has advantages such as smarter decisions and fraud detection, but also poses challenges including privacy concerns and the need for skilled professionals. The evolution of data mining has transformed it from manual coding to sophisticated algorithms integrated with big data and machine learning technologies.

Uploaded by

sivanisri1319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Over view of data mining

Introduction to data mining:

Today, data is being generated at a rapid pace. Every time we click, make
a purchase or interact online we create valuable information which
businesses are using to make smarter decisions, understand customer
behavior and stay competitive in the market and this process is called
data mining.

1. Define data mining


Data mining is the process of extracting insights from large
datasets using statistical and computational techniques. It can
involve structured, semi-structured or unstructured data stored in
databases, data warehouses or data lakes. The goal is to uncover
hidden patterns and relationships to support informed decision-
making and predictions using methods like clustering, classification,
regression and anomaly detection.
Data mining is widely used in industries such as marketing, finance,
healthcare and telecommunications. For example, it helps identify
customer segments in marketing or detect disease risk factors in
healthcare. However, it also raises ethical concerns particularly regarding
privacy and the misuse of personal data, requiring careful safeguards.

1.2.list type of data mining:


Data mining is the process of discovering useful patterns and knowledge from large data sets. Based
on the kind of data or purpose, data mining is divided into different types:
 Classification. Classification is a technique used to categorize data into predefined classes or
categories based on the features or attributes of the data instances. ...
 Regression. ...
 Clustering. ...
 Association Rule. ...
 Anomaly Detection. ...
 Time Series Analysis. ...
 Neural Networks. ...
 Decision Trees.

1.3;List the advantages of data mining:


1. Discover Hidden Patterns

You can find relationships and trends in huge piles of data—like knowing that people who
buy bread often buy butter too

2. Smarter Decisions

Instead of guessing, decisions are based on real facts and past data, leading to better
outcomes .

3. Predict the Future

Data mining can help predict future events—like what products customers will buy or when
machines might break down .

4. Better Customer Experiences

By analyzing shopping habits, businesses can send personalized offers or recommendations,


making customers feel special

5. Detect Fraud & Reduce Risk

It can spot unusual behavior (like fake bank transactions) early, helping prevent fraud and
reduce risks

6. Save Money & Improve Efficiency

By finding which processes are slow or wasteful, companies can fix them—saving time and
money

7. Handle Massive Data (Big Data)

With big data tools, you can analyze data that’s too large or complex for normal methods—
giving even deeper insights

8. Real-Time Insights with Big Data

You can process live data to get insights instantly—useful for things like detecting fraud as it
happens or adjusting prices in real time .

1.4.List the Disadvantages of data mining


Here are the main disadvantages of data mining, explained in simple terms:

1. Privacy Worries �

Data mining often involves collecting personal information. This can lead to privacy breaches
if the data is misused or shared illegally
2. Security Risks

Storing large amounts of data attracts hackers. If systems aren’t secure, sensitive info (like
financial or personal records) can be stolen .

3. High Costs

It requires expensive software, powerful computers, and skilled experts. Small businesses
might find the investment too costly

4. Need Skilled People

Using data-mining tools well needs special training. Without that expertise, it’s easy to make
mistakes or misinterpret results

5. Poor Data Means Poor Results

If the data is incomplete, inconsisHandling huge, varied datasets is tricky. The tools and processes
get more complex as data growtent, or wrong, the insights will also be misleading

6. Overfitting / False Discoveries

Sometimes the system finds patterns that are just coincidences—these don’t hold true in real
situations .

7. Complex & Hard to Scale

Handling huge, varied datasets is tricky. The tools and processes get more complex as data grows.

8. Ethical & Bias Problems

Data may reflect stereotypes or unfair trends. If unchecked, models can reinforce
discrimination

1.5.Aplications of Data Mining


Data mining finds applications across numerous fields, such as:
 Business: Analyzing customer data to identify trends and patterns that
inform marketing strategies and enhance sales.
 Healthcare: Identifying patterns in patient data to inform treatment
decisions and improve patient outcomes.
 Multimedia: Extracting insights from unstructured data like text and images
using natural language processing and computer vision.
1.6.list challenges of implementation in data mining:
Challenges of Data Mining Implementation:

1. Data Quality Issues


o Incomplete, noisy, or inconsistent data can reduce the accuracy of results.
2. Large Volume of Data
o Handling and processing huge amounts of data is time-consuming and needs
powerful systems.
3. Data Privacy and Security
o Protecting sensitive data during mining is a major concern.
4. Integration from Multiple Sources
o Combining data from different formats and sources is difficult.
5. High Cost of Implementation
o Advanced tools, storage, and skilled professionals are expensive.
6. Complexity of Algorithms
o Many mining algorithms are difficult to understand and implement.
7. Changing Data
o Data keeps updating, so the results can become outdated quickly.
8. Interpretation of Results
o Understanding the output and converting it into useful decisions is not easy.
9. Scalability
o Algorithms must work efficiently as data size grows.
10. Legal and Ethical Issues

 Using customer or user data can raise legal and ethical concerns.

1.7.Evolution of data mining

Data mining has evolved significantly since its early beginnings in the
1960s, driven by advancements in computing power, storage, and
processing capabilities. Initially, it was a manual, coding-intensive process,
but it has grown into a sophisticated field utilizing powerful algorithms and
techniques to extract valuable insights from vast datasets.
Here's a more detailed look at the evolution:

1.Early Stages (1960s-1980s):

2.Formlization and Expansion (1990s):

3.Modern Era (2000s - Present):


1.Early Stages (1960s-1980s):
 Roots in AI:
Data mining emerged from the field of artificial intelligence, initially referred to as
"knowledge discovery in databases" (KDD).
 Manual Coding:
Early data mining involved extensive manual coding and specialized expertise for
data preparation, analysis, and interpretation.
 Emergence of Basic Techniques:
Techniques like clustering, classification, and decision trees were developed.
2.Formlization and Expansion (1990s);

.Increased Popularity:
The 1990s saw a surge in popularity, with the establishment of dedicated
conferences and the widespread adoption of data mining in commercial settings.
 KDD Focus:
The term "Knowledge Discovery in Databases" (KDD) became prominent,
emphasizing the process of extracting useful patterns from data.
 Advancements in Algorithms:
Algorithms like association rule mining (e.g., Apriori) and support vector machines
were developed and refined.
 Rise of Data Warehousing:
Data warehouses became common for storing large volumes of data for analysis.
 Impact of Loyalty Cards:
Customer loyalty programs generated massive datasets that fueled the growth of
data mining in retail.
3.Modern Era (2000s - Present):
 Big Data and Cloud Computing:
The rise of big data technologies like Hadoop and Spark, along with cloud
computing platforms (AWS, Azure, GCP), enabled the analysis of massive,
unstructured datasets.
 Integration with Machine Learning:
Data mining techniques are now deeply integrated with machine learning, including
deep learning, NLP, and reinforcement learning.
 Real-time Mining:
Scalable infrastructure and advancements in processing power have facilitated
real-time data mining and analysis.
 Broader Applications:
Data mining is now applied across diverse industries, including finance, healthcare,
marketing, research, and more.

 Focus on Non-Standard Data:


The field is increasingly addressing the challenges of mining non-tabular data, such
as text, images, and videos.
 Evolutionary Algorithms:
Techniques like evolutionary algorithms are used to emulate natural evolution in
data mining processes, optimizing rules and models.
 Continuous Evaluation:
Data mining evaluation remains crucial for assessing the effectiveness and
efficiency of different methods, models, and algorithms.
1.8.List and explain the Data mining
techniques:
Data Mining is the process of discovering useful patterns and insights
from large amounts of data. Data science, information technology, and
artisanal practices put together to reassemble the collected information
into something valuable.

Here some steps in data mining include:
1. Association
Association analysis looks for patterns where certain items or conditions
tend to appear together in a dataset. It's commonly used in market basket
analysis to see which products are often bought together. One method,
called associative classification, generates rules from the data and uses
them to build a model for predictions.
2. Classification
Classification builds models to sort data into different categories. The
model is trained on data with known labels and is then used to predict
labels for unknown data.
3. Prediction
Prediction is similar to classification, but instead of predicting categories, it
predicts continuous values (like numbers). The goal is to build a model
that can estimate the value of a specific attribute for new data.
4. Clustering
Clustering groups similar data points together without using predefined
categories. It helps discover hidden patterns in the data by organizing
objects into clusters where items in each cluster are more similar to each
other than to those in other clusters.
5. Regression
Regression is used to predict continuous values, like prices or
temperatures, based on past data. There are two main types: linear
regression, which looks for a straight-line relationship, and multiple linear
regression, which uses more variables to make predictions.
6. Artificial Neural Network (ANN) Classifier
An artificial neural network (ANN) is a model inspired by how the human
brain works. It learns from data by adjusting connections between artificial
neurons. Neural networks are great for recognizing complex patterns but
require a lot of training and can be hard to interpret.
7. Outlier Detection
Outlier detection identifies data points that are very different from the rest
of the data. These unusual points, called outliers, can be spotted using
statistical methods or by checking if they are far away from other data
points.
8. Genetic Algorithm
Genetic algorithms are inspired by natural selection. They solve problems
by evolving solutions over several generations. Each solution is like a
"species," and the fittest solutions are kept and improved over time,
simulating "survival of the fittest" to find the best solution to a problem.
1.9.Expalin the data mining implementation process
1. Business understanding:
It focuses on understanding the project goals and requirements form a business
point of view, then converting this information into a data mining problem afterward
a preliminary plan designed to accomplish the target.
Tasks:

 Determine business objectives


 Access situation
 Determine data mining goals
 Produce a project plan
2. Data Understanding:
Data understanding starts with an original data collection and proceeds with
operations to get familiar with the data, to data quality issues, to find better insight
in data, or to detect interesting subsets for concealed information hypothesis.

Tasks:

 Collects initial data


 Describe data
 Explore data
 Verify data quality
3.Data Preparation
It takes more time. Covers all operations to build the final data set from the original
raw information. Several times, data preparation is probable to be done

Tasks:

 Select data
 Clean data
 Construct data
 Integrate data
 Format data
4.Modeling:
In modeling, various modeling methods are selected and applied, and their
parameters are measured to optimum values. Some methods gave particular
requirements on the form of data. Therefore, stepping back to the data preparation
phase is necessary.

Tasks:

 Select modeling technique


 Generate test design
 Build model
 Access model
5.Evaluation:
At the last of this phase, a decision on the use of the data mining results should be
reached. It evaluates the model efficiently, and review the steps executed to build the
model and to ensure that the business objectives are properly achieved. The main
objective of the evaluation is to determine some significant business issue that has
not been regarded adequately. At the last of this phase, a decision on the use of the
data mining outcomes should be reached.

Tasks:

 Evaluate results
 Review process
 Determine next steps
6.Deployment
The concept of deployment in data mining refers to the appliance of a model for
prediction using a new data. The deployment phase are often as simple as
generating a report or as complex as implementing a repeatable data mining process.

Tasks

 Plan deployment
 Plan monitoring and maintenance
 Produce final report
 Review project

1.10.Explain Data Mining Architecture:


The architecture of Data Mining:
Basic Working:
1. It all starts when the user puts up certain data mining requests, these
requests are then sent to data mining engines for pattern evaluation.
2. These applications try to find the solution to the query using the
already present database.
3. The metadata then extracted is sent for proper analysis to the data
mining engine which sometimes interacts with pattern evaluation
modules to determine the result.
4. This result is then sent to the front end in an easily understandable
manner using a suitable interface.
A detailed description of parts of data mining architecture is shown:
1. Data Sources: Database, World Wide Web(WWW), and data
warehouse are parts of data sources. The data in these sources may
be in the form of plain text, spreadsheets, or other forms of media like
photos or videos. WWW is one of the biggest sources of data.
2. Database Server: The database server contains the actual data ready
to be processed. It performs the task of handling data retrieval as per
the request of the user.
3. Data Mining Engine: It is one of the core components of the data
mining architecture that performs all kinds of data mining techniques
like association, classification, characterization, clustering, prediction,
etc.
4. Pattern Evaluation Modules: They are responsible for finding
interesting patterns in the data and sometimes they also interact with
the database servers for producing the result of the user requests.
5. Graphic User Interface: Since the user cannot fully understand the
complexity of the data mining process so graphical user interface helps
the user to communicate effectively with the data mining system.
6. Knowledge Base: Knowledge Base is an important part of the data
mining engine that is quite beneficial in guiding the search for the result
patterns. Data mining engines may also sometimes get inputs from the
knowledge base. This knowledge base may contain data from user
experiences. The objective of the knowledge base is to make the result
more accurate and reliable.
Types of Data Mining architecture:
1. No Coupling: The no coupling data mining architecture retrieves data
from particular data sources. It does not use the database for retrieving
the data which is otherwise quite an efficient and accurate way to do
the same. The no coupling architecture for data mining is poor and only
used for performing very simple data mining processes.
2. Loose Coupling: In loose coupling architecture data mining
systemThis mining is for memory-based data mining architecture.
3. Semi-Tight Coupling: It tends to use various advantageous features
of the data warehouse systems. It includes sorting, indexing, and
aggregation. In this architecture, an intermediate result can be stored
in the database for better performance.
4. Tight coupling: In this architecture, a data warehouse is considered
one of its most important components whose features are employed for
performing data mining tasks. This architecture provides scalability,
performance, and integrated information

1.11.Explain KDD-knowledge Discovery in Data bases of data mining:


Knowledge Discovery in Databases (KDD) refers to the complete process
of uncovering valuable knowledge from large datasets. It starts with the
selection of relevant data, followed by preprocessing to clean and
organize it, transformation to prepare it for analysis, data mining to
uncover patterns and relationships, and concludes with the evaluation and
interpretation of results, ultimately producing valuable knowledge or
insights. KDD is widely utilized in fields like machine learning, pattern
recognition, statistics, artificial intelligence, and data visualization.
The KDD process is iterative, involving repeated refinements to ensure
the accuracy and reliability of the knowledge extracted. The whole
process consists of the following steps:
1. Data Selection
2. Data Cleaning and Preprocessing
3. Data Transformation and Reduction
4. Data Mining
5. Evaluation and Interpretation of Results

1.Data Selection
Data Selection is the initial step in the Knowledge Discovery in Databases
(KDD) process, where relevant data is identified and chosen for analysis.
It involves selecting a dataset or focusing on specific variables, samples,
or subsets of data that will be used to extract meaningful insights.
 It ensures that only the most relevant data is used for analysis,
improving efficiency and accuracy.
 It involves selecting the entire dataset or narrowing it down to particular
features or subsets based on the task’s goals.
 Data is selected after thoroughly understanding the application domain.
By carefully selecting data, we ensure that the KDD process delivers
accurate, relevant, and actionable insights.
2.Data Cleaning
In the KDD process, Data Cleaning is essential for ensuring that the
dataset is accurate and reliable by correcting errors, handling missing
values, removing duplicates, and addressing noisy or outlier data.
 Missing Values: Gaps in data are filled with the mean or most
probable value to maintain dataset completeness.
 Noisy Data: Noise is reduced using techniques like binning, regression,
or clustering to smooth or group the data.
 Removing Duplicates: Duplicate records are removed to maintain
consistency and avoid errors in analysis.
Data cleaning is crucial in KDD to enhance the quality of the data and
improve the effectiveness of data mining.
3.Data Transformation and Reduction
Data Transformation in KDD involves converting data into a format that is
more suitable for analysis.
 Normalization: Scaling data to a common range for consistency
across variables.
 Discretization: Converting continuous data into discrete categories for
simpler analysis.
 Data Aggregation: Summarizing multiple data points (e.g., averages
or totals) to simplify analysis.
 Concept Hierarchy Generation: Organizing data into hierarchies for a
clearer, higher-level view.
Data Reduction helps simplify the dataset while preserving key
information.
 Dimensionality Reduction (e.g., PCA): Reducing the number of
variables while keeping essential data.
 Numerosity Reduction: Reducing data points using methods like
sampling to maintain critical patterns.
 Data Compression: Compacting data for easier storage and
processing.
Together, these techniques ensure that the data is ready for deeper
analysis and mining.
4.Data Mining
Data Mining is the process of discovering valuable, previously unknown
patterns from large datasets through automatic or semi-automatic means.
It involves exploring vast amounts of data to extract useful information that
can drive decision-making.
Key characteristics of data mining patterns include:
 Validity: Patterns that hold true even with new data.
 Novelty: Insights that are non-obvious and surprising.
 Usefulness: Information that can be acted upon for practical outcomes.
 Understandability: Patterns that are interpretable and meaningful to
humans.
In the KDD process, choosing the data mining task is critical. Depending
on the objective, the task could involve classification, regression,
clustering, or association rule mining. After determining the task, selecting
the appropriate data mining algorithms is essential. These algorithms are
chosen based on their ability to efficiently and accurately identify patterns
that align with the goals of the analysis.
5.Evaluation and Interpretation of Results
Evaluation in KDD involves assessing the patterns identified during data
mining to determine their relevance and usefulness. It includes calculating
the "interestingness score" for each pattern, which helps to identify
valuable insights. Visualization and summarization techniques are then
applied to make the data more understandable and accessible for the
user.
Interpretation of Results focuses on presenting these insights in a way that is meaningful and
actionable. By effectively communicating the findings, decision-makers can use the results to drive
informed actions and strategies

1.12.List and explain the data mining tools:

Data mining tools are software applications that automate the process of
extracting valuable insights, patterns, and relationships from large datasets.

Here is a list of popular data mining tools along with simple explanations useful for exam
purposes, especially for diploma or undergraduate level:

� 1. RapidMiner

 Type: Open-source (also has a commercial version)


 Use: Data preparation, machine learning, deep learning, text mining.
 Why it’s used: It provides a drag-and-drop interface, so you don't need to write
code.
 Best for: Beginners and researchers who want fast results.

� 2. WEKA (Waikato Environment for Knowledge Analysis)

 Type: Open-source
 Use: Data analysis, data preprocessing, classification, clustering.
 Why it’s used: It has GUI-based tools and is good for educational purposes.
 Best for: Students and researchers.
� 3. KNIME (Konstanz Information Miner)

 Type: Open-source
 Use: Visual data analytics, ETL (Extract, Transform, Load), machine learning.
 Why it’s used: It allows visual workflows for analyzing data.
 Best for: Business analysts and data scientists.

� 4. Orange

 Type: Open-source
 Use: Data visualization, machine learning, interactive data analysis.
 Why it’s used: It provides widgets and has easy drag-and-drop features.
 Best for: Beginners in machine learning and teaching environments.

� 5. SAS (Statistical Analysis System)

 Type: Commercial
 Use: Advanced analytics, data management, and business intelligence.
 Why it’s used: Strong in predictive modeling and enterprise-level data analysis.
 Best for: Enterprises and large-scale organizations.

� 6. R (with RStudio)

 Type: Open-source
 Use: Statistical computing, graphics, data analysis.
 Why it’s used: It is very powerful for statistical modeling and data mining.
 Best for: Statisticians and researchers.

� 7. Python (with libraries like Pandas, Scikit-learn, NumPy)

 Type: Open-source programming language


 Use: General-purpose + powerful for machine learning and data mining.
 Why it’s used: It's flexible and used in real-world projects.
 Best for: Developers and data scientists.

� 8. Tableau

 Type: Commercial
 Use: Data visualization and reporting.
 Why it’s used: It makes it easy to visualize patterns and trends.
 Best for: Business intelligence professionals.

� 9. Excel (with Data Analysis ToolPak)

 Type: Commercial
 Use: Simple data mining tasks like classification, regression, basic statistics.
 Why it’s used: Easy to use and available in most workplaces.
 Best for: Beginners and small businesses.

1.13.List the major difference between Data mining and machine


learning:
1.14.state the importance of data analytics:
Data Analytics is the process of collecting, organizing and studying data to
find useful information understand what’s happening and make better
decisions. In simple words it helps people and businesses learn from data
like what worked in the past, what is happening now and what might
happen in the future.
Data Analytics is very important in today’s world because it helps people and businesses make better
decisions using data. Here are some key points:

1. Better Decision Making


2. Finding Trends and Patterns
3. Saving Time and Money
4. Improving Products and Services
5. Risk management
6. Better Customer Experience
7. Competitive Advantage

1.15.List and explain the phases of data analystics:


The phases of data analytics can be summarized as: Discovery, Data
Preparation, Model Planning, Model Building, Communicating Results, and
Operationalization.

Here's a more detailed breakdown:


1. 1. Discovery:
This initial phase focuses on understanding the business problem, defining
objectives, and identifying relevant data sources. It involves scoping the data
analytics project and collaborating with stakeholders to clarify requirements.
2. 2. Data Preparation:
Once the problem and data sources are identified, the data needs to be prepared
for analysis. This includes collecting, cleaning, and transforming the data to ensure
its quality and suitability for modeling.
3. 3. Model Planning:
In this phase, the data scientist or analyst designs the data model. This involves
choosing the appropriate analytical techniques and algorithms based on the
problem and the nature of the data.
4. 4. Model Building:
This phase involves building and executing the model using the chosen
techniques. It includes training the model on the prepared data, evaluating its
performance, and making adjustments as needed.
5. 5. Communicating Results:
The findings from the model are then communicated to stakeholders. This often
involves creating visualizations and reports to present the insights in a clear and
understandable manner.
6. 6. Operationalization:
Finally, the results are put into action. This phase involves deploying the insights
into business processes, monitoring their effectiveness, and making further
adjustments as needed

1.16.differentiate between data mining and data analystics:

Below is a table of differences between Data Mining and Data Analystics :


Based on Data Mining Data Analystics

It is the process of
It is the process of analysing and organizing
extracting important raw data in order to
Definition
pattern from large determine useful
datasets. informations and
decisions

In this all operations are


It is used in discovering
involved in examining
Function hidden patterns in raw
data sets to fine
data sets .
conclusions.

Dataset can be large,


In this data set are
medium or small, Also
Data set generally large and
structured, semi
structured.
structured, unstructured.

Often require
Analytical and business
Models mathematical and
intelligence models
statistical models

Visualization It generally does not Surely requires Data


Based on Data Mining Data Analystics

require visualization visualization.

Prime goal is to make It is used to make data


Goal
data usable. driven decisions.

It requires the knowledge


It involves the
of computer science,
intersection of machine
Required Knowledge statistics, mathematics,
learning, statistics, and
subject knowledge
databases.
Al/Machine Learning.

Data analysis can be


divided into descriptive
It is also known as
statistics, exploratory
Also known as Knowledge discovery in
data analysis,
databases.
and confirmatory data
analysis.

It shows the data tends The output is verified or


Output
and patterns. discarded hypothesis

1.18.Explain text data mining:


Text Data Mining (also called Text Mining or Text Analytics) is the process of extracting
useful and meaningful information from unstructured text data. It is like finding
patterns, facts, or knowledge from large amounts of written content, such as emails,
social media posts, reports, articles, etc.

✅ Key Steps in Text Data Mining:

1. Text Preprocessing
o Cleaning the text (removing punctuation, special characters)
o Removing stop words (like "is", "the", "and")
o Converting text to lowercase
o Tokenization (splitting text into words)
2. Text Representation
o Changing text into a numerical format for analysis (e.g., Bag of Words, TF-
IDF, or word embeddings)
3. Pattern Discovery or Analysis
o Finding useful patterns using:
 Classification (e.g., spam or not spam)
 Clustering (grouping similar documents)
 Sentiment analysis (positive or negative opinion)
 Topic modeling (finding topics in large documents)
4. Interpretation and Evaluation
o Interpreting the mined patterns to make decisions or gain insights

Text Mining Process

Conventional Process of Text Mining

 Gathering unstructured information from various sources accessible in


various document organizations, for example, plain text, web pages,
PDF records, etc.
 Pre-processing and data cleansing tasks are performed to distinguish
and eliminate inconsistency in the data. The data cleansing process
makes sure to capture the genuine text, and it is performed to eliminate
stop words stemming (the process of identifying the root of a certain
word and indexing the data.
 Processing and controlling tasks are applied to review and further
clean the data set.
 Pattern analysis is implemented in Management Information System.
 Information processed in the above steps is utilized to extract important
and applicable data for a powerful and convenient decision-making
process and trend analysis

� Common Applications:

 Email spam detection


 Sentiment analysis of customer reviews
 Chatbot conversations
 Legal or medical document analysis
 News article classification

1.19.Differentiate between classification and clustering in data


mining
Parameter CLASSIFICATION CLUSTERING

used for unsupervised


Type used for supervised learning
learning

process of classifying the input grouping the instances based


Basic instances based on their on their similarity without the
corresponding class labels help of class labels

it has labels so there is need of


there is no need of training
Need training and testing dataset for
and testing dataset
verifying the model created

more complex as compared to less complex as compared to


Complexity
clustering classification

k-means clustering algorithm,


Logistic regression, Naive
Example Fuzzy c-means clustering
Bayes classifier, Support
Algorithms algorithm, Gaussian (EM)
vector machines, etc.
clustering algorithm, etc.

You might also like