DWDM Unit 4

The document discusses the benefits and applications of data mining, highlighting its role in extracting useful information from large datasets across various sectors such as banking, healthcare, and education. It emphasizes the importance of data mining techniques in decision-making, pattern recognition, and predictive analytics. Additionally, it introduces tools like WEKA, RapidMiner, and IBM Watson that facilitate data mining processes and enhance data analysis capabilities.

Uploaded by

raghuvartandon15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views43 pages

DWDM Unit 4

Uploaded by

raghuvartandon15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

BCA VI SEM

UNIT 4
Syllabus
Benefits of Data Mining
 Data is a set of discrete objective facts about an event or a process
that have little use by themselves unless converted into information.
 We have been collecting numerous data, from simple numerical
measurements and text documents to more complex information
such as spatial data, multimedia channels, and hypertext
documents.
 Nowadays, large quantities of data are being accumulated.
 The amount of data collected is said to be almost doubled every year.
 An extracting data or seeking knowledge from this massive data,
data mining techniques are used.
 Data mining is used in almost all places where a large amount of data
is stored and processed.
 For example, banks typically use ‘data mining’ to find out their
prospective customers who could be interested in credit cards,
personal loans, or insurance as well. Since banks have the
transaction details and detailed profiles of their customers, they
analyze all this data and try to find out patterns that help them
predict that certain customers could be interested in personal loans,
etc.
Data Mining
 Basically, the motive behind mining data, whether
commercial or scientific, is the same – the need to find
useful information in data to enable better decision-
making or a better understanding of the world around
us.
 “Extraction of interesting information or patterns from
data in large databases is known as data mining.”

Application of Data Mining
 Scientific Analysis: Scientific simulations are
generating bulks of data every day. This includes data
collected from nuclear laboratories, data about human
psychology, etc. Data mining techniques are capable of
the analysis of these data. Now we can capture and
store more new data faster than we can analyze the old
data already accumulated. Example of scientific
analysis:
 Sequence analysis in bioinformatics
 Classification of astronomical objects
 Medical decision support.
Application of Data Mining
 Intrusion Detection: A network intrusion refers to any
unauthorized activity on a digital network.
 Network intrusions often involve stealing
valuable network resources. Data mining technique plays a
vital role in searching intrusion detection, network attacks,
and anomalies.
 These techniques help in selecting and refining useful and
relevant information from large data sets.
 Data mining technique helps in classify relevant data for
Intrusion Detection System. Intrusion Detection system
generates alarms for the network traffic about the foreign
invasions in the system. For example:
 Detect security violations
 Misuse Detection
 Anomaly Detection
Application of Data Mining
 Business Transactions: Every business industry is
memorized for perpetuity.
 Such transactions are usually time-related and can be
inter-business deals or intra-business operations.
 The effective and in-time use of the data in a reasonable
time frame for competitive decision-making is definitely
the most important problem to solve for businesses that
struggle to survive in a highly competitive world.
 Data mining helps to analyze these business transactions
and identify marketing approaches and decision-making.
Example :
 Direct mail targeting
 Stock trading
 Customer segmentation
 Churn prediction (Churn prediction is one of the most
popular Big Data use cases in business)
Application of Data Mining
 Market Basket Analysis: Market Basket Analysis is a technique
that gives the careful study of purchases done by a customer in a
supermarket. This concept identifies the pattern of frequent
purchase items by customers. This analysis can help to promote
deals, offers, sale by the companies and data mining techniques
helps to achieve this analysis task. Example:
 Data mining concepts are in use for Sales and marketing to
provide better customer service, to improve cross-selling
opportunities, to increase direct mail response rates.
 Customer Retention in the form of pattern identification and
prediction of likely defections is possible by Data mining.
 Risk Assessment and Fraud area also use the data-mining
concept for identifying inappropriate or unusual behavior etc.
Application of Data Mining
 Education: For analyzing the education sector, data
mining uses Educational Data Mining (EDM) method.
 This method generates patterns that can be used both
by learners and educators.
 By using data mining EDM we can perform some
educational task:
 Predicting students admission in higher education
 Predicting students profiling
 Predicting student performance
 Teachers teaching performance
 Curriculum development
 Predicting student placement opportunities
Application of Data Mining
 Research: A data mining technique can perform predictions, classification,
clustering, associations, and grouping of data with perfection in the
research area.
 Rules generated by data mining are unique to find results. In most of the
technical research in data mining, we create a training model and testing
model.
 The training/testing model is a strategy to measure the precision of the
proposed model. It is called Train/Test because we split the data set into
two sets: a training data set and a testing data set. A training data set used
to design the training model whereas testing data set is used in the testing
model. Example:
 Classification of uncertain data.
 Information-based clustering.
 Decision support system
 Web Mining
 Domain-driven data mining
 IoT (Internet of Things)and Cybersecurity
 Smart farming IoT(Internet of Things)
Application of Data Mining
 Healthcare and Insurance: A Pharmaceutical sector can
examine its new deals force activity and their outcomes to
improve the focusing of high-value physicians and figure
out which promoting activities will have the best effect in
the following upcoming months, Whereas the Insurance
sector, data mining can help to predict which customers
will buy new policies, identify behavior patterns of risky
customers and identify fraudulent behavior of customers.
 Claims analysis i.e which medical procedures are claimed
together.
 Identify successful medical therapies for different illnesses.
 Characterizes patient behavior to predict office visits.
Application of Data Mining
 Transportation: A diversified transportation
company with a large direct sales force can apply data
mining to identify the best prospects for its services.
 A large consumer merchandise organization can apply
information mining to improve its business cycle to
retailers.
 Determine the distribution schedules among outlets.
 Analyze loading patterns.
Application of Data Mining
 Financial/Banking Sector: A credit card company
can leverage its vast warehouse of customer
transaction data to identify customers most likely to
be interested in a new credit product.
 Credit card fraud detection.
 Identify ‘Loyal’ customers.
 Extraction of information related to customers.
 Determine credit card spending by customer groups.
Data Mining and Recommender
Systems
 Data mining makes use of various methodologies in
statistics and different algorithms, like classification
models, clustering, and regression models to exploit the
insights which are present in the large set of data.
 It helps us to predict the outcome based on the history of
events that have taken place.
 For example, the amount a person spends on a monthly
basis based on his previous transactions, the frequent
items which are bought by the customers, like bread,
butter, and jam, are always bought together. The trends in
the market can also be analyzed, like the demand for
umbrellas during the rainy season and the demand for ice
cream during the summer. The main objective here is to
analyze the pattern present in the data set and obtain
useful information based on the target required.
Data Mining and Recommender
Systems
 The recommender system mainly deals with the likes and
dislikes of the users.
 Its major objective is to recommend an item to a user which
has a high chance of liking or is in need of a particular user
based on his previous purchases.
 It is like having a personalized team who can understand
our likes and dislikes and help us in making the decisions
regarding a particular item without being biased by any
means by making use of a large amount of data in the
repositories which are generated day by day.
 The aim of recommender systems is to supply simply
accessible, high-quality recommendations for the user
community. Its wish is to own a reasonable personal
authority with efficiency.
Recommender System
 Recommendation engines are information filtering
systems which offer appropriate recommendations
based on your customer preferences. Recommendation
systems are getting widely adopted across diverse
sectors starting from OTT (Over-the-Top) platforms to
e-commerce websites.
Data Mining and Recommender
Systems
 A recommendation engine or recommendation system
generates relevant recommendations through the
following steps-
 Data Mining
 Data Analysis
 Data Modelling
Data Mining and Recommender
Systems
 Data mining is the process of converting the raw data
into useful information. The data mining technique is
used in extracting and discovering patterns in large
data sets. It has extensive application in
recommendation systems, where the recommendation
engines collect the data related to the actions and
attributes of users. A recommendation system uses
various data mining techniques such as-
 Clustering
 Classification Technique
 Association Rules
Data mining Tools
 Data Mining tools have the objective of discovering
patterns/trends/groupings among large sets of data
and transforming data into more refined information.
 It is a framework, such as R Studio or Tableau that
allows you to perform different types of data mining
analysis.
 We can perform various algorithms such as clustering
or classification on the data set and visualize the
results itself.
 Data mining tools are the framework that provide us
better insights for our data and the phenomenon that
data represent.
WEKA
 Weka is a comprehensive software that lets you to
preprocess the big data, apply different machine
learning algorithms on big data and compare various
outputs.
 This software makes it easy to work with big data and
train a machine using machine learning algorithms.
 WEKA is an open source software provides tools for
data preprocessing, implementation of several
Machine Learning algorithms, and visualization tools
so that you can develop machine learning techniques
and apply them to real-world data mining problems.
WEKA

 WEKA supports several clustering algorithms such as EM,

FilteredClusterer, HierarchicalClusterer, SimpleKMeans
and so on. You should understand these algorithms
completely to fully exploit the WEKA capabilities.

 As in the case of classification, WEKA allows you to

visualize the detected clusters graphically.
What Weka can do?
WEKA
WEKA supports a large number of file formats for the data. Here is the
complete list −
 arff
 arff.gz
 bsi
 csv
 dat
 data
 json
 json.gz
 libsvm
 m
 names
 xrff
 xrff.gz
WEKA
Case Study – on Clustering using IRIS
dataset
 To demonstrate the clustering, we will use the
provided iris database.
 The data set contains three classes of 50 instances
each.
 Each class refers to a type of iris plant.
IRIS - Loading Data in WEKA
In the WEKA
explorer select
the Preprocess t
ab.
Click on the Open
file ... option and
select
the iris.arff file
in the file
selection dialog.
When you load the
data, the screen
looks like as
shown below
 You can observe that there are 150 instances and 5
attributes. The names of attributes are listed
as sepallength, sepalwidth, petallength, petalwidth an
d class

 The first four attributes are of numeric type while the class
is a nominal type with 3 distinct values. Examine each
attribute to understand the features of the database. We
will not do any preprocessing on this data and straight-
away proceed to model building.
IRIS - Clustering in WEKA
 Click on
the Cluster
TAB to
apply the
clustering
algorithms
to our
loaded
data.
 Click on
the Choose
button.
 Now, select EM as
the clustering
algorithm.
 In the Cluster
mode sub window,
select the Classes
to clusters
evaluation option
as shown
IRIS - Examining Output
 Click on
the Start butt
on to process
the data. After
a while, the
results will be
presented on
the screen.
 The output of
the data
processing is
shown
 From the output screen, you can observe that −

 There are 5 clustered instances detected in the database.

 The Cluster 0 represents setosa, Cluster 1 represents

virginica, Cluster 2 represents versicolor, while the last two
clusters do not have any class associated with them.

 If you scroll up the output window, you will also see some
statistics that gives the mean and standard deviation for
each of the attributes in the various detected clusters.
IRIS - Visualizing Clusters
 To visualize
the clusters,
right click on
the EM result
in the Result
list.
 Select Visualize cluster assignments.
 As in the case of classification, you will notice the distinction between
the correctly and incorrectly identified instances. You can play around
by changing the X and Y axes to analyze the results.
Rapidminer – data mining tool
 RapidMiner is a free of charge, open source software tool for
data and text mining.

 It is used for business, commercial applications and research,

education, rapid prototyping, training, and application
development also supports the machine learning process,
including results from visualization, data preparation, model
validation, and optimization.

 Rapid Miner provides the server on-site as well as in public or

private cloud infrastructure. It has a client/server model as its
base. A rapid miner comes with template-based frameworks
that enable fast delivery
Rapidminer
 RapidMiner provides data mining and machine
learning procedures including: data loading and
transformation (ETL), data preprocessing and visualization,
predictive analytics and statistical modeling, evaluation, and
deployment.
 RapidMiner allows connections to the most varied of data
sources such as Oracle, Microsoft SQL Server, MySQL, and
access to Excel, Access as well as numerous other data
formats.
 RapidMiner provides a GUI to design an analytical process
(reading data from source, transformations, applying
algorithm). All GUI changes are stored in an XML
(eXtensible Markup Language) file and then this file is read
by RapidMiner to run the analyses.
IBM Watson for data mining
 IBM Watson is a data analytics processor that leverages
natural language processing to help industries such as
healthcare, finance, retail and more make better business
decisions.
 IBM Watson is an excellent data analysis platform
integrated with advanced application programming
interfaces, software-as-a-service application and
specialized tooling.
 It leverages these tools for complex data analysis use cases
and can be integrated with different platforms for
optimizing daily tasks and enabling businesses to make the
right decisions.
Key features of IBM Watson
 Cloud environment
IBM Watson’s cloud availability means companies can start
small and pay for what they use. In addition, this means
businesses won’t have to invest in in-house computing
devices or hardware, which can be expensive.

 API integration
IBM Watson is integrated with various APIs, allowing
developers to combine different features of Watson into the
business apps.
Key features of IBM Watson
 Watson Assistant
Assistant builds better virtual agents to quickly get accurate
answers across applications and devices from customer
service to internal IT help desk and human resources
teams. It delivers consistent and intelligent customer care
across all channels and touchpoints with conversational AI.

 Watson Code Assistant

Code Assistant enables developers with various levels of
expertise to write code with AI-generated
recommendations, making it easier for anyone to write
code.
IBM Watson vs. other analytics
applications
IBM Watson stands out from other analytics applications
because it focuses on artificial intelligence and cognitive
computing capabilities. While traditional analytics
applications provide valuable insights based on historical
data and statistical methods, Watson goes beyond that by
incorporating AI, machine learning, natural language
processing and other advanced technologies:

 AI-powered insights: IBM Watson’s AI offers the ability to

analyze unstructured data like text, images and audio to
provide deeper insights from diverse data sources,
unlocking valuable information that would otherwise
remain untapped.
IBM Watson vs. other analytics
applications
 Natural language understanding: NLU capabilities enable users to
interact with systems using natural language queries, making IBM
Watson more user-friendly and accessible to a broader range of users
and allowing nontechnical stakeholders to gain insights and make
data-driven decisions easily.
 Machine learning integration: IBM Watson seamlessly integrates
with ML algorithms, allowing users to build predictive models and
perform advanced analytics tasks as well as streamlining the process of
developing and deploying AI-driven solutions.
 Domain-specific solutions: IBM Watson’s domain-specific
applications come with pretrained models, making it quicker for
businesses to adopt AI in their specific fields.
 Natural language generation: IBM Watson’s NLG capabilities enable
it to generate human-like written responses or summaries, which can
be beneficial for creating reports, communicating insights and
automating content generation.
Use cases of IBM Watson
 Healthcare
Watson’s ability to process and understand large amounts of complex
data makes it extremely valuable in healthcare. Watson can analyze
medical literature, clinical guidelines and patient records to assist
doctors in diagnosing diseases and suggesting treatments.
 Finance
Watson has been used in the financial industry to enhance customer
service, risk management and financial forecasting. Watson’s natural
language processing capabilities in customer service can build
sophisticated chatbots that handle customer inquiries.
 Retail
Watson’s machine learning and natural language processing abilities
create personalized shopping experiences in retail. For instance,
Watson can analyze a customer’s shopping history and preferences to
recommend products they might be interested in

cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Dmi Unit 1 - 186 - N3
No ratings yet
Dmi Unit 1 - 186 - N3
12 pages
Datamining With Big Data - Siva
No ratings yet
Datamining With Big Data - Siva
69 pages
Chapter 5 - Data Mining
No ratings yet
Chapter 5 - Data Mining
29 pages
Data Mining Data Mining: Knowledge Discovery in Data (KDD)
No ratings yet
Data Mining Data Mining: Knowledge Discovery in Data (KDD)
26 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Topic 3 Data Mining For Business Intelligence
No ratings yet
Topic 3 Data Mining For Business Intelligence
49 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Final Document
No ratings yet
Final Document
25 pages
Introduction To Data Mining - 125604
No ratings yet
Introduction To Data Mining - 125604
7 pages
Data Mining
No ratings yet
Data Mining
8 pages
Big Data & Cloud Computing CME Unit 1
No ratings yet
Big Data & Cloud Computing CME Unit 1
23 pages
DMT Unit 5
No ratings yet
DMT Unit 5
25 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
Unit 5 DWDM
No ratings yet
Unit 5 DWDM
42 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Data Mining
No ratings yet
Data Mining
31 pages
Data Mining Final
No ratings yet
Data Mining Final
38 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
L - 1 Data Mining
No ratings yet
L - 1 Data Mining
17 pages
Annotating Full Document
No ratings yet
Annotating Full Document
48 pages
Unit 1
No ratings yet
Unit 1
27 pages
DM Mod1
No ratings yet
DM Mod1
29 pages
Chapter 4 Predictive Analytics I Data Mining Process J Methods J and Algorithms
No ratings yet
Chapter 4 Predictive Analytics I Data Mining Process J Methods J and Algorithms
10 pages
Data Mining Seminar
50% (2)
Data Mining Seminar
21 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
46 pages
Lecture 8 Applications of Data Mining
No ratings yet
Lecture 8 Applications of Data Mining
16 pages
Data Mining Course Syllabus
No ratings yet
Data Mining Course Syllabus
8 pages
Fundamental of Data Mining (CSI-508) .
No ratings yet
Fundamental of Data Mining (CSI-508) .
19 pages
SWEN3165 Lecture 9 - Data Mining
No ratings yet
SWEN3165 Lecture 9 - Data Mining
32 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
Data Mining
No ratings yet
Data Mining
395 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Unit 1 DM Apx
No ratings yet
Unit 1 DM Apx
5 pages
Chapter 2 Data Mining
No ratings yet
Chapter 2 Data Mining
25 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
DM Lesson4
No ratings yet
DM Lesson4
24 pages
5.1 Applications of Data Mining: Unit V - Data Warehousing and Data Mining - Ca5010 1
No ratings yet
5.1 Applications of Data Mining: Unit V - Data Warehousing and Data Mining - Ca5010 1
16 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
13 pages
TPW Data Mining
No ratings yet
TPW Data Mining
4 pages
Data Mining OVERVIEW
No ratings yet
Data Mining OVERVIEW
8 pages
DM Material
No ratings yet
DM Material
98 pages
Seminar On Data Mining Concepts and Its
No ratings yet
Seminar On Data Mining Concepts and Its
8 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
Module 3
No ratings yet
Module 3
187 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
Data Mining Seminar
100% (2)
Data Mining Seminar
21 pages
Data Mining Process & Applications
No ratings yet
Data Mining Process & Applications
4 pages
DM Lec01
No ratings yet
DM Lec01
27 pages
Jurnal Ilmiah Simantek ISSN. 2550-0414 Vol. 3 No. 1 Februari 2019
No ratings yet
Jurnal Ilmiah Simantek ISSN. 2550-0414 Vol. 3 No. 1 Februari 2019
6 pages
Minor Project
No ratings yet
Minor Project
50 pages
DMS Configuration Guide 2016
No ratings yet
DMS Configuration Guide 2016
12 pages
TESDA Circular No. 034-2020
No ratings yet
TESDA Circular No. 034-2020
15 pages
CSEN604: Database II Project 1: German University in Cairo Faculty of Media Engineering and Technology
No ratings yet
CSEN604: Database II Project 1: German University in Cairo Faculty of Media Engineering and Technology
11 pages
MLIS Assign July 2018 - Jan 2019
No ratings yet
MLIS Assign July 2018 - Jan 2019
30 pages
Experiment No 1
No ratings yet
Experiment No 1
7 pages
Student Admission System SRS
No ratings yet
Student Admission System SRS
9 pages
Dark Web Monitoring Tool Report
No ratings yet
Dark Web Monitoring Tool Report
44 pages
LogicalDOC Clustering
No ratings yet
LogicalDOC Clustering
14 pages
Hospital System Design Guide
No ratings yet
Hospital System Design Guide
19 pages
Project Sheet3
No ratings yet
Project Sheet3
11 pages
Sample Thesis Library System
100% (3)
Sample Thesis Library System
6 pages
0 Front Matter
No ratings yet
0 Front Matter
4 pages
Pages from 香港房屋处BIM 标准手册 11
No ratings yet
Pages from 香港房屋处BIM 标准手册 11
1 page
British Columbia City Directories 1860-1955
No ratings yet
British Columbia City Directories 1860-1955
1 page
Businessobjects Web Intelligence Xi 3.0: Report Design: Course Description
No ratings yet
Businessobjects Web Intelligence Xi 3.0: Report Design: Course Description
4 pages
DBMS Pyqs
No ratings yet
DBMS Pyqs
12 pages
RPA Software Reviews & Ratings
No ratings yet
RPA Software Reviews & Ratings
8 pages
Geographic Information System (GIS) : Get Inspired
No ratings yet
Geographic Information System (GIS) : Get Inspired
2 pages
5-Supervised and Unsupervised
No ratings yet
5-Supervised and Unsupervised
7 pages
Date Dimension in SSAS Guide
No ratings yet
Date Dimension in SSAS Guide
7 pages
Informatica V9 Sizing Guide
0% (1)
Informatica V9 Sizing Guide
9 pages
2024-25 Class 10 AI AI Project Cycle WS - 1
No ratings yet
2024-25 Class 10 AI AI Project Cycle WS - 1
4 pages
Data Model Changes Regarding SD Index Tables: Document Version Status Date 1.0 Final October 20, 2015
No ratings yet
Data Model Changes Regarding SD Index Tables: Document Version Status Date 1.0 Final October 20, 2015
19 pages
Rohit Kumar AWS
No ratings yet
Rohit Kumar AWS
1 page
Boolean Retrieval
No ratings yet
Boolean Retrieval
34 pages
Records Storage & Retrieval Guide
No ratings yet
Records Storage & Retrieval Guide
147 pages

DWDM Unit 4

Uploaded by

DWDM Unit 4

Uploaded by

BCA VI SEM

 WEKA supports several clustering algorithms such as EM,

 As in the case of classification, WEKA allows you to

 There are 5 clustered instances detected in the database.

 The Cluster 0 represents setosa, Cluster 1 represents

 It is used for business, commercial applications and research,

 Rapid Miner provides the server on-site as well as in public or

 Watson Code Assistant

 AI-powered insights: IBM Watson’s AI offers the ability to

You might also like