Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views43 pages

DWDM Unit 4

The document discusses the benefits and applications of data mining, highlighting its role in extracting useful information from large datasets across various sectors such as banking, healthcare, and education. It emphasizes the importance of data mining techniques in decision-making, pattern recognition, and predictive analytics. Additionally, it introduces tools like WEKA, RapidMiner, and IBM Watson that facilitate data mining processes and enhance data analysis capabilities.

Uploaded by

raghuvartandon15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views43 pages

DWDM Unit 4

The document discusses the benefits and applications of data mining, highlighting its role in extracting useful information from large datasets across various sectors such as banking, healthcare, and education. It emphasizes the importance of data mining techniques in decision-making, pattern recognition, and predictive analytics. Additionally, it introduces tools like WEKA, RapidMiner, and IBM Watson that facilitate data mining processes and enhance data analysis capabilities.

Uploaded by

raghuvartandon15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

BCA VI SEM

UNIT 4
Syllabus
Benefits of Data Mining
 Data is a set of discrete objective facts about an event or a process
that have little use by themselves unless converted into information.
 We have been collecting numerous data, from simple numerical
measurements and text documents to more complex information
such as spatial data, multimedia channels, and hypertext
documents.
 Nowadays, large quantities of data are being accumulated.
 The amount of data collected is said to be almost doubled every year.
 An extracting data or seeking knowledge from this massive data,
data mining techniques are used.
 Data mining is used in almost all places where a large amount of data
is stored and processed.
 For example, banks typically use ‘data mining’ to find out their
prospective customers who could be interested in credit cards,
personal loans, or insurance as well. Since banks have the
transaction details and detailed profiles of their customers, they
analyze all this data and try to find out patterns that help them
predict that certain customers could be interested in personal loans,
etc.
Data Mining
 Basically, the motive behind mining data, whether
commercial or scientific, is the same – the need to find
useful information in data to enable better decision-
making or a better understanding of the world around
us.
 “Extraction of interesting information or patterns from
data in large databases is known as data mining.”

Application of Data Mining
 Scientific Analysis: Scientific simulations are
generating bulks of data every day. This includes data
collected from nuclear laboratories, data about human
psychology, etc. Data mining techniques are capable of
the analysis of these data. Now we can capture and
store more new data faster than we can analyze the old
data already accumulated. Example of scientific
analysis:
 Sequence analysis in bioinformatics
 Classification of astronomical objects
 Medical decision support.
Application of Data Mining
 Intrusion Detection: A network intrusion refers to any
unauthorized activity on a digital network.
 Network intrusions often involve stealing
valuable network resources. Data mining technique plays a
vital role in searching intrusion detection, network attacks,
and anomalies.
 These techniques help in selecting and refining useful and
relevant information from large data sets.
 Data mining technique helps in classify relevant data for
Intrusion Detection System. Intrusion Detection system
generates alarms for the network traffic about the foreign
invasions in the system. For example:
 Detect security violations
 Misuse Detection
 Anomaly Detection
Application of Data Mining
 Business Transactions: Every business industry is
memorized for perpetuity.
 Such transactions are usually time-related and can be
inter-business deals or intra-business operations.
 The effective and in-time use of the data in a reasonable
time frame for competitive decision-making is definitely
the most important problem to solve for businesses that
struggle to survive in a highly competitive world.
 Data mining helps to analyze these business transactions
and identify marketing approaches and decision-making.
Example :
 Direct mail targeting
 Stock trading
 Customer segmentation
 Churn prediction (Churn prediction is one of the most
popular Big Data use cases in business)
Application of Data Mining
 Market Basket Analysis: Market Basket Analysis is a technique
that gives the careful study of purchases done by a customer in a
supermarket. This concept identifies the pattern of frequent
purchase items by customers. This analysis can help to promote
deals, offers, sale by the companies and data mining techniques
helps to achieve this analysis task. Example:
 Data mining concepts are in use for Sales and marketing to
provide better customer service, to improve cross-selling
opportunities, to increase direct mail response rates.
 Customer Retention in the form of pattern identification and
prediction of likely defections is possible by Data mining.
 Risk Assessment and Fraud area also use the data-mining
concept for identifying inappropriate or unusual behavior etc.
Application of Data Mining
 Education: For analyzing the education sector, data
mining uses Educational Data Mining (EDM) method.
 This method generates patterns that can be used both
by learners and educators.
 By using data mining EDM we can perform some
educational task:
 Predicting students admission in higher education
 Predicting students profiling
 Predicting student performance
 Teachers teaching performance
 Curriculum development
 Predicting student placement opportunities
Application of Data Mining
 Research: A data mining technique can perform predictions, classification,
clustering, associations, and grouping of data with perfection in the
research area.
 Rules generated by data mining are unique to find results. In most of the
technical research in data mining, we create a training model and testing
model.
 The training/testing model is a strategy to measure the precision of the
proposed model. It is called Train/Test because we split the data set into
two sets: a training data set and a testing data set. A training data set used
to design the training model whereas testing data set is used in the testing
model. Example:
 Classification of uncertain data.
 Information-based clustering.
 Decision support system
 Web Mining
 Domain-driven data mining
 IoT (Internet of Things)and Cybersecurity
 Smart farming IoT(Internet of Things)
Application of Data Mining
 Healthcare and Insurance: A Pharmaceutical sector can
examine its new deals force activity and their outcomes to
improve the focusing of high-value physicians and figure
out which promoting activities will have the best effect in
the following upcoming months, Whereas the Insurance
sector, data mining can help to predict which customers
will buy new policies, identify behavior patterns of risky
customers and identify fraudulent behavior of customers.
 Claims analysis i.e which medical procedures are claimed
together.
 Identify successful medical therapies for different illnesses.
 Characterizes patient behavior to predict office visits.
Application of Data Mining
 Transportation: A diversified transportation
company with a large direct sales force can apply data
mining to identify the best prospects for its services.
 A large consumer merchandise organization can apply
information mining to improve its business cycle to
retailers.
 Determine the distribution schedules among outlets.
 Analyze loading patterns.
Application of Data Mining
 Financial/Banking Sector: A credit card company
can leverage its vast warehouse of customer
transaction data to identify customers most likely to
be interested in a new credit product.
 Credit card fraud detection.
 Identify ‘Loyal’ customers.
 Extraction of information related to customers.
 Determine credit card spending by customer groups.
Data Mining and Recommender
Systems
 Data mining makes use of various methodologies in
statistics and different algorithms, like classification
models, clustering, and regression models to exploit the
insights which are present in the large set of data.
 It helps us to predict the outcome based on the history of
events that have taken place.
 For example, the amount a person spends on a monthly
basis based on his previous transactions, the frequent
items which are bought by the customers, like bread,
butter, and jam, are always bought together. The trends in
the market can also be analyzed, like the demand for
umbrellas during the rainy season and the demand for ice
cream during the summer. The main objective here is to
analyze the pattern present in the data set and obtain
useful information based on the target required.
Data Mining and Recommender
Systems
 The recommender system mainly deals with the likes and
dislikes of the users.
 Its major objective is to recommend an item to a user which
has a high chance of liking or is in need of a particular user
based on his previous purchases.
 It is like having a personalized team who can understand
our likes and dislikes and help us in making the decisions
regarding a particular item without being biased by any
means by making use of a large amount of data in the
repositories which are generated day by day.
 The aim of recommender systems is to supply simply
accessible, high-quality recommendations for the user
community. Its wish is to own a reasonable personal
authority with efficiency.
Recommender System
 Recommendation engines are information filtering
systems which offer appropriate recommendations
based on your customer preferences. Recommendation
systems are getting widely adopted across diverse
sectors starting from OTT (Over-the-Top) platforms to
e-commerce websites.
Data Mining and Recommender
Systems
 A recommendation engine or recommendation system
generates relevant recommendations through the
following steps-
 Data Mining
 Data Analysis
 Data Modelling
Data Mining and Recommender
Systems
 Data mining is the process of converting the raw data
into useful information. The data mining technique is
used in extracting and discovering patterns in large
data sets. It has extensive application in
recommendation systems, where the recommendation
engines collect the data related to the actions and
attributes of users. A recommendation system uses
various data mining techniques such as-
 Clustering
 Classification Technique
 Association Rules
Data mining Tools
 Data Mining tools have the objective of discovering
patterns/trends/groupings among large sets of data
and transforming data into more refined information.
 It is a framework, such as R Studio or Tableau that
allows you to perform different types of data mining
analysis.
 We can perform various algorithms such as clustering
or classification on the data set and visualize the
results itself.
 Data mining tools are the framework that provide us
better insights for our data and the phenomenon that
data represent.
WEKA
 Weka is a comprehensive software that lets you to
preprocess the big data, apply different machine
learning algorithms on big data and compare various
outputs.
 This software makes it easy to work with big data and
train a machine using machine learning algorithms.
 WEKA is an open source software provides tools for
data preprocessing, implementation of several
Machine Learning algorithms, and visualization tools
so that you can develop machine learning techniques
and apply them to real-world data mining problems.
WEKA

 WEKA supports several clustering algorithms such as EM,


FilteredClusterer, HierarchicalClusterer, SimpleKMeans
and so on. You should understand these algorithms
completely to fully exploit the WEKA capabilities.

 As in the case of classification, WEKA allows you to


visualize the detected clusters graphically.
What Weka can do?
WEKA
WEKA supports a large number of file formats for the data. Here is the
complete list −
 arff
 arff.gz
 bsi
 csv
 dat
 data
 json
 json.gz
 libsvm
 m
 names
 xrff
 xrff.gz
WEKA
Case Study – on Clustering using IRIS
dataset
 To demonstrate the clustering, we will use the
provided iris database.
 The data set contains three classes of 50 instances
each.
 Each class refers to a type of iris plant.
IRIS - Loading Data in WEKA
In the WEKA
explorer select
the Preprocess t
ab.
Click on the Open
file ... option and
select
the iris.arff file
in the file
selection dialog.
When you load the
data, the screen
looks like as
shown below
 You can observe that there are 150 instances and 5
attributes. The names of attributes are listed
as sepallength, sepalwidth, petallength, petalwidth an
d class

 The first four attributes are of numeric type while the class
is a nominal type with 3 distinct values. Examine each
attribute to understand the features of the database. We
will not do any preprocessing on this data and straight-
away proceed to model building.
IRIS - Clustering in WEKA
 Click on
the Cluster
TAB to
apply the
clustering
algorithms
to our
loaded
data.
 Click on
the Choose
button.
 Now, select EM as
the clustering
algorithm.
 In the Cluster
mode sub window,
select the Classes
to clusters
evaluation option
as shown
IRIS - Examining Output
 Click on
the Start butt
on to process
the data. After
a while, the
results will be
presented on
the screen.
 The output of
the data
processing is
shown
 From the output screen, you can observe that −

 There are 5 clustered instances detected in the database.

 The Cluster 0 represents setosa, Cluster 1 represents


virginica, Cluster 2 represents versicolor, while the last two
clusters do not have any class associated with them.

 If you scroll up the output window, you will also see some
statistics that gives the mean and standard deviation for
each of the attributes in the various detected clusters.
IRIS - Visualizing Clusters
 To visualize
the clusters,
right click on
the EM result
in the Result
list.
 Select Visualize cluster assignments.
 As in the case of classification, you will notice the distinction between
the correctly and incorrectly identified instances. You can play around
by changing the X and Y axes to analyze the results.
Rapidminer – data mining tool
 RapidMiner is a free of charge, open source software tool for
data and text mining.

 It is used for business, commercial applications and research,


education, rapid prototyping, training, and application
development also supports the machine learning process,
including results from visualization, data preparation, model
validation, and optimization.

 Rapid Miner provides the server on-site as well as in public or


private cloud infrastructure. It has a client/server model as its
base. A rapid miner comes with template-based frameworks
that enable fast delivery
Rapidminer
 RapidMiner provides data mining and machine
learning procedures including: data loading and
transformation (ETL), data preprocessing and visualization,
predictive analytics and statistical modeling, evaluation, and
deployment.
 RapidMiner allows connections to the most varied of data
sources such as Oracle, Microsoft SQL Server, MySQL, and
access to Excel, Access as well as numerous other data
formats.
 RapidMiner provides a GUI to design an analytical process
(reading data from source, transformations, applying
algorithm). All GUI changes are stored in an XML
(eXtensible Markup Language) file and then this file is read
by RapidMiner to run the analyses.
IBM Watson for data mining
 IBM Watson is a data analytics processor that leverages
natural language processing to help industries such as
healthcare, finance, retail and more make better business
decisions.
 IBM Watson is an excellent data analysis platform
integrated with advanced application programming
interfaces, software-as-a-service application and
specialized tooling.
 It leverages these tools for complex data analysis use cases
and can be integrated with different platforms for
optimizing daily tasks and enabling businesses to make the
right decisions.
Key features of IBM Watson
 Cloud environment
IBM Watson’s cloud availability means companies can start
small and pay for what they use. In addition, this means
businesses won’t have to invest in in-house computing
devices or hardware, which can be expensive.

 API integration
IBM Watson is integrated with various APIs, allowing
developers to combine different features of Watson into the
business apps.
Key features of IBM Watson
 Watson Assistant
Assistant builds better virtual agents to quickly get accurate
answers across applications and devices from customer
service to internal IT help desk and human resources
teams. It delivers consistent and intelligent customer care
across all channels and touchpoints with conversational AI.

 Watson Code Assistant


Code Assistant enables developers with various levels of
expertise to write code with AI-generated
recommendations, making it easier for anyone to write
code.
IBM Watson vs. other analytics
applications
IBM Watson stands out from other analytics applications
because it focuses on artificial intelligence and cognitive
computing capabilities. While traditional analytics
applications provide valuable insights based on historical
data and statistical methods, Watson goes beyond that by
incorporating AI, machine learning, natural language
processing and other advanced technologies:

 AI-powered insights: IBM Watson’s AI offers the ability to


analyze unstructured data like text, images and audio to
provide deeper insights from diverse data sources,
unlocking valuable information that would otherwise
remain untapped.
IBM Watson vs. other analytics
applications
 Natural language understanding: NLU capabilities enable users to
interact with systems using natural language queries, making IBM
Watson more user-friendly and accessible to a broader range of users
and allowing nontechnical stakeholders to gain insights and make
data-driven decisions easily.
 Machine learning integration: IBM Watson seamlessly integrates
with ML algorithms, allowing users to build predictive models and
perform advanced analytics tasks as well as streamlining the process of
developing and deploying AI-driven solutions.
 Domain-specific solutions: IBM Watson’s domain-specific
applications come with pretrained models, making it quicker for
businesses to adopt AI in their specific fields.
 Natural language generation: IBM Watson’s NLG capabilities enable
it to generate human-like written responses or summaries, which can
be beneficial for creating reports, communicating insights and
automating content generation.
Use cases of IBM Watson
 Healthcare
Watson’s ability to process and understand large amounts of complex
data makes it extremely valuable in healthcare. Watson can analyze
medical literature, clinical guidelines and patient records to assist
doctors in diagnosing diseases and suggesting treatments.
 Finance
Watson has been used in the financial industry to enhance customer
service, risk management and financial forecasting. Watson’s natural
language processing capabilities in customer service can build
sophisticated chatbots that handle customer inquiries.
 Retail
Watson’s machine learning and natural language processing abilities
create personalized shopping experiences in retail. For instance,
Watson can analyze a customer’s shopping history and preferences to
recommend products they might be interested in

You might also like