0% found this document useful (0 votes)

34 views27 pages

Data Mining Complete Lab Manual - DRSNR

The document outlines a data mining lab course focused on using the WEKA toolkit for various data mining tasks, including data pre-processing, classification, clustering, and association rule mining. It details procedures for installing WEKA, understanding the ARFF file format, and performing specific algorithms such as Apriori and J48. The lab includes hands-on activities with datasets to explore data attributes, handle missing values, and visualize results.

Uploaded by

viswa225574

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views27 pages

Data Mining Complete Lab Manual - DRSNR

Uploaded by

viswa225574

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 27

Subject: DATA MINING LAB

Class : VI CSE-C

Faculty : Dr S. NageswaraRao
INDEX

1 Explore WEKA Data Mining/Machine Learning Toolkit

2 i) Study the .arff file format

ii) Explore the available data sets in WEKA? Load and observe each
dataset.
3 Perform data pre-processing tasks
i) Handling missing values
ii) Applying normalization on numeric attributes
4 Load each dataset into Weka and run Apriori algorithm
5 Demonstrate performing classification on data sets

6 Extract if-then rules from the decision tree generated by the classifier

7 Load labelled dataset into Weka and perform Naive-bayes classification and k-
Nearest Neighbour classification. Interpret the results obtained.

8 i) Cluster the given data set using k-means clustering algorithm

ii) Explore visualization features of Weka to visualize the clusters

9. i) Apply the hierarchical and grid based Clustering techniques on the given data
set
ii) Explore visualization features of Weka to visualize the clusters
10 Apply PCA (Principle Component Analysis) dimensionality reduction technique
on the given data set.
11 Demonstrate performing Regression on data sets

12 Credit Risk Assessment – The German Credit Data

1) Explore WEKA Data Mining/Machine Learning Toolkit

(i). Downloading and/or installation of WEKA data mining toolkit

Procedure:
1. Go to the Weka website, http://www.cs.waikato.ac.nz/ml/weka/, and download the software.
On the left-hand side, click on the link that says download.
2. Select the appropriate link corresponding to the version of the software based on your
operating system and whether or not you already have Java VM running on your machine (if you
don‘t know what Java VM is, then you probably don‘t).
3. The link will forward you to a site where you can download the software from a mirror site.
Save the self-extracting executable to disk and then double click on it to install Weka. Answer
yes or next to the questions during the installation.
4. Click yes to accept the Java agreement if necessary. After you install the program Weka
should appear on your start menu under Programs (if you are using Windows).
5. Running Weka from the start menu select Programs, then Weka.You will see the Weka GUI
Chooser. Select Explorer. The Weka Explorer will then launch.

Data Mining Lab Dept of CSE KSRMC

E
4

(ii). Understand the features of WEKA toolkit such as Explorer, Knowledge

Flow interface, Experimenter, command-line interface.

The Weka GUI Chooser (class weka.gui.GUIChooser) provides a starting pointfor launching
Weka‘s main GUI applications and supporting tools. If one prefersa MDI (―multiple document
interface‖) appearance, then this is provided by analternative launcher called ―Main‖
(class weka.gui.Main).
The GUI Chooser consists of four buttons—one for each of the four majorWeka applications—
and four menus.

The buttons can be used to start the following applications:

Explorer- An environment for exploring data with WEKA
a) Click on ―explorer‖ button to bring up the explorer window.
b) Make sure the ―preprocess‖ tab is highlighted.
c) Open a new file by clicking on ―Open New file‖ and choosing a file with ―.arff
extension from the ―Data‖ directory.
d) Attributes appear in the window below.
e) Click on the attributes to see the visualization on the right.
f) Click ―visualize all‖ to see them all

Experimenter- An environment for performing experiments and conducting statistical tests

between learning schemes.
a) Experimenter is for comparing results.
b) Under the ―set up‖ tab click ―New‖.
c) Click on ―Add New‖ under ―Data‖ frame. Choose a couple of arff format files from
―Data‖ directory one at a time.
d) Click on ―Add New‖ under ―Algorithm‖ frame. Choose several algorithms, one at a time
by clicking ―OK‖ in the window and ―Add New‖.
e) Under the ―Run‖ tab click ―Start‖.

Data Mining Lab Dept of CSE KSRMC

E
5

f) Wait for WEKA to finish.

g) Under ―Analyses‖ tab click on ―Experiment‖ to see results.

Knowledge Flow- This environment supports essentially the same functions as the Explorer but
with a drag-and-drop interface. One advantageis that it supports incremental learning.
SimpleCLI - Provides a simple command-line interface that allows directexecution of WEKA
commands for operating systems that do not provide their own command line interface.
(iii).Navigate the options available in the WEKA (ex. Select attributes panel, Preprocess
panel, classify panel, Cluster panel, Associate panel and Visualize panel)

When the Explorer is first started only the first tab is active; the others are greyed out. This is
because it is necessary to open (and potentially pre-process) a data set before starting to explore
the data.
The tabs are as follows:
1. Preprocess. Choose and modify the data being acted on.
2. Classify. Train and test learning schemes that classify or perform regression.
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for the data.
5. Select attributes. Select the most relevant attributes in the data.
6. Visualize. View an interactive 2D plot of the data.
Once the tabs are active, clicking on them flicks between different screens, on which the
respective actions can be performed. The bottom area of the window (including the status box,
the log button, and the Weka bird) stays visible regardless of which section you are in.

1. Preprocessing

Loading Data:

The first four buttons at the top of the preprocess section enable you to loaddata into WEKA:

Data Mining Lab Dept of CSE KSRMC

E
6

1. Open file.... Brings up a dialog box allowing you to browse for the datafile on the local file
system.
2. Open URL....Asks for a Uniform Resource Locator address for wherethe data is stored.
3. Open DB.....Reads data from a database. (Note that to make this workyou might have to edit
the file in weka/experiment/DatabaseUtils.props.)
4. Generate.. . .Enables you to generate artificial data from a variety ofDataGenerators.
Using the Open file. . .button you can read files in a variety of formats:
WEKA‘s ARFF format, CSV format, C4.5 format, or serialized Instances format. ARFF files
typically have a .arff extension, CSV files a .csv extension,C4.5 files a .data and .names
extension, and serialized Instances objects a .bsiextension.
2. Classification:

Selecting a Classifier

At the top of the classify section is the Classifier box. This box has a text fieldthat gives the
name of the currently selected classifier, and its options. Clickingon the text box with the left
mouse button brings up a GenericObjectEditordialog box, just the same as for filters, that you
can use to configure the optionsof the current classifier. With a right click (or Alt+Shift+left
click) you canonce again copy the setup string to the clipboard or display the properties in
aGenericObjectEditor dialog box. The Choose button allows you to choose on4eof the classifiers
that are available in WEKA.
Test Options

Data Mining Lab Dept of CSE KSRMC

E
7

The result of applying the chosen classifier will be tested according to the optionsthat are set by
clicking in the Test options box. There are four test modes:
1. Use training set: The classifier is evaluated on how well it predicts theclass of the instances
it was trained on.
2. Supplied test set: The classifier is evaluated on how well it predicts theclass of a set of
instances loaded from a file. Clicking the Set... buttonbrings up a dialog allowing you to
choose the file to test on.
3. Cross-validation: The classifier is evaluated by cross-validation, usingthe number of folds
that are entered in the Folds text field.
4. Percentage split: The classifier is evaluated on how well it predicts acertain percentage of the
data which is held out for testing. The amountof data held out depends on the value entered in
the % field.
3. Clustering:

Cluster Modes:

The Cluster mode box is used to choose what to cluster and how to evaluatethe results. The first
three options are the same as for classification: Use training set, Supplied test set and
Percentage split.
4. Associating:

Data Mining Lab Dept of CSE KSRMC

E
8

Setting Up
This panel contains schemes for learning association rules, and the learners are chosen and
configured in the same way as the clusterers, filters, and classifiers in the other panels.

5. Selecting Attributes:

Searching and Evaluating

Attribute selection involves searching through all possible combinations of attributes in the data
to find which subset of attributes works best for prediction. To do this, two objects must be set
up: an attribute evaluator and a searchmethod. The evaluator determines what method is used to
Data Mining Lab Dept of CSE KSRMC
E
9

assign a worth toeach subset of attributes. The search method determines what style of searchis
performed.
6. Visualizing:

WEKA‘s visualization section allows you to visualize 2D plots of the current relation.

2.i) Study the arff file format

An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list
of instances sharing a set of attributes. ARFF files were developed by the Machine Learning
Project at the Department of Computer Science of The University of Waikato for use with
the Weka machine learning software.

Overview

ARFF files have two distinct sections. The first section is the Header information, which is
followed the Data information.

The Header of the ARFF file contains the name of the relation, a list of the attributes (the
columns in the data), and their types. An example header on the standard IRIS dataset looks like
this:

% 1. Title: Iris Plants Database

%
% 2. Sources:
% (a) Creator: R.A. Fisher
% (b) Donor: Michael Marshall (MARSHALL%[email protected])
% (c) Date: July, 1988
Data Mining Lab Dept of CSE KSRMC
E
10

%
@RELATION iris

@ATTRIBUTE sepallength NUMERIC

@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

The Data of the ARFF file looks like the following:

@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
Lines that begin with a % are comments.
The @RELATION, @ATTRIBUTE and @DATA declarations are case insensitive.

2.ii) Explore the available data sets in WEKA

There are 23 different datasets are available in weka (C:\Program Files\Weka-3-6\) by default for
testing purpose. All the datasets are available in. arff format. Those datasets are listed below.

Data Mining Lab Dept of CSE KSRMC

E
11

Load a data set (ex. Weather dataset, Iris dataset, etc.)

Procedure:
1. Open the weka tool and select the explorer option.
2. New window will be opened which consists of different options (Preprocess, Association etc.)
3. In the preprocess, click the ―open file‖ option.
4. Go to C:\Program Files\Weka-3-6\data for finding different existing. arff datasets.
5. Click on any dataset for loading the data then the data will be displayed as shown below.

Load each dataset and observe the following:

Data Mining Lab Dept of CSE KSRMC

E
12

Here we have taken IRIS.arff dataset as sample for observing all the below things.

List the attribute names and they types

There are 5 attributes& its datatype present in the above loaded dataset (IRIS.arff)
sepallength – Numeric
sepalwidth – Numeric
petallength – Numeric
petallength – Numeric
Class – Nominal
i. Number of records in each dataset

There are total 150 records (Instances) in dataset (IRIS.arff).

ii. Identify the class attribute (if any)

There is one class attribute which consists of 3 labels. They are:

1. Iris-setosa
2. Iris-versicolor
3. Iris-virginica

iii. Plot Histogram

Data Mining Lab Dept of CSE KSRMC

E
13

iv. Determine the number of records for each class.

There is one class attribute (150 records) which consists of 3 labels. They are shown below
1. Iris-setosa - 50 records
2. Iris-versicolor – 50 records
3. Iris-virginica – 50 records

v. Visualize the data in various dimensions

Data Mining Lab Dept of CSE KSRMC

E
14

3) Perform data pre-processing tasks

o Handling missing values
o Applying normalization on numeric attributes

Handling missing values

Description: Missing values for numerical features are replaced with mean values, for nominal
features replaced with most frequently occurred term.

I weka it is implemented with the following steps

1) Select data set called “cpu-with -vendor”
2) Select option choose—unsupervised—attribute—Replacewithmissingvalues, it introduce
some missing values in each attribute.
3) Missing values can be handled using any one of the options
 choose—unsupervised—attribute—Replacemissingvalues
 choose—unsupervised—attribute—Replacemissingvalueswithuserconstant

Applying normalization on numeric attributes

Description: in normalization all numerical attribute values are converted into the default scale of
[0-1]

I weka it is implemented with the following steps

1) Select data set called “cpu”

2) Select option choose—unsupervised—attribute—Normalize,
It normalize all attribute values in the default range of 0 to1, this range can be changed by
double clicking on Normalize S-0-T-1

4) LOAD EACH DATASET INTO WEKA AND RUN APRIORI ALGORITHM

Data Mining Lab Dept of CSE KSRMC

E
15

Step1: Open the data file in Weka Explorer. It is presumed that the required data fields have been
discretized. In this example it is age attribute.

Step2: Clicking on the associate tab will bring up the interface for association rule algorithm.

Step3: We will use apriori algorithm. This is the default algorithm.

Step4: Inorder to change the parameters for the run (example support, confidence etc) we click on the
text box immediately to the right of the choose button

Data Mining Lab Dept of CSE KSRMC

E
16

Data Mining Lab Dept of CSE KSRMC

E
17

5) Demonstrate performing classification on data sets

Procedure for J48:
1. Load the dataset (Contact-lenses.arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see differentclassification algorithms
under tree section.
3. In which we selected J48 algorithm,in more options select the output entropy evaluation
measures& click on start option.
4. Then we will get classifier output, entropy values & Kappa Statistic as represented below.
5. In the below screenshot, we can run classifiers with different test options (Cross-validation, Use
Training Set, Percentage Split, Supplied Test set).

A. Use training set: The classifier is evaluated on how well it predicts theclass of the instances it
was trained on.
B. Supplied test set: The classifier is evaluated on how well it predicts theclass of a set ofinstances
loaded from a file. Clicking the Set... buttonbrings up a dialog allowing you to choose the file to test
on.

Data Mining Lab Dept of CSE KSRMC

E
18

C. Cross-validation: The classifier is evaluated by cross-validation, usingthe number of folds that

are entered in the Folds text field.
D. Percentage split: The classifier is evaluated on how well it predicts acertain percentage of the
data which is held out for testing. The amountof data held out depends on the value entered in the %
field.

6) Extract if-then rules from the decision tree generated by the classifier

Procedure:
1. Load the dataset (Iris-2D. arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see differentclassification algorithms
under rules section.
3. In which we selected Decisionstump (If-then) algorithm & click on start option with ―use training
set‖ test option enabled.
4. Then we will get detailed accuracy by class consists ofF-measure, TP rate, FP rate, Precision,
Recall values& Confusion Matrix as represented below.

7) Load labelled dataset into Weka and perform Naive-bayes classification and k-Nearest
Neighbour classification. Interpret the results obtained.

Procedure for Naïve-Bayes:

1. Load the dataset (Iris-2D. arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see differentclassification algorithms
under bayes section.
3. In which we selected Naïve-Bayes algorithm & click on start option with ―use training set‖ test
option enabled.
4. Then we will get detailed accuracy by class consists of F-measure, TP rate, FP rate, Precision,
Recall values& Confusion Matrix as represented below.

Data Mining Lab Dept of CSE KSRMC

E
19

Procedure for K-Nearest Neighbour (IBK):

1. Load the dataset (Iris-2D. arff) into weka tool

Data Mining Lab Dept of CSE KSRMC

E
20

2. Go to classify option & in left-hand navigation bar we can see differentclassification algorithms
under lazy section.
3. In which we selected K-Nearest Neighbour (IBK) algorithm & click on start option with ―use
training set‖ test option enabled.
4. Then we will get detailed accuracy by class consists of F-measure, TP rate, FP rate, Precision,
Recall values& Confusion Matrix as represented below.

8) i) Cluster the given data set using k-means clustering algorithm

ii) Explore visualization features of Weka to visualize the clusters

k-means clustering

1. Load the dataset (Iris.arff) into weka tool

2. Go to cluster option & we can see different clustering algorithms.
3. In which we selected Simple K-Means algorithm & click on start option with ―use training set‖ test
option enabled.
4. Then we will get the sum of squared errors, centroids, No. of iterations & clustered instances as
represented below.
5. If we right click on simple k means, we will get more options in which ―Visualize cluster assignments‖
should be selected for getting cluster visualization as shown below.

Data Mining Lab Dept of CSE KSRMC

E
21

Explore visualization features of Weka to visualize the clusters

 If we right click on simple k means, we will get more options in which ―Visualize cluster

Data Mining Lab Dept of CSE KSRMC

E
22

assignments‖ should be selected for getting cluster visualization as shown below.

 In that cluster visualization we are having different features to explore by changing the X-axis,
Y-axis, Color, Jitter& Select instance (Rectangle, Polygon & Polyline) for getting different sets
of cluster outputs.

 As shown in above screenshot, all the dataset (Iris.arff) tuples are represented in X-axis & in
similar way it will represented for y-axis also. For each cluster, the color will be different. In the
above figure, there are two clusters which are represented in blue & red colors. In the select
instance we can select different shapes for choosing clustered area as shown in below
screenshot, rectangle shape is selected.

Data Mining Lab Dept of CSE KSRMC

E
23

 By this visualization feature we can observe different clustering outputs for an dataset by
changing those X-axis, Y-axis, Color & Jitter options.

Data Mining Lab Dept of CSE KSRMC

E
24

9) Apply the hierarchical and grid based Clustering techniques on the given data set

1. Load the dataset (Iris.arff) into weka tool

2. Go to cluster option & we can see different clustering algorithms.
3. In which we selected HierarchicalClusterer algorithm & click on start option with ―use training set‖
test option enabled.
4. Then we will get the sum of squared errors, centroids, No. of iterations & clustered instances as
represented below.
5. If we right click on simple k means, we will get more options in which ―Visualize cluster assignments‖
should be selected for getting cluster visualization as shown below.

Data Mining Lab Dept of CSE KSRMC

E
25

10) Apply PCA (Principle Component Analysis) dimensionality reduction technique on the given
data set.

1. Load the dataset (Iris.arff) into weka tool

2. Go to Filters->unsupervised-> attributes->principlecomponent
3. If we right click on principlecomponent, we will get more options in which ―MaximumAttributes=2
4.Visualize the reduced dimensions on clicking right upper option visualization

Data Mining Lab Dept of CSE KSRMC

E
26

11) Demonstrate performing Regression on data sets

Data Mining Lab Dept of CSE KSRMC

E
27

Data Mining Lab Dept of CSE KSRMC

Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
VDA - Band - Lessons - Learned - 1. Ausgabe 2020 - Englisch
100% (3)
VDA - Band - Lessons - Learned - 1. Ausgabe 2020 - Englisch
40 pages
WEKA Data Mining Lab Manual
100% (1)
WEKA Data Mining Lab Manual
8 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Mining Lab Manual for CSE
No ratings yet
Data Mining Lab Manual for CSE
50 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Microbial Diversity Manual PDF
No ratings yet
Microbial Diversity Manual PDF
83 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
Weka Guide for Data Scientists
No ratings yet
Weka Guide for Data Scientists
5 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
WEKA Guide for ML Enthusiasts
No ratings yet
WEKA Guide for ML Enthusiasts
52 pages
WEKA Guide for ML Practitioners
No ratings yet
WEKA Guide for ML Practitioners
58 pages
ExplorerGuide A Version 3-5-8
No ratings yet
ExplorerGuide A Version 3-5-8
22 pages
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
No ratings yet
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
13 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Lab Manual - DM
No ratings yet
Lab Manual - DM
56 pages
DM Manual III-II
No ratings yet
DM Manual III-II
18 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
WEKA Toolkit: Machine Learning Guide
No ratings yet
WEKA Toolkit: Machine Learning Guide
8 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
Data Mining & Predictive Analysis Lab Manual
No ratings yet
Data Mining & Predictive Analysis Lab Manual
68 pages
Weka Data Mining Lab Guide
No ratings yet
Weka Data Mining Lab Guide
20 pages
Assignment 7
100% (1)
Assignment 7
3 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Lab Manual
No ratings yet
Lab Manual
24 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
Weka Experiment
No ratings yet
Weka Experiment
13 pages
WEKA Tool & Data Mining Lab Guide
No ratings yet
WEKA Tool & Data Mining Lab Guide
29 pages
Expt 1 Docx
No ratings yet
Expt 1 Docx
15 pages
Machine Learning Tools: Weka & KNIME
No ratings yet
Machine Learning Tools: Weka & KNIME
88 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
WEKA: ML Tool for Data Scientists
No ratings yet
WEKA: ML Tool for Data Scientists
23 pages
FinalTerm mth302 Solved Paper No 18 N 20 sharedbyNAiveeNiGmA PDF
No ratings yet
FinalTerm mth302 Solved Paper No 18 N 20 sharedbyNAiveeNiGmA PDF
15 pages
Module 5 94 128 2
No ratings yet
Module 5 94 128 2
35 pages
Amos Proc Calis (Sas) Eqs JMP Lisrel Mplus MX Openmx R Sepath (Statistica) Sem (Stata) W.Plsmodel
No ratings yet
Amos Proc Calis (Sas) Eqs JMP Lisrel Mplus MX Openmx R Sepath (Statistica) Sem (Stata) W.Plsmodel
18 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
TableauCertifiedDataAnalyst ExamGuide
No ratings yet
TableauCertifiedDataAnalyst ExamGuide
16 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
Jurnal Internasional
No ratings yet
Jurnal Internasional
22 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
Alemaya Stat
No ratings yet
Alemaya Stat
153 pages
Statistics at KU Leuven
No ratings yet
Statistics at KU Leuven
8 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
Practice Based Research With Psychologists in Training Presentation of A Supervision Model and Use of Routine Outcome Monitoring Investigaci N Basada
No ratings yet
Practice Based Research With Psychologists in Training Presentation of A Supervision Model and Use of Routine Outcome Monitoring Investigaci N Basada
27 pages
Business Intelligence
No ratings yet
Business Intelligence
8 pages
J48 & Naive Bayes Classification Guide
No ratings yet
J48 & Naive Bayes Classification Guide
3 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Technology-Driven Internal Audit Revolution
No ratings yet
Technology-Driven Internal Audit Revolution
5 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
5 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
Social Medias Influence On Students College Sele
No ratings yet
Social Medias Influence On Students College Sele
18 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Lab 04
No ratings yet
Lab 04
7 pages
Data Analytics Bootcamp Overview
No ratings yet
Data Analytics Bootcamp Overview
22 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Autonomous LLM-driven Research From Data To Human-Verifiable Research Papers
No ratings yet
Autonomous LLM-driven Research From Data To Human-Verifiable Research Papers
38 pages
IE5005 Lecture 00
No ratings yet
IE5005 Lecture 00
32 pages
Lab 02
No ratings yet
Lab 02
4 pages
Lecture Five - Docx Measure of Dispersion
No ratings yet
Lecture Five - Docx Measure of Dispersion
9 pages
Aiml Manual
No ratings yet
Aiml Manual
27 pages
By Waed Ananbeh
No ratings yet
By Waed Ananbeh
10 pages
QM - Assignment - Theory
0% (1)
QM - Assignment - Theory
10 pages
Last Assignment
No ratings yet
Last Assignment
5 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
Python Development Internship Task List
No ratings yet
Python Development Internship Task List
18 pages
Data Warehousing Lab Excercise
No ratings yet
Data Warehousing Lab Excercise
45 pages
Aarti Resume1111
No ratings yet
Aarti Resume1111
3 pages
Experiment WEKA
No ratings yet
Experiment WEKA
16 pages
RMC Brochure2025
No ratings yet
RMC Brochure2025
9 pages
Yahia Omar Data Engineering CV
No ratings yet
Yahia Omar Data Engineering CV
2 pages
Correlation and Hypthesis Testing MCQs
No ratings yet
Correlation and Hypthesis Testing MCQs
11 pages
Marketing Analytics (BA) - Syllabus7456456
No ratings yet
Marketing Analytics (BA) - Syllabus7456456
1 page
PDF 20230501 220524 0000
No ratings yet
PDF 20230501 220524 0000
2 pages
Lab Manual (2024)
No ratings yet
Lab Manual (2024)
56 pages
DW Lab Manual
No ratings yet
DW Lab Manual
44 pages
DWDM Lab Manual 2024-2025
No ratings yet
DWDM Lab Manual 2024-2025
96 pages
Job Profiles AND Descriptions: Tech Global
No ratings yet
Job Profiles AND Descriptions: Tech Global
9 pages
DWDM Lab Manual 2022-2023
No ratings yet
DWDM Lab Manual 2022-2023
87 pages

Data Mining Complete Lab Manual - DRSNR

Uploaded by

Data Mining Complete Lab Manual - DRSNR

Uploaded by

Subject: DATA MINING LAB

1 Explore WEKA Data Mining/Machine Learning Toolkit

2 i) Study the .arff file format

8 i) Cluster the given data set using k-means clustering algorithm

12 Credit Risk Assessment – The German Credit Data

1) Explore WEKA Data Mining/Machine Learning Toolkit

(i). Downloading and/or installation of WEKA data mining toolkit

Data Mining Lab Dept of CSE KSRMC

(ii). Understand the features of WEKA toolkit such as Explorer, Knowledge

The buttons can be used to start the following applications:

Experimenter- An environment for performing experiments and conducting statistical tests

Data Mining Lab Dept of CSE KSRMC

f) Wait for WEKA to finish.

Data Mining Lab Dept of CSE KSRMC

Data Mining Lab Dept of CSE KSRMC

Data Mining Lab Dept of CSE KSRMC

Searching and Evaluating

2.i) Study the arff file format

% 1. Title: Iris Plants Database

@ATTRIBUTE sepallength NUMERIC

The Data of the ARFF file looks like the following:

2.ii) Explore the available data sets in WEKA

Data Mining Lab Dept of CSE KSRMC

Load a data set (ex. Weather dataset, Iris dataset, etc.)

Load each dataset and observe the following:

Data Mining Lab Dept of CSE KSRMC

List the attribute names and they types

There are total 150 records (Instances) in dataset (IRIS.arff).

ii. Identify the class attribute (if any)

There is one class attribute which consists of 3 labels. They are:

iii. Plot Histogram

Data Mining Lab Dept of CSE KSRMC

iv. Determine the number of records for each class.

v. Visualize the data in various dimensions

Data Mining Lab Dept of CSE KSRMC

3) Perform data pre-processing tasks

Handling missing values

I weka it is implemented with the following steps

Applying normalization on numeric attributes

I weka it is implemented with the following steps

1) Select data set called “cpu”

4) LOAD EACH DATASET INTO WEKA AND RUN APRIORI ALGORITHM

Data Mining Lab Dept of CSE KSRMC

Step3: We will use apriori algorithm. This is the default algorithm.

Data Mining Lab Dept of CSE KSRMC

Data Mining Lab Dept of CSE KSRMC

5) Demonstrate performing classification on data sets

Data Mining Lab Dept of CSE KSRMC

C. Cross-validation: The classifier is evaluated by cross-validation, usingthe number of folds that

Procedure for Naïve-Bayes:

Data Mining Lab Dept of CSE KSRMC

Procedure for K-Nearest Neighbour (IBK):

Data Mining Lab Dept of CSE KSRMC

8) i) Cluster the given data set using k-means clustering algorithm

1. Load the dataset (Iris.arff) into weka tool

Data Mining Lab Dept of CSE KSRMC

Explore visualization features of Weka to visualize the clusters

Data Mining Lab Dept of CSE KSRMC

assignments‖ should be selected for getting cluster visualization as shown below.

Data Mining Lab Dept of CSE KSRMC

Data Mining Lab Dept of CSE KSRMC

1. Load the dataset (Iris.arff) into weka tool

Data Mining Lab Dept of CSE KSRMC

1. Load the dataset (Iris.arff) into weka tool

Data Mining Lab Dept of CSE KSRMC

11) Demonstrate performing Regression on data sets

Data Mining Lab Dept of CSE KSRMC

Data Mining Lab Dept of CSE KSRMC

You might also like