0% found this document useful (0 votes)

21 views30 pages

DataMiningManual Sawan

The document is a lab manual for a Data Mining and Data Warehousing course at DIT University, detailing various experiments using the WEKA tool. It covers topics such as data preprocessing, creating ARFF files, OLAP operations, regression analysis, decision trees, random forests, Naïve Bayes classifiers, information gain measures, the Apriori algorithm, and K-means clustering. Each experiment includes code examples and explanations of the processes involved.

Uploaded by

swndhaka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views30 pages

DataMiningManual Sawan

Uploaded by

swndhaka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Lab Manual

of
Data Mining & Data Warehousing
(CSN447)
MASTER OF COMPUTER APPLICATION

Session 2024-25

SCHOOL OF COMPUTING

DIT University

Submitted to: Submitted by:

Dr.R.K.Saini Sawan
School of Computing 1000026853
DIT University,Dehradun MCA(P1)
EXPERIMENT 1 : Exploring weka tool

WHAT IS WEKA?
WEKA - an open source software provides tools for data preprocessing, implementation of
several Machine Learning algorithms, and visualization tools so that you can develop
machine learning techniques and apply them to real-world data mining problems. What
WEKA offers is summarized in the following diagram.

STEP 1: Installing WEKA TOOL in virtual machine using internet explorer.

The WEKA GUI Chooser application will start and you would see the following :
The GUI Chooser application allows you to run five different types of applications as listed here:
· Explorer
· Experimenter
· KnowledgeFlow
· Workbench
· Simple CLI

WEKA TOOL INTERFACE

STEP 2: Opening Weka Explorer And Opening Dataset Using The Following Path:
When you click on the Explorer button in the Applications selector, it opens the following screen.
On the top, you will see several tabs as listed here:
· Preprocess
· Classify
· Cluster
· Associate
· Select Attributes
· Visualize

Click on the Open file ... button. A directory navigator window opens as shown in the
following screen, following the below path to load data.
C:\Program Files\Weka-3-8-6\data

STEP 3: Visualizing The Given Data Graphically

STEP 4: Every Attribute Along With It’s Graph Has Some Conclusion
Example the time duration of the labour dataset, its graph and conclusion are as follows:

Here we can see from the graph as well as the pre calculated value by weka tool:
 The mean of the time duration is : 2.161
 The standard deviation is: 0.707

 Increase of wage in second year, it’s graph and conclusion.

Here we can see from the graph as well as the pre calculated value by weka tool:
 The mean of the wage increment in second year is : 3.972
 The standard deviation is: 1.164

 Increase of wage in second year, it’s graph and conclusion.

Here we can see from the graph as well as the pre calculated value by weka tool:
 The mean of the wage increment in second year is : 38.039
 The standard deviation is: 2.506

 Increase of wage in third year, it’s graph and conclusion.

Here we can see from the graph as well as the pre calculated value by weka tool:
 The mean of the wage increment in third year is : 3.913
 The standard deviation is: 1.304
EXPERIMENT 2 : Creating a new ARFF FILE
Code:
@relation Student
@attribute stud_id numeric
@attribute stud_name string
@attribute stud_age numeric
@attribute stud_dept {CSE, IT, BCA, MCA}
@attribute stud_gender {male, female}
@attribute stud_marks numeric
@attribute stud_city string
@attribute stud_Dob date "dd-MM-yyyy"
@data
1, Riya, 22, CSE, female, 90, Shamli, 22-03-2003
2, Tanish, 21, IT, female, 91, Aligarh, 17-01-2004
3, Parul, 22, MCA, female, 89, Kota, 05-10-2003
4, Shreyashi, 25, CSE, female, 87, Kotdwar, 13-11-1997
5, Ayesha, 23, IT, female, 85, Mumbai, 10-10-2000
6, Tanu, 21, CSE, female, 88, Jaipur, 15-05-2002
7, Siddharth, 24, IT, male, 86, Pune, 20-08-1999
8, Shreya, 20, MCA, female, 92, Chennai, 25-12-2003
9, Rahul, 22, CSE, male, 89, Hyderabad, 03-07-2002
10, Neha, 23, IT, female, 84, Bengaluru, 18-09-2001

Images:
EXPERIMENT 3 : Data Processing Techniques On Dataset
Pre-Processing involves
1. Converting Nominal to Binary, Numeric to Nominal(in the form of 0 or 1)
Output:

2. Detecting the missing value and replacing with a user constant value or system
generated value.
Output:
3. Detect the Outliers and Remove it using interquartile.
Output:
EXPERIMENT 4 : Create a cube and illustrate the following OLAP
operations.
1. Rollup 2 ) Drill down 3) Slice 4) Dice 5)Pivot
 Rollup:

(a) Concept: This operation aggregates data to a higher level. Essentially, it

reduces the granularity of the data.
(b) Example: You may roll up from "Month" to "Quarter" or "Region" to
"Country."
(c) Visual: Imagine a cube with dimensions like Time (Month), Product
(Category), and Region. Rolling up from "Month" to "Quarter" would
consolidate all the monthly data into quarterly data.

 Drill Down:

(a) Concept: This is the opposite of rollup. It allows you to go from higher
levels of aggregation to more detailed data.
(b) Example: You drill down from "Country" to "Region" or from "Year" to
"Month."
(c) Visual: In a cube, this would involve expanding a higher-level value (like
"Year") into more detailed values (like "Month").

 Slice:

(a) Concept: The slice operation is used to view a subset of the cube by
fixing one of the dimensions to a specific value.
(b) Example: You might slice the cube to look at data for a specific year
(e.g., 2023) while keeping other dimensions, like product and region,
unchanged.
(c) Visual: Imagine you have a cube with three dimensions: Time
(Month), Product (Category), and Region. If you slice by Time =
"2023," you'll have a 2D view of the data for just that year.

 Dice:

(a) Concept: This operation is similar to slicing, but you select specific
values from multiple dimensions, creating a smaller cube.
(b) Example: You might dice the cube to focus on a specific "Region"
and "Product" while selecting all months.
(c) Visual: If you slice by "Region = North America" and "Product =
Laptops," the result would be a smaller cube that only has data for
those selections, keeping time (e.g., month) as a dimension.

 Pivot:

(a) Concept: The pivot operation rotates the data, changing the
orientation of the cube to view it from different perspectives.
(b) Example: If you have a table with regions as rows and months as
columns, pivoting would flip the rows and columns, making months
rows and regions columns.
(c) Visual: Pivoting would involve rotating the cube so you can change
how dimensions are organized (rows vs columns).

Code:

@relation sales_cube

@attribute Month {Jan, Feb, Mar}

@attribute City {Delhi, Mumbai, Bangalore}

@attribute Product {Laptop, Phone, Tablet}

@attribute Sales numeric

@data

Jan,Delhi,Laptop,120

Jan,Delhi,Phone,100

Jan,Mumbai,Laptop,150

Feb,Delhi,Laptop,130

Feb,Mumbai,Phone,170

Feb,Bangalore,Tablet,90

Mar,Delhi,Tablet,200

Mar,Mumbai,Phone,180

Mar,Bangalore,Laptop,160

Output:
EXPERIMENT 5: Demonstrate performing Regression on data sets
Code:
@relation house_prices

@attribute size numeric

@attribute bedrooms numeric

@attribute price numeric

@data

1000, 3, 200000

1200, 3, 250000

1500, 4, 300000

800, 2, 180000

950, 2, 190000

1400, 4, 280000

Output:
EXPERIMENT 6: Implementation of Decision Tree and Random Forest Tree
Induction
The J48 algorithm in Weka is the implementation of the C4.5 decision tree algorithm, which is a commonly
used decision tree algorithm.

Steps to Apply Decision Tree in Weka:

1. Load the dataset:

o Open Weka.
o Click on Explorer.
o Click on Open file and select the dataset you want to use (e.g., .arff or .csv format).
2. Choose the J48 classifier:
o In the "Classify" tab, click the Choose button.
o Under trees, select J48 (this is Weka's implementation of the C4.5 algorithm).
3. Set the options (optional):
o Click on the text next to J48 to open the options window.
o Here, you can change parameters such as:
 confidenceFactor (default: 0.25): The confidence level for pruning.
 minNumObj (default: 2): The minimum number of objects (instances) for a leaf
node.
 unpruned: If checked, disables pruning (leads to a fully-grown tree).

For example:

o To use a confidence factor of 0.1, type -C 0.1 in the options field.

4. Train the model:
o Click the Start button to build the decision tree model.
5. View the output:
o Once the model has finished building, Weka will show the decision tree structure in the result
area. You can see the accuracy and other performance metrics like Precision, Recall, and F1-
score as well.
Using Random Forest in Weka

A Random Forest is an ensemble method that builds multiple decision trees and combines their predictions.
Weka has an implementation called Random Forest.

Steps to Apply Random Forest in Weka:

1. Load the dataset:

o Similar to the Decision Tree, open Weka, click on Explorer, and load your dataset.
2. Choose the Random Forest classifier:
o In the "Classify" tab, click the Choose button.
o Under trees, select RandomForest.
3. Set the options (optional):
o Click on the text next to RandomForest to open the options window.
o The default parameters include:
 numTrees (default: 100): The number of trees to build.
 maxDepth (default: 0, meaning no limit on tree depth).
 seed: The random seed for reproducibility.
 bagSizePercent (default: 100): Percentage of the training set used for building each
tree (using bootstrapping).

For example:

o To use 200 trees, set -I 200.

o To limit the depth of trees to 10, set -depth 10.
4. Train the model:
o Click the Start button to build the random forest model.
5. View the output:
o After training, Weka will display the performance metrics such as accuracy, confusion
matrix, and more. You can also examine the structure of the individual trees if needed.
Visualization:-
Experiment 7: Implementation of Naïve Bayesian Classifier
Code:

@relation weather

@attribute outlook {sunny, overcast, rainy}

@attribute temperature {hot, mild, cool}

@attribute humidity {high, normal}

@attribute windy {TRUE, FALSE}

@attribute play {yes, no}

@data

sunny,hot,high,FALSE,no

sunny,hot,high,TRUE,no

overcast,hot,high,FALSE,yes

rainy,mild,high,FALSE,yes

rainy,cool,normal,FALSE,yes

rainy,cool,normal,TRUE,no

overcast,cool,normal,TRUE,yes

sunny,mild,high,FALSE,no

sunny,cool,normal,FALSE,yes

rainy,mild,normal,FALSE,yes

sunny,mild,normal,TRUE,yes

overcast,mild,high,TRUE,yes

overcast,hot,normal,FALSE,yes

rainy,mild,high,TRUE,no

Output:
Experiment – 8: Calculating Information gains measures
Code:

@relation weather

@attribute outlook {sunny, overcast, rainy}

@attribute temperature {hot, mild, cool}

@attribute humidity {high, normal}

@attribute windy {TRUE, FALSE}

@attribute play {yes, no}

@data

sunny,hot,high,FALSE,no

sunny,hot,high,TRUE,no

overcast,hot,high,FALSE,yes

rainy,mild,high,FALSE,yes

rainy,cool,normal,FALSE,yes

rainy,cool,normal,TRUE,no

overcast,cool,normal,TRUE,yes

sunny,mild,high,FALSE,no

sunny,cool,normal,FALSE,yes

rainy,mild,normal,FALSE,yes

sunny,mild,normal,TRUE,yes

overcast,mild,high,TRUE,yes

overcast,hot,normal,FALSE,yes

rainy,mild,high,TRUE,no

Output:
Experiment 9: Implementation of Apriori algorithm for Market Basket Analysis.
Code:
@relation market_basket

@attribute bread {t, f}

@attribute milk {t, f}

@attribute butter {t, f}

@attribute beer {t, f}

@attribute eggs {t, f}

@data

t, t, f, f, t

t, f, t, f, f

f, t, t, t, t

t, t, t, f, f

f, f, t, t, t

Output:
Experiment 10: Implementation of K-means algorithm for Clustering
Code:

@relation simple_clusters

@attribute height numeric

@attribute weight numeric

@data

170, 60

180, 80

160, 55

175, 77

165, 58

155, 50

185, 85

Output:

Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
26 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
47 pages
Itdw
No ratings yet
Itdw
44 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
Data Werehousing Lab Manual
No ratings yet
Data Werehousing Lab Manual
63 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
DMDV 210
No ratings yet
DMDV 210
63 pages
DWH Manual Merged
No ratings yet
DWH Manual Merged
47 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
Ccs341 Datawarehousing
No ratings yet
Ccs341 Datawarehousing
66 pages
DM Tools Sample-1
No ratings yet
DM Tools Sample-1
72 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
44 pages
DMDV 210
No ratings yet
DMDV 210
61 pages
Data Warehousing Lab Course Guide
0% (1)
Data Warehousing Lab Course Guide
28 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
OS Journal
No ratings yet
OS Journal
28 pages
Data Warehouse
No ratings yet
Data Warehouse
29 pages
Data Warehousing Lab Guide
No ratings yet
Data Warehousing Lab Guide
55 pages
DM Lab Manualiii I 1 Mrits
No ratings yet
DM Lab Manualiii I 1 Mrits
39 pages
DW Lab Manual
No ratings yet
DW Lab Manual
44 pages
DMDV Main Manual
No ratings yet
DMDV Main Manual
35 pages
Printing 1-3
No ratings yet
Printing 1-3
36 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Lab Manual Format
No ratings yet
Lab Manual Format
37 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Data Warehousing Lab Record Final
No ratings yet
Data Warehousing Lab Record Final
45 pages
Data Warehousing Record
No ratings yet
Data Warehousing Record
26 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
Datawarehousing Lab Manual
No ratings yet
Datawarehousing Lab Manual
22 pages
Weka Tool Guide for Data Analysts
No ratings yet
Weka Tool Guide for Data Analysts
6 pages
Data Mining & Predictive Modeling Lab
No ratings yet
Data Mining & Predictive Modeling Lab
23 pages
DMDV
No ratings yet
DMDV
22 pages
Big Data & Weka Tool Guide
No ratings yet
Big Data & Weka Tool Guide
32 pages
Data Mining Lab Manual for CSE
No ratings yet
Data Mining Lab Manual for CSE
50 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Data Warehousing Lab Manual 2021
No ratings yet
Data Warehousing Lab Manual 2021
48 pages
Spare Parts Book SK550 1.1
No ratings yet
Spare Parts Book SK550 1.1
26 pages
WEKA Tool & Data Mining Lab Guide
No ratings yet
WEKA Tool & Data Mining Lab Guide
29 pages
WEKA Data Mining Techniques Guide
No ratings yet
WEKA Data Mining Techniques Guide
17 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
WEKA Data Mining Practical Guide
No ratings yet
WEKA Data Mining Practical Guide
18 pages
Alain Bellon - Orbis Ardentis
100% (5)
Alain Bellon - Orbis Ardentis
16 pages
DWM1
No ratings yet
DWM1
19 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Weka Activity Report
No ratings yet
Weka Activity Report
30 pages
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
No ratings yet
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
33 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
SKF3013 - Manual Amali PDF
No ratings yet
SKF3013 - Manual Amali PDF
26 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
WEKA Data Mining Lab Manual
100% (1)
WEKA Data Mining Lab Manual
8 pages
Acgih Manual 1998 (401-500)
No ratings yet
Acgih Manual 1998 (401-500)
100 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
Weka Data Mining Lab Guide
No ratings yet
Weka Data Mining Lab Guide
20 pages
Prime and Composite Numbers PDF
No ratings yet
Prime and Composite Numbers PDF
6 pages
Data Mining - Lab - Manual
No ratings yet
Data Mining - Lab - Manual
20 pages
WEKA Toolkit: Machine Learning Guide
No ratings yet
WEKA Toolkit: Machine Learning Guide
8 pages
SAT Suite Question Bank - 1 o 10 Difficult and Hard Grammar 2622024 Answers
No ratings yet
SAT Suite Question Bank - 1 o 10 Difficult and Hard Grammar 2622024 Answers
10 pages
Types of Concrete: Ar. C.N.Vaishnavi Ar. M.Padma
No ratings yet
Types of Concrete: Ar. C.N.Vaishnavi Ar. M.Padma
23 pages
Dos and Donts
100% (1)
Dos and Donts
4 pages
Higher Education Strategy 2011-2016
No ratings yet
Higher Education Strategy 2011-2016
4 pages
Presentation 1
No ratings yet
Presentation 1
91 pages
Engineering Student Project Proposal
No ratings yet
Engineering Student Project Proposal
14 pages
Amlocor Steel Grade: Aml C
No ratings yet
Amlocor Steel Grade: Aml C
8 pages
Biomimetics 06 00027 v3
No ratings yet
Biomimetics 06 00027 v3
16 pages
Moldflow 2021 Features Comparison Matrix A4 en
No ratings yet
Moldflow 2021 Features Comparison Matrix A4 en
4 pages
Fitness Careers & Event Planning
No ratings yet
Fitness Careers & Event Planning
3 pages
Major1107202x12.2pscslab V4 Approve P11
No ratings yet
Major1107202x12.2pscslab V4 Approve P11
1 page
C13-Rating A
100% (1)
C13-Rating A
5 pages
SQL Commands
No ratings yet
SQL Commands
21 pages
Experiment No.4 Atterberg Limits: Object
No ratings yet
Experiment No.4 Atterberg Limits: Object
3 pages
Business Client Information Form
No ratings yet
Business Client Information Form
5 pages
Maximising Heat Exchanger Cleaning
No ratings yet
Maximising Heat Exchanger Cleaning
4 pages
Ebook Monitoring Can Help Make Tailings Dams Safer
No ratings yet
Ebook Monitoring Can Help Make Tailings Dams Safer
17 pages
English MAINS Practice Shot 200
No ratings yet
English MAINS Practice Shot 200
4 pages
PC 101 Life Skills Gathering
No ratings yet
PC 101 Life Skills Gathering
2 pages
The Next Generation Melting System
No ratings yet
The Next Generation Melting System
19 pages
Health Psychology: Well-Being in A Diverse World Regan A R Gurung Instant Download
100% (1)
Health Psychology: Well-Being in A Diverse World Regan A R Gurung Instant Download
59 pages
6EP1332-1SH31 - Industry Support Siemens
No ratings yet
6EP1332-1SH31 - Industry Support Siemens
3 pages
Definition, Health by WHO
No ratings yet
Definition, Health by WHO
13 pages
67207e78746876a86fe72ba5 Widavasigivexatez
No ratings yet
67207e78746876a86fe72ba5 Widavasigivexatez
2 pages
Nelco N5000 BT Epoxy Laminate and Prepreg
No ratings yet
Nelco N5000 BT Epoxy Laminate and Prepreg
6 pages

DataMiningManual Sawan

Uploaded by

DataMiningManual Sawan

Uploaded by

Lab Manual

Submitted to: Submitted by:

STEP 1: Installing WEKA TOOL in virtual machine using internet explorer.

WEKA TOOL INTERFACE

STEP 3: Visualizing The Given Data Graphically

 Increase of wage in second year, it’s graph and conclusion.

 Increase of wage in second year, it’s graph and conclusion.

 Increase of wage in third year, it’s graph and conclusion.

(a) Concept: This operation aggregates data to a higher level. Essentially, it

@attribute Month {Jan, Feb, Mar}

@attribute City {Delhi, Mumbai, Bangalore}

@attribute Product {Laptop, Phone, Tablet}

@attribute Sales numeric

@attribute size numeric

@attribute bedrooms numeric

@attribute price numeric

Steps to Apply Decision Tree in Weka:

1. Load the dataset:

o To use a confidence factor of 0.1, type -C 0.1 in the options field.

Steps to Apply Random Forest in Weka:

1. Load the dataset:

o To use 200 trees, set -I 200.

@attribute outlook {sunny, overcast, rainy}

@attribute temperature {hot, mild, cool}

@attribute humidity {high, normal}

@attribute windy {TRUE, FALSE}

@attribute play {yes, no}

@attribute outlook {sunny, overcast, rainy}

@attribute temperature {hot, mild, cool}

@attribute humidity {high, normal}

@attribute windy {TRUE, FALSE}

@attribute play {yes, no}

@attribute bread {t, f}

@attribute milk {t, f}

@attribute butter {t, f}

@attribute beer {t, f}

@attribute eggs {t, f}

@attribute height numeric

@attribute weight numeric

You might also like