Lab Manual
of
Data Mining & Data Warehousing
(CSN447)
MASTER OF COMPUTER APPLICATION
Session 2024-25
SCHOOL OF COMPUTING
DIT University
Submitted to: Submitted by:
Dr.R.K. Saini Sargam Arora
School of Computing 1000026809
DIT University, Dehradu
EXPERIMENT 1: Exploring Weka tool
WHAT IS WEKA?
WEKA - an open source software provides tools for data preprocessing, implementation of several
Machine Learning algorithms, and visualization tools so that you can develop machine learning
techniques and apply them to real-world data mining problems. What WEKA offers is summarized in
the following diagram.
STEP 1: Installing WEKA TOOL in virtual machine using internet explorer.
The WEKA GUI Chooser application will start and you would see the following :
The GUI Chooser application allows you to run five different types of applications as listed here:
· Explorer
· Experimenter
· KnowledgeFlow
· Workbench
· Simple CLI
WEKA TOOL INTERFACE
STEP 2: Opening Weka Explorer And Opening Dataset Using The Following Path:
When you click on the Explorer button in the Applications selector, it opens the following screen.
On the top, you will see several tabs as listed here:
· Preprocess
· Classify
· Cluster
· Associate
· Select Attributes
· Visualize
Click on the Open file ... button. A directory navigator window opens as shown in the following
screen, following the below path to load data.
C:\Program Files\Weka-3-8-6\data
STEP 3: Visualizing The Given Data Graphically
STEP 4: Every Attribute Along With It’s Graph Has Some Conclusions
Example the time duration of the labour dataset, its graph and conclusion are as follows:
Here we can see from the graph as well as the pre calculated value by weka tool:
The mean of the time duration is : 2.161
The standard deviation is: 0.707
Increase of wage in second year, it’s graph and conclusion.
Here we can see from the graph as well as the pre calculated value by weka tool:
The mean of the wage increment in second year is : 3.972
The standard deviation is: 1.164
Increase of wage in second year, it’s graph and conclusion.
Here we can see from the graph as well as the pre calculated value by weka tool:
The mean of the wage increment in second year is : 38.039
The standard deviation is: 2.506
Increase of wage in third year, it’s graph and conclusion.
Here we can see from the graph as well as the pre calculated value by weka tool:
The mean of the wage increment in third year is : 3.913
The standard deviation is: 1.304
EXPERIMENT 2: Creating a new ARFF FILE
Step 1:
EXPERIMENT 3: Data Processing Techniques on Dataset
Pre-Processing involves
1. Converting Nominal to Binary, Numeric to Nominal (in the form of 0 or 1).
AGE
EDUCATION_NUM
BEFORE
AFTER
2. Detecting the missing value and replacing with a user constant value or system
generated value.
3.Detect the Outliers and Remove it using interquartile.
NO OUTLIERS IN AGE ATTRIBUTE
OUTLIERS PRESENT IN FNLWGT
NO OUTLIERS IN CAPITAL_LOSS ATTRIBUTE
NO OUTLIERS IN CAPITAL_GAIN ATTRIBUTE
HOURS_PER_WEEK
EXPERIMENT 4: create a cube and illustrate the following
OLAP operations.
1) Rollup 2) Drill down 3) Slice 4) Dice 5) Pivot
5) Pivot: It rotates the cube, sub cube or rolled -up or drilled -down cube, thus
changing the view of the cube.