0% found this document useful (0 votes)

72 views84 pages

Lecture 12 - Weka Tutorial

Weka is a free software developed by Waikato University, offering tools for data pre-processing, classification, clustering, association rules, and visualization. It supports various versions, including command-line and GUI versions, and allows users to preprocess data, build classifiers, and visualize results. Weka also includes features for comparing learning algorithms and performing experiments with different learning schemes.

Uploaded by

gihel53025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views84 pages

Lecture 12 - Weka Tutorial

Uploaded by

gihel53025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 84

Lecture 12

Main Features
 Weka is a freely available software developed by Waikato
University New Zealand. You can download it from the
following link
https://www.cs.waikato.ac.nz/ml/weka/

 Weka contains tools for data pre-processing, classification,

clustering, association rules, and visualization. (Weka
Knowledge Explorer)

 Environment for comparing learning algorithms

(Experimental)

 It is also well-suited for developing new data mining or

machine learning schemes.
WEKA: versions
 There are several versions of WEKA:

WEKA 3.0: “command-line”

WEKA 3.2: “GUI version” adds graphical user
interfaces

WEKA 3.3: “development version” with lots of
improvements
 These slides use a mixture of snapshots of
WEKA 3.3 and 3.9 (soon to be WEKA 3.4).
WEKA Knowledge Explorer
 Preprocess Choose and modify the data
 Classify Train and test learning schemes that classify
 Cluster Learn clusters for the data
 Association Learn association rules for the data
 Select attributes Most relevant attributes in the data

 Visualize View an interactive 2D plot of the data

WEKA Explorer: Pre-processing
the Data
 Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
 Data can also be read from a URL or from
an SQL database (using JDBC)
 Pre-processing tools in WEKA are called
“filters”
 WEKA contains filters for:

Discretization, normalization, attribute
selection, transforming, …
WEKA only deals with “flat” files
 The data must be converted to ARFF format
before applying any algorithm.

The dataset’s name: @relation

The attribute information: @attribute

The data section begins with @data

Data: a list of instances with the attribute values
being separated by commas.

By default, the class is the last attribute in the
ARFF file.
Numeric attribute and Missing
Value
@relation weather

@attribute outlook {sunny, overcast, rainy}

@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {YES,NO}

@data
Sunny, 85, 85, FALSE, no
Sunny, 80, 90, TRUE, no
Overcast, 83, 86, FALSE, yes
Rainy, 70, 96, FALSE, yes
...
Numeric attribute and Missing
Value
@relation weather

@attribute outlook {sunny, overcast, rainy}

@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {YES,NO}

@data
Sunny, 85, 85, FALSE, no
Sunny, 80, 90, TRUE, no
Overcast, 83, 86, FALSE, ?
Rainy, 70, 96, ?, yes
...
Explorer: building “classifiers”
 Classifiers in WEKA are models for
predicting nominal or numeric quantities
 Implemented learning schemes include:

Decision trees and lists, instance-based
classifiers, support vector machines, multi-
layer perceptrons, logistic regression, Bayes’
nets, …
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Explorer: clustering data
 WEKA contains “clusterers” for finding
groups of similar instances in a dataset
 Implemented schemes are:

k-Means, EM, Cobweb, X-means, FarthestFirst
 Clusters can be visualized
 Evaluation based on loglikelihood if
clustering scheme produces a probability
distribution
Explorer: finding associations
 WEKA contains an implementation of the
Apriori algorithm for learning association rules

Works only with discrete data
 Can identify statistical dependencies between
groups of attributes:

milk, butter  bread, eggs (with confidence 0.9)
 Apriori can compute all rules that have a
given minimum support and exceed a given
confidence
Explorer: attribute selection
 Panel that can be used to investigate which
(subsets of) attributes are the most predictive
ones
 Attribute selection methods contain two parts:

A search method: best-first, forward selection, random,
exhaustive, genetic algorithm, ranking

An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
 Very flexible: WEKA allows (almost) arbitrary
combinations of these two
Explorer: data visualization
 Visualization very useful in practice: e.g.
helps to determine difficulty of the learning
problem
 WEKA can visualize single attributes (1-d)
and pairs of attributes (2-d)

To do: rotating 3-d visualizations (Xgobi-style)
 Color-coded class values
 “Jitter” option to deal with nominal attributes
(and to detect “hidden” data points)
Performing experiments
 Experimenter makes it easy to compare
the performance of different learning
schemes

 For classification and regression problems

 Results can be written into file or database

 Evaluation options: cross-validation,

learning curve
Resources:
 WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka

 Also has a list of projects based on

WEKA

Tutorial.
http://prdownloads.sourceforge.net/weka/wek
a.ppt

Spark
No ratings yet
Spark
96 pages
Lecture 10-Assiciation Rule Mining-I-M
No ratings yet
Lecture 10-Assiciation Rule Mining-I-M
30 pages
03-Python Libraries - Numpy - Matplotlib
No ratings yet
03-Python Libraries - Numpy - Matplotlib
56 pages
Build Your Multimodal RAG System
No ratings yet
Build Your Multimodal RAG System
19 pages
List of Sci - Scie Journals
No ratings yet
List of Sci - Scie Journals
5 pages
05 NumPy - Arrays and Vectorized Computation
No ratings yet
05 NumPy - Arrays and Vectorized Computation
47 pages
Hoffer Mdm12e PP Ch01
No ratings yet
Hoffer Mdm12e PP Ch01
60 pages
CaseStudy1 HRIS (Solution)
No ratings yet
CaseStudy1 HRIS (Solution)
2 pages
Synthetic Tourism Dataset
No ratings yet
Synthetic Tourism Dataset
112 pages
Synthetic Tourism Dataset Gilgit Baltistan
No ratings yet
Synthetic Tourism Dataset Gilgit Baltistan
112 pages
Data Analytics
No ratings yet
Data Analytics
2 pages
Lift (Data Mining)
No ratings yet
Lift (Data Mining)
3 pages
Data Analyst 1707484551
No ratings yet
Data Analyst 1707484551
1 page
Lecture 13-Supervised Learning-Decision Trees-M
No ratings yet
Lecture 13-Supervised Learning-Decision Trees-M
47 pages
Mining Engineering Handbook On cd-ROM
No ratings yet
Mining Engineering Handbook On cd-ROM
1 page
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
Sravani Soma-ETL Resume
No ratings yet
Sravani Soma-ETL Resume
3 pages
Company Research
No ratings yet
Company Research
5 pages
Express.js: Features and Comparisons
No ratings yet
Express.js: Features and Comparisons
11 pages
CS614 Assignment 1 Solution Fall 2024
No ratings yet
CS614 Assignment 1 Solution Fall 2024
4 pages
Data Warehouse - What Is It
No ratings yet
Data Warehouse - What Is It
5 pages
Cyberspace Analysis in Global Wars
No ratings yet
Cyberspace Analysis in Global Wars
2 pages
Chapter 5 v8.2
No ratings yet
Chapter 5 v8.2
21 pages
1000 Free Directory Backlink Guide
33% (3)
1000 Free Directory Backlink Guide
59 pages
Lecture 7 - Data Preprocessing - Cleaning-M
No ratings yet
Lecture 7 - Data Preprocessing - Cleaning-M
21 pages
Unit 3 Gis
No ratings yet
Unit 3 Gis
19 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
SNSW Unit-3
No ratings yet
SNSW Unit-3
15 pages
Lecture # 12 - Introduction To React JS
No ratings yet
Lecture # 12 - Introduction To React JS
76 pages
Final Year Bo12
No ratings yet
Final Year Bo12
3 pages
Auditing in CIS Environment Guide
No ratings yet
Auditing in CIS Environment Guide
15 pages
Documentum Workflow Manager 5.2 User Guide
No ratings yet
Documentum Workflow Manager 5.2 User Guide
60 pages
SEO and Social Media Checklist
No ratings yet
SEO and Social Media Checklist
11 pages
USDA Examples of Personally Identifiable Information (PII)
100% (1)
USDA Examples of Personally Identifiable Information (PII)
2 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Node - JS: Mendel Rosenblum
No ratings yet
Node - JS: Mendel Rosenblum
30 pages
Interview Questions On ADF
No ratings yet
Interview Questions On ADF
2 pages
Data Base Management System (DBMS) Unit - 1
No ratings yet
Data Base Management System (DBMS) Unit - 1
100 pages
Hospital System Design Guide
No ratings yet
Hospital System Design Guide
19 pages
Oracle Faqs2
No ratings yet
Oracle Faqs2
7 pages
Miclinic Ehr
No ratings yet
Miclinic Ehr
33 pages
Google Cloud Platform
No ratings yet
Google Cloud Platform
16 pages
Data Analysis with WEKA Guide
No ratings yet
Data Analysis with WEKA Guide
21 pages
Apache Spark for Developers
No ratings yet
Apache Spark for Developers
3 pages
SQL
No ratings yet
SQL
87 pages
Kafka
No ratings yet
Kafka
23 pages
Apache Airflow 50
100% (1)
Apache Airflow 50
50 pages
Rhapsody in Healthcare Data Exchange
No ratings yet
Rhapsody in Healthcare Data Exchange
14 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
Data Visualization Tools Module
No ratings yet
Data Visualization Tools Module
29 pages
Ignacio Leori Ramonette Book Redesign
No ratings yet
Ignacio Leori Ramonette Book Redesign
7 pages
SWDD Template
No ratings yet
SWDD Template
6 pages
Assignment 3
No ratings yet
Assignment 3
30 pages
Python List and Tuple Guide
No ratings yet
Python List and Tuple Guide
49 pages
CH 3
No ratings yet
CH 3
57 pages
Python Data Analysis for Beginners
No ratings yet
Python Data Analysis for Beginners
100 pages
Numpy ML - AI
No ratings yet
Numpy ML - AI
135 pages
Introduction To Weka
No ratings yet
Introduction To Weka
39 pages
Module 1 Topic-3-ML Framework
No ratings yet
Module 1 Topic-3-ML Framework
82 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
HTML Basics for Beginners
No ratings yet
HTML Basics for Beginners
9 pages
Pandas
No ratings yet
Pandas
86 pages
Systematic Review Search Guide
No ratings yet
Systematic Review Search Guide
8 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
PySpark SparkSession Guide
No ratings yet
PySpark SparkSession Guide
63 pages
An Introduction To WEKA
No ratings yet
An Introduction To WEKA
85 pages
Lesson 3 - JavaScript ES6
No ratings yet
Lesson 3 - JavaScript ES6
17 pages
Azure Information Protection Guide
No ratings yet
Azure Information Protection Guide
16 pages
SQL Basics for Beginners
100% (1)
SQL Basics for Beginners
34 pages
Airflow
No ratings yet
Airflow
37 pages
Numpy Python
No ratings yet
Numpy Python
36 pages
Data Mining - Lab - Manual
No ratings yet
Data Mining - Lab - Manual
20 pages
SQL Query Basics and Examples
No ratings yet
SQL Query Basics and Examples
22 pages
Machine Learning with Spark Guide
No ratings yet
Machine Learning with Spark Guide
26 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Micro Strategy Material
No ratings yet
Micro Strategy Material
298 pages
Weka Data Mining Lab Guide
No ratings yet
Weka Data Mining Lab Guide
20 pages
SQL States
No ratings yet
SQL States
53 pages
10.python Lists
No ratings yet
10.python Lists
53 pages
Pandas
No ratings yet
Pandas
41 pages
Sanfoundry Sourcecode
No ratings yet
Sanfoundry Sourcecode
5 pages
JavaScript for Developers
100% (10)
JavaScript for Developers
59 pages
Akash
No ratings yet
Akash
203 pages
Weka 3.6 Tutorial: Data Mining Guide
No ratings yet
Weka 3.6 Tutorial: Data Mining Guide
4 pages
PL SQL Practice
No ratings yet
PL SQL Practice
28 pages
Unit 2: Emerging Javascript
No ratings yet
Unit 2: Emerging Javascript
82 pages
Biswajit Resume
100% (2)
Biswajit Resume
2 pages
Python 160403194316
No ratings yet
Python 160403194316
42 pages
Django 1
No ratings yet
Django 1
46 pages
JADE Platform Setup & Agent Development
100% (1)
JADE Platform Setup & Agent Development
50 pages
Ds Lab Programs
No ratings yet
Ds Lab Programs
30 pages
Dzone RC Rxjs
No ratings yet
Dzone RC Rxjs
6 pages
Oracle Analytic Functions Guide
100% (1)
Oracle Analytic Functions Guide
3 pages
Javascript: Creating A Programmable Web Page Misoi Jonathan
No ratings yet
Javascript: Creating A Programmable Web Page Misoi Jonathan
63 pages
Data Warehouse-Basic Concepts
No ratings yet
Data Warehouse-Basic Concepts
21 pages
Arrays 1
No ratings yet
Arrays 1
24 pages
SQL Ti
No ratings yet
SQL Ti
292 pages
C/C++ Programming Quiz with Answers
No ratings yet
C/C++ Programming Quiz with Answers
39 pages

Lecture 12 - Weka Tutorial

Uploaded by

Lecture 12 - Weka Tutorial

Uploaded by

Lecture 12

 Weka contains tools for data pre-processing, classification,

 Environment for comparing learning algorithms

 It is also well-suited for developing new data mining or

 Visualize View an interactive 2D plot of the data

@attribute outlook {sunny, overcast, rainy}

@attribute outlook {sunny, overcast, rainy}

 For classification and regression problems

 Results can be written into file or database

 Evaluation options: cross-validation,

 Also has a list of projects based on

You might also like