Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
72 views84 pages

Lecture 12 - Weka Tutorial

Weka is a free software developed by Waikato University, offering tools for data pre-processing, classification, clustering, association rules, and visualization. It supports various versions, including command-line and GUI versions, and allows users to preprocess data, build classifiers, and visualize results. Weka also includes features for comparing learning algorithms and performing experiments with different learning schemes.

Uploaded by

gihel53025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views84 pages

Lecture 12 - Weka Tutorial

Weka is a free software developed by Waikato University, offering tools for data pre-processing, classification, clustering, association rules, and visualization. It supports various versions, including command-line and GUI versions, and allows users to preprocess data, build classifiers, and visualize results. Weka also includes features for comparing learning algorithms and performing experiments with different learning schemes.

Uploaded by

gihel53025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 84

Lecture 12

Main Features
 Weka is a freely available software developed by Waikato
University New Zealand. You can download it from the
following link
https://www.cs.waikato.ac.nz/ml/weka/

 Weka contains tools for data pre-processing, classification,


clustering, association rules, and visualization. (Weka
Knowledge Explorer)

 Environment for comparing learning algorithms


(Experimental)

 It is also well-suited for developing new data mining or


machine learning schemes.
WEKA: versions
 There are several versions of WEKA:

WEKA 3.0: “command-line”

WEKA 3.2: “GUI version” adds graphical user
interfaces

WEKA 3.3: “development version” with lots of
improvements
 These slides use a mixture of snapshots of
WEKA 3.3 and 3.9 (soon to be WEKA 3.4).
WEKA Knowledge Explorer
 Preprocess Choose and modify the data
 Classify Train and test learning schemes that classify
 Cluster Learn clusters for the data
 Association Learn association rules for the data
 Select attributes Most relevant attributes in the data

 Visualize View an interactive 2D plot of the data


WEKA Explorer: Pre-processing
the Data
 Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
 Data can also be read from a URL or from
an SQL database (using JDBC)
 Pre-processing tools in WEKA are called
“filters”
 WEKA contains filters for:

Discretization, normalization, attribute
selection, transforming, …
WEKA only deals with “flat” files
 The data must be converted to ARFF format
before applying any algorithm.

The dataset’s name: @relation

The attribute information: @attribute

The data section begins with @data

Data: a list of instances with the attribute values
being separated by commas.

By default, the class is the last attribute in the
ARFF file.
Numeric attribute and Missing
Value
@relation weather

@attribute outlook {sunny, overcast, rainy}


@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {YES,NO}

@data
Sunny, 85, 85, FALSE, no
Sunny, 80, 90, TRUE, no
Overcast, 83, 86, FALSE, yes
Rainy, 70, 96, FALSE, yes
...
Numeric attribute and Missing
Value
@relation weather

@attribute outlook {sunny, overcast, rainy}


@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {YES,NO}

@data
Sunny, 85, 85, FALSE, no
Sunny, 80, 90, TRUE, no
Overcast, 83, 86, FALSE, ?
Rainy, 70, 96, ?, yes
...
Explorer: building “classifiers”
 Classifiers in WEKA are models for
predicting nominal or numeric quantities
 Implemented learning schemes include:

Decision trees and lists, instance-based
classifiers, support vector machines, multi-
layer perceptrons, logistic regression, Bayes’
nets, …
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Explorer: clustering data
 WEKA contains “clusterers” for finding
groups of similar instances in a dataset
 Implemented schemes are:

k-Means, EM, Cobweb, X-means, FarthestFirst
 Clusters can be visualized
 Evaluation based on loglikelihood if
clustering scheme produces a probability
distribution
Explorer: finding associations
 WEKA contains an implementation of the
Apriori algorithm for learning association rules

Works only with discrete data
 Can identify statistical dependencies between
groups of attributes:

milk, butter  bread, eggs (with confidence 0.9)
 Apriori can compute all rules that have a
given minimum support and exceed a given
confidence
Explorer: attribute selection
 Panel that can be used to investigate which
(subsets of) attributes are the most predictive
ones
 Attribute selection methods contain two parts:

A search method: best-first, forward selection, random,
exhaustive, genetic algorithm, ranking

An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
 Very flexible: WEKA allows (almost) arbitrary
combinations of these two
Explorer: data visualization
 Visualization very useful in practice: e.g.
helps to determine difficulty of the learning
problem
 WEKA can visualize single attributes (1-d)
and pairs of attributes (2-d)

To do: rotating 3-d visualizations (Xgobi-style)
 Color-coded class values
 “Jitter” option to deal with nominal attributes
(and to detect “hidden” data points)
Performing experiments
 Experimenter makes it easy to compare
the performance of different learning
schemes

 For classification and regression problems

 Results can be written into file or database

 Evaluation options: cross-validation,


learning curve
Resources:
 WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka

 Also has a list of projects based on


WEKA

Tutorial.
http://prdownloads.sourceforge.net/weka/wek
a.ppt

You might also like