0% found this document useful (0 votes)

549 views27 pages

Attribute Oriented Analysis

The document discusses attribute-oriented analysis, including attribute generalization which simplifies attributes with many values, attribute relevance analysis to identify the most important attributes, and class comparison to distinguish properties between a target class and contrasting classes. It also covers different types of attributes like nominal, binary, ordinal and numeric attributes. Statistical measures explored include measures of central tendency like mean, median and mode, and dispersion measures like range, quartiles and standard deviation to describe how data is distributed.

Uploaded by

Joveliza Angelia Trongcoso

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

549 views27 pages

Attribute Oriented Analysis

Uploaded by

Joveliza Angelia Trongcoso

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Attribute-Oriented Analysis

Presenter: Joveliza A. Trongcoso

Topic Outline

• Attribute generalization
• Attribute relevance
• Class comparison
• Statistical measures
• Experiments with weka – using filters and statistics
Data Objects

• Represents an entity
• Example in sales database, the objects may be customers, store
items, and sales
• Data objects are typically described by attributes.
• If the data objects are stored in a database, they are data tuples.
• That is the rows of a database correspond to the data objects, and the
columns correspond to the attributes
What is an Attribute?

• A data field representing a characteristic or feature of a data object.

• The nouns attribute, dimension, feature, and variable are often used
interchangeably in the literature.
• Attributes describing a customer object can include, for example,
customer_ID, name, and address
What is an Attribute? (cont’d)

• Observations – are the observed values for a given attribute

• Attribute vector (or feature vector) – set of attributes used to describe
a given object
• Univariate - distribution of data involving one attribute (or variable)
• Bivariate - distribution involves two attributes
Types of Attributes

1. Nominal
2. Binary
3. Ordinal
4. Numeric
Nominal Attributes

• The values of a nominal attribute are symbols or names of things.

• Nominal attributes are also referred to as categorical.

Example: hair_color and marital_status

Binary Attributes

• A nominal attribute with only two categories or states: 0 or 1

• 0 means absent; 1 means present
• Binary attributes are referred to as Boolean if the two states correspond
to true and false.

Example:

Attribute smoker describing a patient object

1 indicates that the patient smokes, while 0 indicates that the patient does not
Binary Attributes (cont’d)

• A binary attribute is symmetric if both of its states are equally

valuable and carry the same weight;
• Binary attribute is asymmetric if the outcomes of the states are not
equally important
Ordinal Attributes

• An attribute with possible values that have a meaningful order or

ranking among them.

Example:

drink_size (small, medium, and large)

professional_rank (private, private first class, specialist, corporal, and sergeant)
Ordinal Attributes (cont’d)

• Ordinal attributes are often used in surveys for ratings.

Example: Customer satisfaction had the following ordinal categories;

0: very dissatisfied,
1: somewhat dissatisfied,
2: neutral,
3: satisfied, and
4: very satisfied.
Numeric Attributes

• A numeric attribute is quantitative; that is, it is a measurable quantity,

represented in integer or real values.
• Numeric attributes can be interval-scaled or ratio-scaled.
Numeric Attributes (cont’d)

Interval-Scaled Attributes
• measured on a scale of equal-size units
• values have order
• allows to compare and quantify the difference between values.

Example:
• temperature of 20°C and 15°C
• Calendar dates 2010 and 2022
Numeric Attributes (cont’d)

Ratio-Scaled Attributes
• a numeric attribute with an inherent zero-point
• values are ordered, and we can also compute the difference between values,
as well as the mean, median, and mode

Example:
• count attributes such as years_of_experience (e.g., the objects are employees)
• number_of_words (e.g., the objects are documents)
• Additional examples include attributes to measure weight, height, latitude
and longitude coordinates (e.g., when clustering houses)
Attribute Generalization

• Attribute generalization is based on the following rule: “if there is a

large set of distinct values for an attribute, then a generalization
operator should be selected and applied to the attribute”

• Nominal attributes: the operation defines a sub-cube by performing a

selection on two or more dimensions. (Dropping condition)
• Structured attributes: climbing up concept hierarchy is used. Replacing a
value in an attribute value pair with a more general one. The operation
performs aggregation on data cube, either by climbing up a concept hierarchy
for a dimension or by dimension reduction.
Attribute Generalization (cont’d)

Example:

Set representation

Generalization
Y1 = {x2 = hot, x3 = high, x4 = weak} (X1
with first and last attributes dropped)
Attribute Relevance

Attribute relevance analysis is done in order to filter out statistically

irrelevant or weakly relevant attributes, and retain or even rank the
most relevant attributes for the descriptive mining task at hand.
Class Comparison

• Class discrimination or comparison (hereafter referred to as class

comparison) mines descriptions that distinguish a target class from its
contrasting classes.
• target and contrasting classes must be comparable and share similar
dimensions and attributes.
Class Comparison (cont’d)

Example: a class comparison describing the graduate and

undergraduate students at Big University.

Mining a class comparison. Suppose that you would like to compare the
general properties of the graduate and undergraduate students at Big
University, given the attributes name, gender, major, birth_place,
birth_date, residence, phone#, and gpa.
Class Comparison (cont’d)

This data mining task can be expressed in DMQL as follows:

Class Comparison (cont’d)
Class Comparison (cont’d)
Statistical Description of data

What Why
1. Measures of central tendency • To get overall picture of the data, basic
• Mean, median, mode statistical descriptions are used in data
• Location of the center of a data analysis
distribution • The statistical metrics can tell us if there
• Where do most of the attributes values are issues exist as extreme outliers and
fall? large deviation in the values of attributes
2. Dispersion measures
What is Outliers
• Range, quartiles, inter quartile range,
five-number summary and box plots, • Data values differs significantly from other values
variance and standard deviation. • It affect the mean value of the data but little
• It describes how are the data spread out. affect on median or mode.
Measures of Central Tendency

Example: We have the values for salary (in thousand dollars) 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110.

Mean – Average value of numeric Median – Middle value of numeric Mode – Most common value of
attribute attribute numeric attribute

Sort values in increasing order. It can be determined for

If N is odd, median is middle value qualitative and quantitative
of the ordered set. attributes.
If N is even, median is the average
The data from Example are
of the two middlemost values.
bimodal

Mean salary is $58,000. Median is $54,000. Modes are $52,000 and $70,000
Dispersion Measures

Example: We take data for any attribute X sorted in increasing numeric order

Range – The difference between the largest and smallest values of the attribute.

Quantiles – points takes at regular intervals dividing the data into equal size.
2-Quantile – a data point dividing the lower and upper halves of the data – Median
4-Quantiles – three data points that divide the data into four equal parts - Quartiles
100-Quantiles – divide the data values into 100 parts – Percentiles.
Dispersion Measures (Quartile)

A plot of the data distribution for an attribute X.

First quartile Q1 – 25th

Cuts off the lowest 25% of the data.
percentile

Third quartile Q3 – 75th

Cuts off the lowest 75% (or highest 25%) of the data.
percentile

Second quartile Q2 – 50th Median gives the center of the data distribution.
percentile The distance between the Q1 and Q3 gives the range covered by the middle half of
the data. This distance is called the Interquartile range. IQR=Q3-Q1
Experiments with Weka
using Filters

Lab Assignment1 Mongodb
100% (1)
Lab Assignment1 Mongodb
2 pages
CS8091 - Big Data Analytics - Unit 1
No ratings yet
CS8091 - Big Data Analytics - Unit 1
28 pages
DBMS Lab (18IS507) Manual With Solutions-1
No ratings yet
DBMS Lab (18IS507) Manual With Solutions-1
24 pages
Mining Class Comparisions and Mining Descriptive Statistical Measures
No ratings yet
Mining Class Comparisions and Mining Descriptive Statistical Measures
24 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
21 pages
Data Analytics Question Bank
No ratings yet
Data Analytics Question Bank
4 pages
VTU Research Exam Guide
No ratings yet
VTU Research Exam Guide
48 pages
Data Mining Techniques Explained
No ratings yet
Data Mining Techniques Explained
9 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Rajkiya Engineering College Kannauj: Datawarehousing & Data Mining Lab (RCS-654)
No ratings yet
Rajkiya Engineering College Kannauj: Datawarehousing & Data Mining Lab (RCS-654)
28 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
18CS72
No ratings yet
18CS72
2 pages
Frame-Based Expert Systems
No ratings yet
Frame-Based Expert Systems
50 pages
Data Base Management System - Unit 8 - Week 6
No ratings yet
Data Base Management System - Unit 8 - Week 6
7 pages
Lesson Plan F1.1-DMDW
No ratings yet
Lesson Plan F1.1-DMDW
3 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
CCS341 Set3
100% (1)
CCS341 Set3
3 pages
Data Mining Quiz for AIML Students
No ratings yet
Data Mining Quiz for AIML Students
3 pages
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
No ratings yet
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
3 pages
Nptel - Data Mining - Week 2
No ratings yet
Nptel - Data Mining - Week 2
4 pages
CS8492-Database Management Systems
No ratings yet
CS8492-Database Management Systems
15 pages
Chapter 5 Concept Description Characterization and Comparison 395
No ratings yet
Chapter 5 Concept Description Characterization and Comparison 395
64 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Unit 1
No ratings yet
Unit 1
61 pages
Unit3 Inferentialnew
No ratings yet
Unit3 Inferentialnew
36 pages
B.Tech R20 IQuestion Bank CC
No ratings yet
B.Tech R20 IQuestion Bank CC
3 pages
NP-Complete Exam Guide
100% (1)
NP-Complete Exam Guide
7 pages
Ad3391 LAB MANUAL
No ratings yet
Ad3391 LAB MANUAL
23 pages
Concept Learning
No ratings yet
Concept Learning
62 pages
CCW331 BA IAT 1 Set 1 & Set 2 Questions
No ratings yet
CCW331 BA IAT 1 Set 1 & Set 2 Questions
19 pages
Lecture 3030 K19EN
No ratings yet
Lecture 3030 K19EN
20 pages
CS8091-BIG DATA ANALYTICS UNIT V Notes
100% (4)
CS8091-BIG DATA ANALYTICS UNIT V Notes
31 pages
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
100% (1)
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
2 pages
Artificial Intelligence PPT-7 - Inference in FOL
No ratings yet
Artificial Intelligence PPT-7 - Inference in FOL
46 pages
Classification Error: Training Errors Generalization Errors
No ratings yet
Classification Error: Training Errors Generalization Errors
39 pages
RDBMS Assignment 1
No ratings yet
RDBMS Assignment 1
5 pages
BI Lab Manual
0% (1)
BI Lab Manual
9 pages
Course - DBMS: Course Instructor Dr. Umadevi V Department of CSE, BMSCE
No ratings yet
Course - DBMS: Course Instructor Dr. Umadevi V Department of CSE, BMSCE
43 pages
Data Science & Big Data MCQs
No ratings yet
Data Science & Big Data MCQs
17 pages
Data Warehouse & Mining MCQs
No ratings yet
Data Warehouse & Mining MCQs
4 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
Pincer Search Algo
No ratings yet
Pincer Search Algo
8 pages
Software Testing Basics & Techniques
100% (1)
Software Testing Basics & Techniques
63 pages
Data Visualization Exam Guide
No ratings yet
Data Visualization Exam Guide
4 pages
Data Mining and Knowledge Discovery
No ratings yet
Data Mining and Knowledge Discovery
10 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
MCQ On Knowledge Representation 5eea6a0e39140f30f369e525
No ratings yet
MCQ On Knowledge Representation 5eea6a0e39140f30f369e525
21 pages
Cyber Security Ii-I Model Question Papers
No ratings yet
Cyber Security Ii-I Model Question Papers
69 pages
CS614 FinalTerm Solved Papers
No ratings yet
CS614 FinalTerm Solved Papers
24 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Assigment - 1 - Week 1 - 2023 - G
No ratings yet
Assigment - 1 - Week 1 - 2023 - G
3 pages
Introduction To Data Analytics and Visualization Question Paper
100% (2)
Introduction To Data Analytics and Visualization Question Paper
2 pages
What Kind of Data Can Be Mined
No ratings yet
What Kind of Data Can Be Mined
6 pages
Mini Project B.tech
100% (1)
Mini Project B.tech
15 pages
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
No ratings yet
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
17 pages
Assignment 6
No ratings yet
Assignment 6
2 pages
Know Your Data
No ratings yet
Know Your Data
83 pages
Data Mining - Data Objects and Attributes
No ratings yet
Data Mining - Data Objects and Attributes
50 pages
2nd Slides
No ratings yet
2nd Slides
54 pages
Wanelo: Social Shopping Web Site Headed For Success
No ratings yet
Wanelo: Social Shopping Web Site Headed For Success
1 page
Student Project Evaluation Rubric
No ratings yet
Student Project Evaluation Rubric
5 pages
Excel2016 Introformulas Practice
No ratings yet
Excel2016 Introformulas Practice
4 pages
Quiz 1: Identify What Is Being Meant or Referred On The Following Statements
No ratings yet
Quiz 1: Identify What Is Being Meant or Referred On The Following Statements
1 page
Ict For Public Administration Course Map: Computer Concept Office Productivity Netiquette
No ratings yet
Ict For Public Administration Course Map: Computer Concept Office Productivity Netiquette
1 page
Module 1 - Computer Concepts
No ratings yet
Module 1 - Computer Concepts
3 pages
Short Film Rubric (Commandment of Computer Ethics) Category 5 3 2
No ratings yet
Short Film Rubric (Commandment of Computer Ethics) Category 5 3 2
1 page
A 6.1 S S Q: Nnex Ample Tudent Uestionnaire
No ratings yet
A 6.1 S S Q: Nnex Ample Tudent Uestionnaire
17 pages
Rasch Model Measurement: Basic: Bambang Sumintono
No ratings yet
Rasch Model Measurement: Basic: Bambang Sumintono
30 pages
Climate Risk Factsheet
No ratings yet
Climate Risk Factsheet
15 pages
Indraprastha
100% (1)
Indraprastha
7 pages
Module Physics P3B Gas Law Chap 4.4
No ratings yet
Module Physics P3B Gas Law Chap 4.4
7 pages
The Impacts of Covid in Tourism
No ratings yet
The Impacts of Covid in Tourism
15 pages
Statistics Unit 7 Notes
No ratings yet
Statistics Unit 7 Notes
9 pages
Note Taking of Testing-Testing
No ratings yet
Note Taking of Testing-Testing
2 pages
Measuring Training Effectiveness: A Practical Guide - AIHR
No ratings yet
Measuring Training Effectiveness: A Practical Guide - AIHR
15 pages
Public Policy Dissertation Examples
100% (2)
Public Policy Dissertation Examples
7 pages
Geo - PPM - Nevada Dot
No ratings yet
Geo - PPM - Nevada Dot
309 pages
Accounting Theory Godfrey 7th Edition Solution
63% (8)
Accounting Theory Godfrey 7th Edition Solution
2 pages
Performance Appraisal Comment Guide
No ratings yet
Performance Appraisal Comment Guide
6 pages
Project Report On Sanvie Retail Private Limited
No ratings yet
Project Report On Sanvie Retail Private Limited
65 pages
Grade 11 Hypothesis Testing Lesson
No ratings yet
Grade 11 Hypothesis Testing Lesson
4 pages
Chapter 4 Demand Forecasting
No ratings yet
Chapter 4 Demand Forecasting
14 pages
Opcrf Movs Checklist Sy 2022 2023
No ratings yet
Opcrf Movs Checklist Sy 2022 2023
9 pages
Orissa Mutual Fund Investment Analysis
No ratings yet
Orissa Mutual Fund Investment Analysis
23 pages
Name: Nana Mohammed Msellem REG NO: HDB212-C005-0444/2015
No ratings yet
Name: Nana Mohammed Msellem REG NO: HDB212-C005-0444/2015
7 pages
Metfessel - Michaels Curriculum Evaluation Model
No ratings yet
Metfessel - Michaels Curriculum Evaluation Model
29 pages
Building A Better Response - Training Toolkit
No ratings yet
Building A Better Response - Training Toolkit
6 pages
"Bahala Na Attitude and Success"
No ratings yet
"Bahala Na Attitude and Success"
10 pages
Project On Herzberg Two Factor Theory & Mayo'S Hawthorne: Submitted By: Mihir Jain 1139441 Group-E
No ratings yet
Project On Herzberg Two Factor Theory & Mayo'S Hawthorne: Submitted By: Mihir Jain 1139441 Group-E
21 pages
Week 13 - Causal-Comparative Research T-Test
No ratings yet
Week 13 - Causal-Comparative Research T-Test
43 pages
Ciechanowski 2014
No ratings yet
Ciechanowski 2014
28 pages
Jharone Demandante - MODULE 5 - Importance of Qualitative Research Across Fields of Inquiry
No ratings yet
Jharone Demandante - MODULE 5 - Importance of Qualitative Research Across Fields of Inquiry
3 pages
Vuthani HR Solutions
No ratings yet
Vuthani HR Solutions
9 pages
Binomial Probability Sums: Table A.1
No ratings yet
Binomial Probability Sums: Table A.1
23 pages
Sample IA
No ratings yet
Sample IA
24 pages
AQoNs PWC 7 July 2023
No ratings yet
AQoNs PWC 7 July 2023
35 pages

Attribute Oriented Analysis

Uploaded by

Attribute Oriented Analysis

Uploaded by

Attribute-Oriented Analysis

Presenter: Joveliza A. Trongcoso

• A data field representing a characteristic or feature of a data object.

• Observations – are the observed values for a given attribute

• The values of a nominal attribute are symbols or names of things.

Example: hair_color and marital_status

• A nominal attribute with only two categories or states: 0 or 1

Attribute smoker describing a patient object

• A binary attribute is symmetric if both of its states are equally

• An attribute with possible values that have a meaningful order or

drink_size (small, medium, and large)

• Ordinal attributes are often used in surveys for ratings.

Example: Customer satisfaction had the following ordinal categories;

• A numeric attribute is quantitative; that is, it is a measurable quantity,

• Attribute generalization is based on the following rule: “if there is a

• Nominal attributes: the operation defines a sub-cube by performing a

Attribute relevance analysis is done in order to filter out statistically

• Class discrimination or comparison (hereafter referred to as class

Example: a class comparison describing the graduate and

This data mining task can be expressed in DMQL as follows:

Sort values in increasing order. It can be determined for

A plot of the data distribution for an attribute X.

First quartile Q1 – 25th

Third quartile Q3 – 75th

You might also like