0% found this document useful (0 votes)

64 views9 pages

Data Analysis and Variable Classification

The document discusses different types of variables and how to classify data. There are two main types of variables: qualitative, which describe qualities in a non-numerical format, and quantitative, which are numerical. Qualitative variables can be categorical and measured on nominal or ordinal scales, while quantitative variables can be discrete, continuous, interval or ratio. The document provides examples of each variable type and recommends ways to analyze and represent the different data types, such as using frequency tables for categorical data or histograms for continuous data.

Uploaded by

Maruthi Maddipatla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views9 pages

Data Analysis and Variable Classification

Uploaded by

Maruthi Maddipatla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 9

Overview

Describing and Interpreting Data

The manner in which you analyze data depends on the type of data/variables that you are evaluating.
There are several different classifications that are used in classifying data.

Variable
 A variable is an item of data
 Examples of variables include quantities such as: gender, investment type, test scores, and weight.
The values of these quantities vary from one observation to another.

Types/Classifications of Variables
 Qualitative: Non-numerical quality
 Quantitative: Numerical
 Discrete: counts
 Continuous: measures

Qualitative Data
 This data describes the quality of something in a non-numerical format.
 Counts can be applied to qualitative data, but you cannot order or measure this type of variable.
Examples are gender, marital status, geographical region of an organization, job title….
 Qualitative data is usually treated as Categorical Data.
With categorical data, the observations can be sorted according into non-overlapping categories or by
characteristics.
 For example, shirts can be sorted according to color; the characteristic 'color' can have non-
overlapping categories: white, black, red, etc. People can be sorted by gender with
categories male and female.
 Categories should be chosen carefully since a bad choice can prejudice the outcome. Every
value of a data set should belong to one and only one category.
 Measurement Scale
 Nominal: classifies with no ranking (e.g. color, investment type...)
 Ordinal: classifies with ranking (e.g. product satisfaction, grades…)
 Analyze qualitative data using:
 Frequency tables, Contingency tables (for 2 variables)
 Modes - most frequently occurring
 Graphs: Bar Charts, Pie Charts, Pareto Charts

Quantitative Data
 Quantitative or numerical data arise when the observations are frequencies or measurements.
 Discrete Data
 The data are said to be discrete if the measurements are integers (e.g. number of employees
of a company, number of incorrect answers on a test, number of participants in a
program…)
 Continuous Data
 The data are said to be continuous if the measurements can take on any value, usually
within some range (e.g. weight). Age and income are continuous quantitative variables.
For continuous variables, arithmetic operations such as differences and averages make
sense.
 Analysis can take almost any form:
 Create groups or categories and generate frequency tables.
 Effective graphs include: Histograms, Stem-and-Leaf plots, Dot Plots, Box plots, and
XY Scatter Plots (2 variables).
 All descriptive statistics can be applied.

Goodson/ sln12 1
 Measurement Scale
 Interval: ordered and difference between variables is meaningful (e.g. standardized scores...)
 Ratio: ordered and difference between variables is meaningful, true 0 in measuring

Note: Some “quantitative” variables can be treated only as ranks; they have a natural order, but these
values are not strictly measured (ordinal data). Examples are: 1) age group (taking the values child,
teen, adult, senior), and 2) Likert Scale data (responses such as strongly agree, agree, neutral, disagree,
strongly disagree). For these variables, the distinction between adjacent points on the scale is not
necessarily the same, and the ratio of values is not meaningful.
 Analyze using:
 Frequency tables
 Mode, Median, Quartiles
 Graphs: Bar Charts, Dot Plots, Pie Charts, and Line Charts (2 variables)

Tables

Frequency Table/Frequency Distribution

A frequency or relative frequency table is used to summarize categorical, nominal, and ordinal data. It is
also be used to summarize continuous data when the data set has been divided into meaningful groups.

Count the number of observations that fall into each category. The number associated with each category
is called the frequencyand the collection of frequencies over all categories gives the frequency
distribution of that variable. Generally, a frequency distribution has 5 to 15 classes.
 It presents data in a useful form and allows for a visual interpretation.
 It enables analysis of the data set including where the data are concentrated / clustered,
the range of values, and observation of extreme values,

Frequency Table for Qualitative Data

Color Preferences of Customers

Goodson/ sln12 2
Frequency Distribution for Quanitative Data

Table 1
Frequency Distribution
of Time (min)

Time Count Note Table1

110 1 There are 8 classes. The frequency of the first
115 2 class is 1; i.e. there is 1 value within the class;
the class has a midpoint of 110.
120 4
125 3
130 5
135 3
140 4
145 2
150 1

The relative frequency is a number which describes the proportion of observations falling in a given
category. Instead of counts, we report relative frequencies or percentages.

CEO Compensation (x$1 mil.)

Contingency Table
A contingency table cross tabulates data using two or more categorical variables to allow for analysis of
relationships between the variables.

Table C: Employee Time at Company (. 3 yrs.) by Prior Related Experience Rating

Count of Prior Related Exp.

Rating Stayed3Yrs
Prior Related Exp. Rating No Yes Grand
Goodson/ sln12 3
Total
Very Good 8 6 14
Good 16 15 31
Fair 8 9 17
Minimal 2 2 4
Grand Total 34 32 66

Graphs
Note Excel will create any graph that you specify, even if the graph that you select is not appropriate for
the data. Remember - consider the type of data that you have before selecting your graph.

Graphs Used for Categorical/ Qualitative Data

Pie Charts
A circle is divided proportionately and shows what percentage of the whole falls into each category. The
size of each slice of the pie varies according to the percentage in each category.
 These charts are simple to understand.
 They convey information regarding the relative size of groups more readily than does a table.

Pie Chart of Color Preferences

Green
8%

Yellow
Red
16%
44%

Blue
32%

Bar Charts
Bar charts also show percentages in various categories and allow comparison between categories.
 The vertical scale is frequencies, relative frequencies, or percentages.
 The horizontal scale shows categories.
 Consider the following in constructing bar charts.
 all boxes should have the same width
 leave gaps between the boxes (because there is no connection between them)
 boxes can be in any order.
 Bar charts can be used to represent two categorical variables simultaneously

Goodson/ sln12 4
Color Preference of Customers

15
N
10

0
Red Blue Yellow Green

Color

As presented above, the bar chart is also called a Pareto chart because the vertical bars are plotted in
descending order by frequency (i.e. red is the most frequent selection …green occurs the least frequent.)
They are used to separate the “vital few” from the “trivial many.
Graphs for Measured/Continuous Quantitative Data
 Stem and Leaf
 Histograms
 Percentage Polygons
 XY Scatter Charts (2 variables)
 Line Graphs (e.g. time series)
 Box plots

Stem and Leaf Plots

A stem-and-leaf plot puts data into groups (called stems) so that the values within each group (the leaves)
branch out to the right on each row. The advantage of a stem and leaf plot is that it utilizes the data as a
part of the graph.

Note the first line. The first stem is 10

It is followed by four leaves, each 9.
This means that the original data has
Stem  four values of 10.9

 leaf

Histograms

Goodson/ sln12 5
Histograms show the frequency distributions of continuous variables. They look similar to Bar Charts,
but they are drawn without gaps between the bars because the x-axis is used to represent the class
intervals (on a continuum). However, many of the current software packages do easily not make this
distinction (e.g. Excel).
 The data is divided into non-overlapping intervals (usually use from 5 to 15).
 Intervals generally have the same length
 The number of values in each interval is counted (the class frequency).
 Sometimes relative frequencies or percentages are used. (Divide the cell total by the grand total.)
 Rectangles are drawn over each interval. (The area of rectangle = relative frequency of the interval.
If intervals are not all of the same length then heights have to be scaled so that each area is
proportional to the frequency for that interval. )
 Shifts in data concentration may show up when different class boundaries are chosen. As the size of
the data set increases, the impact of alterations in the selection of class boundaries is greatly reduced
 When comparing two or more groups with different sample sizes, you must use either a relative
frequency or a percentage distribution

Goodson/ sln12 6
Histogram of Time for CEO Compensation

Note: XL does not give mid points; it uses bins – which represent a range of values.
 The upper boundary of a bin is explicitly given – no value in the bin exceeds the upper
boundary.
 All the values in the bin are greater than the lower boundary.
 See posted examples for constructing histograms with Excel.

Frequency Polygon for CEO Compensation

Use midpoints to represent the data.

Ogive for CEO Compensation

Cumulative percentages are plotted

along the Y axis.

Goodson/ sln12 7
XY Scatter Chart
This type of chart should be used with two variables when both of the variables are quantitative and
continuous.
 Plot pairs of values using the rectangular coordinate system to examine the relationship between two
values.

Worker-Hours by Lot Size

180

160
140

120
Hours

100
80
60

40
20

0
0 20 40 60 80 100

Lot Size

A Line Chart is similar to the scatter chart; however, it can be used when the values of the independent
variable (shown on the horizontal axis) are ranked values (i.e. they do not have to be continuous
variables). It is also used for time series plots.

Goodson/ sln12 8
Basic Principles for Constructing All Plots
 Data should stand out clearly from background.
 Keep the graph as simple as possible.
 The information should be clearly labeled and include:
 title
 axes, bars, pie segments, etc. - include units that are needed to interpret data
 axis labels
 scale including starting points. The vertical axis will typically begin at 0.
 Sources of data should be identified, as appropriate.
 Do not clutter the graphs with unnecessary information and graphical components that are really not
necessary.
 Do not put too much information or data on one graph.
 Sometimes, you have to try several approaches before selecting an appropriate graph.

Some practical advice for constructing graphs includes the following.

 Every bit of ink on a graphic requires a reason. And nearly always that reason should be that the ink
presents new information. In most cases, non-data ink clutters up the data. Avoid content-free
decoration, including chart junk.
 Type should be clear, precise, and modest. Usually - type in upper and lower case.
 The grid should usually be muted or completely suppressed so that its presence is only implicit - lest
it compete with the data. Dark lines are chart junk. They carry no information, clutter up the graphic
and generate graphic activity unrelated to data information.
 The representation of numbers, as physically measured on the surface of the graphic itself, should be
directly proportional to the numerical quantities represented.
 Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity.
Label important events in the data.
 Show data variation, not design variation.
 Graphical elegance is often found in simplicity of design and complexity of data

To describe / interpret the data, consider the following.

 Shape of the Distribution
 Symmetry
 Modality: most frequently occurring value
 Unimodal or bimodal or uniform
 Skewness
 Centrality – mid range of values
 Spread – range of values
 Extreme values - outliers

In interpreting graphs, consider:

 Horizontal and vertical scales; what is the relationship - are the distances between, for example, 10
and 20, the same on each axis? A no answer may distort the interpretation.
 The center point - of particular importance in comparing two histograms. Look at the starting point
of the vertical scale - does it start at 0? How could this affect the interpretation of the data?

Goodson/ sln12 9

Dalgakiran Refrigeration Air Dryers
0% (1)
Dalgakiran Refrigeration Air Dryers
2 pages
9t83b3382 Hoja de Datos
No ratings yet
9t83b3382 Hoja de Datos
4 pages
Battery Monitoring Board Tesla 1701959523
No ratings yet
Battery Monitoring Board Tesla 1701959523
8 pages
CH 2 Notes Filled
No ratings yet
CH 2 Notes Filled
22 pages
CHAPTER 2 Descriptive Statistics
No ratings yet
CHAPTER 2 Descriptive Statistics
5 pages
Overview: Describing and Interpreting Data: Variable
No ratings yet
Overview: Describing and Interpreting Data: Variable
5 pages
Shock Absorber Dynamometer
100% (2)
Shock Absorber Dynamometer
19 pages
Behavioral Statistics: Chapter 2 - Describing Data With Tables and Graphs
No ratings yet
Behavioral Statistics: Chapter 2 - Describing Data With Tables and Graphs
47 pages
Variable: An Item of Data Examples
No ratings yet
Variable: An Item of Data Examples
60 pages
Chapter 3 Data Presentation
No ratings yet
Chapter 3 Data Presentation
37 pages
01 Data & Statistics
No ratings yet
01 Data & Statistics
35 pages
Chap 1 - 2: Business Statistics
No ratings yet
Chap 1 - 2: Business Statistics
38 pages
.Chapter 1: What Is Statistics?: 1.1 Key Statistical Concepts
No ratings yet
.Chapter 1: What Is Statistics?: 1.1 Key Statistical Concepts
66 pages
Operation Listo Disaster Preparedness Manual Presentation Final - Zamboanga - Regional
100% (1)
Operation Listo Disaster Preparedness Manual Presentation Final - Zamboanga - Regional
89 pages
Stat Module 3.2
No ratings yet
Stat Module 3.2
16 pages
ADDB - Week 1
No ratings yet
ADDB - Week 1
44 pages
Ch2 - Descriptive Statistics - Tabular and Graphical Presentations
100% (1)
Ch2 - Descriptive Statistics - Tabular and Graphical Presentations
47 pages
MATH 101 - Data Management
No ratings yet
MATH 101 - Data Management
44 pages
Data Visualization: Are Merely Labels, Codes or Mutually Exclusive Categories
No ratings yet
Data Visualization: Are Merely Labels, Codes or Mutually Exclusive Categories
26 pages
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
No ratings yet
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
23 pages
Graphical Presentation
No ratings yet
Graphical Presentation
6 pages
Compare Fermentation Rates of Flours
100% (1)
Compare Fermentation Rates of Flours
17 pages
AP Exam Data Analysis Guide
100% (2)
AP Exam Data Analysis Guide
38 pages
Data Presentation and Sumarization: Ungrouped Vs Grouped Data
No ratings yet
Data Presentation and Sumarization: Ungrouped Vs Grouped Data
43 pages
Business Statistics Guide
No ratings yet
Business Statistics Guide
41 pages
Writing About A Bar Chart Exercise
No ratings yet
Writing About A Bar Chart Exercise
4 pages
Pedestrian Stackers: 1 000 - 1 200KG S1.0 E, S1.2 E
No ratings yet
Pedestrian Stackers: 1 000 - 1 200KG S1.0 E, S1.2 E
3 pages
Task 1 Key Features
No ratings yet
Task 1 Key Features
8 pages
Electrical System Building Blocks
100% (1)
Electrical System Building Blocks
71 pages
Unit 01 Statistics
No ratings yet
Unit 01 Statistics
10 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
1st Mid
No ratings yet
1st Mid
19 pages
Complex Structure Design Guide
No ratings yet
Complex Structure Design Guide
23 pages
Natural Gas Engineering
100% (1)
Natural Gas Engineering
51 pages
Picturing Distributions With Graphs
No ratings yet
Picturing Distributions With Graphs
21 pages
Chapter 2. Presenting Data in Tables and Charts: Objectives
No ratings yet
Chapter 2. Presenting Data in Tables and Charts: Objectives
44 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
Christianity As A Double-Edged Sword in Colonial Africa
No ratings yet
Christianity As A Double-Edged Sword in Colonial Africa
12 pages
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
No ratings yet
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
39 pages
MATLAB Solution To Microwave Engineering Pozar 4th Ed. Example 1.5
No ratings yet
MATLAB Solution To Microwave Engineering Pozar 4th Ed. Example 1.5
5 pages
Data Visualization & Data Exploration - Unit II
No ratings yet
Data Visualization & Data Exploration - Unit II
26 pages
Decision Science: Ken Black
No ratings yet
Decision Science: Ken Black
296 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
2/ Organizing and Visualizing Variables: Dcova
No ratings yet
2/ Organizing and Visualizing Variables: Dcova
4 pages
Week 2 Data Presentation
No ratings yet
Week 2 Data Presentation
37 pages
EV 200 Trouble Shooting Guid 1
100% (2)
EV 200 Trouble Shooting Guid 1
82 pages
Wecall Catalog
100% (2)
Wecall Catalog
20 pages
Math Project (Section A)
No ratings yet
Math Project (Section A)
10 pages
SLIDES Statistics-Chapter 2
No ratings yet
SLIDES Statistics-Chapter 2
31 pages
Graduate Diploma in Law at Bradford
No ratings yet
Graduate Diploma in Law at Bradford
7 pages
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
No ratings yet
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
59 pages
Tle
No ratings yet
Tle
7 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Bal 3552133 02 001 1 en PDF
No ratings yet
Bal 3552133 02 001 1 en PDF
176 pages
Identifying The Main Idea PDF
No ratings yet
Identifying The Main Idea PDF
4 pages
DEPORTES
No ratings yet
DEPORTES
5 pages
Unit1 - 2charts and Graphs
No ratings yet
Unit1 - 2charts and Graphs
26 pages
The Cellular Approach: Smart Energy Region Wunsiedel. Testbed For Smart Grid, Smart Metering and Smart Home Solutions
No ratings yet
The Cellular Approach: Smart Energy Region Wunsiedel. Testbed For Smart Grid, Smart Metering and Smart Home Solutions
6 pages
Unit-2 MFAI
No ratings yet
Unit-2 MFAI
118 pages
Class-12-Maths-Sep Test-Final QN Paper
No ratings yet
Class-12-Maths-Sep Test-Final QN Paper
5 pages
Academic Writing: Summarizing Skills
No ratings yet
Academic Writing: Summarizing Skills
36 pages
Guidelines For Writing A SUMMARY
No ratings yet
Guidelines For Writing A SUMMARY
3 pages
Sample Pages 2016 PDF
No ratings yet
Sample Pages 2016 PDF
4 pages
Summary Writing Guide
No ratings yet
Summary Writing Guide
1 page
Citric Acid-Production, Technology, Applications, Patent, Consultants, Company Profiles, Reports, Market
No ratings yet
Citric Acid-Production, Technology, Applications, Patent, Consultants, Company Profiles, Reports, Market
7 pages
Graph and Chart Trend Descriptions
No ratings yet
Graph and Chart Trend Descriptions
17 pages
Com Med Practical 1
No ratings yet
Com Med Practical 1
28 pages
Writing Skills: Analyzing Pie Charts
No ratings yet
Writing Skills: Analyzing Pie Charts
3 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
Bar Chart Writing Practice
No ratings yet
Bar Chart Writing Practice
4 pages
Bar Chart Writing Practice
No ratings yet
Bar Chart Writing Practice
4 pages
Describe and Evaluate The Use of Multiple Choice Question For Testing Listening Comprehension
No ratings yet
Describe and Evaluate The Use of Multiple Choice Question For Testing Listening Comprehension
16 pages
Semiconductor Field Service Expert
No ratings yet
Semiconductor Field Service Expert
2 pages
Figures, Tables, Appendices
No ratings yet
Figures, Tables, Appendices
7 pages
2 - Presenting Data Part
No ratings yet
2 - Presenting Data Part
42 pages
Graphs and Data: About The Unit
No ratings yet
Graphs and Data: About The Unit
10 pages
2. presenting of data - ١١١٠٥٩
No ratings yet
2. presenting of data - ١١١٠٥٩
39 pages
STA 111 Note
No ratings yet
STA 111 Note
12 pages
List Turnkey Approved Material
No ratings yet
List Turnkey Approved Material
24 pages
LEE Exam 1 Version A
No ratings yet
LEE Exam 1 Version A
7 pages
Cbse 10th Bio Atom Bomb Free
No ratings yet
Cbse 10th Bio Atom Bomb Free
6 pages
Variables & Chart
No ratings yet
Variables & Chart
60 pages
10.2305 IUCN - UK.1998.RLTS.T33255A9771604.en
No ratings yet
10.2305 IUCN - UK.1998.RLTS.T33255A9771604.en
5 pages
Document
No ratings yet
Document
8 pages
Data Visualization
No ratings yet
Data Visualization
5 pages
Ae 9 Reviewer
No ratings yet
Ae 9 Reviewer
7 pages
Variables and Data Presentation
No ratings yet
Variables and Data Presentation
64 pages
Organizing-Data 250120 180858
No ratings yet
Organizing-Data 250120 180858
32 pages
Lecture 2-Data Description
No ratings yet
Lecture 2-Data Description
80 pages
Dis Vishnu
No ratings yet
Dis Vishnu
48 pages
Wbi11 01 Que 20240508
No ratings yet
Wbi11 01 Que 20240508
28 pages
Statistics - CH - 1 & CH - 2 - Introduction and Describing Data - Tabular and Graphical Presentation
No ratings yet
Statistics - CH - 1 & CH - 2 - Introduction and Describing Data - Tabular and Graphical Presentation
37 pages
The Cost of Obesity and Related NCDs in Brazil
No ratings yet
The Cost of Obesity and Related NCDs in Brazil
9 pages
Stats Formula
No ratings yet
Stats Formula
26 pages
How The Rib of Adam Is Incorrectly Translated
No ratings yet
How The Rib of Adam Is Incorrectly Translated
5 pages
UPS Power Monitor Users Manual Ver 1.17 - C
No ratings yet
UPS Power Monitor Users Manual Ver 1.17 - C
32 pages
Poverty Reduction in Ethiopia and The Role of Ngos: Qualitative Studies of Selected Projects
No ratings yet
Poverty Reduction in Ethiopia and The Role of Ngos: Qualitative Studies of Selected Projects
83 pages
Variable and Data-2
No ratings yet
Variable and Data-2
27 pages
Processing and Analysis of Data
No ratings yet
Processing and Analysis of Data
16 pages
Describing The Data
No ratings yet
Describing The Data
42 pages
Module 1
No ratings yet
Module 1
5 pages
STK11O - Chapter 1-7 Notes
No ratings yet
STK11O - Chapter 1-7 Notes
22 pages

Data Analysis and Variable Classification

Uploaded by

Data Analysis and Variable Classification

Uploaded by

Overview

Describing and Interpreting Data

Frequency Table/Frequency Distribution

Frequency Table for Qualitative Data

Color Preferences of Customers

Time Count Note Table1

CEO Compensation (x$1 mil.)

Table C: Employee Time at Company (. 3 yrs.) by Prior Related Experience Rating

Count of Prior Related Exp.

Graphs Used for Categorical/ Qualitative Data

Pie Chart of Color Preferences

Stem and Leaf Plots

Note the first line. The first stem is 10

Frequency Polygon for CEO Compensation

Use midpoints to represent the data.

Ogive for CEO Compensation

Cumulative percentages are plotted

Worker-Hours by Lot Size

Some practical advice for constructing graphs includes the following.

To describe / interpret the data, consider the following.

In interpreting graphs, consider:

You might also like