INTRODUCTION TO
INFORMATION
VISUALIZATION
BMI 6340: Health Information Visualization and
Visual Analytics
Todd R. Johnson, PhD
ANSCOMBE’S
QUARTET (1973)
Statistically identical
data sets
Mean
Variance
Correlation
Regression line
WHAT IS INFORMATION
VISUALIZATION?
The use of computer-
supported interactive
visual representations of
abstract data to amplify
cognition.
Card, Mackinlay and
Computer Supported
Interactive Non-Interactive
amplify cognition.
Abstract data
no natural physical form Image of real object
Static image?
Zoomable Satellite view?
Information
Visualization?
Information
Visualization?
Static image?
Zoomable view plus click on icons to get more details
Information
Visualization?
Information
Visualization?
Click image to view
Information
Visualization?
Information
Visualization?
Click image to view
Information
Visualization?
Information
Visualization?
Sky
Shady side of pyramid
Click image to view Sunny side of pyramid
WHAT IS DATA
VISUALIZATION?
“The use of visual
representation to
explore, make sense of,
and communicate data.”
Transforms Data into
Information
Stephen Few,
author, scientist
Syllabus Review
BMI 6340 Health Information
Visualization and Visual Analytics
3 Semester Credit Hours
Will Require 9-20 hours of effort each week
Learning Objectives
Primary
Learn the theory behind effective
visualizations
Gain practical experience designing
effective visualizations
Secondary
Learn how to use Tableau to create effective
information visualizations
Why Tableau?
Directly supports visualizations as mappings from data to
visual representations of that data
More than any other visualization tool, Tableau allows us
to focus on design instead of the mechanics (e.g., data
manipulation and programming) behind effective
visualizations
Often automatically follows sound theory for creating
visualizations
Produces highly interactive visualizations with little effort
Available free to students and faculty through the Tableau
for Teaching program. Tableau Public is available for free
to everyone.
Prereqs/Coreqs
Access to Internet
Downloading Tableau, data, examples,
readings, etc
Canvas for online course material and
discussions
Computer capable of running Tableau (Mac or
Windows)
Instructor
Todd R. Johnson, PhDOffice phone: 713-500-
3913 860 UCT 7000 Fannin, Suite 600
[email protected]Houston, TX 77030
Office Hours
Online help sessions held once each week
(see Canvas for schedule and session link)
Otherwise schedule one on one with the
course TA or me
Online Help Sessions
Q & A Session open to all students
Be sure to have a mic and headphones and be prepared to
share your screen
Questions taken in the following order
Questions about the next assignment due
Questions about most recent lecture
Questions about all other assignments or lectures (past and
future if you are working ahead)
Questions about class in general
Questions about anything else related to visualization in
healthcare, skills needed to help get a job in visualization, etc.
Grading
Requirements Requirements Percentage of Total Points
Percentage of Total Points
Questions embedded in 0.000000%
Weekly Review
videos and pre-quizzes
15%
Weekly Review Quizzes
Homeworks
15.000000%
40%
Homeworks 40.000000%
Term Project Proposal
15% Midterm Exam 10.000000%
Term Project Progress Report
10%
Midterm Project 10.000000%
Term Project Final Report/Poster
Term Project 25.000000%
20%
Total Total 100.000000%
100%
Late Assignment
Policy
25% penalty for each day after the due date.
Applies to all graded assignments, quizzes, exams, and
projects
Example:
Due Date: December 1 by midnight
Turned in: December 2 at 12:30am (counted as 1 day late!)
Score: 100 out of 100 points
Grade: 100 - .25*100 = 75 points
Assessments turned in after the 4th day will not be graded.
Solutions
Quizzes and Midterm Exam
Correct answers will be available 4 days after due
date (due to late submission policy)
Open one of your attempts to see the correct
answers
Homeworks
Walkthroughs (if available) will be available
approximately 4 days after the homework due date.
Each will appear in the module in which the
homework was assigned.
Weekly Review
Quizzes (15%)
Intended to reinforce weekly material (formative
assessments)
Open book, notes and we. Just do your own work so that
you learn the material.
Online in Canvas
You will have one week to complete the questions from
the time they are posted
You can retake the questions as many times as you like,
but questions are pulled pseudo-randomly from multiple
question banks.
Canvas keeps the highest score
Homeworks (40%)
Usually one per week
Hands on exercises to create or extend visualizations
I expect you to work individually on these
assignments
You can consult with me or your classmates for help,
especially regarding how to use Tableau, but your
work should be your own
I do expect you to use the internet for help with
Tableau and even design ideas. This is all part of
learning how to learn new concepts and new tools. It
is also essential for getting design inspiration.
Homework Grading
Rubrics
Each homework has a detailed grading rubric
to show exactly how much each element of
the homework is worth.
The TA will include a grade and possible
comments in the rubric for your submission.
Midterm Exam (10%)
Online Canvas Quiz
You can submit as many times as you like
until 4 days after the due date (but beware
late penalty).
Grade is taken from last submission.
Correct answers not shown until after due
date.
You will have approximately one week to do
the exam
Midterm Project
(10%)
Tableau project covering material cover in the
first half of the semester
You have approximately one week to
complete it
Term Project (25%)
Fairly open-ended
Create a Tableau dashboard using real data
(supplied by the instructor).
Dashboard must meet specific user needs
that you are given
You must justify your design
You will have several weeks to do the term
project
Textbook
Plus selected readings
Topics
Theory and practical guidelines for designing
effective visualizations
Tools for creating Information Visualizations
Emphasis on Tableau Data Visualization
Software
Brief discussions and examples using other
information visualization tools
Visual analysis for specific kinds of data
Dashboards
Time-Series Data
Data collected at equal (or nearly equal)
intervals
Is proportion of visits increasing over time?
Ranking Relations
Is Clinic 2 better or worse
than Clinic 5?
Where does Clinic 4 rank
among all clinics?
Interested in rank order, not
magnitude of differences
Barchart
An unsorted
sorted
Sorting from low to
barchart
from low does
to high
not
high supports rank
workworks
also well
comparisons
Part to Whole
Relations
What proportion
does one value
contribute to a
whole?
Which clinic sees
the highest
proportion of
patients in 2009?
Which clinic sees
the lowest
proportion of
patients in 2010?
Deviations
How does one set of
values differ from a
reference set of
values?
How much did
each Unit miss or
exceed its
readmission rate
goal?
Sorting from best
Bullet Charts showto
Distributions
How are
quantitative
values spread
across their
range?
What is the
distribution of
Birthweight by
Race?
Correlations
Use LOS vs. some other variable
Multivariate Data
Items described by a
common set of variables
patients: age, height,
weight, gender, race
Countries: population,
health spending per
person, GDP, etc.
Questions
Which items are alike or
similar? (Which patients
are like me?)
How can we group items?
Which set of variables and
values lead to a particular
outcome? Parallel Coordinates Plot
Temporal Event
Relations
Temporal sequence of events
Point events
Events with a duration (start
and end)
Different types of events (e.g.,
admission, surgery, discharge)
Events may take place at non-
uniform time periods
Time periods and actual date and
time of measurement may vary
by subject
Questions
What happens to patients…
EventFlow
Signal Detection and
Quality Improvement
Signal Detection and
Quality Improvement
Geographic Data
We will not cover…
Data governance
Data quality, other than the use of
visualization to discover potential quality
issues
Needs analysis (essential for visualizations)
Less common (but still useful) graphs and
tools
Advanced visualization theories: grammar of
graphics, types of visualization tools, etc.
A Closer Look at
Data Visualization
WHAT IS DATA
VISUALIZATION?
“The use of visual
representation to
explore, make sense of,
and communicate data.”
Stephen Few
Transforms Data into
Information
Income Life
Populatio
Data
Country Region per
Person
Expectanc
y
n
China Asia 9502 75 1.35 B
Sub-
Congo Saharan 403 50 70 M
Africa
Y loc
Color
Mapping Circle + Tooltip X loc Size
(Encoding)
Visual
Representati
on
Perception
+
Knowledge
Information
Income Life
Populatio
Data
Country Region per
Person
Expectanc
y
n
China Asia 9502 75 1.35 B
Sub-
Congo Saharan 403 50 70 M
Africa
Size X
Mapping
(Encoding)
Visual
Representati
on
Changing the
mapping
changes the
Income Life
Populatio
Data
Country Region per
Person
Expectanc
y
n
China Asia 9502 75 1.35 B
Sub-
Congo Saharan 403 50 70 M
Africa
Mapping
(Encoding)
Visual
Representati
on
What is the
mapping?
All Kinds of Mappings are
What’s
Possible, But Not All Arethe
GoodNot
Mapping?
all mappings are
good!
Bar Height
Bar Color
Bars ordered
An
Data effective
mapping depends
on
Mapping Lets look
Characteristics of the data at these in
(Encoding) detail
How we perceive visual objects
and relationships
Visual The viewer’s information
Representati need(s)
on
Perception The viewer’s background
+ knowledge
Knowledge
Information
Case Study: Clinic
Visits
What’s the Mapping?
Clinic Year Visits (%)
1 2008 26.3
Data
2 2008 73.7
1 2009 23.611
2 2009 76.389
Row Column
Hindu-Arabic Numeral
2008 2009 2010 2011 2012
Clinic 1 26.3 23.611 37.681 43.089 62.338
Clinic 2 73.7 76.389 62.319 56.911 37.662
What proportion of patients
visited Clinic 1 in 2009?
2008 2009 2010 2011 2012
Clinic 1 26.3 23.611 37.681 43.089 62.338
Clinic 2 73.7 76.389 62.319 56.911 37.662
Find a specific data point
Easy with the table
Only two clinics
Dates in chronological order
Very precise
Which clinic has a higher
proportion of patient visits in
2012?
2008 2009 2010 2011
Clinic 1
2012
Clinic 1 26.3 23.611 37.681 43.089 62.338
Clinic 2 73.7 76.389 62.319 56.911 37.662
Comparison of two data points
A little more difficult
Requires cognitive comparison of magnitude
Spatial proximity of the two values makes it easier
Is the proportion of patients
visiting Clinic 1 generally
increasing over time?
2008 2009 2010 2011
Yes 2012
Clinic 1 26.3 23.611 37.681 43.089 62.338
Clinic 2 73.7 76.389 62.319 56.911 37.662
Synthesis of multiple differences to determine an
overall trend
Much harder
Requires multiple comparisons and judgement
about slight variations (small decrease in 2008-
2009)
From 2008 to 2011 does one
clinic consistently see more
patients?
2008 2009 2010 2011
Yes
2012
Clinic 1 26.3 23.611 37.681 43.089 62.338
Clinic 2 73.7 76.389 62.319 56.911 37.662
Multiple comparisons
Easier than trend estimation?
Between which two years does one
clinic overtake the other in terms
of proportion of patients seen?
2008 2009 2010 2011 2012
Clinic 1 26.3 23.611 37.681 43.089 62.338
Clinic 2 73.7 76.389 62.319 56.911 37.662
Comparison of pairs of values and detection
of shift in proportion
Hardest yet?
If the trends continue, approximately
what proportion of patient visits will
Clinic 1 have in 2013?
Around 70?
2008 2009 2010 2011 2012
Clinic 1 26.3 23.611 37.681 43.089 62.338
Clinic 2 73.7 76.389 62.319 56.911 37.662
Synthesis to determine trend, plus projection
Very difficult
Calculate growth
-3,+4, +14, +6, +11
LET’S TRY AGAIN
WITH GRAPHS
What are the
mappings?
What proportion of patients
visited Clinic 1 in 2009?
WHICH IS EASIEST?
Which clinic has a higher
proportion of patient visits in
2012?
WHICH IS EASIEST?
Is the proportion of patients
visiting Clinic 1 generally
increasing over time?
WHICH IS EASIEST?
From 2008 to 2011 does one
clinic consistently see more
patients?
WHICH IS EASIEST?
Between which two years does one
clinic overtake the other in terms
of proportion of patients seen?
WHICH IS EASIEST?
If the trends continue, approximately
what proportion of patient visits will
Clinic 1 have in 2013?
Around 70?
WHICH IS EASIEST?
Key Points
The best visualization (mapping) depends on the data
and the information need(s)
For looking up exact values, tables can be better than
graphs
Graphs are often better for comparison and synthesis
(trends, projections)
To pick the best graph you need know your data, the
users’ information need(s), and the users’ background
knowledge
Since users can have more than one information need,
more than one graph might be needed for the same
data
All Kinds of Mappings are
Why
Possible, But Not All is
Arethis
Good a bad
To answer this, we need
mapping?
to understand variables,
measurement scales,
and visualBar Height
perception
Bar Color
Bars ordered
Variables and
Measurement Scales
Variable: A characteristic of an observational
unit that may assume more than one set of
values
Measurement Scale: All possible values for
a variable
Observational Unit:
Patient
Variable Measurement Scale
Age (in years) {0, 1, 2, 3,…}
{infant, toddler, adolescent,
Age Group
teen, adult}
{0-5, 6-12, 13-19, 20-25, 26
Age Range
and older}
Gender {Male, Female}
{Male, Female, Transgender,
Gender
Other}
Categorical
Quantitative
Steven’s Scale Types
Formal Scale
Properties Types
Nominal = Ordinal < ,
Interval - Ratio ÷
,≠ >
Category
X X X X
(equality)
Magnitude
(greater or X X X
less)
Equal
Interval
(equality
X X
of
differences
)
Absolute
Nominal
Values are non-numeric with no meaningful
order OR values are numbers used as names
or labels, where the value of the number is
meaningless
Magnitude < , Absolute Zero
Category = , ≠ Equal Interval -
> ÷
✓
Ordinal Scale
Values have a meaningful order, but differences don’t matter (intervals between
values are not equal)
Age: {0-5, 6-12, 13-19, 20-25, 26 and older}
Rank order, such as finish place in a race: {1st, 2nd, 3rd, …}
Survey response: {Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree}
Magnitude < , Absolute Zero
Category = , ≠ Equal Interval -
> ÷
✓ ✓
Interval Scale
Order and differences matter, but ratios of
two values do not matter (intervals
between values are equal)
Discharge Time – Arrival
Time = LOS
Discharge Time / Arrival
Time = ?
Magnitude < , Absolute Zero
Category = , ≠ Equal Interval -
> ÷
✓ ✓ ✓
Interval Scale
Order and differences matter, but ratios of
two values do not matter
These two patients temp dropped by an
equal amount
104 F - 103 F = 1
102 F - 101 F = 1
Magnitude < , Absolute Zero
Category = , ≠ Equal Interval -
> ÷
✓ ✓ ✓
Ratio Scale
Numbers tell us how much of one thing we have in
comparison to another
Has a non-arbitrary absolute zero, meaning total
absence of the characteristic being measured
Magnitude < , Absolute Zero
Category = , ≠ Equal Interval -
> ÷
✓ ✓ ✓ ✓
Celsius vs. Kelvin
0º K is absolute lack of heat
Note location of 0º on the C
scale
20º C is not twice the heat
as 10º C
10º C = 283.15º K
20º C = 293.15 K
0º C
20º C is just 1.035 times
as much heat as 10º C:
293.15/283.15 = 1.035
Scales and Mappings
To create an accurate mapping
Choose visual properties that match the
scale of the data as closely as possible
All Kinds of Mappings are
Why
Possible, But Not All is
Arethis
GoodScales
a ofbad
data and
mapping?
mappings are
Ratio
mismatched
Nominal
Bar Height
Ratio
Bar Color
Bars ordered Nominal
Ratio
Income Life
Populatio
Country Region per Expectanc
n
Person y
China Asia 9502 75 1.35 B
Sub-
Congo Saharan 403 50 70 M
Africa
North
US 41231 79 310 M
America
Countries (Nominal) to
Distinct Labeled Bars
(Nominal property of
What’s wrong with
this graph?
Y axis is no longer
uniform
Looks like ratio, but
isn’t
An Accurate Mapping is
Necessary But Not Sufficient
for an Effective Mapping
Summary
Information visualization is the use of computer
supported, interactive visualizations of abstract data
to amplify cognition
A subtype of data visualization
A data visualization
maps data to a visual representation
uses visual perception to turn data into information
The best visualization depends on the data, the
users’ information needs and their background
knowledge
Summary
To know what a visualization means, we must know
the mapping used to encode it
Data is measured along 4 scales
Nominal (categorical)
Ordinal
Interval
Ratio
An accurate mapping is one in which the scale of the
data matches the scale of the visual property
Assignments
Download and install Tableau
Try to work through the getting started
tutorial
Do the weekly review questions
Read this weeks and next weeks reading
assignments
Next Week
Perception and Memory
How they affect visual representations and
vice versa
Making effective use of color