Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
214 views91 pages

Displaying Public Health Data

The document discusses tools for organizing and displaying public health data in an annual summary report. It recommends using tables, graphs, and charts to summarize data clearly and effectively. These tools can demonstrate trends, distributions, and relationships in the data. They allow health departments to communicate epidemiological findings to others efficiently.

Uploaded by

Nurul Khalda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
214 views91 pages

Displaying Public Health Data

The document discusses tools for organizing and displaying public health data in an annual summary report. It recommends using tables, graphs, and charts to summarize data clearly and effectively. These tools can demonstrate trends, distributions, and relationships in the data. They allow health departments to communicate epidemiological findings to others efficiently.

Uploaded by

Nurul Khalda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

4

DISPLAYING PUBLIC HEALTH DATA


Imagine that you work in a county or state health department. The
department must prepare an annual summary of the individual
surveillance reports and other public health data from the year that just
313 ended. This summary needs to display trends and patterns in a concise and
understandable manner. You have been selected to prepare this annual
summary. What tools might you use to organize and display the data?

Most annual reports use a combination of tables, graphs, and charts to summarize and display
data clearly and effectively. Tables and graphs can be used to summarize a few dozen records or
a few million. They are used every day by epidemiologists to summarize and better understand
the data they or others have collected. They can demonstrate distributions, trends, and
relationships in the data that are not apparent from looking at individual records. Thus, tables and
graphs are critical tools for descriptive and analytic epidemiology. In addition, remembering the
adage that a picture is worth a thousand words, you can use tables and graphs to communicate
epidemiologic findings to others efficiently and effectively. This lesson covers tabular and
graphic techniques for data display; interpretation was covered in Lessons 2 and 3.

Objectives
After completing this lesson and answering the questions in the exercises, you will be able to:
• Prepare and interpret one, two, or three variable tables and composite tables (including
creating class intervals)
• Prepare and interpret arithmetic-scale line graphs, semilogarithmic-scale line graphs,
histograms, frequency polygons, bar charts, pie charts, maps, and area maps
• State the value and proper use of population pyramids, cumulative frequency graphs,
survival curves, scatter diagrams, box plots, dot plots, forest plots, and tree plots
• Identify when to use each type of table and graph

Major Sections
Introduction to Tables and Graphs ............................................................................................... 4-2
Tables ........................................................................................................................................... 4-4
Graphs ........................................................................................................................................ 4-23
Other Data Displays ................................................................................................................... 4-43
Using Computer Technology ..................................................................................................... 4-64
Summary .................................................................................................................................... 4-67

Displaying Public Health Data


Page 4-1
Introduction to Tables and Graphs
Data analysis is an important component of public health practice.
In examining data, one must first determine the data type in order
to select the appropriate display format. The data to be displayed
will be in one of the following categories:
• Nominal
• Ordinal
• Discrete
• Continuous

Nominal measurements have no intrinsic order and the difference


between levels of the variable have no meaning. In epidemiology,
sex, race, or exposure category (yes/no) are examples of nominal
measurements. Ordinal variables do have an intrinsic order, but,
again, differences between levels are not relevant. Examples of
ordinal variables are “low, medium, high” or perhaps categories of
other variables (e.g., age ranges). Discrete variables have values
that are integers (e.g., number of ill persons who were exposed to a
risk factor). Finally, continuous variables can have any value in a
range (e.g., amount of time between meal being served and onset
of gastro-intestinal symptoms; infant mortality rate).

Before constructing any display of epidemiologic data, it is


important to first determine the point to be conveyed. Are you
highlighting a change from past patterns in the data? Are you
showing a difference in incidence by geographic area or by some
predetermined risk factor? What is the interpretation you want the
reader to reach? Your answer to these questions will help to
determine the choice of display.

To analyze data effectively, an epidemiologist must become


familiar with the data before applying analytic techniques. The
epidemiologist may begin by examining individual records such as
those contained in a line listing. This review will be followed by
production of a table to summarize the data. Sometimes, the
resulting tables are the only analysis that is needed, particularly
when the amount of data is small and relationships are
straightforward.

When the data are more complex, graphs and charts can help the
epidemiologist visualize broader patterns and trends and identify
variations from those trends. Variations in data may represent
important new findings or only errors in typing or coding which
need to be corrected. Thus, tables and graphs can be helpful tools
to aid in verifying and analyzing the data.
Displaying Public Health Data
Page 4-2
Once an analysis is complete, tables and graphs further serve as
useful visual aids for describing the data to others. When preparing
tables and graphs, keep in mind that your primary purpose is to
communicate information.

Tables and graphs can be presented using a variety of media. In


epidemiology, the most common media are print and projection.
This lesson will focus on creating effective and attractive tables
and graphs for print and will also offer suggestions for projection.
At the end, we present tables that summarize all techniques
presented and guidelines for use.

Displaying Public Health Data


Page 4-3
Tables
A table is a set of data arranged in rows and columns. Almost any
quantitative information can be organized into a table. Tables are
useful for demonstrating patterns, exceptions, differences, and
other relationships. In addition, tables usually serve as the basis for
preparing additional visual displays of data, such as graphs and
If a table is taken out of its
original context, it should charts, in which some of the details may be lost.
still convey all the
information necessary for Tables designed to present data to others should be as simple as
the reader to understand possible.1 Two or three small tables, each focusing on a different
the data.
aspect of the data, are easier to understand than a single large table
that contains many details or variables.

A table in a printed publication should be self-explanatory. If a


table is taken out of its original context, it should still convey all
the information necessary for the reader to understand the data. To
create a table that is self-explanatory, follow the guidelines below.

M ore About Constructing Tables


• Use a clear and concise title that describes person, place and time — what, where, and when — of the data in the
table. Precede the title with a table number.
• Label each row and each column and include the units of measurement for the data (for example, years, mm Hg,
mg/dl, rate per 100,000).

• Show totals for rows and columns, where appropriate. If you show percentages (%), also give their total (always
100).

• Identify missing or unknown data either within the table (for example, Table 4.11) or in a footnote below the table.
• Explain any codes, abbreviations, or symbols in a footnote (for example, Syphilis P&S = primary and secondary
syphilis).
• Note exclusions in a footnote (e.g., 1 case and 2 controls with unknown family history were excluded from this
analysis).

• Note the source of the data below the table or in a footnote if the data are not original.

One-variable tables
In descriptive epidemiology, the most basic table is a simple
frequency distribution with only one variable, such as Table 4.1a,
which displays number of reported syphilis cases in the United
States in 2002 by age group.2 (Frequency distributions are
discussed in Lesson 2.) In this type of frequency distribution table,
the first column shows the values or categories of the variable
represented by the data, such as age or sex. The second column
shows the number of persons or events that fall into each category.
In constructing any table, the choice of columns results from the

Displaying Public Health Data


Page 4-4
interpretation to be made. In Table 4.1a, the point the analyst
wishes to make is the role of age as a risk factor of syphilis. Thus,
age group is chosen as column 1 and case count as column 2.

Often, an additional column lists the percentage of persons or


events in each category (see Table 4.1b). The percentages shown in
To create a frequency Table 4.1b actually add up to 99.9% rather than 100.0% due to
distribution from a data rounding to one decimal place. Rounding that results in totals of
set in Analysis Module: 99.9% or 100.1% is common in tables that show percentages.
Select frequencies, then
Nonetheless, the total percentage should be displayed as 100.0%,
choose variable under and a footnote explaining that the difference is due to rounding
Frequencies of. should be included.
(Since Epi Info 3 is the
recommended version,
The addition of percent to a table shows the relative burden of
only commands for this illness; for example, in Table 4.1b, we see that the largest
version are provided in the contribution to illness for any single age category is from 35–39
text; corresponding year olds. The subsequent addition of cumulative percent (e.g.,
commands for Epi Info 6
are offered at the end of Table 4.1c) allows the public health analyst to illustrate the impact
the lesson.) of a targeted intervention. Here, any intervention effective at
preventing syphilis among young people and young adults (under
age 35) would prevent almost half of the cases in this population.

The one-variable table can be further modified to show cumulative


frequency and/or cumulative percentage, as in Table 4.1c. From
this table, you can see at a glance that 46.7% of the primary and
secondary syphilis cases occurred in persons younger than age 35
years, meaning that over half of the syphilis cases occurred in
persons age 35 years or older. Note that the choice of age-
groupings will affect the interpretation of your data.3

Table 4.1a Reported Cases of Primary and Secondary Syphilis by Age — United States, 2002
Age Group (years) Number of Cases

<14 21
15–19 351
20–24 842
25–29 895
30–34 1,097
35–39 1,367
40–44 1,023
45–54 982
≥55 284
Total 6,862
Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.

Displaying Public Health Data


Page 4-5
Table 4.1b Reported Cases of Primary and Secondary Syphilis by Age — United States, 2002
CASES

Age Group (years) Number Percent

<14 21 0.3
15–19 351 5.1
20–24 842 12.3
25–29 895 13.0
30–34 1,097 16.0
35–39 1,367 19.9
40–44 1,023 14.9
45–54 982 14.3
≥55 284 4.1
Total 6,862 100.0*
* Actual total of percentages for this table is 99.9% and does not add to 100.0% due to rounding error.

Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.

Table 4.1c Reported Cases of Primary and Secondary Syphilis by Age — United States, 2002
CASES

Age Group (years) Number Percent Cumulative Percent

<14 21 0.3 0.3


15–19 351 5.1 5.4
20–24 842 12.3 17.7
25–29 895 13.0 30.7
30–34 1,097 16.0 46.7
35–39 1,367 19.9 66.6
40–44 1,023 14.9 81.6
45–54 982 14.3 95.9
≥55 284 4.1 100.0
Total 6,862 100.0* 100.0*
* Percentages do not add to 100.0% due to rounding error.

Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.

Tw o- and three-variable tables


Tables 4.1a, 4.1b, and 4.1c show case counts (frequency) by a
single variable, e.g., age. Data can also be cross-tabulated to show
counts by an additional variable. Table 4.2 shows the number of
syphilis cases cross-classified by both age group and sex of the
patient.

Displaying Public Health Data


Page 4-6
Table 4.2 Reported Cases of Primary and Secondary Syphilis by Age and Sex — United States, 2002
NUMBER OF CASES

Age Group (years) Male Female Total

<14 9 12 21
15–19 135 216 351
20–24 533 309 842
25–29 668 227 895
30–34 877 220 1,097
35–39 1,121 246 1,367
40–44 845 178 1,023
45–54 825 157 982
≥55 255 29 284
Total 5,268 1,594 6,862
Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.

A two-variable table with data categorized jointly by those two


variables is known as a contingency table. Table 4.3 is an
example of a special type of contingency table, in which each of
To create a two-variable the two variables has two categories. This type of table is called a
table from a data set in two-by-two table and is a favorite among epidemiologists. Two-
Analysis Module: by-two tables are convenient for comparing persons with and
Select frequencies, then
without the exposure and those with and without the disease. From
choose variable under these data, epidemiologists can assess the relationship, if any,
Frequencies of. Output between the exposure and the disease. Table 4.3 is a two-by-two
shows table with row and table that shows one of the key findings from an investigation of
column percentages, plus
chi-square and p-value.
carbon monoxide poisoning following an ice storm and prolonged
For a two-by-two table, power failure in Maine.4 In the table, the exposure variable,
output also provides odds location of power generator, has two categories — inside or
ratio, risk ratio, risk outside the home. Similarly the outcome variable, carbon
difference and confidence
intervals. Note that for a monoxide poisoning, has two categories — cases (number of
cohort study, the row persons who became ill) and controls (number of persons who did
percentage in cells of ill not become ill).
patients is the attack
proportion, sometimes
called the attack rate.

Displaying Public Health Data


Page 4-7
Table 4.3 Generator Location and Risk of Carbon Monoxide Poisoning After an Ice Storm — Maine, 1998
NUMBER OF

Cases Controls Total

Inside home or
23 23 46
attached structure
Generator location
Outside home 4 139 143

Total 27 162 189


Data Source: Daley RW, Smith A, Paz-Argandona E, Mallilay J, McGeehin M. An outbreak of carbon monoxide poisoning after a
major ice storm in Maine. J Emerg Med 2000;18:87–93.

Table 4.4 illustrates a generic format and standard notation for a


two-by-two table. Disease status (e.g., ill versus well, sometimes
denoted cases vs. controls if a case-control study) is usually
designated along the top of the table, and exposure status (e.g.,
exposed versus not exposed) is designated along the side. The
letters a, b, c, and d within the 4 cells of the two-by-two table refer
to the number of persons with the disease status indicated above
and the exposure status indicated to its left. For example, in Table
4.4, “c” represents the number of persons in the study who are ill
but who did not have the exposure being studied. Note that the
“Hi” represents horizontal totals; H1 and H0 represent the total
number of exposed and unexposed persons, respectively. The “Vi”
represents vertical totals; V1 and V0 represent the total number of
ill and well persons (or cases and controls), respectively. The total
number of subjects included in the two-by-two table is represented
by the letter T (or N).
Table 4.4 General Format and Notation for a Two-by-Two Table
Ill Well Total Attack Rate (Risk)
Exposed a b a + b = H1 a / a+b
Unexposed c d c + d = H0 c / c+d
Total a + c = V1 b + d = V0 T V1 / T

When producing a table to display either in print or projection, it is


best, generally, to limit the number of variables to one or two. One
exception to this rule occurs when a third variable modifies the
effect (technically, produces an interaction) of the first two. Table
4.5 is intended to convey the way in which race/ethnicity may
modify the effect of age and sex on incidence of syphilis. Because
three-way tables are often hard to understand, they should be used
only when ample explanation and discussion is possible.

Displaying Public Health Data


Page 4-8
Table 4.5 Number of Reported Cases of Primary and Secondary Syphilis, by Race/Ethnicity, Age, and
Sex — United States, 2002
Race/ethnicity Age Group (years) Male Female Total

American Indian/ <14 1 0 1


Alaskan Native 15–19 0 1 1
20–24 5 3 8
25–29 3 1 4
30–34 1 2 3
35–39 3 5 8
40–44 4 3 7
45–54 8 8 16
≥55 2 1 3
Total 27 24 51

Asian/Pacific Islander <14 1 1 2


15–19 0 2 2
20–24 9 4 13
25–29 16 1 17
30–34 21 1 22
35–39 14 1 15
40–44 14 1 15
45–54 8 0 8
≥55 0 0 0
Total 83 11 94

Black, Non-Hispanic <14 3 9 12


15–19 89 164 253
20–24 313 233 546
25–29 322 163 485
30–34 310 166 476
35–39 385 183 568
40–44 305 142 447
45–54 370 112 482
≥55 129 23 152
Total 2,226 1,195 3,421

Hispanic <14 1 1 2
15–19 37 25 62
20–24 117 29 146
25–29 139 26 165
30–34 172 20 192
35–39 178 22 200
40–44 93 9 102
45–54 69 14 83
≥55 18 1 19
Total 824 147 971

White, Non-Hispanic <14 3 1 4


15–19 9 24 33
20–24 89 40 129
25–29 188 36 224
30–34 373 31 404
35–39 541 35 576
40–44 429 23 452
45–54 370 23 393
≥55 106 4 110
Total 2,108 217 2,325
Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003. p. 118.
Displaying Public Health Data
Page 4-9
Exercise 4.1
The data in Table 4.6 describe characteristics of the 38 persons who ate
food at or from a church supper in Texas in August 2001. Fifteen of these
persons later developed botulism. 5

1. Construct a table of the illness (botulism) by age group. Use botulism status (yes/no) as the
column labels and age groups as the row labels.

2. Construct a two-by-two table of the illness (botulism) by exposure to chicken.

3. Construct a two-by-two table of the illness (botulism) by exposure to chili.

4. Construct a three-way table of illness (botulism) by exposure to chili and chili leftovers.

Check your answers on page 4-72

Displaying Public Health Data


Page 4-10
Table 4.6 Line Listing for Exercise 4.1
Attended Date of Ate Any Ate Ate Ate Chili
ID Age Supper Case Onset Case Status Food Chili Chicken Leftovers

1 1 Y N - Y Y Y N
2 3 Y Y 8/27 Lab-confirmed Y Y N N
3 7 Y Y 8/31 Lab-confirmed Y Y N N
4 7 Y N - Y Y Y N
5 10 Y N - Y Y N Y
6 17 Y Y 8/28 Lab-confirmed Y Y Y N
7 21 Y N - N N N N
8 23 Y N - Y Y N N
9 25 Y Y 8/26 Epi-linked Y Y N N
10 29 N Y 8/28 Lab-confirmed Y Unk Unk Y
11 38 Y N - N N N N
12 39 Y N - N N N N
13 41 Y N - Y Y Y N
14 41 Y N - N N N N
15 42 Y Y 8/26 Lab-confirmed Y Y Unk N
16 45 Y Y 8/26 Lab-confirmed Y Y Y Y
17 45 Y Y 8/27 Epi-linked Y Y Y N
18 46 Y N - Y N Y N
19 47 Y N - Y N Y N
20 48 Y Y 9/1 Lab-confirmed Y Y Unk N
21 50 Y Y 8/29 Epi-linked Y Y N N
22 50 Y N - Y N Y N
23 50 Y N - Y N N Y
24 52 Y Y 8/28 Lab-confirmed Y Y Y N
25 52 Y N - N N N N
26 53 Y Y 8/27 Epi-linked Y Y Y N
27 53 Y N - Y Y Y N
28 62 Y Y 8/27 Epi-linked Y Y Y N
29 62 Y N - Y N Y N
30 63 Y N - N N N N
31 67 Y N - N N N N
32 68 Y N - N N N N
33 69 Y N - Y Y Y N
34 71 Y N - Y N Y N
35 72 Y Y 8/27 Lab-confirmed Y Y Y N
36 74 Y N - Y Y N N
37 74 Y N - Y N Y N
38 78 Y Y 8/25 Epi-linked Y Y Y N
Data Source: Kalluri P, Crowe C, Reller M, Gaul L, Hayslett J, Barth S, Eliasberg S, Ferreira J, Holt K, Bengston S, Hendricks K, Sobel
J. An outbreak of foodborne botulism associated with food sold at a salvage store in Texas. Clin Infect Dis 2003;37:1490–5.

Displaying Public Health Data


Page 4-11
Tables of statistical m easures other than frequency
Tables 4.1–4.5 show case counts (frequency). The cells of a table
could also display averages, rates, relative risks, or other
epidemiological measures. As with any table, the title and/or
headings must clearly identify what data are presented. For
example, the title of Table 4.7 indicates that the data for reported
cases of primary and secondary syphilis are rates rather than
numbers.

Table 4.7 Rate per 100,000 Population for Reported Cases of Primary and Secondary Syphilis, by Age
and Race — United States, 2002
Age Group Am. Indian/ Asian/ Black, Non- White, Non-
(years) Alaska Native Pacific Is. Hispanic Hispanic Hispanic Total

10–14 0.0 0.1 0.3 0.1 0.0 0.1


15–19 0.5 0.2 8.6 1.9 0.3 1.7
20–24 5.0 1.5 20.7 4.3 1.1 4.4
25–29 2.7 1.6 19.1 4.9 1.8 4.6
30–34 2.0 2.2 18.2 6.1 3.0 5.4
35–39 4.8 1.6 20.1 7.1 3.6 6.0
40–44 4.5 1.6 16.6 4.4 2.8 4.6
45–54 6.1 0.6 11.8 2.7 1.4 2.6
55–64 1.4 0.0 4.6 0.6 0.5 0.9
65+ 0.8 0.0 1.5 0.5 0.1 0.2
Totals 2.4 0.9 9.8 2.7 1.2 2.4
Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.

Com posite tables


To conserve space in a report or manuscript, several tables are
sometimes combined into one. For example, epidemiologists often
create simple frequency distributions by age, sex, and other
demographic variables as separate tables, but editors may combine
them into one large composite table for publication. Table 4.8 is an
example of a composite table from the investigation of carbon
monoxide poisoning following the power failure in Maine.4

It is important to realize that this type of table should not be


interpreted as for a three-way table. The data in Table 4.8 have not
been arrayed to indicate the interrelationship of sex, age, smoking,
and disposition from medical care. Merely, several one variable
tables (independently assessing the number of cases by each of
these variables) have been concatenated for space conservation. So
this table would not help in assessing the modification that
smoking has on the risk of illness by age, for example. This
difference also explains why portraying total values would be
inappropriate and meaningless for Table 4.8.

Displaying Public Health Data


Page 4-12
Table 4.8 Number and Percentage of Confirmed Cases of Carbon Monoxide Poisoning Identified from
Four Hospitals, by Selected Characteristics — Maine, January 1998
CASES

Characteristic Number Percent

Total cases 100 100


Sex (female) 59 59
Age (years)
0–3 5 5
4–12 17 17
13–18 9 9
19–64 52 52
≥65 17 17
Smokers 20 20
Disposition
Released from ED* 83 83
Admitted to hospital 11 11
Transferred 5 5
Died 1 1
* ED = Emergency department

Data Source: Daley RW, Smith A, Paz-Argandona E, Mallilay J, McGeehin M. An outbreak of carbon monoxide poisoning after a
major ice storm in Maine. J Emerg Med 2000;18:87–93.

Table shells
Although you cannot analyze data before you have collected them,
epidemiologists anticipate and design their analyses in advance to
delineate what the study is going to convey, and to expedite the
analysis once the data are collected. In fact, most protocols, which
are written before a study can be conducted, require a description
of how the data will be analyzed. As part of the analysis plan, you
can develop table shells that show how the data will be organized
and displayed. Table shells are tables that are complete except for
the data. They show titles, headings, and categories. In developing
table shells that include continuous variables such as age, we
create more categories than we may later use, in order to disclose
any interesting patterns and quirks in the data.

The following table shells were designed before conducting a


case-control study of fractures related to falls in community-
dwelling elderly persons. The researchers were particularly
interested in assessing whether vigorous and/or mild physical
activity was associated with a lower risk of fall-related fractures.

Table shells of epidemiologic studies usually follow a standard


sequence from descriptive to analytic. The first and second tables
in the sequence usually cover clinical features of the health event
and demographic characteristics of the subjects. Next, the analyst
portrays the association of most interest to the researchers, in this
Displaying Public Health Data
Page 4-13
case, the association between physical activity and fracture.
Subsequent tables may present stratified or adjusted analyses,
refinements, and subset analyses. Of course, once the data are
available and used for these tables, additional analyses will come
to mind and should be pursued.

This sequence of table shells provides a systematic and logical


approach to the analysis. The first two tables (Table shells 4.9a and
4.9b), describing the health problem of interest and the population
studied, provide the background a reader would need to put the
analytic results in perspective.

Table Shell 4.9a Anatomic Site of Fall-related Fractures Sustained by Participants, SAFE Study — Miami,
1987–1989
Fracture Site Number (Percent)

Skull ____ ( )
Spine ____ ( )
Clavicle (collarbone) ____ ( )
Scapula (shoulderblade) ____ ( )
Humerus (upper arm) ____ ( )
Radius / ulna (lower arm) ____ ( )
Bones of the hand ____ ( )
Ribs, sternum ____ ( )
Pelvis ____ ( )
Neck of femur (hip) ____ ( )
Other parts of femur (upper leg) ____ ( )
Patella (knee) ____ ( )
Tibia / fibula (lower leg) ____ ( )
Ankle ____ ( )
Bones of the foot ____ ( )
Adapted from: Stevens, JA, Powell KE, Smith SM, Wingo PA, Sattin RW. Physical activity, functional limitations, and the risk of fall-
related fractures in community-dwelling elderly. Annals of Epidemiology 1997;7:54–61.

Displaying Public Health Data


Page 4-14
Table Shell 4.9b Selected Characteristics of Case and Control Participants, SAFE Study — Miami, 1987–
1989
CASES CONTROLS

Number (Percent) Number (Percent)

Age 65–74 ____ ( ) ____ ( )


75–84 ____ ( ) ____ ( )
≥85 ____ ( ) ____ ( )

Sex Male ____ ( ) ____ ( )


Female ____ ( ) ____ ( )

Race White ____ ( ) ____ ( )


Black ____ ( ) ____ ( )
Other ____ ( ) ____ ( )
Unknown ____ ( ) ____ ( )

Ethnicity Hispanic ____ ( ) ____ ( )


Non-Hispanic ____ ( ) ____ ( )
Unknown ____ ( ) ____ ( )

Hours/day spent on feet


<1 ____ ( ) ____ ( )
2–4 ____ ( ) ____ ( )
5–7 ____ ( ) ____ ( )
>8 ____ ( ) ____ ( )

Smoking status
Never smoked ____ ( ) ____ ( )
Former smoker ____ ( ) ____ ( )
Current smoker ____ ( ) ____ ( )
Unknown ____ ( ) ____ ( )

Alcohol use (drinks / week)


None ____ ( ) ____ ( )
<1 ____ ( ) ____ ( )
1–3 ____ ( ) ____ ( )
>4 ____ ( ) ____ ( )
Unknown ____ ( ) ____ ( )
Adapted from: Stevens, JA, Powell KE, Smith SM, Wingo PA, Sattin RW. Physical activity, functional limitations, and the risk of fall-
related fractures in community-dwelling elderly. Annals of Epidemiology 1997;7:54–61.

Now that the data in Table shells 4.9a and 4.9b have illustrated
descriptive characteristics of cases and controls in this study, we
are ready to refine the analysis by demonstrating the variability of
the data as assessed by statistical confidence intervals. Because of
the study design in this example, we have chosen the odds ratio to
assess statistical differences (see Lesson 3). Table shell 4.9c
illustrates a useful display for this information.

Displaying Public Health Data


Page 4-15
Table Shell 4.9c Relationship Between Physical Activity (Vigorous and Mild) and Fracture, SAFE Study
— Miami, 1987–1989
CASES CONTROLS Odds Ratio
(95% Confidence
Number (Percent) Number (Percent) Interval)
Vigorous Activity Yes ____ ( ) ____ ( ) _____ (____ – ____)
No ____ ( ) ____ ( )

Mild Activity Yes ____ ( ) ____ ( ) _____ (____ – ____)


No ____ ( ) ____ ( )
Adapted from: Stevens, JA, Powell KE, Smith SM, Wingo PA, Sattin RW. Physical activity, functional limitations, and the risk of fall-
related fractures in community-dwelling elderly. Annals of Epidemiology 1997;7:54–61.

Creating class intervals


If the epidemiologic hypothesis for the investigation involves
Conventional Rounding variables such as “gender” or “exposure to a risk factor (yes/no),”
Rules
the construction of tables as described thus far in this chapter
If a fraction is greater should be straightforward. Often, however, the presumed risk
than .5, round it up (e.g., factor may not be so conveniently packaged. We may need to
round 6.6 to 7).
investigate an infection acquired as a result of hospitalization and
If a fraction is less than “days of hospitalization” may be relevant; for many chronic
.5, round it down (e.g., conditions, blood pressure is an important factor; if we are
round 6.4 to 6). interested in the effect of alcohol consumption on health risk,
If a fraction is exactly .5,
number of drinks per week may be an important measurement.
it is recommended that These examples illustrate relevant variables that have a broader
you round it to the even range of possible responses than are easily handled by the methods
value (e.g., round both 5.5 described earlier in this chapter. One solution in this case is to
and 6.5 to 6). More
common and also create class intervals for your data, keeping the following
acceptable is to round it guidelines in mind:
up (e.g., round 6.5 to 7)
• Class intervals should be mutually exclusive and exhaustive. In
plain language, that means that each individual in your data set
should fit uniquely into one class interval, and all persons
should fit into some class interval. So, for example, age ranges
should not overlap. Most measures follow conventional
rounding rules (see sidebar).

A general tip is to use a large number of class intervals for the


initial analysis to gain an appreciation for the variability of your
data. You can combine your categories later.

• Use principles of biologic plausibility when constructing


categories. For example, when analyzing infant and childhood
mortality, we might use categories of 0–12 months (since
neonatal problems are different epidemiologically from those of
other childhood problems), 1–5 years (since these result

Displaying Public Health Data


Page 4-16
from causes of death primarily outside of institutions), and 5–
10 years (since these may result from risks in school settings).
Table 4.10 illustrates age groups that are sensible for the study
of various health conditions that are behaviorally-related.
CDC’s National Center for • A natural baseline group should be kept as a distinct category.
Health Statistics uses the
following age
Often the baseline group will include those who have not had
categorizations: an exposure, e.g., non-smokers (0 cigarettes per day).

<1 infants • If you wish to calculate rates to illustrate the relative risk of
1–4 toddlers
adverse health events by these categories of risk factors, be
5–14 adolescents
15–24 teens and young sure that the intervals you choose for the classes of your data
adults are the same as the intervals for the denominators that you will
25–44 adults find for readily available data. For example, to compute rates
45–64 older adults
>65 elderly
of infant mortality by maternal age, you must find data on the
number of live-born infants to women; in determining age
groupings, consider what categories are used by the United
States Census Bureau.

• Always consider a category for “unknown” or “not stated.”

Table 4.10 Age Groupings Used for Different Conditions, as Reported in Surveillance Summaries, CDC,
2003
Overweight In Traumatic Brain Pregnancy- Vaccine Adverse
Adults7 Injury8 Related Mortality9 HIV/AIDS10 Events11

18–24 years <4 years <19 years <13 years <1 year
25–34 5–14 20–24 13–14 1–6
35–44 15–19 25–29 15–24 7–17
45–54 20–24 30–34 25–34 18–64
55–64 25–34 35–39 35–44 >65
65–74 35–44 >40 45–54
>75 45–64 55–64
>65 >65

Total Total Total Total Total

In addition to these guidelines for creating class intervals, the


analyst must decide how many intervals to portray. If no natural or
standard class intervals are apparent, the strategies below may be
helpful.

Strategy 1: Divide the data into groups of similar size


A particularly appropriate approach if you plan to create area maps
(see later section on Maps) is to create a number of class intervals,
each with the same number of observations. For example, to
portray the rates of incidence of lung cancer by state (for men,
2001), one might group the rates into four class intervals, each
with 10–12 observations:
Displaying Public Health Data
Page 4-17
Table 4.11 Rates of Lung Cancer in Men, 2001 by State (and the District
of Columbia)
Rate Number of States in the US Cumulative Frequency

22.1–48.3 11 11
48.4–53.3 11 22
53.4–58.7 12 34
58.8–73.3 10 44
Missing data 7 51
Data Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 2002
Incidence and Mortality. Atlanta: U.S. Department of Health and Human Services, Centers
for Disease Control and Prevention and National Cancer Institute; 2005.

Strategy 2: Base intervals on mean and standard deviation


With this strategy, you can create three, four, or six class intervals.
First, calculate the mean and standard deviation of the distribution
of data. (Lesson 2 covers the calculation of these measures.) Then
use the mean plus or minus different multiples of the standard
deviation to establish the upper limits for the intervals. This
strategy is most appropriate for large data sets. For example, let’s
suppose you are investigating a scoring system for preparedness of
health departments to respond to emerging and urgent threats. You
have devised a series of evaluation questions ranging from 0 to
100, with 100 being highest. You conduct a survey and find that
the scores for health departments in your jurisdiction range from
19 to 82; the mean of the scores is 50, and the standard deviation is
10. Here, the strategy for establishing six intervals for these data
specifies:
Upper limit of interval 6 = maximum value = 82
Upper limit of interval 5 = 50 + 20 = 70
Upper limit of interval 4 = 50 + 10 = 60
Upper limit of interval 3 = 50
Upper limit of interval 2 = 50 − 10 = 40
Upper limit of interval 1 = 50 − 20 = 30
Lower limit of interval 1 = 19

If you then select the obvious lower limit for each upper limit, you
have the six intervals:
Interval 6 = 71–82 Interval 3 = 41–50
Interval 5 = 61–70 Interval 2 = 31–40
Interval 4 = 51–60 Interval 1 = 19–30
You can create three or four intervals by combining some of the
adjacent six-interval limits.

Strategy 3: Divide the range into equal class intervals


This method is the simplest and most commonly used, and is most
readily adapted to graphs. The selection of groups or categories is
often arbitrary, but must be consistent (for example, age groups by
Displaying Public Health Data
Page 4-18
5 or 10 years throughout the data set). To use equal class intervals,
do the following:

Find the range of the values in your data set. That is, find the
difference between the maximum value (or some slightly larger
convenient value) and zero (or the minimum value).

Decide how many class intervals (groups or categories) you want


to have. For tables, choose between four and eight class intervals.
For graphs and maps, choose between three and six class intervals.
The number will depend on what aspects of the data you want to
highlight.

Find what size of class interval to use by dividing the range by the
number of class intervals you have decided on.

Begin with the minimum value as the lower limit of your first
interval and specify class intervals of whatever size you calculated
until you reach the maximum value in your data.

For example, to display 52 observations, say the percentage of men


over age 40 screened for prostate cancer within the past two years
in 2004 by state (including Puerto Rico and the District of
Columbia), you could create five categories, each containing the
number of states with percentages of screened men in the given
range.
Table 4.12 Percentage of Men Over Age 40 Screened for Prostate
Cancer, by State (including Puerto Rico and the District of Columbia),
2004
Percentage Number of States Cumulative Frequency

40.0–44.9 3 3
45.0–49.9 18 21
50.0–54.9 25 46
55.0–59.9 5 51
60.0–64.9 1 52
Data Source: Behavioral Risk Factor Surveillance System [Internet]. Atlanta: Centers for
Disease Control and Prevention. Available from: http://www.cdc.gov/brfss/.

Displaying Public Health Data


Page 4-19
EXAMPLE: Creating Class Interval Categories

Use each strategy to create four class interval categories by using the lung cancer mortality rates shown in Table
4.13.
Table 4.13 Age-adjusted Lung Cancer Death Rates per 100,000 population, in Rank
Order by State — United States, 2000
Rate per Rate per
Rank State 100,000 Rank State 100,000
1 Kentucky 116.1 26 Florida 75.3
2 Mississippi 111.7 27 Kansas 74.5
3 West Virginia 104.1 28 Massachusetts 73.6
4 Tennessee 103.4 29 Alaska 72.9
5 Alabama 100.8 30 Oregon 72.7
6 Louisiana 99.2 31 New Hampshire 71.2
7 Arkansas 99.1 32 New Jersey 71.2
8 North Carolina 94.6 33 Washington 71.2
9 Georgia 93.2 34 Vermont 70.2
10 South Carolina 92.4 35 South Dakota 68.1
11 Indiana 91.6 36 Wisconsin 67.0
12 Oklahoma 89.4 37 Montana 66.5
13 Missouri 88.5 38 Connecticut 66.4
14 Ohio 85.6 39 New York 66.2
15 Virginia 83.0 40 Nebraska 65.6
16 Maine 80.2 41 North Dakota 64.9
17 Illinois 80.0 42 Wyoming 64.4
18 Texas 79.3 43 Arizona 62.0
19 Maryland 79.2 44 Minnesota 60.7
20 Nevada 78.7 45 California 60.1
21 Delaware 78.2 46 Idaho 59.7
22 Rhode Island 77.9 47 New Mexico 52.3
23 Iowa 77.0 48 Colorado 52.1
24 Michigan 76.7 49 Hawaii 49.8
25 Pennsylvania 76.5 50 Utah 39.7
Total United States 76.9
Data Source: Stewart SL, King JB, Thompson TD, Friedman C, Wingo PA. Cancer Mortality–United States,
1990-2000. In: Surveillance Summaries, June 4, 2004. MMWR 2004;53 (No. SS-3):23–30.

Strategy 1: Divide the data into groups of sim ilar size


(Note: If the states in Table 4.13 had been listed alphabetically rather than in rank order, the first step would have
been to sort the data into rank order by rate. Fortunately, this has already been done.)

1. Divide the list into four equal sized groups of places:

50 states / 4 = 12.5 states per group. Because states can’t be cut in half, use two groups of 12 states and two
groups of 13 states. Missouri (#13) could go into either the first or second group and Connecticut (#38) could
go into either third or fourth group. Arbitrarily putting Missouri in the second category and Connecticut into the
third results in the following groups:
a. Kentucky through Oklahoma (States 1–12)
b. Missouri through Pennsylvania (States 13–25)
c. Florida through Connecticut (States 26–38)
d. New York through Utah (States 39–50)

2. Identify the rate for the first and last state in each group:
a. Oklahoma through Kentucky 89.4–116.1
b. Pennsylvania through Missouri 76.5–88.5
c. Connecticut through Florida 66.4–75.3
d. Utah through New York 39.7–66.2

Displaying Public Health Data


Page 4-20
EXAMPLE: Creating Class Interval Categories (Continued)

3. Adjust the limits of each interval so no gap exists between the end of one class interval and beginning of the
next. Deciding how to adjust the limits is somewhat arbitrary — you could split the difference, or use a
convenient round number.
a. Oklahoma through Kentucky 89.0–116.1
b. Pennsylvania through Missouri 76.0–88.9
c. Connecticut through Florida 66.3–75.9
d. Utah through New York 39.7–66.2

Strategy 2: Base intervals on m ean and standard deviation


1. Calculate the mean and standard deviation (see Lesson 2 for instructions in calculating these measures.):
Mean = 77.1
Standard deviation = 16.1

2. Find the upper limits of four intervals


a. Upper limit of interval 4 = maximum value = 116.1
b. Upper limit of interval 3 = mean + 1 standard deviation = 77.1 + 16.1 = 93.2
c. Upper limit of interval 2 = mean = 77.1
d. Upper limit of interval 1 = mean – 1 standard deviation = 77.1 – 16.1 = 61.0
e. Lower limit of interval 1 = minimum value = 39.7

3. Select the lower limit for each upper limit to define four full intervals. Specify the states that fall into each
interval. (Note: To place the states with the highest rates first, reverse the order of the intervals):
a. North Carolina through Kentucky (8 states) 93.3–116.1
b. Rhode Island through Georgia (14 states) 77.1–93.2
c. Arizona through Iowa (21 states) 61.1–77.1
d. Utah through Minnesota (7 states) 39.7–61.0

Strategy 3: Divide the range into equal class intervals


4. Divide the range from zero (or the minimum value) to the maximum by 4:
a. (116.1 – 39.7) / 4 = 76.4 / 4 = 19.1

5. Use multiples of 19.1 to create four categories, starting with 39.7:


a. 39.7 through (39.7 + 19.1) = 39.7 through 58.8
b. 58.9 through (39.7 + [2 x 19.1]) = 58.9 through 77.9
c. 78.0 through (39.7 + [3 x 19.1]) = 78.0 through 97.0
d. 97.1 through (39.7 + [4 x 19.1]) = 97.1 through 116.1

6. Final categories:
a. Arkansas through Kentucky (7 states) 97.1–116.1
b. Delaware through North Carolina (14 states) 78.0–97.0
c. Idaho through Rhode Island (25 states) 58.9–77.9
d. Utah through New Mexico (4 states) 39.7–58.8

7. Alternatively, since 19.1 is close to 20, multiples of 20 might be used to create the four categories that might
look cleaner. For example, the final categories could look like:
a. Arkansas through Kentucky (7 states) 97.0–116.9
b. Iowa through North Carolina (16 states) 77.0–96.9
c. Idaho through Michigan (23 states) 57.0–76.9
d. Utah through New Mexico (4 states) 37.0–56.9
OR
a. Alabama through Kentucky (5 states) 100.0–119.9
b. Illinois through Louisiana (12 states) 80.0–99.9
c. California through Texas (28 states) 60.0–79.9
d. Utah through Idaho (5 states) 39.7–59.9

Displaying Public Health Data


Page 4-21
Exercise 4.2
With the data on lung cancer mortality rates presented in Table 4.13, use
each strategy to create three class intervals for the rates.

Check your answers on page 4-73

Displaying Public Health Data


Page 4-22
Graphs
A graph (used here interchangeably with chart) displays numeric
“Charts…should fulfill
certain basic objectives:
data in visual form. It can display patterns, trends, aberrations,
they should be: (1) similarities, and differences in the data that may not be evident in
accurate representations tables. As such, a graph can be an essential tool for analyzing and
of the facts, (2) clear, trying to make sense of data. In addition, a graph is often an
easily read, and
understood, and (3) so effective way to present data to others less familiar with the data.
designed and constructed
as to attract and hold When designing graphs, the guidelines for categorizing data for
attention.”12
- CF Schmid and
tables also apply. In addition, some best practices for graphics
SE Schmid include:
• Ensure that a graphic can stand alone by clear labeling of
title, source, axes, scales, and legends;
• Clearly identify variables portrayed (legends or keys),
including units of measure;
• Minimize number of lines on a graph;
• Generally, portray frequency on the vertical scale, starting at
zero, and classification variable on horizontal scale;
• Ensure that scales for each axis are appropriate for data
presented;
• Define any abbreviations or symbols; and
• Specify any data excluded.

In epidemiology, most graphs have two scales or axes, one


horizontal and one vertical, that intersect at a right angle. The
horizontal axis is known as the x-axis and generally shows values
of the independent (or x) variable, such as time or age group. The
vertical axis is the y-axis and shows the dependent (or y) variable,
which, in epidemiology, is usually a frequency measure such as
number of cases or rate of disease. Each axis should be labeled to
show what it represents (both the name of the variable and the
units in which it is measured) and marked by a scale of
measurement along the line.

In constructing a useful graph, the guidelines for categorizing data


“Make the data stand out.
Avoid superfluity.”13 for tables by types of data also apply. For example, the number of
- WS Cleveland reported measles cases by year of report is technically a nominal
variable, but because of the large number of cases when
aggregated over the United States, we can treat this variable as a
continuous one. As such, a line graph is appropriate to display
these data.

Displaying Public Health Data


Page 4-23
Try It: Plotting a Graph

Scenario: Table 4.14 shows the number of measles cases by year of report from 1950 to 2003. The number of
measles cases in years 1950 through 1954 has been plotted in Figure 4.1, below. The independent variable, years,
is shown on the horizontal axis. The dependent variable, number of cases, is shown on the vertical axis. A grid is
included in Figure 4.1 to illustrate how points are plotted. For example, to plot the point on the graph for the
number of cases in 1953, draw a line up from 1953, and then draw a line from 449 cases to the right. The point
where these lines intersect is the point for 1953 on the graph.

Your Turn: Use the data in Table 4.14 to plot the points for 1955 to 1959 and complete the graph in Figure 4.1.

Figure 4.1 Partial Graph of Measles by Year of Report — United States, 1950–1959

Table 4.14 Number of Reported Measles Cases, by Year of Report — United States, 1950–2003
Year Cases Year Cases Year Cases
1950 319,000 1970 47,351 1990 27,786
1951 530,000 1971 75,290 1991 9,643
1952 683,000 1972 32,275 1992 2,237
1953 449,000 1973 26,690 1993 312
1954 683,000 1974 22,094 1994 963
1955 555,000 1975 24,374 1995 309
1956 612,000 1976 41,126 1996 508
1957 487,000 1977 57,345 1997 138
1958 763,000 1978 26,871 1998 100
1959 406,000 1979 13,597 1999 100
1960 442,000 1980 13,506 2000 86
1961 424,000 1981 3,124 2001 116
1962 482,000 1982 1,714 2002 44
1963 385,000 1983 1,497 2003 56
1964 458,000 1984 2,587
1965 262,000 1985 2,822
1966 204,000 1986 6,282
1967 62,705 1987 3,655
1968 22,231 1988 3,396
1969 25,826 1989 18,193
Data Sources: Centers for Disease Control and Prevention. Summary of notifiable diseases–United States, 1989. MMWR
1989;38(No. 54).
Centers for Disease Control and Prevention. Summary of notifiable diseases–United States, 2002. MMWR 2002;51(No. 53)
Centers for Disease Control and Prevention. Summary of notifiable diseases–United States, 2003. MMWR 2005;52(No. 54)

Displaying Public Health Data


Page 4-24
Arithm etic-scale line graphs
An arithmetic-scale line graph (such as Figure 4.1) shows patterns
or trends over some variable, often time. In epidemiology, this type
of graph is used to show long series of data and to compare several
series. It is the method of choice for plotting rates over time.

In an arithmetic-scale line graph, a set distance along any axis


represents the same quantity anywhere on that axis. In Figure 4.2,
for example, the space between tick marks along the y-axis
(vertical axis) represents an increase of 10,000 (10 x 1,000) cases
anywhere along the axis — a continuous variable.

Furthermore, the distance between any two tick marks on the x-


axis (horizontal axis) represents a period of time of one year. This
represents an example of a discrete variable. Thus an arithmetic-
scale line graph is one in which equal distances along either the x-
or y- axis portray equal values.

Arithmetic-scale line graphs can display numbers, rates,


proportions, or other quantitative measures on the y-axis.
Generally, the x-axis for these graphs is used to portray the time
period of data occurrence, collection, or reporting (e.g., days,
weeks, months, or years). Thus, these graphs are primarily used to
portray an overall trend over time, rather than an analysis of
particular observations (single data points). For example, Figure
4.2 shows prevalence (of neural tube defects) per 100,000 births.

Figure 4.2 Trends in Neural Tube Defects (Anencephaly and Spina


Bifida) Among All Births, 45 States and District of Columbia, 1990–
1999

Source: Honein MA, Paulozzi LJ, Mathews TJ, Erickson JD, Wong L-Y. Impact of folic acid
fortification of the US food supply on the occurrence of neural tube defects. JAMA
2001;285:2981–6.
Displaying Public Health Data
Page 4-25
Figure 4.3 shows another example of an arithmetic-scale line
graph. Here the y-axis is a calculated variable, median age at death
of people born with Down’s syndrome from 1983–1997. Here also,
we see the value of showing two data series on one graph; we can
compare the mortality risk for males and females.

Figure 4.3 Median Age at Death of People with Down’s Syndrome by


Sex — United States, 1983–1997

Source: Yang Q, Rasmussen A, Friedman JM. Mortality associated with Down’s syndrome in
the USA from 1983 to 1997: a population-based study. Lancet 2002;359:1019–25.

More About the X-axis and the Y-axis

When you create an arithmetic-scale line graph, you need to select a scale for the x- and y-axes. The scale should
reflect both the data and the point of the graph. For example, if you use the data in Table 4.14 to graph the number
of cases of measles cases by year from 1990 to 2002, then the scale of the x-axis will most likely be year of report,
because that is how the data are available. Consider, however, if you had line-listed data with the actual dates of
onset or report that spanned several years. You might prefer to plot these data by week, month, quarter, or even
year, depending on the point you wish to make.

The following steps are recommended for creating a scale for the y-axis.
• Make the length of the y-axis shorter than the x-axis so that your graph is horizontal or “landscape.” A 5:3 ratio
is often recommended for the length of the x-axis to y-axis.
• Always start the y-axis with 0. While this recommendation is not followed in all fields, it is the standard practice
in epidemiology.
• Determine the range of values you need to show on the y-axis by identifying the largest value you need to graph
on the y-axis and rounding that figure off to a slightly larger number. For example, the largest y-value in Figure
4.3 is 49 years in 1997, so the scale on the y-axis goes up to 50. If median age continues to increase and
exceeds 50 in future years, a future graph will have to extend the scale on the y-axis to 60 years.
• Space the tick marks and their labels to describe the data in sufficient detail for your purposes. In Figure 4.3, five
intervals of 10 years each were considered adequate to give the reader a good sense of the data points and
pattern.

Displaying Public Health Data


Page 4-26
Exercise 4.3
Using the data on measles rates (per 100,000) from 1955 to 2002 in Table
4.15:

1. Construct an arithmetic-scale line graph of rate by year. Use intervals on the y-axis that are
appropriate for the range of data you are graphing.

2. Construct a separate arithmetic-scale line graph of the measles rates from 1985 to 2002.
Use intervals on the y-axis that are appropriate for the range of data you are graphing.

Graph paper is provided at the end of this lesson.

Table 4.15 Rate (per 100,000 Population) of Reported Measles Cases by Year of Report — United
States, 1955–2002
Rate per Rate per
Year 100,000 Year Rate per 100,000 Year 100,000

1955 336.3 1971 36.5 1987 1.5


1956 364.1 1972 15.5 1988 1.4
1957 283.4 1973 12.7 1989 7.3
1958 438.2 1974 10.5 1990 11.2
1959 229.3 1975 11.4 1991 3.8
1960 246.3 1976 19.2 1992 0.9
1961 231.6 1977 26.5 1993 0.1
1962 259.0 1978 12.3 1994 0.4
1963 204.2 1979 6.2 1995 0.1
1964 239.4 1980 6.0 1996 0.2
1965 135.1 1981 1.4 1997 0.06
1966 104.2 1982 0.7 1998 0.04
1967 31.7 1983 0.6 1999 0.04
1968 11.1 1984 1.1 2000 0.03
1969 12.8 1985 1.2 2001 0.04
1970 23.2 1986 2.6 2002 0.02

Data Sources: Centers for Disease Control. Summary of notifiable diseases–United States, 1989. MMWR 1989;38(No. 54).
Centers for Disease Control and Prevention. Summary of notifiable diseases–United States, 2002. Published April 30, 2004 for
MMWR 2002;51(No. 53).

Check your answers on page 4-75


Displaying Public Health Data
Page 4-27
Sem ilogarithm ic-scale line graphs
In some cases, the range of data observed may be so large that
proper construction of an arithmetic-scale graph is problematic.
For example, in the United States, vaccination policies have
greatly reduced the incidence of mumps; however, outbreaks can
still occur in unvaccinated populations. To portray these competing
forces, an arithmetic graph is insufficient without an inset
amplifying the problem years (Figure 4.4).

Figure 4.4 Mumps by Year — United States, 1978–2003

Source: Centers for Disease Control and Prevention. Summary of notifiable diseases–United
States, 2003. Published April 22, 2005, for MMWR 2003;52(No. 54):54.

An alternative approach to this problem of incompatible scales is


to use a logarithmic transformation for the y-axis. Termed a
“semi-log” graph, this technique is useful for displaying a variable
with a wide range of values (as illustrated in Figure 4.5). The x-
axis uses the usual arithmetic-scale, but the y-axis is measured on a
logarithmic rather than an arithmetic scale. As a result, the distance
from 1 to 10 on the y- axis is the same as the distance from 10 to
100 or 100 to 1,000.

Displaying Public Health Data


Page 4-28
Another use for the semi-log graph is when you are interested in
portraying the relative rate of change of several series, rather than
the absolute value. Figure 4.5 shows this application. Note several
aspects of this graph:

• The y-axis includes four cycles of the order of magnitude,


Cycle = order of each a multiple of ten (e.g., 0.1 to 1, 1 to 10, etc.) — each a
magnitude
constant multiple.
That is, from 1 to 10 is
one cycle; from 10 to 100 • Within a cycle, the ten tick-marks are spaced so that spaces
is another cycle.
become smaller as the value increases. Notice that the absolute
distance from 1.0 to 2.0 is wider than the distance from 2.0 to
3.0, which is, in turn, wider than the distance from 8.0 to 9.0.
This results from the fact that we are graphing the logarithmic
transformation of numbers, which, in fact, shrinks them as they
become larger. We can still compare series, however, since the
shrinking process preserves the relative change between series.

Figure 4.5 Age-adjusted Death Rates for 5 of the 15 Leading Causes of


Death — United States, 1958–2002

Adapted from: Kochanek KD, Murphy SL, Anderson RN, Scott C. Deaths: final data for
2002. National vital statistics report; vol 53, no 5. Hyattsville, Maryland: National Center for
Health Statistics, 2004. p. 9.

Displaying Public Health Data


Page 4-29
Consider the data shown in Table 4.16. Two hypothetical countries
begin with a population of 1,000,000. The population of Country A
grows by 100,000 persons each year. The population of Country B
grows by 10% each year. Figure 4.6 displays data from Country A
on the left, and Country B on the right. Arithmetic-scale line
graphs are above semilog-scale line graphs of the same data. Look
at the left side of the figure. Because the population of Country A
grows by a constant number of persons each year, the data on the
arithmetic-scale line graph fall on a straight line. However,
because the percentage growth in Country A declines each year,
the curve on the semilog-scale line graph flattens. On the right side
of the figure the population of Country B curves upward on the
arithmetic-scale line graph but is a straight line on the semilog
graph. In summary, a straight line on an arithmetic-scale line graph
represents a constant change in the number or amount. A straight
line on a semilog-scale line graph represents a constant percent
change from a constant rate.

Table 4.16 Hypothetical Population Growth in Two Countries


COUNTRY A COUNTRY B
(Constant Growth by 100,000) (Constant Growth by 10%)

Year Population Growth Rate Population Growth Rate

0 1,000,000 1,000,000
1 1,100,000 10.0% 1,100,000 10.0%
2 1,200,000 9.1% 1,210,000 10.0%
3 1,300,000 8.3% 1,331,000 10.0%
4 1,400,000 7.7% 1,464,100 10.0%
5 1,500,000 7.1% 1,610,510 10.0%
6 1,600,000 6.7% 1,771,561 10.0%
7 1,700,000 6.3% 1,948,717 10.0%
8 1,800,000 5.9% 2,143,589 10.0%
9 1,900,000 5.6% 2,357,948 10.0%
10 2,000,000 5.3% 2,593,742 10.0%
11 2,100,000 5.0% 2,853,117 10.0%
12 2,200,000 4.8% 3,138,428 10.0%
13 2,300,000 4.4% 3,452,271 10.0%
14 2,400,000 4.3% 3,797,498 10.0%
15 2,500,000 4.2% 4,177,248 10.0%
16 2,600,000 4.0% 4,594,973 10.0%
17 2,700,000 3.8% 5,054,470 10.0%
18 2,800,000 3.7% 5,559,917 10.0%
19 2,900,000 3.6% 6,115,909 10.0%
20 3,000,000 3.4% 6,727,500 10.0%

Displaying Public Health Data


Page 4-30
Figure 4.6 Comparison of Arithmetic-scale Line Graph and
Semilogarithmic-scale Line Graph for Hypothetical Country A (Constant
Increase in Number of People) and Country B (Constant Increase in
Rate of Growth)

To create a semilogarithmic
graph from a data set in
Analysis Module:

To calculate data for


plotting, you must define a
new variable. For example,
if you want a semilog plot
for annual measles
surveillance data in a Consequently, a semilog-scale line graph has the following
variable called MEASLES,
under the VARIABLES
features:
section of the Analysis • The slope of the line indicates the rate of increase or
commands: decrease.
• A straight line indicates a constant rate (not amount) of
• Select Define. increase or decrease in the values.
• Type logmeasles into
• A horizontal line indicates no change.
the Variable Name box.
• Since your new variable is • Two or more lines following parallel paths show identical
not used by other rates of change.
programs, the Scope
should be Standard.
• Click on OK to define the Semilog graph paper is available commercially, and most include
new variable. Note that at least three cycles.
logmeasles now
appears in the pull-down
list of Variables. Histogram s
• Under the Variables A histogram is a graph of the frequency distribution of a
section of the Analysis continuous variable, based on class intervals. It uses adjoining
commands, select columns to represent the number of observations for each class
Assign.
interval in the distribution. The area of each column is proportional
Types of variables and class to the number of observations in that interval. Figures 4.7a and
intervals are discussed in 4.7b show two versions of a histogram of frequency distributions
Lesson 2.
with equal class intervals. Since all class intervals are equal in this
histogram, the height of each column is in proportion to the
number of observations it depicts.
Figures 4.7a, 4.7b, and 4.7c are examples of a particular type of
histogram that is commonly used in field epidemiology — the
Displaying Public Health Data
Page 4-31
epidemic curve. An epidemic curve is a histogram that displays the
number of cases of disease during an outbreak or epidemic by
times of onset. The y-axis represents the number of cases; the x-
axis represents date and/or time of onset of illness. Figure 4.7a is a
perfectly acceptable epidemic curve, but some epidemiologists
prefer drawing the histogram as stacks of squares, with each square
representing one case (Figure 4.7b). Additional information may
be added to the histogram. The rendition of the epidemic curve
shown in Figure 4.7c shades the individual boxes in each time
period to denote which cases have been confirmed with culture
results. Other information such as gender or presence of a related
risk factor could be portrayed in this fashion.

Conventionally, the numbers on the x-axis are centered between


the tick marks of the appropriate interval. The interval of time
should be appropriate for the disease in question, the duration of
the outbreak, and the purpose of the graph. If the purpose is to
show the temporal relationship between time of exposure and onset
of disease, then a widely accepted rule of thumb is to use intervals
approximately one-fourth (or between one-eighth and one-third) of
the incubation period of the disease shown. The incubation period
for salmonellosis is usually 12–36 hours, so the x-axis of this
epidemic curve has 12-hour intervals.
Figure 4.7a Number of Cases of Salm onella Enteriditis Among Party
Attendees by Date and Time of Onset — Chicago, Illinois, February
2000

Source: Cortese M, Gerber S, Jones E, Fernandez J. A Salmonella Enteriditis outbreak in


Chicago. Presented at the Eastern Regional Epidemic Intelligence Service Conference,
March 23, 2000, Boston, Massachusetts.

Displaying Public Health Data


Page 4-32
Figure 4.7b Number of Cases of Salm onella Enteriditis Among Party
Attendees by Date and Time of Onset — Chicago, Illinois, February
2000

Source: Cortese M, Gerber S, Jones E, Fernandez J. A Salmonella Enteriditis outbreak in


Chicago. Presented at the Eastern Regional Epidemic Intelligence Service Conference,
March 23, 2000, Boston, Massachusetts.

The most common choice for the x-axis variable in field


epidemiology is calendar time, as shown in Figures 4.7a–c.
However, age, cholesterol level or another continuous-scale
variable may be used on the x-axis of an epidemic curve.

Figure 4.7c Number of Cases of Salm onella Enteriditis Among Party


Attendees by Date and Time of Onset — Chicago, Illinois, February
2000

Source: Cortese M, Gerber S, Jones E, Fernandez J. A Salmonella Enteriditis outbreak in


Chicago. Presented at the Eastern Regional Epidemic Intelligence Service Conference,
March 23, 2000, Boston, Massachusetts.

Displaying Public Health Data


Page 4-33
In Figure 4.8, which shows a frequency distribution of adults with
diagnosed diabetes in the United States, the x-axis displays a
measure of body mass — weight (in kilograms) divided by height
(in meters) squared. The choice of variable for the x-axis of an
epidemic curve is clearly dependent on the point of the display.
Figures 4.7a, 4.7b, or 4.7c are constructed to show the natural
course of the epidemic over time; Figure 4.8 conveys the burden of
the problem of overweight and obesity.
Figure 4.8 Distribution of Body Mass Index Among Adults with
Diagnosed Diabetes — United States, 1999–2002

Data Source: Centers for Disease Control and Prevention. Prevalence of overweight and
obesity among adults with diagnosed diabetes–United States, 1988-1994 and 1999-2002.
MMWR 2004;53:1066–8.

The component of most interest should always be put at the bottom


because the upper component usually has a jagged baseline that
may make comparison difficult. Consider the data on
pneumoconiosis in Figure 4.9a. The graph clearly displays a
gradual decline in deaths from all pneumoconiosis between 1972
and 1999. It appears that deaths from asbestosis (top subgroup in
Figure 4.9a) went against the overall trend, by increasing over the
same period. However, Figure 4.9b makes this point more clearly
by placing asbestosis along the baseline.

Displaying Public Health Data


Page 4-34
Figure 4.9a Number of Deaths with Any Death Certificate Mention of
Asbestosis, Coal Worker’s Pneumoconiosis (CWP), Silicosis, and
Unspecified/Other Pneumoconiosis Among Persons Aged > 15 Years,
by Year — United States, 1968–2000

Adapted from: Centers for Disease Control and Prevention. Changing patterns of
pneumoconiosis mortality–United States, 1968-2000. MMWR 2004;53:627–31.

Figure 4.9b Number of Deaths with Any Death Certificate Mention of


Asbestosis, Coal Worker’s Pneumoconiosis (CWP), Silicosis, and
Unspecified/Other Pneumoconiosis Among Persons Aged > 15 Years,
by Year — United States, 1968–2000

Data Source: Centers for Disease Control and Prevention. Changing patterns of
pneumoconiosis mortality–United States, 1968-2000. MMWR 2004;53:627–31.

Some histograms, particularly those that are drawn as stacks of


Epidemic curves are squares, include a box that indicates how many cases are
discussed in more detail in
Lesson 6. represented by each square. While a square usually represents one
case in a relatively small outbreak, a square may represent five or
ten cases in a relatively large outbreak.

Displaying Public Health Data


Page 4-35
Exercise 4.4
Using the botulism data presented in Exercise 4.1, draw an epidemic curve.
Then use this epidemic curve to describe this outbreak as if you were
speaking over the telephone to someone who cannot see the graph. Graph
paper is provided at the end of this lesson.

Check your answers on page 4-76

Displaying Public Health Data


Page 4-36
P opulation pyram id
A population pyramid displays the count or percentage of a
population by age and sex. It does so by using two histograms —
most often one for females and one for males, each by age group
— turned sideways so the bars are horizontal, and placed base to
base (Figures 4.10 and 4.11). Notice the overall pyramidal shape of
the population distribution of a developing country with many
births, relatively high infant mortality, and relatively low life
expectancy (Figure 4.10). Compare that with the shape of the
population distribution of a more developed country with fewer
births, lower infant mortality, and higher life expectancy (Figure
4.11).
Figure 4.10 Population Distribution of Zambia by Age and Sex, 2000

Source: U.S. Census Bureau [Internet]. Washington, DC: IDB Population Pyramids [cited
2004 Sep 10]. Available from: http://www.census.gov/population/international/.

Figure 4.11 Population Distribution of Sweden by Age and Sex, 1997

Source: U.S. Census Bureau [Internet]. Washington, DC: IDB Population Pyramids [cited
2004 Sep 10]. Available from: http://www.census.gov/population/international/.

While population pyramids are used most often to display the


distribution of a national population, they can also be used to
display other data such as disease or a health characteristic by age
Displaying Public Health Data
Page 4-37
and sex. For example, smoking prevalence by age and sex is
shown in Figure 4.12. This pyramid clearly shows that, at every
age, females are less likely to be current smokers than males.
Figure 4.12 Percentage of Persons >18 Years Who Were Current
Smokers,* by Age and Sex — United States, 2002

Answer “yes” to both questions: “Do you now smoke cigarettes everyday or some days?”
and “Have you smoked at least 100 cigarettes in your entire life?”

Data Source: Centers for Disease Control and Prevention. Cigarette smoking among adults–
United States, 2002. MMWR 2004;53:427–31.

Frequency polygons
A frequency polygon, like a histogram, is the graph of a frequency
distribution. In a frequency polygon, the number of observations
within an interval is marked with a single point placed at the
midpoint of the interval. Each point is then connected to the next
with a straight line. Figure 4.13 shows an example of a frequency
polygon over the outline of a histogram for the same data. This
graph makes it easy to identify the peak of the epidemic (4 weeks).
Figure 4.13 Comparison of Frequency Polygon and Histogram

Displaying Public Health Data


Page 4-38
A frequency polygon contains the same area under the line as does
a histogram of the same data. Indeed, the data that were displayed
as a histogram in Figure 4.9a are displayed as a frequency polygon
in Figure 4.14.
Figure 4.14 Number of Deaths with Any Death Certificate Mention of
Asbestosis, Coal Worker’s Pneumoconiosis (CWP), Silicosis, and
Unspecified/Other Pneumoconiosis Among Persons Aged > 15 Years,
by Year — United States, 1968–2000

Data Source: Centers for Disease Control and Prevention. Changing patterns of
pneumoconiosis mortality–United States, 1968-2000. MMWR 2004;53:627–31.

A frequency polygon differs from an arithmetic-scale line graph in


several ways. A frequency polygon (or histogram) is used to
display the entire frequency distribution (counts) of a continuous
variable. An arithmetic-scale line graph is used to plot a series of
observed data points (counts or rates), usually over time. A
frequency polygon must be closed at both ends because the area
under the curve is representative of the data; an arithmetic-scale
line graph simply plots the data points. Compare the
pneumoconiosis mortality data displayed as a frequency polygon in
Figure 4.14 and as a line graph in Figure 4.15.
Figure 4.15 Number of Deaths with Any Death Certificate Mention of
Asbestosis, Coal Worker’s Pneumoconiosis (CWP), Silicosis, and
Unspecified/Other Pneumoconiosis Among Persons Aged > 15 Years,
by Year — United States, 1968–2000

Data Source: Centers for Disease Control and Prevention. Changing patterns of
pneumoconiosis mortality–United States, 1968-2000. MMWR 2004;53:627–31.

Displaying Public Health Data


Page 4-39
Exercise 4.5
Consider the epidemic curve constructed for Exercise 4.4. Prepare a
frequency polygon for these same data. Compare the interpretations of the
two graphs.

Check your answers on page 4-76

Displaying Public Health Data


Page 4-40
Cum ulative frequency and survival curves
As its name implies, a cumulative frequency curve plots the
cumulative frequency rather than the actual frequency distribution
of a variable. This type of graph is useful for identifying medians,
Ogive (pronounced O’-jive)
is another name for a
quartiles, and other percentiles. The x-axis records the class
cumulative frequency intervals, while the y-axis shows the cumulative frequency either
curve. Ogive also means on an absolute scale (e.g., number of cases) or, more commonly, as
the diagonal rib of a percentages from 0% to 100%. The median (50% or half-way
Gothic vault, a pointed arc,
or the curved area making point) can be found by drawing a horizontal line from the 50% tick
up the nose of a projectile. mark on the y-axis to the cumulative frequency curve, then
drawing a vertical line from that spot down to the x-axis. Figure
4.16 is a cumulative frequency graph showing the number of days
until smallpox vaccination scab separation among persons who had
never received smallpox vaccination previously (primary
vaccinees) and among persons who had been previously vaccinated
(revaccinees). The median number of days until scab separation
was 19 days among revaccinees, and 22 days among primary
vaccinees.
Figure 4.16 Days to Smallpox Vaccination Scab Separation Among
Primary Vaccinees (n=29) and Revaccinees (n=328) — West Virginia,
2003

Source: Kaydos-Daniels S, Bixler D, Colsher P, Haddy L. Symptoms following smallpox


vaccination–West Virginia, 2003. Presented at 53rd Annual Epidemic Intelligence Service
Conference, April 19-23, 2004, Atlanta, Georgia.

A survival curve can be used with follow-up studies to display the


proportion of one or more groups still alive at different time
periods. Similar to the axes of the cumulative frequency curve, the
x-axis records the time periods, and the y-axis shows percentages,
from 0% to 100%, still alive.
Displaying Public Health Data
Page 4-41
The most striking difference is in the plotted curves themselves.
While a cumulative frequency starts at zero in the lower left corner
of the graph and approaches 100% in the upper right corner, a
survival curve begins at 100% in the upper left corner and
Kaplan-Meier is a well proceeds toward the lower right corner as members of the group
accepted method for
estimating survival die. The survival curve in Figure 4.17 shows the difference in
probabilities.14 survival in the early 1900s, mid-1900s, and late 1900s. The
survival curve for 1900–1902 shows a rapid decline in survival
during the first few years of life, followed by a relatively steady
decline. In contrast, the curve for 1949–1951 is shifted right,
showing substantially better survival among the young. The curve
for 1997 shows improved survival among the older population.

Figure 4.17 Percent Surviving by Age in Death-registration States,


1900–1902 and United States, 1949–1951 and 1997

Source: Anderson RN. United States life tables, 1997. National vital statistics reports; vol
47, no. 28. Hyattsville, Maryland: National Center for Health Statistics, 1999.

Note that the smallpox scab separation data plotted as a cumulative


frequency graph in Figure 4.16 can be plotted as a smallpox scab
survival curve, as shown in Figure 4.18.

Figure 4.18 “Survival” of Smallpox Vaccination Scabs Among Primary


Vaccines (n=29) and Revaccinees (n=328) — West Virginia, 2003

Source: Kaydos-Daniels S, Bixler D, Colsher P, Haddy L. Symptoms following smallpox


vaccination–West Virginia, 2003. Presented at 53rd Annual Epidemic Intelligence Service
Conference, April 19-23, 2004, Atlanta, Georgia.
Displaying Public Health Data
Page 4-42
Other Data Displays
Thus far in this lesson, we have covered the most common ways
that epidemiologists and other public health analysts display data
in tables and graphs. We now cover some additional graphical
techniques that are useful in specific situations. While you may not
find yourself constructing these figures often, our objective is to
equip you to properly interpret these displays when you encounter
them.

Scatter diagram s
A scatter diagram (or “scattergram”) is a graph that portrays the
relationship between two continuous variables, with the x-axis
representing one variable and the y-axis representing the other.15
To create a scatter diagram you must have a pair of values (one for
each variable) for each person, group, country, or other entity in
the data set, one value for each variable. A point is placed on the
graph where the two values intersect. For example, demographers
may be interested in the relationship between infant mortality and
total fertility in various nations. Figure 4.19 plots the total fertility
rate (estimated average number of children per woman) by the
infant mortality rate in 194 countries, so this scatter diagram has
194 data points.

To interpret a scatter diagram, look at the overall pattern made by


the plotted points. A fairly compact pattern of points from the
lower left to the upper right indicates a positive correlation, in
which one variable increases as the other increases. A compact
pattern from the upper left to lower right indicates a negative or
inverse correlation, in which one variable decreases as the other
increases. Widely scattered points or a relatively flat pattern
indicates little correlation. The data in Figure 4.19 seem to show a
positive correlation between infant mortality and total fertility, that
is, countries with high infant mortality seem to have high total
fertility as well. Statistical tools such as linear regression can be
applied to such data to quantify the correlation between variables
in a scatter diagram. Similarly, scatter diagrams often display
correlations that may provoke intriguing hypotheses about causal
relationships, but additional investigation is almost always needed
before any causal hypotheses should be accepted.

Displaying Public Health Data


Page 4-43
Figure 4.19 Correlation of Infant Mortality Rate and Total Fertility Rate
Among 194 Nations, 1997

Data Source: Population Reference Bureau [Internet]. Datafinder [cited 2004 Dec 13].
Available from: http://www.prb.org/datafind/datafinder7.htm.

Bar charts
A bar chart uses bars of equal width to display comparative data.
Comparison of categories is based on the fact that the length of the
bar is proportional to the frequency of the event in that category.
Therefore, breaks in the scale could cause the data to be
misinterpreted and should not be used in bar charts. Bars for
different categories are separated by spaces (unlike the bars in a
histogram). The bar chart can be portrayed with the bars either
vertical or horizontal. (This choice is usually made based on the
length of text labels — long labels fit better on a horizontal chart
than a vertical one) The bars are usually arranged in ascending or
descending length, or in some other systematic order dictated by
any intrinsic order of the categories. Appropriate data for bar
charts include discrete data (e.g., race or cause of death) or
variables treated as though they were discrete (age groups). (Recall
that a histogram shows frequency of a continuous variable, such as
dates of onset of symptoms).

Displaying Public Health Data


Page 4-44
More About Constructing Bar Charts

• Arrange the categories that define the bars or groups of bars in a natural order, such as alphabetical or
increasing age, or in an order that will produce increasing or decreasing bar lengths.
• Choose whether to display the bars vertically or horizontally.
• Make all of the bars the same width.
• Make the length of bars in proportion to the frequency of the event. Do not use a scale break, because the
reader could easily misinterpret the relative size of different categories.
• Show no more than five bars within a group of bars, if possible.
• Leave a space between adjacent groups of bars but not between bars within a group (see Figure 4.22).
• Within a group, code different variables by differences in bar color, shading, cross hatching, etc. and include a
legend that interprets your code.

The simplest bar chart is used to display the data from a


one-variable table (see page 4-4). Figure 4.20 shows the number of
deaths among persons ages 25–34 years for the six most common
causes, plus all other causes grouped together, in the United States
in 2003. Note that this bar chart is aligned horizontally to allow for
long labels.
Figure 4.20 Number of Deaths by Cause Among 25–34 Year Olds —
United States, 2003

Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online
database] Atlanta; National Center for Injury Prevention and Control. [cited 2006 Feb 15].
Available from: http://www.cdc.gov/injury/wisqars/.

Grouped bar charts


A grouped bar chart is used to illustrate data from two-variable or
three-variable tables. A grouped bar chart is particularly useful
when you want to compare the subgroups within a group. Bars
within a group are adjoining. The bars should be illustrated
distinctively and described in a legend. For example, consider the
data for Figure 4.12 — current smokers by age and sex. In Figure

Displaying Public Health Data


Page 4-45
4.21, each bar grouping represents an age group. Within the group,
separate bars are used to represent data for males and females. This
shows graphically that regardless of age, men are more likely to be
current smokers than are women, but that difference declines with
age.

Figure 4.21 Percentage of Persons Aged >18 Years Who Were Current
Smokers, by Age and Sex — United States, 2002

Data Source: Centers for Disease Control and Prevention. Cigarette smoking among adults–
United States, 2002. MMWR 2004;53:427–31.

The bar chart in Figure 4.22a shows the leading causes of death in
1997 and 2003 among persons ages 25–34 years. The graph is
more effective at showing the differences in causes of death during
the same year than in showing differences in a single cause
between years. While the decline in deaths due to HIV infection
between 1997 and 2003 is quite apparent, the smaller drop in heart
disease is more difficult to see. If the goal of the figure is to
compare specific causes between the two years, the bar chart in
Figure 4.22b is a better choice.

Displaying Public Health Data


Page 4-46
Figure 4.22a Number of Deaths by Cause Among 25–34 Year Olds — United States, 1997 and 2003

Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.

Figure 4.22b Number of Deaths by Cause Among 25–34 Year Olds — United States, 1997 and 2003

Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.

Displaying Public Health Data


Page 4-47
Stacked bar charts
A stacked bar chart is used to show the same data as a grouped bar
chart but stacks the subgroups of the second variable into a single
bar of the first variable. It deviates from the grouped bar chart in
that the different groups are differentiated not with separate bars,
but with different segments within a single bar for each category.
A stacked bar chart is more effective than a grouped bar chart at
displaying the overall pattern of the first variable but less effective
at displaying the relative size of each subgroup. The trends or
patterns of the subgroups can be difficult to decipher because,
except for the bottom categories, the categories do not rest on a flat
baseline.

To see the difference between grouped and stacked bar charts, look
at Figure 4.23. This figure shows the same data as Figures 4.22a
and 4.22b. With the stacked bar chart, you can easily see the
change in the total number of deaths between the two years;
however, it is difficult to see the values of each cause of death. On
the other hand, with the grouped bar chart, you can more easily see
the changes by cause of death.
Figure 4.23 Number of Deaths by Cause Among 25–44 Year Olds — United States, 1997 and 2003

Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.

Displaying Public Health Data


Page 4-48
100% com ponent bar charts
A 100% component bar chart is a variant of a stacked bar chart, in
which all of the bars are pulled to the same height (100%) and
show the components as percentages of the total rather than as
actual values. This type of chart is useful for comparing the
contribution of different subgroups within the categories of the
main variable. Figure 4.24 shows a 100% component bar chart that
compares lengths of hospital stay by age group. The figure clearly
shows that the percentage of people who stay in the hospital for 1
day or less (bottom component) is greatest for children ages 0–4
years, and declines with increasing age. Concomitantly, lengths of
stay of 7 or more days increase with age. However, because the
columns are the same height, you cannot tell from the columns
how many people in each age group were hospitalized for
traumatic brain injury — putting numbers above the bars to
indicate the totals in each age group would solve that problem.
Figure 4.24 Length of Hospital Stay for Traumatic Brain Injury-related
Discharges — 14 States*, 1997

Source: Langlois JA, Kegler SR, Butler JA, Gotsch KE, Johnson RL, Reichard AA, et al.
Traumatic brain injury-related hospital discharges: results from a 14-state surveillance
system. In: Surveillance Summaries, June 27, 2003. MMWR 2003;52(No. SS-04):1–18.

Displaying Public Health Data


Page 4-49
Deviation bar charts
While many bar charts show only positive values, a deviation bar
chart displays both positive and negative changes from a baseline.
(Imagine profit/loss data at different times.) Figure 4.25 shows
such a deviation bar chart of selected reportable diseases in the
United States. A similar chart appears in each issue of CDC’s
Morbidity and Mortality Weekly Report. In this chart, the number
of cases reported during the past 4 weeks is compared to the
average number reported during comparable periods of the past
few years. The deviations to the right for hepatitis B and pertussis
indicate increases over historical levels. The deviations to the left
for measles, rubella, and most of the other diseases indicate
declines in reported cases compared to past levels. In this
particular chart, the x-axis is on a logarithmic scale, so that a 50%
reduction (one-half of the cases) and a doubling (50% increase) of
cases are represented by bars of the same length, though in
opposite directions. Values beyond historical limits (comparable to
95% confidence limits) are highlighted for special attention.
Figure 4.25 Comparison of Current Four-week Totals with Historical
Data for Selected Notifiable Diseases — United States, 4-weeks Ending
December 11, 2004

Source: Centers for Disease Control and Prevention. Figure 1. Selected notifiable disease
reports, United States, comparison of provisional 4-week totals ending December 11, 2004,
with historical data. MMWR 2004;53:1161.

Displaying Public Health Data


Page 4-50
Exercise 4.6
Use the data in Table 4.17 to draw a stacked bar chart, a grouped bar chart,
and a 100% component bar chart to illustrate the differences in the age
distribution of syphilis cases among white males, white females, black
males, and black females. What information is best conveyed by each chart? Graph paper is
provided at the end of this lesson.

Table 4.17 Number of Reported Cases of Primary and Secondary Syphilis, by Age Group, Among Non-
Hispanic Black and White Men and Women — United States, 2002
Black White Black White
Age Group (Years) Men Men Women Women

<20 804 905 277 50


20–29 695 914 349 66
30–39 635 277 396 76
≥40 92 12 173 25
Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.

Check your answers on page 4-77

Displaying Public Health Data


Page 4-51
Pie charts
A pie chart is a simple, easily understood chart in which the size of
the “slices” or wedges shows the proportional contribution of each
component part.16 Pie charts are useful for showing the proportions
Pie graphs are used for of a single variable’s frequency distribution. Figure 4.26 shows a
proportional assessment by simple pie chart of the leading causes of death in 2003 among
comparing data elements as
percentages or counts against persons aged 25–34 years.
other data elements and
against the sum of the data Figure 4.26 Number of Deaths by Cause Among 25–34 Year Olds —
elements. Displaying data United States, 2003
using a pie graph is easy using
Epi Info.
1. Read (import) the file
containing the data.
2. Click on the Graph
command under the
Statistics folder.
3. Under Graph Type, select
type of graph you would
like to create (Pie).
4. Under 1st Title/2nd Title,
write a page title for the pie
chart.
5. Select the variable you wish
to graph from the X-Axis
(Main variables) drop-
down box.
6. Select the value you want
to show from the Y-Axis
(Shown value of) drop-
down box. Usually you want
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online
to show percentages. Then, database] Atlanta; National Center for Injury Prevention and Control. [cited 2006 Feb 15].
select Count %. Available from: http://www.cdc.gov/injury/wisqars/.
7. Click OK and the pie chart
will be displayed.

More About Constructing Pie Charts

• Conventionally, pie charts begin at 12 o’clock.


• The wedges should be labeled and arranged from largest to smallest, proceeding clockwise, although the “other”
or “unknown” may be last.
• Shading may be used to distinguish between slices but is not always necessary.
• Because the eye cannot accurately gauge the area of the slices, the chart should indicate what percentage each
slice represents either inside or near each slice.

Displaying Public Health Data


Page 4-52
Given current technology, pie charts are almost always generated
by computer rather than drawn by hand. But the default settings of
many computer programs differ from recommended epidemiologic
practice. Many computer programs allow one or more slices to
“explode” or be pulled out of the pie. In general, this technique
should be limited to situations when you want to place special
emphasis on one wedge, particularly when additional detail is
provided about that wedge (Figure 4.27).

Figure 4.27 Number of Deaths by Cause Among 25–34 Year Olds —


United States, 2003

Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online
database] Atlanta; National Center for Injury Prevention and Control. [cited 2006 Feb 15].
Available from: http://www.cdc.gov/injury/wisqars/.

Multiple pie charts are occasionally used in place of a 100%


component bar chart, that is, to display differences in proportional
distributions. In some figures the size of each pie is proportional to
the number of observations, but in others the pies are the same size
despite representing different numbers of observations (Figure
4.28a and 4.28b).

Displaying Public Health Data


Page 4-53
Figure 4.28a Number of Deaths by Cause Among 25–34 and 35-44 Year Olds — United States, 2003

Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.

Figure 4.28b Number of Deaths by Cause Among 25–34 and 35-44 Year Olds — United States, 2003

Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.

Dot plots and box plots


A dot plot uses dots to show the relationship between a categorical
variable on the x-axis and a continuous variable on the y-axis. A
dot is positioned at the appropriate place for each observation. The
dot plot displays not only the clustering and spread of observations
for each category of the x-axis variable but also differences in the
patterns between categories. In Figure 4.29 the villages using
either antibacterial soap or plain soap have lower incidence rates of
diarrhea than do the control (no soap) villages.17

Displaying Public Health Data


Page 4-54
Figure 4.29 Incidence of Childhood Diarrhea in Each Neighborhood by
Hygiene Intervention Group — Pakistan, 2002–2003

Source: Luby SP, Agboatwalla M, Painter J, Altaf A, Billhimer WL, Hoekstra RM. Effect of
intensive handwashing promotion on childhood diarrhea in high-risk communities in
Pakistan: a randomized controlled trial. JAMA 2004;291:2547–54.

A dot plot shows the relationship between a continuous and a


categorical variable. The same data could also be displayed in a
box plot, in which the data are summarized by using “box-and-
whiskers.” Figure 4.30 is an example of a box plot. The “box”
represents values of the middle 50% (or interquartile range) of the
data points, and the “whiskers” extend to the minimum and
maximum values that the data assume. The median is usually
marked with a horizontal line inside the box. As a result, you can
use a box plot to show and compare the central location (median),
dispersion (interquartile range and range), and skewness (indicated
by a median line not centered in the box, such as for the cases in
Figure 4.30).18

Displaying Public Health Data


Page 4-55
Figure 4.30 Risk Score for Alveolar Echinococcosis Among Cases and
Controls — Germany, 1999–2000

Adapted from: Kern P, Ammon A, Kron M, Sinn G, Sander S, Petersen LR, et al. Risk factors
for alveolar echinococcosis in humans. Emerg Infect Dis 2004;10:2089-93.

Forest plots
A forest plot, also called a confidence interval plot, is used to
display the point estimates and confidence intervals of individual
studies assembled for a meta-analysis or systematic review.19 In
the forest plot, the variable on the x-axis is the primary outcome
measure from each study (relative risk, treatment effects, etc.). If
risk ratio, odds ratio, or another ratio measure is used, the x-axis
uses a logarithmic-scale. This is because the logarithmic
transformation of these risk estimates has a more symmetric
distribution than do the risk estimates themselves (since the risk
estimates can vary from zero to an arbitrarily large number). Each
study is represented by a horizontal line — reflecting the
confidence interval — and a dot or square — reflecting the point
estimate — usually due to study size or some other aspect of study
design (Figure 4.31). The shorter the horizontal line, the more
precise the study’s estimate. Point estimates (dots or squares) that
line up reasonably well indicate that the studies show a relatively
consistent effect. A vertical line indicates where no effect (relative
risk = 1 or treatment effect = 0) falls on the x-axis. If a study’s
horizontal line does not cross the vertical line, that study’s result is
statistically significant. From a forest plot, one can easily ascertain
patterns among studies as well as outliers.
Displaying Public Health Data
Page 4-56
Figure 4.31 Net Change in Glycohemoglobin (GHb) Following Self-
management Education Intervention for Adults with Type 2 Diabetes,
by Different Studies and Follow-up Intervals, 1980–1999

Source: Norris SL, Lau J, Smith SJ, Schmid CH, Engelgau MM. Self-management education
for adults with type 2 diabetes. Diabetes Care 2002;25:1159–71.

P hylogenetic trees
A phylogenetic tree, a type of dendrogram, is a branching chart
that indicates the evolutionary lineage or genetic relatedness of
organisms involved in outbreaks of illness. Distance on the tree
reflects genetic differences, so organisms that are close to one
another on the tree are more related than organisms that are further
apart. The phylogenetic tree in Figure 4.32 shows that the
organisms isolated from patients with restaurant-associated
hepatitis A in Georgia and North Carolina were identical and
closely related to those from patients in Tennessee.20 Furthermore,
these organisms were similar to those typically seen in patients
from Mexico. These microbiologic data supported epidemiologic
data which implicated green onions from Mexico.

Displaying Public Health Data


Page 4-57
Figure 4.32 Comparison of Genetic Sequences of Hepatitis A Virus
Isolates from Outbreaks in Georgia, North Carolina, and Tennessee in
2003 with Isolates from National Surveillance

Source: Amon JJ, Devasia R, Guoliang X, Vaughan G, Gabel J, MacDonald P, et al. Multiple
hepatitis A outbreaks associated with green onions among restaurant patrons–Tennessee,
Georgia, and North Carolina, 2003. Presented at 53rd Annual Epidemic Intelligence Service
Conference, April 19-23, 2004, Atlanta, Georgia.

Decision trees
A decision tree is a branching chart that represents the logical
sequence or pathway of a clinical or public health decision.21
Decision analysis is a systematic method for making decisions
when outcomes are uncertain. The basic building blocks of a
decision analysis are (1) decisions, (2) outcomes, and (3)
probabilities.

A decision is a choice made by a person, group, or organization to


select a course of action from among a set of mutually exclusive
alternatives. The decision maker compares expected outcomes of
available alternatives and chooses the best among them. This
choice is represented by a decision node, a square, with branches
representing the choices in the decision-tree diagram (for example,
see Figure 4.33). For example, after receiving information that a
person has a family history of a disease (colorectal cancer for this
example), that person may decide (choose) to seek medical advice
or choose not to do so.

Outcomes are the chance events that occur in response to a


decision. Outcomes can be intermediate or final. Intermediate
outcomes are followed by more decisions or chance events. For
example, if a person decides to seek medical care for colorectal
Displaying Public Health Data
Page 4-58
cancer screening, depending on the findings (outcomes) of the
screening, his or her physician may advise diet or more frequent
screenings; some combination of these two; or treatment. From the
person’s perspective, this is a chance outcome; from a health-care
provider’s perspective, it is a decision. Whether an outcome is
intermediate or final may depend on the context of the decision
problem. For example, colorectal cancer screening may be the final
outcome in a decision analysis focusing on colorectal cancer as the
health condition of interest, but it may be an intermediate outcome
in a decision analysis focusing on more invasive cancer treatment.
In a decision tree, outcomes follow a chance node, a circle, with
branches representing different outcomes that occur by chance, one
and only one of which occurs.

Each chance outcome has a probability by which it can occur


written below the branch in a decision-tree diagram. The sum of
probabilities for all outcomes that can occur at a chance node is
one. The building blocks of decision analysis –– decisions,
outcomes, and probabilities — can be used to represent and
examine complex decision problems.

Figure 4.33 Decision Tree Comparing Colorectal Screening Current


Practice with a Targeted Family History Strategy

Source: Tyagi A, Morris J. Using decision analytic methods to assess the utility of family
history tools. Am J Prev Med 2003;24:199–207.

M aps
Maps are used to show the geographic location of events or
attributes. Two types of maps commonly used in field
epidemiology are spot maps and area maps. Spot maps use dots or
other symbols to show where each case-patient lived or was
Displaying Public Health Data
Page 4-59
exposed. Figure 4.34 is a spot map of the residences of persons
EpiMap is an application of with West Nile Virus encephalitis during the outbreak in the New
Epi Info for creating maps
and overlaying survey York City area in 1999.A spot map is useful for showing the
data, and is available for geographic distribution of cases, but because it does not take the
download. size of the population at risk into account a spot map does not
show risk of disease. Even when a spot map shows a large number
of dots in the same area, the risk of acquiring disease may not be
particularly high if that area is densely populated.

More About Constructing Maps

• Excellent examples of the use of maps to display public health data are
available in these selected publications:

• Atlas of United States Mortality, U. S. Department of Health and Human


Services, Centers for Disease Control and Prevention, Hyattsville, MD, 1996
(DHHS Publication No. (PHS) 97-1015)

• Atlas of AIDS. Matthew Smallman-Raynor, Andrew Cliff, and Peter Haggett.


Blackwell Publishers, Oxford, UK, 1992

• An Historical Geography of a Major Human Viral Disease: From Global


Expansion to Local Retreat, 1840-1990. Andrew Cliff, Peter Haggett, Matthew
Smallman-Raynor. Blackwell Publishers, Oxford, UK, 1988

Figure 4.34 Laboratory-confirmed Cases of West Nile Virus Disease —


New York City, August–September 1999

Source: Nash D, Mostashari F, Murray K, et al. Recognition of an outbreak of West Nile


Virus disease. Presented at 49th Annual Epidemic Intelligence Service Conference, April 10–
14, 2000, Atlanta, Georgia.

An area map, also called a chloropleth map, can be used to show


rates of disease or other health conditions in different areas by
using different shades or colors (Figure 4.35). When choosing
Displaying Public Health Data
Page 4-60
shades or colors for each category, ensure that the intensity of
shade or color reflects increasing disease burden. In Figure 4.35, as
mortality rates increase, the shading becomes darker.

Figure 4.35 Mortality Rates (per 100,000) for Asbestosis by State —


United States, 1982–2000

Source: Centers for Disease Control and Prevention. Changing patterns of pneumoconiosis
mortality–United States, 1968-2000. MMWR 2004;53:627–31.

Displaying Public Health Data


Page 4-61
Exercise 4.7
Using the cancer mortality data in Table 4.13, construct an area map based
on dividing the states into four quartiles as follows:

1. Oklahoma through Kentucky


2. Pennsylvania through Missouri
3. Connecticut through Florida
4. Utah through New York

A map of the United States is provided below for your use.

Check your answers on page 4-78


Displaying Public Health Data
Page 4-62
More About Geographic Information Systems (GIS)

A geographic information system is a computer system for the input, editing, storage, retrieval, analysis, synthesis,
and output of location-based information.22 In public health, GIS may use geographic distribution of cases or risk
factors, health service availability or utilization, presence of insect vectors, environmental factors, and other
location-based variables. GIS can be particularly effective when layers of information or different types of
information about place are combined to identify or clarify geographic relationships. For example, in Figure 4.36,
human cases of West Nile virus are shown as dots superimposed over areas of high crow mortality within the
Chicago city limits.

Figure 4.36 High Crow-mortality Areas (HCMAs) and Reported Residences of


A) West Nile Virus (WNV)-infected Case-patients, or B) WNV
Meningoencephalitis Case-patients (WNV Fever Cases Excluded) — Chicago,
Illinois, 2002

Source: Watson JT, Jones RC, Gibbs K, Paul W. Dead crow reports and location of human West
Nile virus cases, Chicago, 2002. Emerg Infect Dis 2004;10:938–40.

Displaying Public Health Data


Page 4-63
Using Computer Technology
Many computer software packages are available to create tables
and graphs. Most of these packages are quite useful, particularly in
allowing the user to redraw a graph with only a few keystrokes.
With these packages, you can now quickly and easily draw a
number of graphs of different types and see for yourself which one
best illustrates the point you wish to make when you present your
data.23-28

On the other hand, these packages tend to have default values that
Many software packages differ from standard epidemiologic practice. Do not let the
are available for producing
all the tables and charts software package dictate the appearance of the graph. Remember
discussed in this chapter. the adage: let the computer do the work, but you still must do the
One particularly helpful
29
thinking. Keep in mind the primary purpose of the graph — to
one is R, used by communicate information to others. For example, many packages
universities and available
can draw bar charts and pie charts that appear three-dimensional.
for no charge around the
world. In addition to Will a three-dimensional chart communicate the information better
graphical techniques, R than a two-dimensional one?
provides a wide variety of
statistical techniques
(including linear and
Compare and contrast the effectiveness of Figure 4.37a and 4.37b
nonlinear modeling, in communicating information.
classical statistical tests,
time-series analysis, Figure 4.37a Past Month Marijuana Use Among Youths Aged 12–17, by
classification, and Geographic Region — United States, 2003 and 2004
clustering).

Data Source: Substance Abuse and Mental Health Services Administration. (2005). Results
from the 2004 National Survey on Drug Use and Health: National Findings (Office of
Applied Studies, NSDUH Series H-28, DHHS Publication No. SMA 05-4062). Rockville, MD.

Displaying Public Health Data


Page 4-64
Figure 4.37b Past Month Marijuana Use Among Youths Aged 12–17, by
Geographic Region — United States, 2003 and 2004

Data Source: Substance Abuse and Mental Health Services Administration. (2005). Results
from the 2004 National Survey on Drug Use and Health: National Findings (Office of
Applied Studies, NSDUH Series H-28, DHHS Publication No. SMA 05-4062). Rockville, MD.

Most observers and analysts would agree that the three-


“The problem with dimensional graph does not communicate the information as
presenting information is
simple — the world is effectively as the two-dimensional graph. For example, can you
high-dimensional, but our tell by a glance at the three-dimensional graph that marijuana use
displays are not. To declined slightly in the Northeast in 2004? These differences are
address this basic
more distinct in the two-dimensional graph.
problem, answer 5
questions:
1. Quantitative thinking Similarly, does the three-dimensional pie chart in Figure 4.38a
comes down to one provide any more information than the two-dimensional chart in
question: Compared to
what?
Figure 4.38b? The relative sizes of the components may be
2. Try very hard to show difficult to judge because of the tilting in the three-dimensional
cause and effect. version. From Figure 4.38a, can you tell whether the wedge for
3. Don't break up heart disease is larger, smaller, or about the same as the wedge for
evidence by accidents
of means of malignant neoplasms? Now look at Figure 4.38b. The wedge for
production. malignant neoplasms is larger.
4. The world is
multivariant, so the Remember that communicating the names and relative sizes of the
display should be high-
dimensional. components (wedges) is the primary purpose of a pie chart. Keep
5. The presentation the number of dimensions as small as possible to clearly convey
stands and falls on the the important points, and avoid using gimmicks that do not add
quality, relevance, and
integrity of the
information.
content.”30
- ER Tufte

Displaying Public Health Data


Page 4-65
More About Using Color in Graphs

Many people misuse technology in selecting color, particularly for slides that accompany oral presentations.32 If you
use colors, follow these recommendations.

• Select colors so that all components of the graph — title, axes, data plots, and legends — stand out clearly from
the background and each plotted series of data can be distinguished from the others.
• Avoid contrasting red and green, because up to 10% of males in the audience may have some degree of color
blindness.
• Use colors or shades to communicate information, particularly with area maps. For example, for an area map in
which states are divided into four groups according to their rates for a particular disease, use a light color or
shade for the states with the lowest rates and use progressively darker colors or shades for the groups with
progressively higher rates. In this way, the colors or shades contribute directly to the impression you want the
viewer to have about the data.

Figure 4.38a Leading Causes of Death in 25–34 Year Olds — United


States, 2003

Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online
database] Atlanta; National Center for Injury Prevention and Control. [cited 2006 Feb 15].
Available from: http://www.cdc.gov/injury/wisqars/.

Figure 4.38b Leading Causes of Death in 25–34 Year Olds — United


States, 2003

Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online
database] Atlanta; National Center for Injury Prevention and Control. [cited 2006 Feb 15].
Available from: http://www.cdc.gov/injury/wisqars/.
Displaying Public Health Data
Page 4-66
Summary
Much work has been done on other graphical methods of presentation.33 One of the more
creative is face plots.34 Originally developed by Chernoff,35 these give a way to display n
variables on a two-dimensional surface. For instance, suppose you have several variables (x, y, z,
etc.) that you have collected on each of n people, and for purposes of this illustration, suppose
each variable can have one of 10 possible values. We can let x be eyebrow slant, y be eye size, z
be nose length, etc. The figures below show faces produced using 10 characteristics — head
eccentricity, eye size, eye spacing, eye eccentricity, pupil size, eyebrow slant, nose size, mouth
shape, mouth size, and mouth opening) — each assigned one of 10 possible values.

Figure 4.39 Example of Face Plot Faces Produced Using 10 Characteristics

Source: Weisstein, Eric W. Chernoff Face. From MathWorld — A Wolfram Web Resource.
http://mathworld.wolfram.com/ChernoffFace.html.

To convey the messages of epidemiologic findings, you must first select the best illustration
method. Tables are commonly used to display numbers, rates, proportions, and cumulative
percents. Because tables are intended to communicate information, most tables should have no
more than two variables and no more than eight categories (class intervals) for any variable.
Printed tables should be properly titled, labeled, and referenced; that is, they should be able to
stand alone if separated from the text.

Tables can be used with either nominal or continuous ordinal data. Nominal variables such as sex
and state of residence have obvious categories. For continuous variables that do not have obvious
categories, class intervals must be created. For some diseases, standard class intervals for age
have been adopted. Otherwise a variety of methods are available for establishing reasonable class
intervals. These include class intervals with an equal number of people or observations in each;
class intervals with a constant width; and class intervals based on the mean and standard
deviation.

Graphs can visually communicate data rapidly. Arithmetic-scale line graphs have traditionally
been used to show trends in disease rates over time. Semilogarithmic-scale line graphs are
preferred when the disease rates vary over two or more orders of magnitude. Histograms and
frequency polygons are used to display frequency distributions. A special type of histogram
Displaying Public Health Data
Page 4-67
known as an epidemic curve shows the number of cases by time of onset of illness or time of
diagnosis during an epidemic period. The cases may be represented by squares that are stacked to
form the columns of the histogram; the squares may be shaded to distinguish important
characteristics of cases, such as fatal outcome.

Simple bar charts and pie charts are used to display the frequency distribution of a single
variable. Grouped and stacked bar charts can display two or even three variables.

Spot maps pinpoint the location of each case or event. An area map uses shading or coloring to
show different levels of disease numbers or rates in different areas.

The final pages of this lesson provide guidance in the selection of illustration methods and
construction of tables and graphs. When using each of these methods, it is important to
remember their purpose: to summarize and to communicate. Even the best method must be
constructed properly or the message will be lost. Glitzy and colorful are not necessarily better;
sometimes less is more!

Displaying Public Health Data


Page 4-68
Guide to Selecting a Graph or Chart to Illustrate Epidemiologic Data

Type of Graph or Chart When to Use


Arithmetic scale line graph Show trends in numbers or rates over time

Semilogarithmic scale line graph Display rate of change over time; appropriate for values ranging over more than
2 orders of magnitude

Histogram Show frequency distribution of continuous variable; for example, number of


cases during epidemic (epidemic curve) or over longer period of time

Frequency polygon Show frequency distribution of continuous variable, especially to show


components

Cumulative frequency Display cumulative frequency for continuous variables

Scatter diagram Plot association between two variables

Simple bar chart Compare size or frequency of different categories of a single variable

Grouped bar chart Compare size or frequency of different categories of 2 4 series of data

Stacked bar chart Compare totals and illustrate component parts of the total among different
groups

Deviation bar chart Illustrate differences, both positive and negative, from baseline

100% component bar chart Compare how components contribute to the whole in different groups

Pie chart Show components of a whole

Spot map Show location of cases or events

Area map Display events or rates geographically

Box plot Visualize statistical characteristics (median, range, asymmetry) of a variable’s


distribution

Displaying Public Health Data


Page 4-69
Guide to Selecting a Method of Illustrating Epidemiologic Data

If data are: And these conditions apply: Then use:


Numbers or rates over time Numbers • 1 or 2 sets Histogram
• 2 or more sets Frequency polygon

Rates • Range of values ≤ 2 orders of Arithmetic-scale line graph


magnitude
• Range of values ≥ 2 orders of Semilogarithmic-scale line
magnitude graph

Continuous data other than Frequency distribution Histogram or frequency


time series polygon

Data with discrete categories Bar chart or pie chart

Place data Numbers Not readily identifiable on map Bar chart or pie chart

Readily identifiable • Specific site important Spot map


on map • Specific site unimportant Area map

Rates Area map

Checklist for Constructing Printed Tables

1. Title
• Does the table have a title?
• Does the title describe the objective of the data display and its content, including subject, person, place,
and time?
• Is the title preceded by the designation “Table #''? (“Table'' is used for typed text; “Figure'' is used for
graphs, maps, and illustrations. Separate numerical sequences are used for tables and figures in the same
document (e.g., Table 4.1, Table 4.2; Figure 4.1, Figure 4.2).

2. Rows and Columns


• Is each row and column labeled clearly and concisely?
• Are the specific units of measurement shown? (e.g., years, mg/dl, rate per 100,000).
• Are the categories appropriate for the data?
• Are the row and column totals provided?

3. Footnotes
• Are all codes, abbreviations, or symbols explained?
• Are all exclusions noted?
• If the data are not original, is the source provided?
• If source is from website, is complete address specified; and is current, active, and reference date cited?

Displaying Public Health Data


Page 4-70
Checklist for Constructing Printed Graphs

1. Title
• Does the graph or chart have a title?
• Does the title describe the content, including subject, person, place, and time?
• Is the title preceded by the designation “Figure #''? (“Table'' is used for typed text; “Figure'' is used for
graphs, charts, maps, and illustrations. Separate numerical sequences are used for tables and figures in the
same document (e.g., Table 1, Table 2; Figure 1, Figure 2).

2. Axes
• Is each axis labeled clearly and concisely?
• Are the specific units of measurement included as part of the label? (e.g., years, mg/dl, rate per 100,000)
• Are the scale divisions on the axes clearly indicated?
• Are the scales for each axis appropriate for the data?
• Does the y axis start at zero?
• If a scale break is used with an arithmetic-scale line graph, is it clearly identified?
• Has a scale break been used with a histogram, frequency polygon, or bar chart? (Answer should be NO!)
• Are the axes drawn heavier than the other coordinate lines?
• If two or more graphs are to be compared directly, are the scales identical?

3. Grid Lines
• Does the figure include only as many grid lines as are necessary to guide the eye? (Often, these are
unnecessary.)

4. Data plots
• Does the table have a title?
• Are the plots drawn clearly?
• Are the data lines drawn more heavily than the grid lines?
• If more than one series of data or components is shown, are they clearly distinguishable on the graph?
• Is each series or component labeled on the graph, or in a legend or key?
• If color or shading is used on an area map, does an increase in color or shading correspond to an increase
in the variable being shown?
• Is the main point of the graph obvious, and is it the point you wish to make?

5. Footnotes
• Are all codes, abbreviations, or symbols explained?
• Are all exclusions noted?
• If the data are not original, is the source provided?

6. Visual Display
• Does the figure include any information that is not necessary?
• Is the figure positioned on the page for optimal readability?
• Do font sizes and colors improve readability?

Displaying Public Health Data


Page 4-71
Guide to Preparing Projected Slides

1. Legibility (make sure your audience can easily read your visuals)
• When projected, can your visuals be read from the farthest parts of the room?

2. Simplicity (keep the message simple)


• Have you used plain words?
• Is the information presented in the language of the audience?
• Have you used only key words?
• Have you omitted conjunctions, prepositions, etc.?
• Is each slide limited to only one major idea/concept/theme?
• Is the text on each slide limited to 2 or 3 colors (e.g., 1 color for title, another for text)?
• Are there no more than 6–8 lines of text and 6–8 words per line?

3. Color
• Colors have an impact on the effect of your visuals. Use warm/hot colors to emphasize, to highlight, to
focus, or to reinforce key concepts. Use cool/cold colors for background or to separate items. The following
table describes the effect of different colors.
Hot Warm Cool Cold
Red Light orange Light blue Dark blue
Bright orange Light yellow Light green Dark green
Colors:
Bright yellow Light gold Light purple Dark purple
Bright gold Browns Light gray Dark gray

Effect: Exciting Mild Subdued Somber

• Are you using the best color combinations? The most important item should be in the text color that has
the greatest contrast with its background. The most legible color combinations are:
Black on yellow
Black on white
Dark Green on white
Dark Blue on white
White on dark blue (yellow titles and white text on a dark blue background is a favorite choice
among epidemiologists)
• Restrict use of red except as an accent.

4. Accuracy
• Slides are distracting when mistakes are spotted. Have someone who has not seen the slide before check
for typos, inaccuracies, and errors in general.

Displaying Public Health Data


Page 4-72
Exercise Answers

Exercise 4.1
1.
Botulism Status by Age Group, Texas Church Supper Outbreak, 2001
Botulism Status
Age Group (Years) Yes No
≤9 2 2
10–19 1 1
20–29 2 2
30–39 0 2
40–49 4 4
50–59 3 4
60–69 1 5
70–79 2 3
≥80 0 0
Total 15 23

2.
Botulism Status by Exposure to Chicken,* Texas Church Supper Outbreak, 2001
Botulism?

Yes No Total

Yes 8 11 19
Ate chicken?
No 4 12 16

Total 12 23 35
* Excludes 3 botulism case-patients with unknown exposure to chicken

3.
Botulism Status by Exposure to Chili,* Texas Church Supper Outbreak, 2001
Botulism?

Yes No Total

Yes 14 8 22
Ate chili?
No 0 15 15

Total 14 23 37
* Excludes 1 botulism case-patient with unknown exposure to chili

Displaying Public Health Data


Page 4-73
4.
Number of Botulism Cases/Controls by Exposure to Chili and Leftover Chili
Ate Leftover Chili

Yes No Total

Yes 1/1 13 / 7 22
Ate chili?
No 0/1 0 / 14 15

Total* 3 34 37*
* One case with unknown exposure to initial chili consumption

Exercise 4.2
Strategy 1: Divide the data into groups of similar size
1. Divide the list into three equal-sized groups of places:

50 states ÷ 3 = 16.67 states per group. Because states can’t be cut in thirds, two groups will
contain 17 states and one group will contain 16 states.

Illinois (#17) could go into either the first or second group, but its rate (80.0) is closer to #16
Maine’s rate (80.2) than Texas’ rate (79.3), so it makes sense to put Illinois in the first group.
Similarly, #34 Vermont could go into either the second or third group.

Arbitrarily putting Illinois into the first category and Vermont into the second results in the
following groups:
a. Kentucky through Illinois (States 1–17)
b. Texas through Vermont (States 18–34)
c. South Dakota through Utah (States 35–50)

2. Identify the rate for the first and last state in each group:
a. Kentucky through Illinois 80.0–116.1
b. Texas through Vermont 70.2–79.3
c. South Dakota through Utah 39.7–68.1

3. Adjust the limits of each interval so no gap exists between the end of one class interval and
beginning of the next. Deciding how to adjust the limits is somewhat arbitrary — you could
split the difference, or use a convenient round number.
a. Kentucky through Illinois 80.0–116.1
b. Texas through Vermont 70.0–79.9
c. South Dakota through Utah 39.7–69.9

Strategy 2: Base intervals on mean and standard deviation


1. Create three categories based on the mean (77.1) and standard deviation (16.1) by finding the
upper limits of three intervals:
a. Upper limit of interval 3 = maximum value = 116.1
b. Upper limit of interval 2 = mean + 1 standard deviation = 77.1 + 16.1 = 93.2
Displaying Public Health Data
Page 4-74
c. Upper limit of interval 1 = mean – 1 standard deviation = 77.1 – 16.1 = 61.0
d. Lower limit of interval 1 = minimum value = 39.7

2. Select the lower limit for each upper limit to define three full intervals. Specify the states that
fall into each interval. (Note: To place the states with the highest rates first, reverse the order
of the intervals):
a. North Carolina through Kentucky (8 states) 93.3–116.1
b. Arizona through Georgia (35 states) 61.1–93.2
c. Utah through Minnesota (7 states) 39.7–61.0

Strategy 3: Divide the range into equal class intervals


1. Divide the range from zero (or the minimum value) to the maximum by 3:
(116.1 – 39.7) / 3 = 76.4 / 3 = 25.467

2. Use multiples of 25.467 to create three categories, starting with 39.7:


39.7 through (39.7 + 1 x 25.467) = 39.7 through 65.2
65.3 through (39.7 + 2 x 25.467) = 65.3 through 90.6
90.7 through (39.7 + 3 x 25.467) = 90.7 through 116.1

3. Final categories:
a. Indiana through Kentucky (11 states) 90.7–116.1
b. Nebraska through Oklahoma (29 states) 65.3–90.6
c. Utah through North Dakota (10 states) 39.7–65.2

4. Alternatively, since 90.6 is close to 90 and 65.2 is close to 65.0, the categories could be
reconfigured with no change in state assignments. For example, the final categories could
look like:
Indiana through Kentucky (11 states) 90.1–116.1
Nebraska through Oklahoma (29 states) 65.1–90.0
Utah through North Dakota (10 states) 39.7–65.0

Displaying Public Health Data


Page 4-75
Exercise 4.3
1. Highest rate is 438.2 per 100,000 (in 1958), so maximum on y-axis should be 450 or 500 per
100,000.

Rate (per 100,000 Population) of Reported Measles Cases by Year of Report — United States,
1955–2002

2. Highest rate between 1985 and 2002 was 11.2 (per 100,000 in 1990), so maximum on y-axis
should be 12 per 100,000.

Rate (per 100,000 Population) of Reported Measles Cases by Year of Report — United States,
1985–2002

Displaying Public Health Data


Page 4-76
Exercise 4.4
Number of Cases of Botulism by Date of Onset of Symptoms, Texas Church Supper Outbreak, 2001

The first case occurs on August 25, rises to a peak two days later on August 27, then declines
symmetrically to 1 case on August 29. A late case occurs on August 31 and September 1.

Exercise 4.5
Number of Cases of Botulism by Date of Onset of Symptoms, Texas Church Supper Outbreak, 2001

The area under the line in this frequency polygon is the same as the area in the answer to
Exercise 4.4. The peak of the epidemic (8/27) is easier to identify.

Displaying Public Health Data


Page 4-77
Exercise 4.6
Number of Reported Cases of Primary and Secondary Syphilis, by Age Group, Among Non-Hispanic
Black and White Men and Women — United States, 2002 (Stacked Bar Chart)

Number of Reported Cases of Primary and Secondary Syphilis,by Age Group, Among Non-Hispanic
Black and White Men and Women — United States, 2002 (Grouped Bar Chart)

Percent of Reported Cases of Primary and Secondary Syphilis, by Age Group, Among Non-Hispanic
Black and White Men and Women — United States, 2002 (100% Component Bar Chart)

Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta, Georgia. U.S.
Department of Health and Human Services; 2003.

The stacked bar chart clearly displays the differences in total number of cases, as reflected by the
overall height of each column. The number of cases in the lowest category (age <20 years) is
Displaying Public Health Data
Page 4-78
also easy to compare across race-sex groups, because it rests on the x-axis. Other categories
might be a little harder to compare because they do not have a consistent baseline. If the size of
each category in a given column is different enough and the column is tall enough, the categories
within a column can be compared.

The grouped bar chart clearly displays the size of each category within a given group. You can
also discern different patterns across the groups. Comparing categories across groups takes work.

The 100% component bar chart is best for comparing the percent distribution of categories across
groups. You must keep in mind that the distribution represents percentages, so while the 30-39
year category in white females appears larger than the 30-39 year category in the other race-sex
groups, the actual numbers are much smaller.

Exercise 4.7
Age-adjusted Lung Cancer Death Rates per 100,000 Population, by State — United States, 2002

Displaying Public Health Data


Page 4-79
Self-Assessment Quiz
Now that you have read Lesson 4 and have completed the exercises, you
should be ready to take the self-assessment quiz. This quiz is designed to
help you assess how well you have learned the content of this lesson. You
may refer to the lesson text whenever you are unsure of the answer.
Unless otherw ise instructed, choose ALL correct choices for each question.

1. Tables and graphs are important tools for which tasks of an epidemiologist?
A. Data collection
B. Data summarization (descriptive epidemiology)
C. Data analysis
D. Data presentation

2. A table in a report or manuscript should include:


A. Title
B. Row and column labels
C. Footnotes that explain abbreviations, symbols, exclusions
D. Source of the data
E. Explanation of the key findings

3. The following table is unacceptable because the percentages add up to 99.9% rather than
100.0%
Age group No. Percent
< 1 year 10 19.6
1–4 9 17.6
5–9 9 17.6
10–14 17 33.3
≥ 15 6 11.8
Total 53

A. True
B. False

4. In the following table, the total number of persons with the disease is:
Cases Controls Total
Exposed 22 12 34
Unexposed 3 13 16
Total 25 25 50

A. 3
B. 22
C. 25
D. 34
E. 50

Displaying Public Health Data


Page 4-80
5. A table shell is the:
A. Box around the outside of a table
B. Lines (“skeleton”) of a table without the labels or title
C. Table with data but without the title, labels or data
D. Table with labels and title but without the data

6. The best time to create table shells is:


A. Just before planning a study
B. As part of planning the study
C. Just after collecting the data
D. Just before analyzing the data
E. As part of analyzing the data

7. Recommended methods for creating categories for continuous variables include:


A. Basing the categories on the mean and standard deviation
B. Dividing the data into categories with similar numbers of observations in each
C. Dividing the range into equal class intervals
D. Using categories that have been used in national surveillance summary reports
E. Using the same categories as your population data are grouped

8. In frequency distributions, observations with missing values should be excluded.


A. True
B. False

9. The following are reasonable categories for a disease that mostly affects people over age 65
years:
Age Group
< 65 years
65–70
70–75
75–80
80–85
85

A. True
B. False

10. In general, before you create a graph to display data, you should put the data into a table.
A. True
B. False

11. Onan arithmetic-scale line graph, the x-axis and y-axis each should:
A.Begin at zero on each axis
B.Have labels for the tick marks and each axis
C.Use equal distances along the axis to represent equal quantities (although the quantities
measured on each axis may differ)
D. Use the same tick mark spacing on the two axes

Displaying Public Health Data


Page 4-81
12. Use the following choices for Questions 12a–d:
A. Arithmetic-scale line graph
B. Semilogarithmic-scale line graph
C. Both
D. Neither

12a. ____ A wide range of values can be plotted and seen clearly, regardless of
magnitude
12b. ____ A constant rate of change would be represented by a curved line
12c. ____ The y-axis tick labels could be 0.1, 1, 10, and 100
12d. ____ Can plot numbers or rates

13. Use the following choices for Questions 13a–d:


A. Histogram
B. Bar chart
C. Both
D. Neither

13a. ____ Used for categorical variables on the x-axis


13b. ____ Columns can be subdivided with color or shading to show subgroups
13c. ____ Displays continuous data
13d. ____ Epidemic curve

14. Which of the following shapes of a population pyramid is most consistent with a young
population?
A. Tall, narrow rectangle
B. Short, wide rectangle
C. Triangle base down
D. Triangle base up

15. A frequency polygon differs from a line graph because a frequency polygon:
A. Displays a frequency distribution; a line graph plots data points
B. Must be closed (plotted line much touch x-axis) at both ends
C. Cannot be used to plot data over time
D. Can show percentages on the y-axis; a line graph cannot

16. Use the following choices for Questions 16a–d:


A. Cumulative frequency curve
B. Survival curve
C. Both
D. Neither

16a. ____ Y-axis shows percentages from 0% to 100%


16b. ____ Plotted curve usually begins in the upper left corner
16c. ____ Plotted curve usually begins in the lower left corner
16d. ____ Horizontal line drawn from 50% tick mark to plotted curve intersects at median

Displaying Public Health Data


Page 4-82
17. A scatter diagram is the graph of choice for plotting:
A. Anabolic steroid levels measured in both blood and urine among a group of athletes
B. Mean cholesterol levels over time in a population
C. Infant mortality rates by mean annual income among different countries
D. Systolic blood pressure by eye color (brown, blue, green, other) measured in each
person

18. Which of the following requires more than one variable?


A. Frequency distribution
B. One-variable table
C. Pie chart
D. Scatter diagram
E. Simple bar chart

19. Compared with a scatter diagram, a dot plot:


A. Is another name for the same type of graph
B. Differ because a scatter diagram plots two continuous variables; a dot plot plots one
continuous and one categorical variable
C. Differ because a scatter diagram plots one continuous and one categorical variable; a
dot plot plots two continuous variables
D. Plots location of cases on a map

20. A spot map must reflect numbers; an area map must reflect rates.
A. True
B. False

21. To display different rates on an area map using different colors, select different colors that
have the same intensity, so as not to bias the audience.
A. True
B. False

22. In an oral presentation, three-dimensional pie charts and three-dimensional columns in bar
charts are desirable because they add visual interest to a slide.
A. True
B. False

23. A 100% component bar chart shows the same data as a stacked bar chart. The key
difference is in the units on the x-axis.
A. True
B. False

24. When creating a bar chart, the decision to use vertical or horizontal bars is usually based
on:
A. The magnitude of the data being graphed and hence the scale of the axis
B. Whether the data being graphed represent numbers or percentages
C. Whether the creator is an epidemiologist (who almost always use vertical bars)
D. Which looks better, such as whether the label fits below the bar

Displaying Public Health Data


Page 4-83
25. Use the following choices for Questions 25a–d (match all that apply):
A. Grouped bar chart
B. Histogram
C. Line graph
D. Pie chart

25a. ____ Number of cases of dog bites over time


25b. ____ Number of cases of dog bites by age group (adult or child) and sex of the
victim
25c. ____ Number of cases of dog bites by breed of the dog
25d. ____ Number of cases of dog bites per 100,000 population over time

Displaying Public Health Data


Page 4-84
Answers to Self-Assessment Quiz
1. B, C, D. Tables and graphs are important tools for summarizing, analyzing, and
presenting data. While data are occasionally collected using a table (for example,
counting observations by putting tick marks into particular cells in table), this is not a
common epidemiologic practice.

2. A, B, C, D. A table in a printed publication should be self-explanatory. If a table is taken


out of its original context, it should still convey all the information necessary for the
reader to understand the data. Therefore, a table should include, in addition to the data,
a proper title, row and column labels, source of the data, and footnotes that explain
abbreviations, symbols, and exclusions, if any. Tables generally present the data, while
the accompanying text of the report may contain an explanation of key findings.

3. B (False). Rounding that results in totals of 99.9% or 100.1% is common in tables that
show percentages. Nonetheless, the total percentage should be displayed as 100.0%,
and a footnote explaining that the difference is due to rounding should be included.

4. C. In the two-by-two table presented in Question 4, the total number of cases is shown
as the total of the left column (labeled “Cases”). That column total number is 25.

5. D. A table shell is the skeleton of a table, complete with titles and labels, but without the
data. It is created when designing the analysis phase of an investigation. Table shells
help guide what data to collect and how to analyze the data.

6. B. Creation of table shells should be part of the overall study plan or protocol. Creation
of table shells requires the investigator to decide how to analyze the data, which
dictates what questions should be asked on the questionnaire.

7. A, B, C, D, E. All of the methods listed are in Question 6 are appropriate and commonly
used by epidemiologists

8. B (False). The number of observations with missing values is important when


interpreting the data, particularly for making generalizations.

9. B (False). The limits of the class intervals must not overlap. For example, would a 70-
year-old be counted in the 65–70 category or in the 70–75 category?

10. A (True). In general, before you create a graph, you should observe the data in a table.
By reviewing the data in the table, you can anticipate the range of values that must be
covered by the axes of a graph. You can also get a sense of the patterns in the data, so
you can anticipate what the graph should look like.

11. B, C. On an arithmetic-scale line graph, the axes and tick marks should be clearly
labeled. For both the x- and y-axis, a particular distance anywhere along the axis should
represent the same increase in quantity, although the x- and y-axis usually differ in what
is measured. The y-axis, measuring frequency, should begin at zero. But the x-axis,
which often measures time, need not start at zero.

12. a. B. One of the key advantages of a semilogarithmic-scale line graph is that it can display
Displaying Public Health Data
Page 4-85
a wide range of values clearly.

12b. A. A starting value of, say, 100,000 and a constant rate of change of, say, 10%, would
result in observations of 100,000, 110,000, 121,000, 133,100, 146, 410, 161,051, etc.
The resulting plotted line on an arithmetic-scale line graph would curve upwards. The
resulting plotted line on a semilogarithmic-scale line graph would be a straight line.

12c. B. Values of 0.1, 1,10, and 100 represent orders of magnitude typical of the y-axis of a
semilogarithmic-scale line graph.

12d. C. Both arithmetic-scale and semilogarithmic-scale line graphs can be used to plot
numbers or rates.

13. a. B. A bar chart is used to graph the frequency of events of a categorical variable such as
sex, or geographic region.

13b. C. The columns of either a histogram or a bar chart can be shaded to distinguish
subgroups. Note that a bar chart with shaded subgroups is called a stacked bar chart.

13c. A. A histogram is used to graph the frequency of events of a continuous variable such as
time.

13d. A. An epidemic curve is a particular type of histogram in which the number of cases (on
the y-axis) that occur during an outbreak or epidemic are graphed over time (on the x-
axis).

14. C. A typical population pyramid usually displays the youngest age group at the bottom
and the oldest age group at the top, with males on one side and females on the other
side. A young population would therefore have a wide bar at the bottom with gradually
narrowing bars above.

15. A, B. A frequency polygon differs from a line graph in that a frequency polygon
represents a frequency distribution, with the area under the curve proportionate to the
frequency. Because the total area must represent 100%, the ends of the frequency
polygon must be closed. Although a line graph is commonly used to display frequencies
over time, a frequency polygon can display the frequency distribution of a given period
of time as well. Similarly, the y-axis of both types of graph can measure percentages.

16. a. C. The y-axis of both cumulative frequency curves and survival curves typically display
percentages from 0% at the bottom to 100% at the top. The main difference is that a
cumulative frequency curve begins at 0% and increases, whereas a survival curve
begins at 100% and decreases.

16b. B. Because a survival curve begins at 100%, the plotted curve begins at the top of the
y-axis and at the beginning time interval (sometimes referred to as time-zero) of the x-
axis, i.e., in the upper left corner.

16c. A. Because a cumulative frequency curve begins at 0%., the plotted curve begins at the
base of the y-axis and at the beginning time interval (sometimes referred to as time-
zero) of the x-axis, i.e., in the lower left corner.

Displaying Public Health Data


Page 4-86
16d. C. Because the y-axis represents proportions, a horizontal line drawn from the 50% tick
mark to the plotted curve will indicate 50% survival or 50% cumulative frequency. The
median is another name for the 50% mark of a distribution of data.

17. A, C. A scatter diagram graphs simultaneous data points of two continuous variables for
individuals or communities. Drug levels, infant mortality, and mean annual income are
all examples of continuous variables. Eye color, at least as presented in the question, is
a categorical variable.

18. D. A frequency distribution, one-variable table, pie chart, and simple bar chart are all
used to display the frequency of categories of a single variable. A scatter diagram
requires two variables.

19. B. A scatter diagram graphs simultaneous data points of two continuous variables for
individuals or communities; whereas a dot plot graphs data points of a continuous
variable according to categories of a second, categorical variable.

20. B (False). The spots on a spot map usually reflect one or more cases, i.e., numbers. The
shading on an area map may represent numbers, proportions, rates, or other measures.

21. B (False). Shading should be consistent with frequency. So rather than using different
colors of the same intensity, increasing shades of the same color or family of colors
should be used.

22. B (False). The primary purpose of any visual is to communicate information clearly. 3-D
columns, bars, and pies may have pizzazz, but they rarely help communicate
information, and sometimes they mislead.

23. A (False). The difference between a stacked bar chart and a 100% component bar chart
is that the bars of a 100% component bar chart are all pulled to the top of the y-axis
(100%). The units on the x-axis are the same.

24. D. Any bar chart can be oriented vertically or horizontally. The creator of the chart can
choose, and often does so on the basis of consistency with other graphs in a series,
opinion about which orientation looks better or fits better, and whether the labels fit
adequately below vertical bars or need to placed beside horizontal bars.

25. a. B, C. Both line graphs and histograms are commonly used to graph numbers of cases
over time. Line graphs are commonly used to graph secular trends over longer time
periods; histograms are often used to graph cases over a short period of observation,
such as during an epidemic.

25b. A. A grouped bar chart (or a stacked bar chart) is ideal for graphing frequency over two
categorical variables. A pie chart is used for a single variable.

25c. D. A pie chart (or a simple bar chart) is used for graphing the frequency of categories of
a single categorical variable such as breed of dog.

25d. C. Rates over time are traditionally plotted by using a line graph.

Displaying Public Health Data


Page 4-87
References
1. Koschat MA. A case for simple tables. The American Statistician 2005;59:31–40.
2. Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance,
2002. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease
Control and Prevention, September 2003.
3. Pierchala C. The choice of age groupings may affect the quality of tabular presentations.
ASA Proceedings of the Joint Statistical Meetings; 2002; Alexandria, VA: American
Statistical Association; 2002:2697–702.
4. Daley RW, Smith A, Paz-Argandona E, Mallilay J, McGeehin M. An outbreak of carbon
monoxide poisoning after a major ice storm in Maine. J Emerg Med 2000;18:87–93.
5. Kalluri P, Crowe C, Reller M, Gaul L, Hayslett J, Barth S, Eliasberg S, Ferreira J, Holt K,
Bengston S, Hendricks K, Sobel J. An outbreak of foodborne botulism associated with food
sold at a salvage store in Texas. Clin Infect Dis 2003;37:1490–5.
6. Stevens JA, Powell KE, Smith SM, Wingo PA, Sattin RW. Physical activity, functional
limitations, and the risk of fall-related fractures in community-dwelling elderly. Ann
Epidemiol 1997;7:54–61.
7. Ahluwalia IB, Mack K, Murphy W, Mokdad AH, Bales VH. State-specific prevalence of
selected chronic disease-related characteristics–Behavioral Risk Factor Surveillance System,
2001. In: Surveillance Summaries, August 22, 2003. MMWR 2003;52(No. SS-08):1–80.
8. Langlois JA, Kegler SR, Butler JA, Gotsch KE, Johnson RL, Reichard AA, et al. Traumatic
brain injury-related hospital discharges: results from a 14-state surveillance system. In:
Surveillance Summaries, June 27, 2003. MMWR 2003;52(No. SS-04):1–18.
9. Chang J, Elam-Evans LD, Berg CJ, Herndon J, Flowers L, Seed KA, Syverson CJ.
Pregnancy-related mortality surveillance–United States, 1991-1999. In: Surveillance
Summaries, February 22, 2003. MMWR 2003;52(No. SS-02):1–8.
10. Centers for Disease Control and Prevention. HIV/AIDS Surveillance Report, 2003 (Vol. 15).
Atlanta, Georgia: US Department of Health and Human Services;2004:1–46.
11. Zhou W, Pool V, Iskander JK, English-Bullard R, Ball R, Wise RP, et al. Surveillance for
safety after immunization: Vaccine Adverse Event Reporting System (VAERS)–1991-2001.
In: Surveillance Summaries, January 24, 2003. MMWR 2003;52(No. SS-01):1–24.
12. Schmid CF, Schmid SE. Handbook of graphic presentation. New York: John Wiley & Sons,
1954.
13. Cleveland WS. The elements of graphing data. Summit, NJ: Hobart Press, 1994.
14. Brookmeyer R, Curriero FC. Survival curve estimation with partial non-random exposure
information. Statistics in Medicine 2002;21:2671–83.

Displaying Public Health Data


Page 4-88
15. Korn EL, Graubard BI. Scatterplots with survey data. The American Statistician 1998;52,58–
69.
16. Souvaine DL, Van Wyk CJ. How hard can it be to draw a pie chart? Mathematics Magazine
1990;63:165–72.
17. Luby SP, Agboatwalla M, Painter J, Altaf A, Billhimer WL, Hoekstra RM. Effect of
intensive handwashing promotion on childhood diarrhea in high-risk communities in
Pakistan: a randomized controlled trial. JAMA 2004; 291(21):2547–54.
18. Kafadar K. John Tkey and robustness. Statistical Science 2003:18:319–31.
19. Urbank S. Exploring statistical forests. ASA Proceedings of the Join Statistical Meetings;
2002; Alexandria, VA: American Statistical Association, 2002: 3535–40.
20. Amon J, Devasia R, Guoliang X, Vaughan G, Gabel J, MacDonald P, et al. Multiple hepatitis
A outbreaks associated with green onions among restaurant patrons–Tennessee, Georgia, and
North Carolina, 2003. Presented at 53rd Annual Epidemic Intelligence Service Conference,
April 19-23, 2004, Atlanta, Georgia.
21. Haddix AC, Teutsch SM, Corso PS. Prevention effectiveness: a guide to decision analysis
and economic evaluation. 2nd ed. New York, New York: Oxford University Press; October
2002.
22. Croner CM. Public health GIS and the internet. Annu Rev Public Health 2003;24:57–82.
23. Hilbe JM. Statistical computing software reviews. The American Statistician 2004;58:92.
24. Devlin SJ. Statistical graphs in customer survey research. ASA Proceedings of the Joint
Statistical Meetings 2003:1212–16.
25. Taub GE. A review of {\it ActivStats for SPSS\/}: Integrating SPSS instruction and
multimedia in an introductory statistics course. Journal of Educational and Behavioral
Statistics 2003;28:291–3.
26. Hilbe J. Computing and software: editor’s notes. Health Services & Outcomes Research
Methodology 2000;1:75–9.
27. Oster RA. An examination of five statistical software packages for epidemiology. The
American Statistician 1998;52:267–80.
28. Morgan WT. A review of eight statistics software packages for general use. The American
Statistician 1998;52:70–82.
29. Anderson-Cook CM. Data analysis and graphics using R: an example-based approach.
Journal of the American Statistical Association 2004;99:901–2.
30. Tufte ER. The visual display of quantitative information. Cheshire CT: Graphics Press, LLC;
2002.
31. Tufte ER. The visual display of quantitative information. Cheshire, CT: Graphics Press;
1983.

Displaying Public Health Data


Page 4-89
32. Olsen J. 2002. Using color in statistical graphs and maps. ASA Proceedings of the Joint
Statistical Meetings; 2002; Alexandria, VA: American Statistical Association; 2002: 2524-9.
33. Wainer H, Velleman PF. Statistical Graphics: mapping the pathways of science. Annual
Review of Psychology 2001;52:305–35.
34. Benedetto DD. Faces and the others: interactive expressions for observations. ASA
Proceedings of the Joint Statistical Meetings; 2003; Alexandria, VA: American Statistical
Association; 2003:520–7.
35. Weisstein EW. [Internet] MathWorld–A Wolfram Web Resource [updated 2006]. Chernoff
Face. Available from: http://mathworld.wolfram.com/ChernoffFace.html.

Websites
For more information on: Visit the following websites:
Age categorization used by CDC’s National Center for
http://www.cdc.gov/nchs/
Health Statistics
Age groupings used by the United States Census Bureau http://www.census.gov
CDC’s Morbidity and Mortality Weekly Report http://www.cdc.gov/mmwr/
Epi Info and EpiMap http://www.cdc.gov/epiinfo/
GIS at CDC http://www.cdc.gov/gis/
The R Project for Statistical Computing http://www.r-project.org
ColorBrewer: color advice for cartography http://www.colorbrewer.org

Instructions for Epi Info 6 (DOS)


To create a frequency distribution from a data set in Analysis Module:
EpiInfo6: >freq variable. Output provides columns for number, percentage, and
cumulative percentage.

To create a two-variable table from a data set in Analysis Module:


EpiInfo6: >Tables exposure_variable outcome_variable. Output shows table plus chi-
square and p-value. For a two-by-two table, output also provides risk ratio, odds ratio,
and confidence intervals.

Displaying Public Health Data


Page 4-90
Displaying Public Health Data
Page 4-91

You might also like