Displaying Public Health Data
Displaying Public Health Data
Most annual reports use a combination of tables, graphs, and charts to summarize and display
data clearly and effectively. Tables and graphs can be used to summarize a few dozen records or
a few million. They are used every day by epidemiologists to summarize and better understand
the data they or others have collected. They can demonstrate distributions, trends, and
relationships in the data that are not apparent from looking at individual records. Thus, tables and
graphs are critical tools for descriptive and analytic epidemiology. In addition, remembering the
adage that a picture is worth a thousand words, you can use tables and graphs to communicate
epidemiologic findings to others efficiently and effectively. This lesson covers tabular and
graphic techniques for data display; interpretation was covered in Lessons 2 and 3.
Objectives
After completing this lesson and answering the questions in the exercises, you will be able to:
• Prepare and interpret one, two, or three variable tables and composite tables (including
creating class intervals)
• Prepare and interpret arithmetic-scale line graphs, semilogarithmic-scale line graphs,
histograms, frequency polygons, bar charts, pie charts, maps, and area maps
• State the value and proper use of population pyramids, cumulative frequency graphs,
survival curves, scatter diagrams, box plots, dot plots, forest plots, and tree plots
• Identify when to use each type of table and graph
Major Sections
Introduction to Tables and Graphs ............................................................................................... 4-2
Tables ........................................................................................................................................... 4-4
Graphs ........................................................................................................................................ 4-23
Other Data Displays ................................................................................................................... 4-43
Using Computer Technology ..................................................................................................... 4-64
Summary .................................................................................................................................... 4-67
When the data are more complex, graphs and charts can help the
epidemiologist visualize broader patterns and trends and identify
variations from those trends. Variations in data may represent
important new findings or only errors in typing or coding which
need to be corrected. Thus, tables and graphs can be helpful tools
to aid in verifying and analyzing the data.
Displaying Public Health Data
Page 4-2
Once an analysis is complete, tables and graphs further serve as
useful visual aids for describing the data to others. When preparing
tables and graphs, keep in mind that your primary purpose is to
communicate information.
• Show totals for rows and columns, where appropriate. If you show percentages (%), also give their total (always
100).
• Identify missing or unknown data either within the table (for example, Table 4.11) or in a footnote below the table.
• Explain any codes, abbreviations, or symbols in a footnote (for example, Syphilis P&S = primary and secondary
syphilis).
• Note exclusions in a footnote (e.g., 1 case and 2 controls with unknown family history were excluded from this
analysis).
• Note the source of the data below the table or in a footnote if the data are not original.
One-variable tables
In descriptive epidemiology, the most basic table is a simple
frequency distribution with only one variable, such as Table 4.1a,
which displays number of reported syphilis cases in the United
States in 2002 by age group.2 (Frequency distributions are
discussed in Lesson 2.) In this type of frequency distribution table,
the first column shows the values or categories of the variable
represented by the data, such as age or sex. The second column
shows the number of persons or events that fall into each category.
In constructing any table, the choice of columns results from the
Table 4.1a Reported Cases of Primary and Secondary Syphilis by Age — United States, 2002
Age Group (years) Number of Cases
<14 21
15–19 351
20–24 842
25–29 895
30–34 1,097
35–39 1,367
40–44 1,023
45–54 982
≥55 284
Total 6,862
Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.
<14 21 0.3
15–19 351 5.1
20–24 842 12.3
25–29 895 13.0
30–34 1,097 16.0
35–39 1,367 19.9
40–44 1,023 14.9
45–54 982 14.3
≥55 284 4.1
Total 6,862 100.0*
* Actual total of percentages for this table is 99.9% and does not add to 100.0% due to rounding error.
Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.
Table 4.1c Reported Cases of Primary and Secondary Syphilis by Age — United States, 2002
CASES
Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.
<14 9 12 21
15–19 135 216 351
20–24 533 309 842
25–29 668 227 895
30–34 877 220 1,097
35–39 1,121 246 1,367
40–44 845 178 1,023
45–54 825 157 982
≥55 255 29 284
Total 5,268 1,594 6,862
Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department
of Health and Human Services; 2003.
Inside home or
23 23 46
attached structure
Generator location
Outside home 4 139 143
Hispanic <14 1 1 2
15–19 37 25 62
20–24 117 29 146
25–29 139 26 165
30–34 172 20 192
35–39 178 22 200
40–44 93 9 102
45–54 69 14 83
≥55 18 1 19
Total 824 147 971
1. Construct a table of the illness (botulism) by age group. Use botulism status (yes/no) as the
column labels and age groups as the row labels.
4. Construct a three-way table of illness (botulism) by exposure to chili and chili leftovers.
1 1 Y N - Y Y Y N
2 3 Y Y 8/27 Lab-confirmed Y Y N N
3 7 Y Y 8/31 Lab-confirmed Y Y N N
4 7 Y N - Y Y Y N
5 10 Y N - Y Y N Y
6 17 Y Y 8/28 Lab-confirmed Y Y Y N
7 21 Y N - N N N N
8 23 Y N - Y Y N N
9 25 Y Y 8/26 Epi-linked Y Y N N
10 29 N Y 8/28 Lab-confirmed Y Unk Unk Y
11 38 Y N - N N N N
12 39 Y N - N N N N
13 41 Y N - Y Y Y N
14 41 Y N - N N N N
15 42 Y Y 8/26 Lab-confirmed Y Y Unk N
16 45 Y Y 8/26 Lab-confirmed Y Y Y Y
17 45 Y Y 8/27 Epi-linked Y Y Y N
18 46 Y N - Y N Y N
19 47 Y N - Y N Y N
20 48 Y Y 9/1 Lab-confirmed Y Y Unk N
21 50 Y Y 8/29 Epi-linked Y Y N N
22 50 Y N - Y N Y N
23 50 Y N - Y N N Y
24 52 Y Y 8/28 Lab-confirmed Y Y Y N
25 52 Y N - N N N N
26 53 Y Y 8/27 Epi-linked Y Y Y N
27 53 Y N - Y Y Y N
28 62 Y Y 8/27 Epi-linked Y Y Y N
29 62 Y N - Y N Y N
30 63 Y N - N N N N
31 67 Y N - N N N N
32 68 Y N - N N N N
33 69 Y N - Y Y Y N
34 71 Y N - Y N Y N
35 72 Y Y 8/27 Lab-confirmed Y Y Y N
36 74 Y N - Y Y N N
37 74 Y N - Y N Y N
38 78 Y Y 8/25 Epi-linked Y Y Y N
Data Source: Kalluri P, Crowe C, Reller M, Gaul L, Hayslett J, Barth S, Eliasberg S, Ferreira J, Holt K, Bengston S, Hendricks K, Sobel
J. An outbreak of foodborne botulism associated with food sold at a salvage store in Texas. Clin Infect Dis 2003;37:1490–5.
Table 4.7 Rate per 100,000 Population for Reported Cases of Primary and Secondary Syphilis, by Age
and Race — United States, 2002
Age Group Am. Indian/ Asian/ Black, Non- White, Non-
(years) Alaska Native Pacific Is. Hispanic Hispanic Hispanic Total
Data Source: Daley RW, Smith A, Paz-Argandona E, Mallilay J, McGeehin M. An outbreak of carbon monoxide poisoning after a
major ice storm in Maine. J Emerg Med 2000;18:87–93.
Table shells
Although you cannot analyze data before you have collected them,
epidemiologists anticipate and design their analyses in advance to
delineate what the study is going to convey, and to expedite the
analysis once the data are collected. In fact, most protocols, which
are written before a study can be conducted, require a description
of how the data will be analyzed. As part of the analysis plan, you
can develop table shells that show how the data will be organized
and displayed. Table shells are tables that are complete except for
the data. They show titles, headings, and categories. In developing
table shells that include continuous variables such as age, we
create more categories than we may later use, in order to disclose
any interesting patterns and quirks in the data.
Table Shell 4.9a Anatomic Site of Fall-related Fractures Sustained by Participants, SAFE Study — Miami,
1987–1989
Fracture Site Number (Percent)
Skull ____ ( )
Spine ____ ( )
Clavicle (collarbone) ____ ( )
Scapula (shoulderblade) ____ ( )
Humerus (upper arm) ____ ( )
Radius / ulna (lower arm) ____ ( )
Bones of the hand ____ ( )
Ribs, sternum ____ ( )
Pelvis ____ ( )
Neck of femur (hip) ____ ( )
Other parts of femur (upper leg) ____ ( )
Patella (knee) ____ ( )
Tibia / fibula (lower leg) ____ ( )
Ankle ____ ( )
Bones of the foot ____ ( )
Adapted from: Stevens, JA, Powell KE, Smith SM, Wingo PA, Sattin RW. Physical activity, functional limitations, and the risk of fall-
related fractures in community-dwelling elderly. Annals of Epidemiology 1997;7:54–61.
Smoking status
Never smoked ____ ( ) ____ ( )
Former smoker ____ ( ) ____ ( )
Current smoker ____ ( ) ____ ( )
Unknown ____ ( ) ____ ( )
Now that the data in Table shells 4.9a and 4.9b have illustrated
descriptive characteristics of cases and controls in this study, we
are ready to refine the analysis by demonstrating the variability of
the data as assessed by statistical confidence intervals. Because of
the study design in this example, we have chosen the odds ratio to
assess statistical differences (see Lesson 3). Table shell 4.9c
illustrates a useful display for this information.
<1 infants • If you wish to calculate rates to illustrate the relative risk of
1–4 toddlers
adverse health events by these categories of risk factors, be
5–14 adolescents
15–24 teens and young sure that the intervals you choose for the classes of your data
adults are the same as the intervals for the denominators that you will
25–44 adults find for readily available data. For example, to compute rates
45–64 older adults
>65 elderly
of infant mortality by maternal age, you must find data on the
number of live-born infants to women; in determining age
groupings, consider what categories are used by the United
States Census Bureau.
Table 4.10 Age Groupings Used for Different Conditions, as Reported in Surveillance Summaries, CDC,
2003
Overweight In Traumatic Brain Pregnancy- Vaccine Adverse
Adults7 Injury8 Related Mortality9 HIV/AIDS10 Events11
18–24 years <4 years <19 years <13 years <1 year
25–34 5–14 20–24 13–14 1–6
35–44 15–19 25–29 15–24 7–17
45–54 20–24 30–34 25–34 18–64
55–64 25–34 35–39 35–44 >65
65–74 35–44 >40 45–54
>75 45–64 55–64
>65 >65
22.1–48.3 11 11
48.4–53.3 11 22
53.4–58.7 12 34
58.8–73.3 10 44
Missing data 7 51
Data Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 2002
Incidence and Mortality. Atlanta: U.S. Department of Health and Human Services, Centers
for Disease Control and Prevention and National Cancer Institute; 2005.
If you then select the obvious lower limit for each upper limit, you
have the six intervals:
Interval 6 = 71–82 Interval 3 = 41–50
Interval 5 = 61–70 Interval 2 = 31–40
Interval 4 = 51–60 Interval 1 = 19–30
You can create three or four intervals by combining some of the
adjacent six-interval limits.
Find the range of the values in your data set. That is, find the
difference between the maximum value (or some slightly larger
convenient value) and zero (or the minimum value).
Find what size of class interval to use by dividing the range by the
number of class intervals you have decided on.
Begin with the minimum value as the lower limit of your first
interval and specify class intervals of whatever size you calculated
until you reach the maximum value in your data.
40.0–44.9 3 3
45.0–49.9 18 21
50.0–54.9 25 46
55.0–59.9 5 51
60.0–64.9 1 52
Data Source: Behavioral Risk Factor Surveillance System [Internet]. Atlanta: Centers for
Disease Control and Prevention. Available from: http://www.cdc.gov/brfss/.
Use each strategy to create four class interval categories by using the lung cancer mortality rates shown in Table
4.13.
Table 4.13 Age-adjusted Lung Cancer Death Rates per 100,000 population, in Rank
Order by State — United States, 2000
Rate per Rate per
Rank State 100,000 Rank State 100,000
1 Kentucky 116.1 26 Florida 75.3
2 Mississippi 111.7 27 Kansas 74.5
3 West Virginia 104.1 28 Massachusetts 73.6
4 Tennessee 103.4 29 Alaska 72.9
5 Alabama 100.8 30 Oregon 72.7
6 Louisiana 99.2 31 New Hampshire 71.2
7 Arkansas 99.1 32 New Jersey 71.2
8 North Carolina 94.6 33 Washington 71.2
9 Georgia 93.2 34 Vermont 70.2
10 South Carolina 92.4 35 South Dakota 68.1
11 Indiana 91.6 36 Wisconsin 67.0
12 Oklahoma 89.4 37 Montana 66.5
13 Missouri 88.5 38 Connecticut 66.4
14 Ohio 85.6 39 New York 66.2
15 Virginia 83.0 40 Nebraska 65.6
16 Maine 80.2 41 North Dakota 64.9
17 Illinois 80.0 42 Wyoming 64.4
18 Texas 79.3 43 Arizona 62.0
19 Maryland 79.2 44 Minnesota 60.7
20 Nevada 78.7 45 California 60.1
21 Delaware 78.2 46 Idaho 59.7
22 Rhode Island 77.9 47 New Mexico 52.3
23 Iowa 77.0 48 Colorado 52.1
24 Michigan 76.7 49 Hawaii 49.8
25 Pennsylvania 76.5 50 Utah 39.7
Total United States 76.9
Data Source: Stewart SL, King JB, Thompson TD, Friedman C, Wingo PA. Cancer Mortality–United States,
1990-2000. In: Surveillance Summaries, June 4, 2004. MMWR 2004;53 (No. SS-3):23–30.
50 states / 4 = 12.5 states per group. Because states can’t be cut in half, use two groups of 12 states and two
groups of 13 states. Missouri (#13) could go into either the first or second group and Connecticut (#38) could
go into either third or fourth group. Arbitrarily putting Missouri in the second category and Connecticut into the
third results in the following groups:
a. Kentucky through Oklahoma (States 1–12)
b. Missouri through Pennsylvania (States 13–25)
c. Florida through Connecticut (States 26–38)
d. New York through Utah (States 39–50)
2. Identify the rate for the first and last state in each group:
a. Oklahoma through Kentucky 89.4–116.1
b. Pennsylvania through Missouri 76.5–88.5
c. Connecticut through Florida 66.4–75.3
d. Utah through New York 39.7–66.2
3. Adjust the limits of each interval so no gap exists between the end of one class interval and beginning of the
next. Deciding how to adjust the limits is somewhat arbitrary — you could split the difference, or use a
convenient round number.
a. Oklahoma through Kentucky 89.0–116.1
b. Pennsylvania through Missouri 76.0–88.9
c. Connecticut through Florida 66.3–75.9
d. Utah through New York 39.7–66.2
3. Select the lower limit for each upper limit to define four full intervals. Specify the states that fall into each
interval. (Note: To place the states with the highest rates first, reverse the order of the intervals):
a. North Carolina through Kentucky (8 states) 93.3–116.1
b. Rhode Island through Georgia (14 states) 77.1–93.2
c. Arizona through Iowa (21 states) 61.1–77.1
d. Utah through Minnesota (7 states) 39.7–61.0
6. Final categories:
a. Arkansas through Kentucky (7 states) 97.1–116.1
b. Delaware through North Carolina (14 states) 78.0–97.0
c. Idaho through Rhode Island (25 states) 58.9–77.9
d. Utah through New Mexico (4 states) 39.7–58.8
7. Alternatively, since 19.1 is close to 20, multiples of 20 might be used to create the four categories that might
look cleaner. For example, the final categories could look like:
a. Arkansas through Kentucky (7 states) 97.0–116.9
b. Iowa through North Carolina (16 states) 77.0–96.9
c. Idaho through Michigan (23 states) 57.0–76.9
d. Utah through New Mexico (4 states) 37.0–56.9
OR
a. Alabama through Kentucky (5 states) 100.0–119.9
b. Illinois through Louisiana (12 states) 80.0–99.9
c. California through Texas (28 states) 60.0–79.9
d. Utah through Idaho (5 states) 39.7–59.9
Scenario: Table 4.14 shows the number of measles cases by year of report from 1950 to 2003. The number of
measles cases in years 1950 through 1954 has been plotted in Figure 4.1, below. The independent variable, years,
is shown on the horizontal axis. The dependent variable, number of cases, is shown on the vertical axis. A grid is
included in Figure 4.1 to illustrate how points are plotted. For example, to plot the point on the graph for the
number of cases in 1953, draw a line up from 1953, and then draw a line from 449 cases to the right. The point
where these lines intersect is the point for 1953 on the graph.
Your Turn: Use the data in Table 4.14 to plot the points for 1955 to 1959 and complete the graph in Figure 4.1.
Figure 4.1 Partial Graph of Measles by Year of Report — United States, 1950–1959
Table 4.14 Number of Reported Measles Cases, by Year of Report — United States, 1950–2003
Year Cases Year Cases Year Cases
1950 319,000 1970 47,351 1990 27,786
1951 530,000 1971 75,290 1991 9,643
1952 683,000 1972 32,275 1992 2,237
1953 449,000 1973 26,690 1993 312
1954 683,000 1974 22,094 1994 963
1955 555,000 1975 24,374 1995 309
1956 612,000 1976 41,126 1996 508
1957 487,000 1977 57,345 1997 138
1958 763,000 1978 26,871 1998 100
1959 406,000 1979 13,597 1999 100
1960 442,000 1980 13,506 2000 86
1961 424,000 1981 3,124 2001 116
1962 482,000 1982 1,714 2002 44
1963 385,000 1983 1,497 2003 56
1964 458,000 1984 2,587
1965 262,000 1985 2,822
1966 204,000 1986 6,282
1967 62,705 1987 3,655
1968 22,231 1988 3,396
1969 25,826 1989 18,193
Data Sources: Centers for Disease Control and Prevention. Summary of notifiable diseases–United States, 1989. MMWR
1989;38(No. 54).
Centers for Disease Control and Prevention. Summary of notifiable diseases–United States, 2002. MMWR 2002;51(No. 53)
Centers for Disease Control and Prevention. Summary of notifiable diseases–United States, 2003. MMWR 2005;52(No. 54)
Source: Honein MA, Paulozzi LJ, Mathews TJ, Erickson JD, Wong L-Y. Impact of folic acid
fortification of the US food supply on the occurrence of neural tube defects. JAMA
2001;285:2981–6.
Displaying Public Health Data
Page 4-25
Figure 4.3 shows another example of an arithmetic-scale line
graph. Here the y-axis is a calculated variable, median age at death
of people born with Down’s syndrome from 1983–1997. Here also,
we see the value of showing two data series on one graph; we can
compare the mortality risk for males and females.
Source: Yang Q, Rasmussen A, Friedman JM. Mortality associated with Down’s syndrome in
the USA from 1983 to 1997: a population-based study. Lancet 2002;359:1019–25.
When you create an arithmetic-scale line graph, you need to select a scale for the x- and y-axes. The scale should
reflect both the data and the point of the graph. For example, if you use the data in Table 4.14 to graph the number
of cases of measles cases by year from 1990 to 2002, then the scale of the x-axis will most likely be year of report,
because that is how the data are available. Consider, however, if you had line-listed data with the actual dates of
onset or report that spanned several years. You might prefer to plot these data by week, month, quarter, or even
year, depending on the point you wish to make.
The following steps are recommended for creating a scale for the y-axis.
• Make the length of the y-axis shorter than the x-axis so that your graph is horizontal or “landscape.” A 5:3 ratio
is often recommended for the length of the x-axis to y-axis.
• Always start the y-axis with 0. While this recommendation is not followed in all fields, it is the standard practice
in epidemiology.
• Determine the range of values you need to show on the y-axis by identifying the largest value you need to graph
on the y-axis and rounding that figure off to a slightly larger number. For example, the largest y-value in Figure
4.3 is 49 years in 1997, so the scale on the y-axis goes up to 50. If median age continues to increase and
exceeds 50 in future years, a future graph will have to extend the scale on the y-axis to 60 years.
• Space the tick marks and their labels to describe the data in sufficient detail for your purposes. In Figure 4.3, five
intervals of 10 years each were considered adequate to give the reader a good sense of the data points and
pattern.
1. Construct an arithmetic-scale line graph of rate by year. Use intervals on the y-axis that are
appropriate for the range of data you are graphing.
2. Construct a separate arithmetic-scale line graph of the measles rates from 1985 to 2002.
Use intervals on the y-axis that are appropriate for the range of data you are graphing.
Table 4.15 Rate (per 100,000 Population) of Reported Measles Cases by Year of Report — United
States, 1955–2002
Rate per Rate per
Year 100,000 Year Rate per 100,000 Year 100,000
Data Sources: Centers for Disease Control. Summary of notifiable diseases–United States, 1989. MMWR 1989;38(No. 54).
Centers for Disease Control and Prevention. Summary of notifiable diseases–United States, 2002. Published April 30, 2004 for
MMWR 2002;51(No. 53).
Source: Centers for Disease Control and Prevention. Summary of notifiable diseases–United
States, 2003. Published April 22, 2005, for MMWR 2003;52(No. 54):54.
Adapted from: Kochanek KD, Murphy SL, Anderson RN, Scott C. Deaths: final data for
2002. National vital statistics report; vol 53, no 5. Hyattsville, Maryland: National Center for
Health Statistics, 2004. p. 9.
0 1,000,000 1,000,000
1 1,100,000 10.0% 1,100,000 10.0%
2 1,200,000 9.1% 1,210,000 10.0%
3 1,300,000 8.3% 1,331,000 10.0%
4 1,400,000 7.7% 1,464,100 10.0%
5 1,500,000 7.1% 1,610,510 10.0%
6 1,600,000 6.7% 1,771,561 10.0%
7 1,700,000 6.3% 1,948,717 10.0%
8 1,800,000 5.9% 2,143,589 10.0%
9 1,900,000 5.6% 2,357,948 10.0%
10 2,000,000 5.3% 2,593,742 10.0%
11 2,100,000 5.0% 2,853,117 10.0%
12 2,200,000 4.8% 3,138,428 10.0%
13 2,300,000 4.4% 3,452,271 10.0%
14 2,400,000 4.3% 3,797,498 10.0%
15 2,500,000 4.2% 4,177,248 10.0%
16 2,600,000 4.0% 4,594,973 10.0%
17 2,700,000 3.8% 5,054,470 10.0%
18 2,800,000 3.7% 5,559,917 10.0%
19 2,900,000 3.6% 6,115,909 10.0%
20 3,000,000 3.4% 6,727,500 10.0%
To create a semilogarithmic
graph from a data set in
Analysis Module:
Data Source: Centers for Disease Control and Prevention. Prevalence of overweight and
obesity among adults with diagnosed diabetes–United States, 1988-1994 and 1999-2002.
MMWR 2004;53:1066–8.
Adapted from: Centers for Disease Control and Prevention. Changing patterns of
pneumoconiosis mortality–United States, 1968-2000. MMWR 2004;53:627–31.
Data Source: Centers for Disease Control and Prevention. Changing patterns of
pneumoconiosis mortality–United States, 1968-2000. MMWR 2004;53:627–31.
Source: U.S. Census Bureau [Internet]. Washington, DC: IDB Population Pyramids [cited
2004 Sep 10]. Available from: http://www.census.gov/population/international/.
Source: U.S. Census Bureau [Internet]. Washington, DC: IDB Population Pyramids [cited
2004 Sep 10]. Available from: http://www.census.gov/population/international/.
Answer “yes” to both questions: “Do you now smoke cigarettes everyday or some days?”
and “Have you smoked at least 100 cigarettes in your entire life?”
Data Source: Centers for Disease Control and Prevention. Cigarette smoking among adults–
United States, 2002. MMWR 2004;53:427–31.
Frequency polygons
A frequency polygon, like a histogram, is the graph of a frequency
distribution. In a frequency polygon, the number of observations
within an interval is marked with a single point placed at the
midpoint of the interval. Each point is then connected to the next
with a straight line. Figure 4.13 shows an example of a frequency
polygon over the outline of a histogram for the same data. This
graph makes it easy to identify the peak of the epidemic (4 weeks).
Figure 4.13 Comparison of Frequency Polygon and Histogram
Data Source: Centers for Disease Control and Prevention. Changing patterns of
pneumoconiosis mortality–United States, 1968-2000. MMWR 2004;53:627–31.
Data Source: Centers for Disease Control and Prevention. Changing patterns of
pneumoconiosis mortality–United States, 1968-2000. MMWR 2004;53:627–31.
Source: Anderson RN. United States life tables, 1997. National vital statistics reports; vol
47, no. 28. Hyattsville, Maryland: National Center for Health Statistics, 1999.
Scatter diagram s
A scatter diagram (or “scattergram”) is a graph that portrays the
relationship between two continuous variables, with the x-axis
representing one variable and the y-axis representing the other.15
To create a scatter diagram you must have a pair of values (one for
each variable) for each person, group, country, or other entity in
the data set, one value for each variable. A point is placed on the
graph where the two values intersect. For example, demographers
may be interested in the relationship between infant mortality and
total fertility in various nations. Figure 4.19 plots the total fertility
rate (estimated average number of children per woman) by the
infant mortality rate in 194 countries, so this scatter diagram has
194 data points.
Data Source: Population Reference Bureau [Internet]. Datafinder [cited 2004 Dec 13].
Available from: http://www.prb.org/datafind/datafinder7.htm.
Bar charts
A bar chart uses bars of equal width to display comparative data.
Comparison of categories is based on the fact that the length of the
bar is proportional to the frequency of the event in that category.
Therefore, breaks in the scale could cause the data to be
misinterpreted and should not be used in bar charts. Bars for
different categories are separated by spaces (unlike the bars in a
histogram). The bar chart can be portrayed with the bars either
vertical or horizontal. (This choice is usually made based on the
length of text labels — long labels fit better on a horizontal chart
than a vertical one) The bars are usually arranged in ascending or
descending length, or in some other systematic order dictated by
any intrinsic order of the categories. Appropriate data for bar
charts include discrete data (e.g., race or cause of death) or
variables treated as though they were discrete (age groups). (Recall
that a histogram shows frequency of a continuous variable, such as
dates of onset of symptoms).
• Arrange the categories that define the bars or groups of bars in a natural order, such as alphabetical or
increasing age, or in an order that will produce increasing or decreasing bar lengths.
• Choose whether to display the bars vertically or horizontally.
• Make all of the bars the same width.
• Make the length of bars in proportion to the frequency of the event. Do not use a scale break, because the
reader could easily misinterpret the relative size of different categories.
• Show no more than five bars within a group of bars, if possible.
• Leave a space between adjacent groups of bars but not between bars within a group (see Figure 4.22).
• Within a group, code different variables by differences in bar color, shading, cross hatching, etc. and include a
legend that interprets your code.
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online
database] Atlanta; National Center for Injury Prevention and Control. [cited 2006 Feb 15].
Available from: http://www.cdc.gov/injury/wisqars/.
Figure 4.21 Percentage of Persons Aged >18 Years Who Were Current
Smokers, by Age and Sex — United States, 2002
Data Source: Centers for Disease Control and Prevention. Cigarette smoking among adults–
United States, 2002. MMWR 2004;53:427–31.
The bar chart in Figure 4.22a shows the leading causes of death in
1997 and 2003 among persons ages 25–34 years. The graph is
more effective at showing the differences in causes of death during
the same year than in showing differences in a single cause
between years. While the decline in deaths due to HIV infection
between 1997 and 2003 is quite apparent, the smaller drop in heart
disease is more difficult to see. If the goal of the figure is to
compare specific causes between the two years, the bar chart in
Figure 4.22b is a better choice.
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.
Figure 4.22b Number of Deaths by Cause Among 25–34 Year Olds — United States, 1997 and 2003
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.
To see the difference between grouped and stacked bar charts, look
at Figure 4.23. This figure shows the same data as Figures 4.22a
and 4.22b. With the stacked bar chart, you can easily see the
change in the total number of deaths between the two years;
however, it is difficult to see the values of each cause of death. On
the other hand, with the grouped bar chart, you can more easily see
the changes by cause of death.
Figure 4.23 Number of Deaths by Cause Among 25–44 Year Olds — United States, 1997 and 2003
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.
Source: Langlois JA, Kegler SR, Butler JA, Gotsch KE, Johnson RL, Reichard AA, et al.
Traumatic brain injury-related hospital discharges: results from a 14-state surveillance
system. In: Surveillance Summaries, June 27, 2003. MMWR 2003;52(No. SS-04):1–18.
Source: Centers for Disease Control and Prevention. Figure 1. Selected notifiable disease
reports, United States, comparison of provisional 4-week totals ending December 11, 2004,
with historical data. MMWR 2004;53:1161.
Table 4.17 Number of Reported Cases of Primary and Secondary Syphilis, by Age Group, Among Non-
Hispanic Black and White Men and Women — United States, 2002
Black White Black White
Age Group (Years) Men Men Women Women
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online
database] Atlanta; National Center for Injury Prevention and Control. [cited 2006 Feb 15].
Available from: http://www.cdc.gov/injury/wisqars/.
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.
Figure 4.28b Number of Deaths by Cause Among 25–34 and 35-44 Year Olds — United States, 2003
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online database] Atlanta; National Center for
Injury Prevention and Control. [cited 2006 Feb 15]. Available from: http://www.cdc.gov/injury/wisqars/.
Source: Luby SP, Agboatwalla M, Painter J, Altaf A, Billhimer WL, Hoekstra RM. Effect of
intensive handwashing promotion on childhood diarrhea in high-risk communities in
Pakistan: a randomized controlled trial. JAMA 2004;291:2547–54.
Adapted from: Kern P, Ammon A, Kron M, Sinn G, Sander S, Petersen LR, et al. Risk factors
for alveolar echinococcosis in humans. Emerg Infect Dis 2004;10:2089-93.
Forest plots
A forest plot, also called a confidence interval plot, is used to
display the point estimates and confidence intervals of individual
studies assembled for a meta-analysis or systematic review.19 In
the forest plot, the variable on the x-axis is the primary outcome
measure from each study (relative risk, treatment effects, etc.). If
risk ratio, odds ratio, or another ratio measure is used, the x-axis
uses a logarithmic-scale. This is because the logarithmic
transformation of these risk estimates has a more symmetric
distribution than do the risk estimates themselves (since the risk
estimates can vary from zero to an arbitrarily large number). Each
study is represented by a horizontal line — reflecting the
confidence interval — and a dot or square — reflecting the point
estimate — usually due to study size or some other aspect of study
design (Figure 4.31). The shorter the horizontal line, the more
precise the study’s estimate. Point estimates (dots or squares) that
line up reasonably well indicate that the studies show a relatively
consistent effect. A vertical line indicates where no effect (relative
risk = 1 or treatment effect = 0) falls on the x-axis. If a study’s
horizontal line does not cross the vertical line, that study’s result is
statistically significant. From a forest plot, one can easily ascertain
patterns among studies as well as outliers.
Displaying Public Health Data
Page 4-56
Figure 4.31 Net Change in Glycohemoglobin (GHb) Following Self-
management Education Intervention for Adults with Type 2 Diabetes,
by Different Studies and Follow-up Intervals, 1980–1999
Source: Norris SL, Lau J, Smith SJ, Schmid CH, Engelgau MM. Self-management education
for adults with type 2 diabetes. Diabetes Care 2002;25:1159–71.
P hylogenetic trees
A phylogenetic tree, a type of dendrogram, is a branching chart
that indicates the evolutionary lineage or genetic relatedness of
organisms involved in outbreaks of illness. Distance on the tree
reflects genetic differences, so organisms that are close to one
another on the tree are more related than organisms that are further
apart. The phylogenetic tree in Figure 4.32 shows that the
organisms isolated from patients with restaurant-associated
hepatitis A in Georgia and North Carolina were identical and
closely related to those from patients in Tennessee.20 Furthermore,
these organisms were similar to those typically seen in patients
from Mexico. These microbiologic data supported epidemiologic
data which implicated green onions from Mexico.
Source: Amon JJ, Devasia R, Guoliang X, Vaughan G, Gabel J, MacDonald P, et al. Multiple
hepatitis A outbreaks associated with green onions among restaurant patrons–Tennessee,
Georgia, and North Carolina, 2003. Presented at 53rd Annual Epidemic Intelligence Service
Conference, April 19-23, 2004, Atlanta, Georgia.
Decision trees
A decision tree is a branching chart that represents the logical
sequence or pathway of a clinical or public health decision.21
Decision analysis is a systematic method for making decisions
when outcomes are uncertain. The basic building blocks of a
decision analysis are (1) decisions, (2) outcomes, and (3)
probabilities.
Source: Tyagi A, Morris J. Using decision analytic methods to assess the utility of family
history tools. Am J Prev Med 2003;24:199–207.
M aps
Maps are used to show the geographic location of events or
attributes. Two types of maps commonly used in field
epidemiology are spot maps and area maps. Spot maps use dots or
other symbols to show where each case-patient lived or was
Displaying Public Health Data
Page 4-59
exposed. Figure 4.34 is a spot map of the residences of persons
EpiMap is an application of with West Nile Virus encephalitis during the outbreak in the New
Epi Info for creating maps
and overlaying survey York City area in 1999.A spot map is useful for showing the
data, and is available for geographic distribution of cases, but because it does not take the
download. size of the population at risk into account a spot map does not
show risk of disease. Even when a spot map shows a large number
of dots in the same area, the risk of acquiring disease may not be
particularly high if that area is densely populated.
• Excellent examples of the use of maps to display public health data are
available in these selected publications:
Source: Centers for Disease Control and Prevention. Changing patterns of pneumoconiosis
mortality–United States, 1968-2000. MMWR 2004;53:627–31.
A geographic information system is a computer system for the input, editing, storage, retrieval, analysis, synthesis,
and output of location-based information.22 In public health, GIS may use geographic distribution of cases or risk
factors, health service availability or utilization, presence of insect vectors, environmental factors, and other
location-based variables. GIS can be particularly effective when layers of information or different types of
information about place are combined to identify or clarify geographic relationships. For example, in Figure 4.36,
human cases of West Nile virus are shown as dots superimposed over areas of high crow mortality within the
Chicago city limits.
Source: Watson JT, Jones RC, Gibbs K, Paul W. Dead crow reports and location of human West
Nile virus cases, Chicago, 2002. Emerg Infect Dis 2004;10:938–40.
On the other hand, these packages tend to have default values that
Many software packages differ from standard epidemiologic practice. Do not let the
are available for producing
all the tables and charts software package dictate the appearance of the graph. Remember
discussed in this chapter. the adage: let the computer do the work, but you still must do the
One particularly helpful
29
thinking. Keep in mind the primary purpose of the graph — to
one is R, used by communicate information to others. For example, many packages
universities and available
can draw bar charts and pie charts that appear three-dimensional.
for no charge around the
world. In addition to Will a three-dimensional chart communicate the information better
graphical techniques, R than a two-dimensional one?
provides a wide variety of
statistical techniques
(including linear and
Compare and contrast the effectiveness of Figure 4.37a and 4.37b
nonlinear modeling, in communicating information.
classical statistical tests,
time-series analysis, Figure 4.37a Past Month Marijuana Use Among Youths Aged 12–17, by
classification, and Geographic Region — United States, 2003 and 2004
clustering).
Data Source: Substance Abuse and Mental Health Services Administration. (2005). Results
from the 2004 National Survey on Drug Use and Health: National Findings (Office of
Applied Studies, NSDUH Series H-28, DHHS Publication No. SMA 05-4062). Rockville, MD.
Data Source: Substance Abuse and Mental Health Services Administration. (2005). Results
from the 2004 National Survey on Drug Use and Health: National Findings (Office of
Applied Studies, NSDUH Series H-28, DHHS Publication No. SMA 05-4062). Rockville, MD.
Many people misuse technology in selecting color, particularly for slides that accompany oral presentations.32 If you
use colors, follow these recommendations.
• Select colors so that all components of the graph — title, axes, data plots, and legends — stand out clearly from
the background and each plotted series of data can be distinguished from the others.
• Avoid contrasting red and green, because up to 10% of males in the audience may have some degree of color
blindness.
• Use colors or shades to communicate information, particularly with area maps. For example, for an area map in
which states are divided into four groups according to their rates for a particular disease, use a light color or
shade for the states with the lowest rates and use progressively darker colors or shades for the groups with
progressively higher rates. In this way, the colors or shades contribute directly to the impression you want the
viewer to have about the data.
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online
database] Atlanta; National Center for Injury Prevention and Control. [cited 2006 Feb 15].
Available from: http://www.cdc.gov/injury/wisqars/.
Data Source: Web-based Injury Statistics Query and Reporting System (WISQARS) [online
database] Atlanta; National Center for Injury Prevention and Control. [cited 2006 Feb 15].
Available from: http://www.cdc.gov/injury/wisqars/.
Displaying Public Health Data
Page 4-66
Summary
Much work has been done on other graphical methods of presentation.33 One of the more
creative is face plots.34 Originally developed by Chernoff,35 these give a way to display n
variables on a two-dimensional surface. For instance, suppose you have several variables (x, y, z,
etc.) that you have collected on each of n people, and for purposes of this illustration, suppose
each variable can have one of 10 possible values. We can let x be eyebrow slant, y be eye size, z
be nose length, etc. The figures below show faces produced using 10 characteristics — head
eccentricity, eye size, eye spacing, eye eccentricity, pupil size, eyebrow slant, nose size, mouth
shape, mouth size, and mouth opening) — each assigned one of 10 possible values.
Source: Weisstein, Eric W. Chernoff Face. From MathWorld — A Wolfram Web Resource.
http://mathworld.wolfram.com/ChernoffFace.html.
To convey the messages of epidemiologic findings, you must first select the best illustration
method. Tables are commonly used to display numbers, rates, proportions, and cumulative
percents. Because tables are intended to communicate information, most tables should have no
more than two variables and no more than eight categories (class intervals) for any variable.
Printed tables should be properly titled, labeled, and referenced; that is, they should be able to
stand alone if separated from the text.
Tables can be used with either nominal or continuous ordinal data. Nominal variables such as sex
and state of residence have obvious categories. For continuous variables that do not have obvious
categories, class intervals must be created. For some diseases, standard class intervals for age
have been adopted. Otherwise a variety of methods are available for establishing reasonable class
intervals. These include class intervals with an equal number of people or observations in each;
class intervals with a constant width; and class intervals based on the mean and standard
deviation.
Graphs can visually communicate data rapidly. Arithmetic-scale line graphs have traditionally
been used to show trends in disease rates over time. Semilogarithmic-scale line graphs are
preferred when the disease rates vary over two or more orders of magnitude. Histograms and
frequency polygons are used to display frequency distributions. A special type of histogram
Displaying Public Health Data
Page 4-67
known as an epidemic curve shows the number of cases by time of onset of illness or time of
diagnosis during an epidemic period. The cases may be represented by squares that are stacked to
form the columns of the histogram; the squares may be shaded to distinguish important
characteristics of cases, such as fatal outcome.
Simple bar charts and pie charts are used to display the frequency distribution of a single
variable. Grouped and stacked bar charts can display two or even three variables.
Spot maps pinpoint the location of each case or event. An area map uses shading or coloring to
show different levels of disease numbers or rates in different areas.
The final pages of this lesson provide guidance in the selection of illustration methods and
construction of tables and graphs. When using each of these methods, it is important to
remember their purpose: to summarize and to communicate. Even the best method must be
constructed properly or the message will be lost. Glitzy and colorful are not necessarily better;
sometimes less is more!
Semilogarithmic scale line graph Display rate of change over time; appropriate for values ranging over more than
2 orders of magnitude
Simple bar chart Compare size or frequency of different categories of a single variable
Grouped bar chart Compare size or frequency of different categories of 2 4 series of data
Stacked bar chart Compare totals and illustrate component parts of the total among different
groups
Deviation bar chart Illustrate differences, both positive and negative, from baseline
100% component bar chart Compare how components contribute to the whole in different groups
Place data Numbers Not readily identifiable on map Bar chart or pie chart
1. Title
• Does the table have a title?
• Does the title describe the objective of the data display and its content, including subject, person, place,
and time?
• Is the title preceded by the designation “Table #''? (“Table'' is used for typed text; “Figure'' is used for
graphs, maps, and illustrations. Separate numerical sequences are used for tables and figures in the same
document (e.g., Table 4.1, Table 4.2; Figure 4.1, Figure 4.2).
3. Footnotes
• Are all codes, abbreviations, or symbols explained?
• Are all exclusions noted?
• If the data are not original, is the source provided?
• If source is from website, is complete address specified; and is current, active, and reference date cited?
1. Title
• Does the graph or chart have a title?
• Does the title describe the content, including subject, person, place, and time?
• Is the title preceded by the designation “Figure #''? (“Table'' is used for typed text; “Figure'' is used for
graphs, charts, maps, and illustrations. Separate numerical sequences are used for tables and figures in the
same document (e.g., Table 1, Table 2; Figure 1, Figure 2).
2. Axes
• Is each axis labeled clearly and concisely?
• Are the specific units of measurement included as part of the label? (e.g., years, mg/dl, rate per 100,000)
• Are the scale divisions on the axes clearly indicated?
• Are the scales for each axis appropriate for the data?
• Does the y axis start at zero?
• If a scale break is used with an arithmetic-scale line graph, is it clearly identified?
• Has a scale break been used with a histogram, frequency polygon, or bar chart? (Answer should be NO!)
• Are the axes drawn heavier than the other coordinate lines?
• If two or more graphs are to be compared directly, are the scales identical?
3. Grid Lines
• Does the figure include only as many grid lines as are necessary to guide the eye? (Often, these are
unnecessary.)
4. Data plots
• Does the table have a title?
• Are the plots drawn clearly?
• Are the data lines drawn more heavily than the grid lines?
• If more than one series of data or components is shown, are they clearly distinguishable on the graph?
• Is each series or component labeled on the graph, or in a legend or key?
• If color or shading is used on an area map, does an increase in color or shading correspond to an increase
in the variable being shown?
• Is the main point of the graph obvious, and is it the point you wish to make?
5. Footnotes
• Are all codes, abbreviations, or symbols explained?
• Are all exclusions noted?
• If the data are not original, is the source provided?
6. Visual Display
• Does the figure include any information that is not necessary?
• Is the figure positioned on the page for optimal readability?
• Do font sizes and colors improve readability?
1. Legibility (make sure your audience can easily read your visuals)
• When projected, can your visuals be read from the farthest parts of the room?
3. Color
• Colors have an impact on the effect of your visuals. Use warm/hot colors to emphasize, to highlight, to
focus, or to reinforce key concepts. Use cool/cold colors for background or to separate items. The following
table describes the effect of different colors.
Hot Warm Cool Cold
Red Light orange Light blue Dark blue
Bright orange Light yellow Light green Dark green
Colors:
Bright yellow Light gold Light purple Dark purple
Bright gold Browns Light gray Dark gray
• Are you using the best color combinations? The most important item should be in the text color that has
the greatest contrast with its background. The most legible color combinations are:
Black on yellow
Black on white
Dark Green on white
Dark Blue on white
White on dark blue (yellow titles and white text on a dark blue background is a favorite choice
among epidemiologists)
• Restrict use of red except as an accent.
4. Accuracy
• Slides are distracting when mistakes are spotted. Have someone who has not seen the slide before check
for typos, inaccuracies, and errors in general.
Exercise 4.1
1.
Botulism Status by Age Group, Texas Church Supper Outbreak, 2001
Botulism Status
Age Group (Years) Yes No
≤9 2 2
10–19 1 1
20–29 2 2
30–39 0 2
40–49 4 4
50–59 3 4
60–69 1 5
70–79 2 3
≥80 0 0
Total 15 23
2.
Botulism Status by Exposure to Chicken,* Texas Church Supper Outbreak, 2001
Botulism?
Yes No Total
Yes 8 11 19
Ate chicken?
No 4 12 16
Total 12 23 35
* Excludes 3 botulism case-patients with unknown exposure to chicken
3.
Botulism Status by Exposure to Chili,* Texas Church Supper Outbreak, 2001
Botulism?
Yes No Total
Yes 14 8 22
Ate chili?
No 0 15 15
Total 14 23 37
* Excludes 1 botulism case-patient with unknown exposure to chili
Yes No Total
Yes 1/1 13 / 7 22
Ate chili?
No 0/1 0 / 14 15
Total* 3 34 37*
* One case with unknown exposure to initial chili consumption
Exercise 4.2
Strategy 1: Divide the data into groups of similar size
1. Divide the list into three equal-sized groups of places:
50 states ÷ 3 = 16.67 states per group. Because states can’t be cut in thirds, two groups will
contain 17 states and one group will contain 16 states.
Illinois (#17) could go into either the first or second group, but its rate (80.0) is closer to #16
Maine’s rate (80.2) than Texas’ rate (79.3), so it makes sense to put Illinois in the first group.
Similarly, #34 Vermont could go into either the second or third group.
Arbitrarily putting Illinois into the first category and Vermont into the second results in the
following groups:
a. Kentucky through Illinois (States 1–17)
b. Texas through Vermont (States 18–34)
c. South Dakota through Utah (States 35–50)
2. Identify the rate for the first and last state in each group:
a. Kentucky through Illinois 80.0–116.1
b. Texas through Vermont 70.2–79.3
c. South Dakota through Utah 39.7–68.1
3. Adjust the limits of each interval so no gap exists between the end of one class interval and
beginning of the next. Deciding how to adjust the limits is somewhat arbitrary — you could
split the difference, or use a convenient round number.
a. Kentucky through Illinois 80.0–116.1
b. Texas through Vermont 70.0–79.9
c. South Dakota through Utah 39.7–69.9
2. Select the lower limit for each upper limit to define three full intervals. Specify the states that
fall into each interval. (Note: To place the states with the highest rates first, reverse the order
of the intervals):
a. North Carolina through Kentucky (8 states) 93.3–116.1
b. Arizona through Georgia (35 states) 61.1–93.2
c. Utah through Minnesota (7 states) 39.7–61.0
3. Final categories:
a. Indiana through Kentucky (11 states) 90.7–116.1
b. Nebraska through Oklahoma (29 states) 65.3–90.6
c. Utah through North Dakota (10 states) 39.7–65.2
4. Alternatively, since 90.6 is close to 90 and 65.2 is close to 65.0, the categories could be
reconfigured with no change in state assignments. For example, the final categories could
look like:
Indiana through Kentucky (11 states) 90.1–116.1
Nebraska through Oklahoma (29 states) 65.1–90.0
Utah through North Dakota (10 states) 39.7–65.0
Rate (per 100,000 Population) of Reported Measles Cases by Year of Report — United States,
1955–2002
2. Highest rate between 1985 and 2002 was 11.2 (per 100,000 in 1990), so maximum on y-axis
should be 12 per 100,000.
Rate (per 100,000 Population) of Reported Measles Cases by Year of Report — United States,
1985–2002
The first case occurs on August 25, rises to a peak two days later on August 27, then declines
symmetrically to 1 case on August 29. A late case occurs on August 31 and September 1.
Exercise 4.5
Number of Cases of Botulism by Date of Onset of Symptoms, Texas Church Supper Outbreak, 2001
The area under the line in this frequency polygon is the same as the area in the answer to
Exercise 4.4. The peak of the epidemic (8/27) is easier to identify.
Number of Reported Cases of Primary and Secondary Syphilis,by Age Group, Among Non-Hispanic
Black and White Men and Women — United States, 2002 (Grouped Bar Chart)
Percent of Reported Cases of Primary and Secondary Syphilis, by Age Group, Among Non-Hispanic
Black and White Men and Women — United States, 2002 (100% Component Bar Chart)
Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta, Georgia. U.S.
Department of Health and Human Services; 2003.
The stacked bar chart clearly displays the differences in total number of cases, as reflected by the
overall height of each column. The number of cases in the lowest category (age <20 years) is
Displaying Public Health Data
Page 4-78
also easy to compare across race-sex groups, because it rests on the x-axis. Other categories
might be a little harder to compare because they do not have a consistent baseline. If the size of
each category in a given column is different enough and the column is tall enough, the categories
within a column can be compared.
The grouped bar chart clearly displays the size of each category within a given group. You can
also discern different patterns across the groups. Comparing categories across groups takes work.
The 100% component bar chart is best for comparing the percent distribution of categories across
groups. You must keep in mind that the distribution represents percentages, so while the 30-39
year category in white females appears larger than the 30-39 year category in the other race-sex
groups, the actual numbers are much smaller.
Exercise 4.7
Age-adjusted Lung Cancer Death Rates per 100,000 Population, by State — United States, 2002
1. Tables and graphs are important tools for which tasks of an epidemiologist?
A. Data collection
B. Data summarization (descriptive epidemiology)
C. Data analysis
D. Data presentation
3. The following table is unacceptable because the percentages add up to 99.9% rather than
100.0%
Age group No. Percent
< 1 year 10 19.6
1–4 9 17.6
5–9 9 17.6
10–14 17 33.3
≥ 15 6 11.8
Total 53
A. True
B. False
4. In the following table, the total number of persons with the disease is:
Cases Controls Total
Exposed 22 12 34
Unexposed 3 13 16
Total 25 25 50
A. 3
B. 22
C. 25
D. 34
E. 50
9. The following are reasonable categories for a disease that mostly affects people over age 65
years:
Age Group
< 65 years
65–70
70–75
75–80
80–85
85
A. True
B. False
10. In general, before you create a graph to display data, you should put the data into a table.
A. True
B. False
11. Onan arithmetic-scale line graph, the x-axis and y-axis each should:
A.Begin at zero on each axis
B.Have labels for the tick marks and each axis
C.Use equal distances along the axis to represent equal quantities (although the quantities
measured on each axis may differ)
D. Use the same tick mark spacing on the two axes
12a. ____ A wide range of values can be plotted and seen clearly, regardless of
magnitude
12b. ____ A constant rate of change would be represented by a curved line
12c. ____ The y-axis tick labels could be 0.1, 1, 10, and 100
12d. ____ Can plot numbers or rates
14. Which of the following shapes of a population pyramid is most consistent with a young
population?
A. Tall, narrow rectangle
B. Short, wide rectangle
C. Triangle base down
D. Triangle base up
15. A frequency polygon differs from a line graph because a frequency polygon:
A. Displays a frequency distribution; a line graph plots data points
B. Must be closed (plotted line much touch x-axis) at both ends
C. Cannot be used to plot data over time
D. Can show percentages on the y-axis; a line graph cannot
20. A spot map must reflect numbers; an area map must reflect rates.
A. True
B. False
21. To display different rates on an area map using different colors, select different colors that
have the same intensity, so as not to bias the audience.
A. True
B. False
22. In an oral presentation, three-dimensional pie charts and three-dimensional columns in bar
charts are desirable because they add visual interest to a slide.
A. True
B. False
23. A 100% component bar chart shows the same data as a stacked bar chart. The key
difference is in the units on the x-axis.
A. True
B. False
24. When creating a bar chart, the decision to use vertical or horizontal bars is usually based
on:
A. The magnitude of the data being graphed and hence the scale of the axis
B. Whether the data being graphed represent numbers or percentages
C. Whether the creator is an epidemiologist (who almost always use vertical bars)
D. Which looks better, such as whether the label fits below the bar
3. B (False). Rounding that results in totals of 99.9% or 100.1% is common in tables that
show percentages. Nonetheless, the total percentage should be displayed as 100.0%,
and a footnote explaining that the difference is due to rounding should be included.
4. C. In the two-by-two table presented in Question 4, the total number of cases is shown
as the total of the left column (labeled “Cases”). That column total number is 25.
5. D. A table shell is the skeleton of a table, complete with titles and labels, but without the
data. It is created when designing the analysis phase of an investigation. Table shells
help guide what data to collect and how to analyze the data.
6. B. Creation of table shells should be part of the overall study plan or protocol. Creation
of table shells requires the investigator to decide how to analyze the data, which
dictates what questions should be asked on the questionnaire.
7. A, B, C, D, E. All of the methods listed are in Question 6 are appropriate and commonly
used by epidemiologists
9. B (False). The limits of the class intervals must not overlap. For example, would a 70-
year-old be counted in the 65–70 category or in the 70–75 category?
10. A (True). In general, before you create a graph, you should observe the data in a table.
By reviewing the data in the table, you can anticipate the range of values that must be
covered by the axes of a graph. You can also get a sense of the patterns in the data, so
you can anticipate what the graph should look like.
11. B, C. On an arithmetic-scale line graph, the axes and tick marks should be clearly
labeled. For both the x- and y-axis, a particular distance anywhere along the axis should
represent the same increase in quantity, although the x- and y-axis usually differ in what
is measured. The y-axis, measuring frequency, should begin at zero. But the x-axis,
which often measures time, need not start at zero.
12. a. B. One of the key advantages of a semilogarithmic-scale line graph is that it can display
Displaying Public Health Data
Page 4-85
a wide range of values clearly.
12b. A. A starting value of, say, 100,000 and a constant rate of change of, say, 10%, would
result in observations of 100,000, 110,000, 121,000, 133,100, 146, 410, 161,051, etc.
The resulting plotted line on an arithmetic-scale line graph would curve upwards. The
resulting plotted line on a semilogarithmic-scale line graph would be a straight line.
12c. B. Values of 0.1, 1,10, and 100 represent orders of magnitude typical of the y-axis of a
semilogarithmic-scale line graph.
12d. C. Both arithmetic-scale and semilogarithmic-scale line graphs can be used to plot
numbers or rates.
13. a. B. A bar chart is used to graph the frequency of events of a categorical variable such as
sex, or geographic region.
13b. C. The columns of either a histogram or a bar chart can be shaded to distinguish
subgroups. Note that a bar chart with shaded subgroups is called a stacked bar chart.
13c. A. A histogram is used to graph the frequency of events of a continuous variable such as
time.
13d. A. An epidemic curve is a particular type of histogram in which the number of cases (on
the y-axis) that occur during an outbreak or epidemic are graphed over time (on the x-
axis).
14. C. A typical population pyramid usually displays the youngest age group at the bottom
and the oldest age group at the top, with males on one side and females on the other
side. A young population would therefore have a wide bar at the bottom with gradually
narrowing bars above.
15. A, B. A frequency polygon differs from a line graph in that a frequency polygon
represents a frequency distribution, with the area under the curve proportionate to the
frequency. Because the total area must represent 100%, the ends of the frequency
polygon must be closed. Although a line graph is commonly used to display frequencies
over time, a frequency polygon can display the frequency distribution of a given period
of time as well. Similarly, the y-axis of both types of graph can measure percentages.
16. a. C. The y-axis of both cumulative frequency curves and survival curves typically display
percentages from 0% at the bottom to 100% at the top. The main difference is that a
cumulative frequency curve begins at 0% and increases, whereas a survival curve
begins at 100% and decreases.
16b. B. Because a survival curve begins at 100%, the plotted curve begins at the top of the
y-axis and at the beginning time interval (sometimes referred to as time-zero) of the x-
axis, i.e., in the upper left corner.
16c. A. Because a cumulative frequency curve begins at 0%., the plotted curve begins at the
base of the y-axis and at the beginning time interval (sometimes referred to as time-
zero) of the x-axis, i.e., in the lower left corner.
17. A, C. A scatter diagram graphs simultaneous data points of two continuous variables for
individuals or communities. Drug levels, infant mortality, and mean annual income are
all examples of continuous variables. Eye color, at least as presented in the question, is
a categorical variable.
18. D. A frequency distribution, one-variable table, pie chart, and simple bar chart are all
used to display the frequency of categories of a single variable. A scatter diagram
requires two variables.
19. B. A scatter diagram graphs simultaneous data points of two continuous variables for
individuals or communities; whereas a dot plot graphs data points of a continuous
variable according to categories of a second, categorical variable.
20. B (False). The spots on a spot map usually reflect one or more cases, i.e., numbers. The
shading on an area map may represent numbers, proportions, rates, or other measures.
21. B (False). Shading should be consistent with frequency. So rather than using different
colors of the same intensity, increasing shades of the same color or family of colors
should be used.
22. B (False). The primary purpose of any visual is to communicate information clearly. 3-D
columns, bars, and pies may have pizzazz, but they rarely help communicate
information, and sometimes they mislead.
23. A (False). The difference between a stacked bar chart and a 100% component bar chart
is that the bars of a 100% component bar chart are all pulled to the top of the y-axis
(100%). The units on the x-axis are the same.
24. D. Any bar chart can be oriented vertically or horizontally. The creator of the chart can
choose, and often does so on the basis of consistency with other graphs in a series,
opinion about which orientation looks better or fits better, and whether the labels fit
adequately below vertical bars or need to placed beside horizontal bars.
25. a. B, C. Both line graphs and histograms are commonly used to graph numbers of cases
over time. Line graphs are commonly used to graph secular trends over longer time
periods; histograms are often used to graph cases over a short period of observation,
such as during an epidemic.
25b. A. A grouped bar chart (or a stacked bar chart) is ideal for graphing frequency over two
categorical variables. A pie chart is used for a single variable.
25c. D. A pie chart (or a simple bar chart) is used for graphing the frequency of categories of
a single categorical variable such as breed of dog.
25d. C. Rates over time are traditionally plotted by using a line graph.
Websites
For more information on: Visit the following websites:
Age categorization used by CDC’s National Center for
http://www.cdc.gov/nchs/
Health Statistics
Age groupings used by the United States Census Bureau http://www.census.gov
CDC’s Morbidity and Mortality Weekly Report http://www.cdc.gov/mmwr/
Epi Info and EpiMap http://www.cdc.gov/epiinfo/
GIS at CDC http://www.cdc.gov/gis/
The R Project for Statistical Computing http://www.r-project.org
ColorBrewer: color advice for cartography http://www.colorbrewer.org