Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
132 views12 pages

Early Childhood Program Evaluations: A Decision-Maker's Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views12 pages

Early Childhood Program Evaluations: A Decision-Maker's Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Early Childhood Program Evaluations:

A Decision-Maker’s Guide
National Forum on Early Childhood Program Evaluation
A collaborative project involving Harvard University, Columbia University, Georgetown
University, Johns Hopkins University, Northwestern University, University of Nebraska,
and University of Wisconsin
Jack P. Shonkoff, M.D., Co-Chair Katherine Magnuson, Ph.D.
Julius B. Richmond FAMRI Professor of Child Health and Assistant Professor, School of Social Work, University
Development; Director, Center on the Developing Child, of Wisconsin-Madison
Harvard University
Deborah Phillips, Ph.D.
Greg J. Duncan, Ph.D., Co-Chair Professor of Psychology and Associated Faculty, Public
Edwina S. Tarry Professor of Human Development and Policy Institute; Co-Director, Research Center on Children
Social Policy; Faculty Fellow, Institute for Policy Research, in the U.S., Georgetown University
Northwestern University
Helen Raikes, Ph.D.
Jeanne Brooks-Gunn, Ph.D. Professor, Family and Consumer Sciences, University
Virginia and Leonard Marx Professor of Child Develop- of Nebraska-Lincoln
ment and Education; Co-director, National Center for
Children and Families, Columbia University Hirokazu Yoshikawa, Ph.D.
Professor of Education, Harvard Graduate School
Bernard Guyer, M.D., M.P.H. of Education
Zanvyl Kreiger Professor of Children’s Health,
Johns Hopkins Bloomberg School of Public Health

The National Forum on Early Childhood Program Evaluation


This collaborative initiative fosters the analysis, synthesis, translation, and dissemination of findings from four decades of early child-
hood program evaluation studies to learn more about what interventions work best and for whom. Based at the Center on the Developing
Child at Harvard University, the Forum involves researchers and data teams from Columbia University, Georgetown University, Harvard
University, Johns Hopkins University, Northwestern University, the University of Nebraska, and the University of Wisconsin. Its work
includes:
■■ building the nation’s most comprehensive meta-analytic database on early childhood program evaluation, from the prenatal period to

age 5 years;
■■ conducting rigorous analyses of the findings of well-designed studies of programs designed to improve outcomes for young children

and/or provide effective support for their families;


■■ producing a variety of publications, including briefs for policymakers and civic leaders, peer-reviewed scientific papers, and web-based

communications to assure both broad and targeted dissemination of high quality information.

For more information go to www.developingchild.harvard.edu/content/forum.html

partners
The FrameWorks Institute
The National Conference of State Legislatures
The National Governors Association Center for Best Practices

SPONSORS
The Buffett Early Childhood Fund
The McCormick Tribune Foundation
An Anonymous Donor

Please note: The content of this paper is the sole responsibility of the authors and does not necessarily represent the opinions of
the funders and partners.

Suggested citation: National Forum on Early Childhood Program Evaluation (2007). Early Childhood Program Evaluations: A Decision-
Maker’s Guide. http://www.developingchild.harvard.edu

© December 2007, National Forum on Early Childhood Program Evaluation, Center on the Developing Child at Harvard University
Early Childhood Program Evaluations
despite increasing demands for evidence-based early childhood services, the evalua-
tions of interventions such as Head Start or home-visiting programs frequently contribute more
heat than light to the policy-making process. This dilemma is illustrated by the intense debate that
often ensues among dueling experts who reach different conclusions from the same data about
whether a program is effective or whether its impacts are large enough to warrant a significant in-
vestment of public and/or private funds.
Because the interpretation of program evaluation research is so often highly politicized, it is es-
sential that policymakers and civic leaders have the independent knowledge needed to be able to
evaluate the quality and relevance of the evidence provided in reports. This guide helps prepare
decision-makers to be better consumers of evaluation information. It is organized around five key
questions that address both the substance and the practical utility of rigorous evaluation research.
The principles we discuss are relevant and applicable to the evaluation of programs for individuals
of any age, but in our examples and discussion we focus specifically on early childhood.

1. . Is the evaluation design strong enough to 3.. How much impact did the program have?
produce trustworthy evidence? The difference between the outcomes for
Evaluations that randomly assign children children and/or families who received ser-
to either receive program services or to a no- vices versus those of the comparison group
treat­ment comparison group provide the are often expressed as “effect sizes.” This sec-
most compelling evidence of a program’s tion will explain what these mean and how
likely effects. Other approaches can also to think about them.
yield strong evidence, provided they are
done well. 4.. Do the program’s benefits exceed its costs?
A key “bottom line” issue for any interven-
2.. What program services were actually re- tion is whether the benefits it generates ex-
ceived by participating children and families ceed the full costs of running the program.
and comparison groups? This document will explain how costs and
Program designers often envision a model benefits are determined and what they
set of services, but children or families who mean for a program that is being considered
are enrolled in “real” programs rarely have for implementation.
perfect attendance records and the qual-
ity of the services received rarely lives up to 5.. How similar are the programs, children, and
their designers’ hopes. Thus, knowing the families in the study to those in your constit-
reality of program delivery “on the ground” uency or community?
is vital for interpreting evaluation results. Program evaluations have been conducted
At the same time, sometimes a comparison in virtually every state and with children
group is able to access services in their com- of diverse ethnicities and socioeconomic
munity that are similar to those provided as backgrounds. Knowing how the character-
part of the intervention. If so, then differ- istics and experiences of comparison-group
ences between the services provided to the children compare to the characteristics and
program and contrast groups may be small- experiences of children in your own con-
er than would exist in a community where stituency or community is important for
those services are not available. determining the relevance of any evaluation
findings.

For guidelines and explanations that can help leaders use these key questions to determine the relevance
of program evaluations for policy decisions, please continue.

www.developingchild.harvard.edu Early Childhood Program Evaluations: A Decision-Maker’s Guide  1


national forum on early childhood program evaluation

1. Is the evaluation design strong enough to produce


trustworthy evidence?
evaluation studies take many forms, but this process creates two groups of children who
the most useful studies answer the question that would be similar if not for the intervention. Any
policymakers and parents most want to know post-program differences in achievement, be-
the answer to—does a program or intervention havior, or other outcomes of interest between
“work?” What, for example, would have hap- the two groups can thus be attributed to the pro-
pened to children in Head Start if they had not gram with a high degree of confidence.
been enrolled in the program? The presumption It is possible for an RCT to be flawed and re-
is that they would not have learned as much, but sult in a comparison group that is not compa-
how much less? How can we be certain there rable to program participants. Examples of how
really is a difference? How confidently can it be this may occur include problems implementing
ascribed to Head Start? the lottery process, too few children in the pro-
It would be easy to determine how well a pro­ gram and comparison groups, and too many
gram works if we could somehow compare its children or families dropping out of the study
ef­f ects on a group of children to what would after random assignment has occurred. For this
have happened if those same children had not reason, even an RCT study should demonstrate
check list #1 received the services. Since that’s clearly impos- that the comparison group used was similar to
sible, all evaluations have to find some kind of the treatment group before the study began.
Value experimental de- comparison group to assess program impacts.
signs (RCT) over non- And how close program and comparison-group Although random assignment of children or par-
experimental studies. children are to being the same before the services ents to program and comparison groups is the
Random assignment
are provided is a major determinant of how valid “gold standard” for program evaluation, some-
is the best way to
ensure that differences the study findings will be. This is not easy, since times this is not possible. In some circum-
in outcomes are the children who attend programs are often different stances, a randomized controlled trial is neither
result of program ef- from those who do not. They may be healthier or practical nor ethical. For example, if access to
fects rather than from sicker. Their parents may be better off or poorer. services is a legal entitlement, denying program
something different Parents of program children are often more mo- services to some children would be a violation
about the children or
tivated to seek out services than parents whose of the law. In such cases, alternative ways of
families who received
the services versus children do not attend. If comparison-group constructing “no treatment” groups are needed
those who did not. children differ in these or other ways from chil- and it is essential that the children and families
dren who are enrolled in a program before the in the comparison group be as similar to the
Not all evaluations that services are provided, then later differences are program group as possible.
use an RCT design are likely to reflect, in part, these initial differences
successful. Sometimes
and thus convey a false picture—either more or The strengths of other evaluation methods
random assignment
doesn’t work. For ex- less favorable—of the program’s impacts. are highly variable, with an approach called
ample, problems arise Regression Discontinuity Design (RDD) consid-
when too many pro- The strongest evaluation designs compare chil- ered by experts to be the strongest alternative
gram or control group dren and parents who receive program services to random assignment. In this case, assignment
children cannot be with a “virtually identical” comparison group of to either the control or the intervention group
located for a reliable
children and parents who do not receive those is defined by a cut-off point along some mea-
“post-treatment” mea-
surement of outcomes. services. The ideal method for assessing pro- surable continuum (such as age). For example,
gram effects is an experimental study referred some pre-K evaluations have taken advantage of
Useful evaluation to as a randomized controlled trial (RCT). In an strict birthday cut-off dates for program eligi-
lessons can be drawn RCT, children who are eligible to participate in a bility. Specifically, in some states, children who
from rigorous non- program are entered into a “lottery” where they are 4 years old as of September 1 are eligible
random-assignment
either win the chance to receive services or are as- for enrollment in pre-K, while those who turn
evaluation studies
such as those employ- signed to a comparison (control) group. Parents 4 after September 1 must wait a year to attend.
ing regression discon- or program administrators have no say in who In this case, the key comparison in an RDD is
tinuity designs (RDD). is selected in this lottery. When done ­correctly, between children with birthdays that just make

2  Early Childhood Program Evaluations: A Decision-Maker’s Guide www.developingchild.harvard.edu


or miss the cutoff. These children presumably important indicators of treatment-comparison
differ only in the fact that the older children at- group comparability are assessments of test
tend pre-K in the given year while the younger scores, behaviors and other outcomes of inter-
ones do not. Comparing kindergarten entry est for both groups of children taken just prior
achievement scores for children who have com- to the point of program entry. Demonstrating
pleted a year in pre-K with the scores measured that the program group and comparison group
at the same time for children who just missed children or parents were initially similar on
the birthday cutoff can be a strong assessment characteristics that the program was intend-
of program impacts. ing to affect is vital for trusting that differ-
Evaluations that select comparison groups ences emerging after the beginning of the pro-
in other ways should probably be assumed gram can be attributed to the program itself.
guilty of bias until proven otherwise. Countless Evaluations that do not compare and discuss
studies have shown how difficult it is to create pre-service characteristics of program and
comparison groups that are similar, absent an comparison-group children should be viewed
RCT design or close approximation. Especially with skepticism.

2. What program services were actually received by


participating children and families and comparison groups?
at the heart of an evaluation study is the Did participating families receive the services
comparison of two groups of children—those that were planned? The best intentions of pro-
who are enrolled in the program and a simi- gram developers are often not reflected in the
lar comparison group of children who are not. experiences of families with infants and young
Sometimes, however, it is surprising to find out children “on the ground.” This problem is most
that the actual experiences of these two groups commonly caused by one of two reasons—either
of children are very similar. This can occur ei- the program was not implemented as intended
ther because the children enrolled in the pro- or families did not participate as expected. In
gram do not receive the services as intended or fact, in some cases implementation or take-up
because many of the children in the comparison problems can be so severe that the most reason-
group seek out and receive similar services that able conclusion would be that the intervention
are already available in the community. Good was not really tested. On the other hand, an in-
questions to ask when reviewing program out- tervention that is difficult to implement or that check list #2
comes include: is not successful in engaging the children and
It is important to know
families it seeks to serve is not likely to be effec-
Were there problems with program delivery? No tive, despite its theoretical appeal. whether the program
was experienced as
one wants to implement a poor quality program Implementation refers to whether all of the
intended. What type
or a program that is so unappealing, inconve- components that were planned and/or described and volume of program
nient, or inaccessible to the target families that were actually put into place at all of the sites. services was a typical
they do not make use of it. Although years of Sometimes, especially in evaluations of services participating child
ex­per­ience have shown what general program that are implemented in multiple locations, the or family supposed
characteristics make services attractive to fami- program is well implemented in some places to receive? Was this
model implemented
lies, it is still essential to know the answers to but not others. Poor implementation can arise
each year and in
the following questions. Was the intervention for many reasons—a building is not completed every site? To what
in the program evaluation actually delivered? on time, a director quits unexpectedly, or the extent did children or
What were the qualifications of those who de- enrollment of families takes much longer than families fail to “take
livered the service? Was it implemented in the anticipated. Not surprisingly, studies that mea- up” services offered
way it was intended? What volume or “dosage” sure variation in implementation often show to them or show up
as often as planned?
of program services did participating children that the most fully implemented sites have the
and families actually receive? strongest impacts. But at the same time, it is not continues p.4

www.developingchild.harvard.edu Early Childhood Program Evaluations: A Decision-Maker’s Guide  3


national forum on early childhood program evaluation

check list # 2 , cont. realistic to expect that a program implemented Do implementation or take-up problems point
in your own community would be lucky enough to more promising practices? No intervention
Examine multiple to avoid all of the problems encountered by the is perfect. Changing behavior and shifting the
characteristics of the
poor-implementation sites. Thus, impacts that course of children’s development is challenging,
program that was
delivered (e.g., inten- are averaged across all locations are probably a and even promising programs can be strength-
sity, duration, skills better guide to what to expect than impacts at- ened. Increasingly, contemporary intervention
and credentials of the tained by only the best sites. programs are turning to “continuous improve-
service providers, and In some circumstances, a program could be ment” or “action research” frameworks, guided
participation rates). implemented exactly as intended, but the par- by a knowledge base that assists service provid-
If important services
ticipation rates could still be low. This may be ers and policymakers in improving program ef-
were not provided as
intended, the program a sign that the program is not attractive or ac- fectiveness. To this end, a supplementary set of
is not likely to be as cessible to potential participants. An example inquiries beyond the simple “did it work?” ques-
effective as hoped. would be a parent outreach service connected to tion can be very useful. This approach is par-
Remember that the an early education program that offers home ticularly important for evaluations of programs
evaluation assesses visits in the afternoon, when most working par- that must be provided, such as public schools.
the program as deliv-
ents cannot participate because of difficulty in Don’t hesitate to contact evaluators directly and
ered, not as designed.
adjusting their work schedules. Another exam- ask, “What do the data tell us about how the
Look carefully for
les- ple is a program whose services are not a good fit program can be improved?”
sons about program with the cultural norms of the particular popu-
improvement. Do the lation being targeted (e.g., home-based services What type of services did the comparison chil-
reports include a sec- for a cultural group that may have strong values dren receive? Another important question about
tion on implications
concerning privacy of the home). In such cir- program receipt is the extent to which children
for other programs?
Is there information cumstances, the failure to “take up” the home in the comparison group were able to access
about implementation vis­i­tation piece does not necessarily mean that similar services. Good evaluations detail exact-
or program design this program component could not be benefi- ly what services or programs were received by
that can be trans- cial to families. It may simply mean that the pro- children and families in the comparison group.
lated into practical gram delivery needs to be designed to fit with In some studies, children in the comparison
guidelines for further
the daily routines, values, and preferences of group could not have participated in a similar
program refinement?
the specific group being served. Issues related to program because it was not available to them. In
Find
out as much as language for families who do not speak English other studies, however, children and families in
you can about the are also very important in this context. the comparison group were able to seek out and
experiences of the Participation (sometimes called program access similar programs. Over time and across
evaluation’s control “take up”) refers to the services that children communities, there is considerable variation in
group. Often the “does
and families actually receive. The measurement the extent to which alternative programs and
a program work”
question should be of participation has two dimensions—how services are available to comparison group chil-
rephrased as “does many of the parents or children participated dren. Sometimes the contrast of the program
the program work in and, for those who were involved, how much and comparison group service experiences is
comparison to the service did they receive. The first dimension is quite small, and thus the program may appear
experience of those measured by take-up fractions (i.e., the number to be less effective.
who didn’t receive
of families who were engaged divided by the to- For example, a couple of decades ago, most
the same services?”
tal number of possible participants). Every eval- children who were not assigned to participate
uation should include information about how in an early education program simply stayed
many families never enrolled or dropped out of home and were cared for by their mothers.
the program. The second dimension includes The world has changed dramatically since
measures of program “dosage,” such as numbers that time, and most young children today—
of visits, hours of service received, and weeks, even infants—do not spend all of their time
months, or years of program participation. In at home. In fact, child care and family sup-
addition to including information about these port services are pervasive throughout the na-
two dimensions of participation, studies are tion, although there is striking variability in
even more useful if they include data from sys­ their quality, accessibility, and affordability.
tem­a tically conducted interviews or focus These changes have important implications
groups that describe what parents and children for drawing lessons from program evaluations
actually experienced. that were conducted in the past or for guidance

4  Early Childhood Program Evaluations: A Decision-Maker’s Guide www.developingchild.harvard.edu


in communities with different service con- other community-based services available to
figurations. Stated simply, program evalua- the control group, including child care, health
tions can only show how a specific program care, and other early intervention programs,
works in comparison to the existing landscape of among others.

3. How much impact did the program have?


the measurement of program impacts—the soon see that an even better measure of a pro-
differences between the treatment and compari- gram’s worth is the value of its effects relative
son groups on a range of outcomes of interest— to its cost.
is a central feature of the evaluation process. The best studies translate effect sizes into
Impacts can be expressed in a variety of ways, practical information. For example, effects on
such as percentage differences or differences in a standardized measure of achievement might
the proportion of program and control-group be translated into how much of a fraction of a
children who fall into a specific category, such as school year the program group exceeds the con-
assignment to special education classes. trol group. Effect sizes on grade retention can
check list # 3
be translated into percentages of children held
Effect sizes. Increasingly, program evaluators back a grade. Program impacts are
express impacts as “effect sizes,” which are a often expressed as
sta­tistical means for comparing outcomes that Statistical significance. Impacts are usually ac- “effect sizes,” which
may otherwise be difficult to compare. For ex- companied by a statement regarding their sta- provide a uniform way
ample, the scales of the SAT test and the IQ test tistical significance. This indicates how much to compare influences
on different kinds of
are completely different, so it’s difficult to com- confidence we have that the measured impact
outcomes and across
pare one program that raises SAT test scores by is real and not just something that appeared by evaluation studies.
20 points, and another that raises IQ scores by 5 chance. Impacts that are statistically significant
points. “Effect sizes” provide the solution. By at the 5 percent level—a common standard— Statistical significance

subtracting the outcomes of the control group mean that if we could somehow conduct 100 provides a valuable
from the outcomes of the treatment group, evaluation trials, we would expect to confirm judgment of how likely
an estimated impact is
we get an effect (e.g., raising SAT scores by 20 those impacts in 95 of them. That is a good bet
real and truly different
points). By dividing that effect by the study’s that the impacts are real. from zero.
“standard deviation” (which indicates how As the number of children or families in the
widely dispersed the results are from the mean), treatment and control groups increases, smaller Distrust evaluations

we get an effect size—a fraction that indicates effect sizes become more statistically significant, that report only mea-
how large the effects are in comparison to the simply because a larger sample means a lower sures with statistically
significant impacts.
scale of results. probability of a chance finding. Typically, evalu-
Every rigorous evalua-
The SAT test, for example, is scaled with a ations involving fewer than 100 children require tion is likely to gener-
standard deviation of 100, so a program that very large effect sizes to be judged statistically ate a mix of significant
boosted SAT scores by 10 points would have significant, while evaluations based on several and non-significant
an effect size of 0.1, or one-tenth of a standard thousand children are much more likely to cal- findings. The overall
deviation—which is considered very small. IQ culate small effects as statistically significant. pattern of effects is
most important.
tests are typically scaled with a standard devia- All other things being equal, bigger studies are
tion of 15, so a program that boosted IQ scores better. Even in large studies, however, small ef- It is important toun-
by 10 points would have an effect size of 0.66, fect sizes imply that the program is not likely to derstand whether the
or two-thirds of a standard deviation—which is change outcomes very much, so policymakers offer of services (ITT) or
much larger. Generally speaking, the larger the should consider carefully the cost required to the receipt of services
effect size, the better. Conventional guidelines achieve small benefits. (TOT) is being evalu-
ated and whether there
consider effect sizes of at least 0.8 as “large”; 0.3 are some groups of
to 0.8 as “moderate”; and less than 0.3 as “small.” Pattern of results. Good program evaluations participants that may
Nevertheless, since inexpensive programs can present or summarize results for all of the benefit from the pro-
hardly be expected to perform miracles, we will outcomes they measure, not just the ones that gram more than others.

www.developingchild.harvard.edu Early Childhood Program Evaluations: A Decision-Maker’s Guide  5


national forum on early childhood program evaluation

produced statistically significant impacts. It is to move, or only for those families that actually
unrealistic to expect that even highly effective moved in conjunction with the program?
programs will produce statistically significant Effects assessed across all children or families
impacts on all of the measured outcomes. And a offered program services, regardless of whether
quirk of the standard practice of applying tests they actually used them, are called “intent to
of statistical significance is that even if a pro- treat” (or ITT) impacts. They answer the vital
gram were completely ineffective, for every 100 policy question about the effects of the program
outcomes tested, you would still expect five of on all families that are offered services. Suppose,
them to emerge as statistically significant simply however, that services are highly effective for
by chance! “Cherry picking” small numbers of those who participate, but only a small fraction
statistically significant results can be very decep- of the targeted children or families actually use
tive. Generally speaking, it is the overall pattern them. The intent to treat impact estimates will
of results that matters the most. show that the overall impact on targeted fami-
lies is small and will point to implementation
Relevance. In reading evaluation reports, it or program take-up as a key problem in pro-
is always useful to ask how much measured gram design.
program outcomes are relevant to the desired
outcomes for your constituents or community. “Treatment on the treated” impacts. Under cer-
Of the outcomes measured, which do you care tain circumstances, it is also possible to isolate
most about? Was the program more effective program impacts on the subset of families that
for those outcomes than for others? If you care actually use the services and compare them
about boosting children’s school achievement, to families that did not use similar services.
are most of the achievement impacts in the These are sometimes called “treatment on the
evaluation statistically significant? If one pur- treated” (or TOT) impacts, and amount to scal-
pose of the intervention is to save money for ing up intent-to-treat estimates in proportion
school districts, did the program produce sta- to program take-up. Treatment-on-the-treated
tistically significant impacts on school-related estimates address important policy questions
measures that have financial effects, such as about program impacts on the children or
grade failure and enrollment in special edu- families who actually use the services. If pro-
cation? Use these kinds of questions to guide gram take up is not a concern and you want to
your assessment of the program’s relevance concentrate on how a program affects children
to your goals for the health and development or families who participate in it, then TOT es-
of children. timates are most relevant. Finally, when com-
paring across studies it is important to compare
“Intent to treat” impacts. In evaluations of in- like with like—ITT with ITT impacts or TOT
terventions in which substantial numbers of with TOT impacts.
children or families fail to take up any of the
offered services, there is an important techni- Subgroup effects. Some programs are more ef-
cal detail that must be addressed. Should pro- fective for some subsets of children or families
gram effects be considered for only those who over others. For example, an intensive pro-
receive the services or for all families who are gram designed to help low birth-weight babies
offered the program, regardless of whether they was found to be considerably more effective
participate? This question is illustrated in pro- for children whose birth weights were close to
grams designed to promote residential mobility normal than for children with very low birth
among public housing residents, in which be- weights, some of whom exhibited serious neu-
tween one-quarter and one-half of the families rological problems. It is common for evalu-
that are offered financial assistance and mobil- ations to report effects on various subgroups
ity counseling fail to take advantage of the of- of participants. These findings may be useful
fer. Thus, an evaluation of child and family for forecasting potential program impacts on
outcomes influenced by the mobility program the children, particularly if the measured im­
faces a choice—should outcome differences be- pacts are largest among subgroups with charac­
tween the program and comparison group be teristics similar to likely participants in your
calculated across all families offered the chance own community.

6  Early Childhood Program Evaluations: A Decision-Maker’s Guide www.developingchild.harvard.edu


4. Do the program’s benefits exceed its costs?
a clear and objective analysis of the costs “effect sizes” but rather by those that lead to the checklist #4
and benefits of specific programs has become an largest benefits relative to costs. According to
Cost-benefit account­
increasingly important consideration for many such calculations, less intensive programs cost
ing provides an
policymakers as they face decisions about in- less and therefore do not need to generate the
important indication
vestments in young children. Stated simply, do same volume of benefits as more intensive pro- of a program’s value to
the total benefits generated by the intervention grams in order to produce a social profit. On the public. Pro­grams
exceed its costs? Just as business executives want balance, it is impossible to generalize about the that generate the larg-
to know how an investment would affect their relative profitability of programs based on costs, est surplus of benefits
company’s bottom line, it is useful to ask not benefits, or effect sizes taken alone. relative to costs (or the
only whether government program expenditures Some program evaluations include a de- most positive rates of
return) generate great-
have their intended effects, but also whether in- tailed cost-benefit accounting in their analy-
er value for public and
vesting in early childhood programs generates ses. If done well (you may wish to consult with private investments.
financial “profits” for the children themselves, someone with expertise in cost-benefit assess-
for taxpayers, and for society as a whole. ment to judge the quality of a specific study’s Costly programs with

accounting), the obvious question is whether a large effects are not


Costs and benefits. Although the details can be necessarily better
program’s benefits exceeded its costs. Properly
financial investments
tricky, the basic idea behind a cost-benefit ac- done, costs and benefits are calculated on a than inexpensive
counting is fairly straightforward. On the cost “present value” basis to reflect the fact that tying programs with smaller
side, we want to know the value of all the time up public money in the short run to produce impacts. Conversely,
and money expenditures incurred by the pro- longer-run benefits entails a genuine “opportu- inexpensive programs
gram on behalf of the participants. Salaries typ- nity cost” to society. Benefits in excess of costs with little to no effects
ically dominate program costs, and services that indicate that a program is a worthy expenditure may be a waste of
money when a more
provide one-on-one or small group sessions of public funding from a financial perspective.
expensive program will
administered by a professional staff are more An equivalent calculation can be made to deter- generate larger effects.
expensive than those that are delivered within mine whether the program produced a favor- The key calculation,
large groups or by less well-trained personnel. able “rate of return” on the investment. from an economic per-
On the benefit side, we want to know the If a cost-benefit accounting is not provided, spective, is the size of
value of the program for taxpayers and for the it is vital to consider an order-of-magnitude es- the benefits generated
by the program relative
participants themselves. For example, if the pro- timate of the likely costs of recommended poli-
to program costs.
gram reduces grade repetition or assignment to cy changes. Are costs per child or family likely to
special education classes, the value of savings to amount to $100, $1000, or $10,000? If services The greatest eco-
taxpayers can easily total thousands of dollars are required for several years to produce their nomic returns from
per child. Similarly, substantial long-term im- effects, then per-year costs must be multiplied investments in early
pacts on educational achievement can be trans- accordingly. If a program provides one-on-one childhood typically
are long-term. Thus,
lated into both higher labor market earnings for or small-group services, it is likely to be more it’s important to look
the participants and increasing tax payments expensive to deliver. The level of professional at costs and benefits
and general economic productivity for society training that is required of the service provid- longitudinally, and to
as a whole. ers will also have a significant impact on cost. consider social and
By the same token, behavior-oriented inter- economic benefits as
ventions can profit from reductions in crimi- Other measures of value. Notwithstanding the a legacy for tomor-
row built from sound
nal behavior, as crime generates large costs for importance of cost-benefit analyses, it is im-
decision-making today.
adjudication and incarceration as well as for portant to remember that some investments
crime victims. Health-related effects can also may be justified because of their intrinsic val- Financial payback is

be important, as reductions in obesity and ue, independent of their financial return. For not the only measure
smoking rates can be linked to savings in health example, if the policy goal is reducing crime or of a program’s worth.
expenditures. high school drop-out rates, policymakers and Some public invest-
ments are made as a
the public may simply be interested in achiev-
matter of social re­-
Return on investment. Economists tell us that the ing the goal, regardless of what any cost-benefit sponsibility. In such
most profitable investments are not necessarily analysis might show. In other cases, investments cases, costs are viewed
generated by programs that produce the biggest in children who are highly vulnerable (such in terms of efficiency.

www.developingchild.harvard.edu Early Childhood Program Evaluations: A Decision-Maker’s Guide  7


national forum on early childhood program evaluation

as those who have been abused or seriously outcomes. In such cases, cost-effectiveness
neglected) may be justified solely because of studies that tell us how to deliver services in
their humanitarian significance, independent the most efficient manner will be more useful
of the long-term financial gains that may be re- than cost-benefit studies that assess their eco-
alized from better health and developmental nomic payback.

5. How similar are the programs, children and families in the


study to those in your constituency or community?

checklist # 5 let’s say you are a businessman in cleve- are they different? If the study was conducted
land, Ohio, wanting to know whether a success- years ago, the circumstances for children with
Look forspecific ful program that was evaluated in Hawaii in 1990 identical characteristics today may differ in im-
information about the would work as well for your community to- portant ways. Both the nature and the extent of
program. Can you form
day. Your first question should be: What kinds the diversity of your target group of families in
a clear picture of the
services offered and of children or families would receive services if Cleveland is important to consider.
how they differ from the program were implemented in Cleveland? Finally, carefully examine the description
what is currently avail- Would it be targeted toward children from low- of the program. Is it tailored to the particular
able in your commu- income families? Children of immigrant parents group in that study in a specific way (e.g., in its
nity? Does this match from particular groups? Children with disabil- language, materials, cultural values, staffing, or
the way in which your
ities? The more precisely you can characterize approach)? Is it difficult to imagine how the
own community would
provide these services?
the intended recipients of the services and how program might be “refitted” for your commu-
the services differ from what is currently avail- nity? Does it require specially trained and quali-
Considerthe constitu- able in your community, the easier it will be to fied staff who may be too scarce or costly in your
ency or population determine the relevance of the findings of a given community? Some programs might be easier to
for whom you might evaluation study. The more closely the use of adapt than others. For example, an intervention
provide a particular
services by children in the study’s comparison that provides a high-quality preschool experi-
program. How well
does the study group matches those of children in your own ence might be easier to reproduce than a child
sample approximate community, the more relevant the study findings literacy intervention that is based on folk tales
this population? will be. among a particular cultural group.
Next, compare the characteristics of the There is much to be learned from rigorous
If it does not overlap
Cleveland target population with those of the evaluations of early childhood interventions.
substantially with your
children or families in the Hawaiian program Applying those lessons to one’s own communi-
own constituency,
examine the study evaluation. On how many dimensions (e.g., ty, however, requires a careful eye toward under-
carefully to deter- poverty status, inner-city location, languages standing which aspects of the interventions are
mine which aspects used at home and in other settings, parent ed- most likely to be replicable given your current
of the program, if ucation levels, cultural beliefs, and parenting situation, target population, and goals.
any, might need to practices) are they similar? On what dimensions
be adapted to fit your
community’s needs.

8  Early Childhood Program Evaluations: A Decision-Maker’s Guide www.developingchild.harvard.edu


Putting it All Together
to assess the overall value of a program warranted because the particular service sys-
for your constituents or community, there are tems, organizations, or cultural groups in your
several overarching guidelines that can help de- community are different from those in the origi-
termine how to use the evidence of previous nal study. Perhaps the delivery system (e.g., child
evaluations. care providers, preschool, or health care sys-
tem) should be changed. Perhaps the setting in
Consider whether the evaluation is strong enough which services are delivered should be modi-
to provide trustworthy evidence. This is the first fied. In many circumstances, credible informa-
and probably most important question to be tion about costs as well as about how the pro-
answered. If the study fails to meet scientific gram was implemented will provide important
standards of strong evidence, it is difficult to as- guidance for determining whether a program is
sess its program or policy implications. feasible for your constituents. If local factors re-
quire changes in a program whose effectiveness
Consider how closely the program that was evalu- has been documented previously, it is essential
ated matches your goals. Of all the information that the modified program be evaluated to as-
provided by the evaluation, which elements are sure that it is achieving the desired results.
most useful and relevant for your constituency
and goals? For example, if reducing the achieve- Consider getting expert assistance to answer
ment gap is your constituency’s primary your continuing questions. Mastering the com-
objective, you might heavily weight Question 3 plexities and nuances of evaluation research is
(How much impact did the program have?), beyond the limits (or interest) of most policy-
with particular emphasis on whether the out- makers, civic leaders, and the general public.
comes differ for different income or racial/eth- Thus, developing trustworthy consultants in the
nic groups in your community. areas of programs that most interest you may
be well worth your time. Researchers are often
Consider how successful programs can be modi- happy to respond to questions about their own
fied to best meet the needs of your particular study findings. Getting to know local experts in
community or constituency. Although fidelity to your community can also be quite helpful for
the specific methods used in an effective pro- digesting the massive amount of information
gram is critical to achieve similar outcomes in provided in the full body of program evaluation
another setting, it is important to note that some studies. Trusting relationships with such con-
programs may require adjustments that make sultants could be particularly useful in “trans-
sense for different circumstances. This could be lating” study findings for local application.

www.developingchild.harvard.edu Early Childhood Program Evaluations: A Decision-Maker’s Guide  9


also from the FORUM

A Science-Based Framework for Early Childhood Policy: Using Evidence to Improve


Outcomes in Learning, Behavior, and Health for Vulnerable Children (2007)
http://www.developingchild.harvard.edu/content/publications.html

50 Church Street, 4th Floor, Cambridge, MA 02138


617.496.0578
www.developingchild.harvard.edu

You might also like