Guidelines On Performance Assessment of Public Weather Services
Guidelines On Performance Assessment of Public Weather Services
GUIDELINES
ON PERFORMANCE ASSESSMENT
OF PUBLIC WEATHER SERVICES
GUIDELINES
ON PERFORMANCE ASSESSMENT
OF PUBLIC WEATHER SERVICES
Geneva, Switzerland
2000
Text by Neil Gordon and Joseph Shaykewich
NOTE
Page
CHAPTER 1 — INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
CHAPTER 3 — AREAS THAT ACTIONS ARE REQUIRED TO MEET THE KEY PURPOSES . . . . . . . . . . . . . . . . . 3
3.1 Production definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Delivery mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.3 Production system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.4 Research and development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.5 Staff training and development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.6 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
CHAPTER 4 — VERIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1 Introduction ................................................................................. 5
4.1.1 Overall purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1.2 Accuracy, skill and reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1.3 Objective and subjective verifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Guiding Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.1 Principles related to why to verify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.2 Principles related to how to verify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2.3 Principles related to do what to do with results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3.1 Deterministic forecasts of values of continuous weather variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3.2 Deterministic forecasts for two categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.3 Probabilistic forecast for two categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.4 Deterministic forecast for multiple categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.5 Probabilistic forecast for multiple categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.6 Forecasts of timing of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.7 Forecasts of the location of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
CHPATER 6 — CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 How to get started on a performance assessment programme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.2 User-based assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.3 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.3.4 Ongoing assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.4 Final words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
APPENDICES
1 EXAMPLE OF MONTHLY RAINFALL VERIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2 ENVIRONMENT CANADA’S ATMOSPHERIC PRODUCTS AND SERVICES 1997 NATIONAL
PILOT SURVEY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 HONG KONG OBSERVATORY SURVEY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chapter 1
INTRODUCTION
Weather services delivered to the public are one of the most • Favour simplicity where possible, rather than overly
visible returns for the taxpayers’ investment in meteorologi- complicated schemes
cal services.It is difficult to quantify this particular Return On • Be very careful about the statistical significance of results
Investment in financial terms. It is both possible, and essen- based on small samples or short records
tial, to carry out ongoing performance assessment of public • Provide regular reports to stakeholders
weather services to ensure that they are efficiently and effec- • Make relevant, interpreted, information available to the
tively meeting the public’s needs. public.
There are many technical papers and publications on There are two major methods available for gathering
the narrow topic of forecast verification, including numerous information in an assessment programme – Verification, and
accuracy and skill scores. There is less material available by User-Based Assessment. Neither can stand alone. It is impor-
way of guidance on why and how verifications should be tant to do both, in a balanced fashion. The amount of effort
carried out, and on the more general topic of assessing spent on each will depend on the country, the nature of the
whether user needs are being met, rather than just whether services, and the user community. The worst thing would be
forecasts are accurate. Forecast accuracy is irrelevant if the not to do either of them!
forecast products are not available to the public at a time and The overall purpose of Verification of forecasts is to
in a form that is useful. ensure that products such as warnings and forecasts are accu-
The purpose of this Technical Document is to provide rate, skilful and reliable from a technical point of view.As far
broader guidance on performance assessment of public as possible, forecast verifications are produced in an objective
weather services, with something of an emphasis on fore- fashion, free of human interpretation. The results tend to be
casts and warnings. An assessment programme can be seen numbers and statistics, which can be manipulated and inter-
in the context of a quality system, where it is important to preted using statistical theory. There is no guarantee that
ensure that the information gathered and processed is verification results will match people’s perceptions of how
focussed on user requirements, to be used in making deci- good the forecasts are. Nonetheless, information gathered
sions and taking actions to improve performance, rather than through verification can be very useful for improving the
just being gathered for the sake of it. In essence, the object of accuracy of forecasts.
the exercise is to ensure a sustainable and cost-effective On the other hand, User-Based Assessment should give
system delivering quality public weather services. a true reflection of the user perception of products and
The guidelines are based on an outline developed at a services provided by the NMS, as well as qualitative infor-
meeting of the WMO Public Weather Services Expert mation on desired products and services. It is almost
Team on Product Development And Verification And completely subjective information, subject to human percep-
Service Evaluation, in Hong Kong, China in November tion and interpretation.
1999. Two of the terms of reference of this team were to In carrying out an assessment programme combining
“Prepare recommendations on standardised verification both methods, there are some commonalities.Although veri-
techniques for public warnings and forecasts”, and to fications may typically provide objective numbers, they
“Prepare guidelines on technical and user-oriented verifi- should still be based around numbers which are relevant to
cation mechanisms including measures of overall users. It should be possible to match user-based assessment
satisfaction with the service”. This guidance addresses results (e.g., of perceptions of forecast accuracy) with corre-
both terms of reference in the context of overall perfor- sponding technical verification results, and seek common
mance measurement, but does not provide hard and fast trends and patterns. In both methods, there is no single score
rules on standardised verification techniques. or method that can give “The Answer”. Various scores and
Some of the basic guidelines about performance assess- assessment methods have their particular uses.
ment include: In Chapter 2 of this Technical Document, the three key
• Know why you are carrying it out (what new informa- purposes for performance assessment will be discussed.
tion do you want to discover?) Services can only improve if actions are taken – the six
• Do not just collect and process information and then main areas are dealt with in Chapter 3.
file it away Chapter 4 considers in detail how to carry out
• Be prepared to take actions based on the results Verifications, and Chapter 5 is on User-Based Assessment.
• Gather information designed to help a National The final chapter reviews why and how to carry out an assess-
Meteorological Service (NMS) make strategic decisions ment programme, and provides some guidance on an
about all aspects of public weather services “entry-level” programme.
Chapter 2
KEY PURPOSES
There are three key purposes for carrying out an assessment 2.2 ENSURING THE EFFECTIVENESS OF THE
programme for public weather services. They are: PUBLIC WEATHER SERVICES SYSTEM
(1) Ensuring that public weather services are responding to
user requirements It is one thing to provide public weather service that meet user
(2) Ensuring the effectiveness and efficiency of the overall needs – and quite another to do it effectively and efficiently,
public weather services system from an overall point of view. This purpose is not about what
(3) Ensuring the overall credibility and proven value of is delivered and how. Rather, it is about the organization,
public weather services. management and planning of the overall public weather
Another way of looking at this, is that the three purposes services system that delivers the services.
are about: A performance assessment programme can gather infor-
(1) Making sure that you are providing the right products mation that can be used to make strategic decisions about the
(2) Making sure that you have a good system for making future delivery of services, about staffing, about training,
them research and development, and about the best mix of infor-
(3) Building stakeholder support for the NMS. mation from computer models and from human value adding.
2.1 ENSURING THAT USER REQUIREMENTS 2.3 ENSURE CREDIBILITY OF AND SUPPORT FOR
ARE MET THE PUBLIC WEATHER SERVICES SYSTEM
There are a wide variety of end-users of public weather Even if public weather services have been designed and deliv-
services. These include individual members of the general ered to meet user needs, there may be a perception problem
public, emergency management agencies, and paying over how good they are. This can be serious, and life threat-
customers for specialised services. ening. For example, if the public has a poor perception of the
In order to make sure that user requirements are being accuracy of topical cyclone forecasts, they may disregard
met, first of all it is necessary to know what they are – and warnings, resulting in major loss of life and property. In the
what better way than asking the users? This topic is covered best of all possible worlds weather forecasts will never be
extensively in Chapter 5. perfect, so this can be a vicious circle, with public credibility
The definition of the needs in the particular case of declining every time there is the inevitable poor forecast.
weather forecasts can encompass what weather elements are An assessment programme can assist in two ways – by
most important, when and how forecasts should be deliv- finding out what the public perceptions are, and by gathering
ered, in what format, and with what accuracy. and publicising facts about performance to improve the
Knowing what the needs are, it is necessary to find out public perception and credibility of the services. Those occa-
whether they are being met, and take actions to improve sions that forecasts do go wrong can be used as opportunities
where possible. to publicise the role of the NMS and to draw attention yet
This may be as simple as checking and then changing the again to the fact (gained from the assessment programme)
issue time of forecasts to make sure that they are available that, say, forecasts are usually 85% accurate.
when they are most useful. It can also involve keeping score Similar information on performance can be incredibly
on how many forecasts are issued late, and changing manage- useful for gaining the support of other stakeholders, including
ment practices and schedules to ensure that forecasts are government ministers responsible for the NMS. The NMS will
issued on time. be in a much stronger position for sustaining and building fund-
Verifying the accuracy of forecasts is, of course, another ing if it can demonstrate such things as its level of performance,
aspect. But it needs to be done in ways that are relevant to the public satisfaction with its services,and the impacts of previous
user, who has probably never heard of a “Brier Score”. investment and research and development programmes.
Chapter 3
AREAS THAT ACTIONS ARE REQUIRED
TO MEET THE KEY PURPOSES
There is no point in gathering information through an assess- they have for accessing and receiving products, and then to
ment programme without using it. Using it means taking improve the delivery system to better meet those needs.
actions. This chapter is about the six main areas where actions
need to be taken – mostly through changing what is being
done now (unless it is perfect, which is unlikely!) or making 3.3 PRODUCTION SYSTEM
plans for future changes.
(1) Improve the products to be provided There are many aspects of the production system that may
(2) Improve how the products are delivered need to be changed as a result of information gathered in an
(3) Improve the production system assessment programme. Just a few of the numerous possible
(4) Carry out needed research and development changes are:
(5) Train and develop staff • Re-configuration of data networks to gather new data
(6) Communicate information. required for products and services, possibly at the
All of these action areas should involve feedback loops. expense of data which may no longer be required
Information is gathered on user requirements and on perfor- • Obtaining new sources of local or global NWP model
mance levels.Actions are taken to improve matters. The final information on which to base new products and services
step of “closing the loop” is also important – checking what • Revising shift schedules to accommodate new, or modi-
the actual impact was of those actions, in order to learn how fied (or discontinued!) products
to do better next time. • Revising shift schedules to accommodate new delivery
Of course, there is also an assumption here that the NMS times
has the resources and staff to take such actions. There may • Installing systems (e.g., fax machines, or a web server)
well be a gap between the measured performance and expec- for new means of delivery of products
tations, but no ability to improve it because of lack of • Using more automated products (e.g., for maximum
resources, or because there are no people available to carry temperature forecasts) if verifications prove that these
out training. satisfy accuracy requirements and they can be cost-effec-
The fundamental management issue here, which is tively produced
beyond the scope of this Technical Document, is how best to • Devoting more forecaster shift time to producing critical
allocate limited resources (and they are always limited) to warnings which have proven not to be accurate enough
best effect, to improve the situation, based on the information • Centralising forecasting, or de-centralising forecasting.
gathered from the assessment programme.
3.5 STAFF TRAINING AND DEVELOPMENT to systematic errors that may need to be corrected.Researchers
need information on performance of the system, and on likely
Once again, there are many actions that may take place as a new products so they can plan and prioritise R&D. All staff
result of information from a performance assessment need information on the technical accuracy of the services
programme. A few examples are: delivered, and on public expectations, perceptions and needs.
• Recruiting and training more forecasters based on All staff should have a sense of ownership, accountability,
projected shift requirements from planned introduction and pride in what is being delivered to the users.
of new products and services Secondly, relevant and appropriate information must
• Training staff to make use of new numerical guidance proactively be made available to stakeholders in general. This
information may be a formal requirement of some kind of “Service
• Training staff on the scientific basis of a new product, Charter”or agreement with the government or community at
and operational procedures for producing it large on services to be provided. Communicating such infor-
• Re-training staff on the fundamental meteorology of a mation is particularly important in relation to the third key
weather phenomenon which verifications show is being purpose of “Ensuring the overall credibility and proven value
poorly forecast of public weather services”. If there is a vacuum of informa-
• Training staff on how to write forecasts in a new and tion, particularly on demonstrated performance, public
more “user-friendly” style (which surveys have shown perceptions will be based on anecdotal evidence. People tend
the public would find more useful) to remember the last time a forecast went wrong – not how
• Training staff on how to reduce a known bias of over- well forecasts do overall.
forecasting precipitation occurrence. The most important stakeholder is the source of funds
for the NMS – the government on behalf of the taxpayers.
Information from a performance assessment programme
3.6 COMMUNICATION must be communicated to demonstrate performance, to
demonstrate the beneficial impacts of previous investment in
One of the most important actions that must be taken is to the NMS, and in support of future plans for the development
communicate the results and information gathered from a of the NMS.
performance assessment programme. Information is only of Finally, and often in reaction to events, information must
value if people know about it. It must be in a form that is be communicated to the public via the media when oppor-
understandable to the audience, and tailored to their likely tunities present themselves.A good example is when there has
use of it. been a severe weather event. Whether or not this was well
Firstly, information gathered must be made available to forecast, the public interest in severe weather is heightened,
the staff of the NMS. Managers need information to guide and this is a good opportunity to include information on
them in decision making. Forecasters need information by overall performance of the public weather services as part of
way of feedback on their performance, particularly in relation the “weather story”, to build public support and credibility.
Chapter 4
VERIFICATION
4.1 INTRODUCTION average of the forecast values minus the average of the
observed values.
4.1.1 Overall Purpose Reliability measures are also used to assess how closely
forecasts expressed in probability terms match reality. For
The overall purpose of verification is to ensure that products example,suppose you were verifying a set of many forecasts of
such as warnings and forecasts are accurate, skilful and reli- the probability of occurrence of rain. Suppose also that there
able from a technical point of view. This is distinct from were 100 occasions when the forecast probability was around
whether the products are actually meeting user needs, which 30 % (e.g., between 25% and 35%), but it only rained on 10 of
is covered separately in the next chapter. Nonetheless, the those occasions.The implication is that the forecasts of a prob-
technical assessments should be in terms of measures that are ability of 30% chance of rain were not very reliable, since it
relevant to user needs. really only rained 10% of the time on average.
There are many dimensions and techniques of forecast
verification. This Technical Document is not intended to
cover all possibilities, but to provide sufficient general infor- 4.1.3 Objective and Subjective Verifications
mation on the possibilities. An extensive survey of
verification techniques was carried out by Stanski et al. There are two main ways of verifying forecasts – objective
(1989) and published by WMO. The work by the late Allan and subjective.
Murphy (1997) is also worth reviewing for his philosophy on Objective verification is based on purely objective
verification, and for the list of references. comparisons of forecast and observed weather elements.
There is no element of human interpretation of either the
forecast or observation.2 The results can be replicated.
4.1.2 Accuracy, Skill and Reliability Objective methods should be based on sound statistical
theory – essentially the comparison of observed and forecast
In concept, forecast verification is simple. You just need to numbers.
compare the forecast weather with the observed weather Subjective methods involve some human assessment of
actually occurred. The accuracy1 of a forecast is some forecasts and/or observations. They are a result of human
measure of how close to the actual weather the forecast was. perception, and the results are not always consistent and
The skill of a forecast is taken against some benchmark fore- cannot necessarily be replicated. However, these perceptions
cast, usually by comparing the accuracy of the issued forecast are a true reflection of the value of the forecast to the indi-
with the accuracy of the benchmark. A benchmark forecast vidual or user who does the assessment.
can be something simple such as climatology, chance, or
persistence, or it could be a partly or completely automated
product. The skill measure should give some meaningful 4.2 GUIDING PRINCIPLES
information about what value has been added in the forecast
process, compared to the usually much simpler or cheaper Unless careful planning is done, there is a risk that a verifica-
benchmark forecast. tion programme will never get off the ground, or that it will
There is a great deal of theory and practice about be engulfed in an avalanche of numbers that are never used.
measures of forecast accuracy, involving sometimes-complex The purpose of this section is to suggest guiding principles
formulas for comparing frequency distributions of forecast on the Why, How and What Next of Verification.
versus observed weather. Usually, an accuracy measure gives
information on the spread of differences between forecast
and observed. A typical example is a Root-Mean-Square- 4.2.1 Principles Related to Why to Verify
Error (RMSE) – the square root of the mean of the squared
difference between forecast and observed. There are four main reasons for verifying forecasts:
Reliability is another aspect of forecast accuracy (it (1) We must know the quality of our products
does not involve comparison with a control forecast). (2) We need information to aid decision-making
Literally, this means the extent to which the forecast can be (3) We need information to feed back into process
“trusted” on average. One measure of reliability would be improvement
the average bias in a maximum temperature forecast – the (4) We need appropriate information for reporting to users
and other stakeholders.
1 2
There is sometimes confusion between accuracy and precision. The The observed weather element may of course have been made by a
precision of a forecast is how much detail is put into it in time, space, human observer as part of a routine weather observing programme
weather elements, and numbers of significant digits in numerical – this can be distinguished from subjective assessment of observa-
values. For example, a forecast maximum temperature of tions such as estimating precipitation that occurred in a spot in a
23.42963°C would be very precise,but that does not make it accurate! data-sparse region.
6 Chapter 4 — Verification
Knowing the Quality of the Products systematic differences between the forecast and observa-
tion may turn out to be a problem in the observation, not
It is essential for any service provider to know the quality of the forecast!
the products and services they provide.
However, historically, because of some of the perceived
difficulties of verifying weather forecasts, and the work Appropriate Information for Reporting
involved, NMSs have probably not done this as much as they
should have. Much of the information from a verification programme can
That time of not knowing is now over. In an era of be used internally.
shrinking budgets for NMSs, increased demands for account- However, there is also an increasing, and perfectly under-
ability for expenses and investments, and competition, NMSs standable, demand from users and other stakeholders for
must know how well they are doing. Assumptions about how information on the quality of products and services.
well they are doing are no longer good enough. Providing such information can be very useful for an NMS.
Furthermore, the information gathered on forecast qual- Users sometimes have an incorrect perception of the
ity can be extraordinarily valuable, provided that it is carefully quality of forecasts, which can be corrected by sharing appro-
gathered and analysed, and appropriately used. Information priate verification information with them. Of course, the
on forecast quality is like having a medical check-up – it can verification information may also validate their perceptions
help you work out what parts of your forecast production or poor forecast – there is no point in hiding this, but there
system are working and what are not. It can provide facts will be value in discussing the issue with users and working
rather than assumptions for discussions with customers, and together on how the forecasting can be improved to better
the media, and the government. meet their needs.
Government ministers like to have proof of “value for
money” expended on NMSs, and particularly like to see
Information to Aid Decision Making evidence of improvements over time, as a payback for money
that they have committed to the NMS budget.
NMSs are continually making decisions that involve alloca- Verification information can be useful in dealings with
tion of resources, staffing, training, research and the media, particularly when countering any negative public-
development, and large expenditures. It is vital to make sure ity on a particular forecast that may have gone wrong.
that sufficient information is available on the quality of the A key word here, of course, is “appropriate”. Information
final output products to support these decisions. for reporting purposes needs to be carefully selected, simple,
Measuring and quantifying forecast performance allows and relevant for reporting purposes. Complicated and hard
you to compare forecasters,and forecast systems,and perform to understand scores will not enhance the image of the NMS.
“what if” scenarios on how different systems might perform.
Many examples of where actions can be taken, and deci-
sions made, can be found in Chapter 3 of this Technical 4.2.2 Principles Related to How to Verify
Document.
When considering how to conduct verification, it is vital to
refer back to the principles in the previous Chapters on why
Feed Back into Process Improvement verification is being done. If the “how” of verification is not
answering questions or providing information needed under
Verification results should provide information that is of “why”, then it may not be needed.
value in ongoing process improvement in forecast opera- There are four key principles on how to verify forecasts:
tions. Just one simple example would be recognition that (1) There Should Be an Overall Plan
rain is forecast far too often. Verification information can (2) Measures Must be Relevant to the Users (internal and
be analysed further to see what the weather conditions are external)
like when the forecast was wrong, and to look for trends. (3) Keep It Simple
You might find that there are particular weather conditions (4) Use Consistent Elements, Locations, Methods and
when the over-forecasting is raking place. Forecasters can Scores.
use this information to improve their own performance,
and it can be used to drive research and development
projects. Overall Plan
Since verification involves a comparison between
forecasts and observations, it can be used to pick up Before embarking on a verification programme, it is very
quality problems in either. If the forecasts are being passed worthwhile to take some time to develop an overall plan. This
through some automatic decoder program that is should cover many of the issues addressed in this Technical
having problems, this may indicate that some forecasters Document, focussing on particular issues for your country.
are using the wrong syntax for writing their forecasts. Those staff who will be producing and using the results need
(This can be fixed by training the forecasters to do better, to be involved in the development of the plan, to ensure
or by putting new systems in place that do not allow fore- ownership, a commitment to success, and broad under-
casts to be written the wrong way to start with.) Large, standing of the purposes.
Guidelines on performance assessment of public weather services 7
Customers
Media
Adjust
NWP
Re-configure
Observing
The plan needs to take into account why the measures It is important that the verification scheme truly reflects
are being produced. the perception of the public or users on the accuracy of the
The diagram above illustrates the overall information forecast. Surveys may show that the public believe that a
flows in an operational verification system. Meteorological temperature forecast is “correct” if it is within 3°C, and veri-
information and product flows are shown with straight lines. fications can then be made in those terms. However, a higher
Observations are used in NWP and by forecasters, who then level of accuracy may be needed by an electricity supplier
produce products, which go to users. The observations, NWP wanting to forecasting power demand, for whom the temper-
information, and products also feed into the verification ature forecast may need to be within 1°C.
system. This system employs user expectations, to produce It is also important that the system captures how good
reports for the paying customers, and for the media and performance is for the times when the forecast most needs to
government and other stakeholders. Information from the be right – the relevant and critical times. For example, in a
verification system may also be analysed and used to make place that rarely gets frosts, a constant forecast of “no frost”
decisions about re-configuring of the observing system, what may be right 99% of the time, but is clearly of no value, since
research and development may be done to improve NWP it always says the same thing.
and to feed into training to improve forecaster performance, Depending on the climate of the region and the time of
and also to adjust the definition and format of products. year, some weather elements are more important than others.
For example, there may be little value in verifying maximum
temperatures in a region where they always vary little from
User-relevant Measures day to day.
You may also take into account the needs of internal
Information should be relevant to the needs of the users. users of the information for decision making. For example,
There is little point in producing scores that are complex and some particular skill measures may be useful when making
satisfying theoretically, and have all the right attributes of decision on the value of numerical guidance and the value
proper3 scores, if no one can understand or use them. For added by forecasters.
example, scores which give “percent correct”accuracy are not
always favoured by the theoreticians, but they are easily
understood by the public. Keep it Simple
3
A “proper”score is one that encourages a forecaster to forecast what Embarking on a verification programme can be a daunting
he or she truly believes, rather than biasing (or hedging) the fore- prospect for an NMS with little experience in this area. It is
cast one way or another in the hope of producing a better score. better to use simple, easy to understand measures, than to
8 Chapter 4 — Verification
implement very complex schemes. It is also better to concen- Analysis of the results should be ongoing to ensure that bene-
trate on verifying for just a few key places, rather than trying fits are coming from these improvements.
to verify many weather elements for many places. Keeping the If the results are acceptable, this information can be used
number of verifications down avoids being buried in to validate previous decisions, and to assess the likely future
numbers that are never analysed, and keeps costs down. impact of new decisions to be taken.
Use Them for Process Improvement: On a shorter
timescale, verification results should provide information
Consistency that is of value in ongoing process improvement in forecast
operations. Just one simple example would be recognition
One of the most useful aspects of verification information is that maximum temperature forecasts for a city tend to have
that the results can be tracked with time to see how perfor- a warm bias (say, of 1.5°C) – forecasters can use this infor-
mance is (one hopes) improving. But performance cannot be mation to improve their own performance.
tracked if the weather elements, locations, methods and
scores keep changing. And tracking performance in a statis-
tically significant way may take a long time series of Not Misusing the Results
information. For example, at least four years of data will be
needed to analyse seasonal differences in performance in a Verification results based on small sample sizes, or of rare
meaningful fashion. events, may have very large margins of error. It is a good idea,
It is,therefore,important to ensure consistency in an ongo- where possible, to compute error bars on verification results.
ing verification programme.You should be consistent by using Care is needed in interpreting information that has poor
the same weather elements, from the same locations, for the statistical validity. This includes being too proud of very good
same times, and using the same accuracy and skill measures. results (which may not last!) or too concerned about very
Then results can be tracked in time,rather than trying to work poor results (which hopefully also won’t last!).
out whether change in skill were due to using a new score, or You should be careful to double check the results if they
to verifying for a different location after a couple of years. are either very good or very bad – there may have been a
However, it can also be very useful to save the raw data problem with the data or with the computer programs.
used for the verifications so that if some new verification Care must also be used in trying to compare results between
method is introduced it may be possible to go back and regions with different climates, which may not be meaningful,
recompute the verifications results from the beginning. even if the verification methods were exactly the same.
The ultimate benefit of a verification programme will only There are many scientific papers and documents on various
come about when the results are used, in support of the four measures of performance that can be used for verification.
reasons we are actually doing verifications (see Section 4.2.1). See, for example, Stanski et al. (1989), and Murphy (1997).
The key principles are quite simple, really: The intent of this Technical Document is not to duplicate
(1) Use the results such material, but to give a sample of the simplest and most
(2) Do not misuse the results. common measures that can be used, together with some brief
examples of their application.
There are two fundamentally different types of variables,
Using the Results which can be forecast in two fundamentally different ways.
The two types of variables are continuous (numbers),
Communicate them: In general, the results should be and categorical (e.g., rain or no-rain, or a category of precip-
communicated appropriately and promptly, rather than just itation amount).
being filed away. This will facilitate general use of the infor- They can be forecast either deterministically, by giving
mation. Communication includes reporting to users and just a single value or category, or probabilistically, through
stakeholders, and providing direct, immediate feedback to giving some information on the probability distribution of
forecasters. Forecasters are usually very interested in the the continuous number, or the individual probabilities for
results of verification. They want to know if they have system- the possible categories which could occur.
atic errors in their forecast so that they can correct them. A forecast expressed in probability terms is more useful
Analyse them: The results should be analysed to assist in for making decisions than a forecast that explicitly states what
decision making. will occur. The user can choose to take one or other decision
If the verification results are not acceptable, then deci- based on the probabilities, and their particular knowledge of
sions may need to be made on the end-to-end forecasting the costs of taking decisions, and rewards or losses depend-
process in order to improve matters. This could include ing on the weather that actually occurs. In the final analysis,
improved data gathering, better numerical guidance, research the value of a probabilistic forecast comes down literally to
and development targetted at the weather elements being the value that such a sophisticated user can extract by making
verified, training programmes, improved procedures, decisions based on the forecast rather than some benchmark
processes and tools in the forecast room, staffing levels. assumptions.
Guidelines on performance assessment of public weather services 9
In this section typical performance measures for the 19.8°C, so there is a slight bias of -0.4°C – on average the
most common types of forecast will be discussed. forecast maxima were 0.4°C colder than the actual maxima.
Other more complicated reliability measures can be
computed. For example, the bias could be considered sepa-
4.3.1 Deterministic Forecasts of Values of Continuous rately for forecasts of colder than 20°C, compared to forecasts
Weather Variables of 20°C or more, to see whether the bias depends on the fore-
cast.It might be that forecasters tend to underdo the maximum
The most common forecasts are of actual values of weather temperatures more when they expect it to be colder.
elements, as real numbers (as distinct from probabilistic fore- Before carrying out calculations of more detailed bias
casts of numbers). Examples of such weather elements are: information such as this, it is important to think about
• Temperature what reason there might be for variations.
• Wind speed Another way of looking for bias is also simply to plot the
• Wind-chill forecast versus observed values. This is easily done these days
• Humidity using standard spreadsheet software. The following graph
• Precipitation amount. shows the forecast versus observed maximums, together with
The following simple example of a set of twenty maxi- the line representing a “perfect forecast”.While this is far too
mum temperature forecasts will be used in this section to small a sample to draw any definitive conclusions from, there
illustrate the scores. Both the forecasts and the observations is a hint here that both the coldest forecasts and the warmest
have been rounded to the nearest whole degree Celsius, since forecasts tend to be too cold.
this is how the public usually see or hear them. In real life, 35
twenty forecasts would be far too small a sample to draw any
conclusions from. This example is purely intended to explain
the various scores and how they can be interpreted. 30
The table includes other columns of information, which
will be explained later.
Observed Max
25
MAX TEMP (°C)
Forecast Observed F-O ABS(F-O) (F-O)^2 Within
(F) (O) ±2°C
17 17 0 0 0 1 20
24 20 4 4 16 0
28 29 -1 1 1 1
22 25 -3 3 9 0 15
14 16 -2 2 4 1
16 17 -1 1 1 1
17 17 0 0 0 1
10
16 16 0 0 0 1
10 15 20 25 30 35
15 14 1 1 1 1
19 18 1 1 1 1 Forecast Max
22 19 3 3 9 0
21 17 4 4 16 0 Accuracy
16 18 -2 2 4 1
20 18 2 2 4 1
Various accuracy measures are shown in the previous table
27 31 -4 4 16 0
21 20 1 1 1 1 for this example.
15 14 1 1 1 1 In terms of accuracy, the Mean Absolute Error or MAE is:
22 28 -6 6 36 0 N
20 23 -3 3 9 0 MAE = N1 ∑ (| fi − oi |)
15 18 -3 3 9 0
i =1
Average: 19.4 19.8 -0.4 2.1 6.9 60%
Bias MAE MSE % correct
For the example, this is 2.1°C. The MAE is a very simple
Reliability measure of accuracy to use and to explain to users – “it’s the
average difference between the forecast and observed temper-
Suppose there are N forecasts fi and corresponding observa- ature”. However, people are often more concerned about the
tions oi for i = 1...N large errors, and this measure does not take these into
A gross measure of reliability is the mean bias. It is simply account as much as ….
the average of the forecast value minus the average observed
value, or The Mean-Square Error or MSE is
N N
bias = N1 ∑ ( fi − oi ) MSE = N1 ∑ ( fi − oi )
2
i =1 i =1
For our simple example, N is 20, the average forecast For the example, this is 6.9. The MSE is affected more
maximum is 19.4°C and the average actual maximum is by large errors, and has the nice statistical property of
10 Chapter 4 — Verification
being a “proper” score – forecasters will do best if they For example, if MAEf is the Mean Absolute Error of the
always forecast the average of what they truly believe the forecast, and MAEb is the Mean Absolute Error of the bench-
maximum temperature is likely to be. It is also the quantity mark, then one skill measure is
that is minimised with classical linear regression equations MAEb − MAE f MAE f
that try and relate some predictor variables to the variable = 1−
being predicted (the predictand). MAEb MAEb
However, the MSE has unfriendly units of °C squared. So, which will be zero when the forecast has the same accuracy
instead, what is usually used is its square root …. as the benchmark, and 1 when the forecast is perfect. This is
typical for a skill measure. Note, however, that since forecasts
The Root-Mean-Square Error or RMSE is are (almost) never perfect, the practical upper limit of a skill
measure may be much smaller than 1.
N
∑ ( fi − oi ) For this particular example, the skill measure based on
2
RMSE = MSE = 1
N MAE is:
i =1 MAE f 2.1
1− = 1− = 0.45
This has units of °C, and for the example the RMSE is 2.6°C. MAEb 3.9
Another measure that is commonly used for weather
elements such as temperature, is the “percent correct”of fore- If MAEf is the Mean Squared Error of the forecast, and
casts that are within some allowable range, e.g., within ±2°C MAEb is the Mean Squared Error of the benchmark, another
or ±3°C. This is shown in the above table by putting a 1 when skill measure is effectively the reduction of variance, or
the forecast was within ±2°C of the observed maximum, and MSE f
0 otherwise, then averaging the values. The result for this 1−
example is that 60% of the forecasts are within ±2°C. MSEb
It is obviously crucial for this measure to know what the For the example of 20 maximum temperature forecasts
public or specialised user considers to be a “correct”forecast. this is:
6.9
But this measure of accuracy is a very simple and useful one 1− = 0.70
22.9
to explain to the public once this has been decided.
If the accuracy measure being used is the percent correct
(of forecasts that are within an acceptable range of the obser-
vations), then another skill measure is:
Skill
PC f − PCb
Skill is measured against some benchmark forecast – typically 100% − PCb
climatology, persistence, or perhaps a numerical guidance
forecast. And for the example this is
Continuing with the same example, suppose that the
benchmark forecast is taken to be the climatological maxi- 0.60 − 0.35
= 0.38
mum temperature for this period of 20°C. Then the 1 − 0.35
corresponding table for this benchmark forecast is: where the value of 0.38 means that the percent correct for the
actual forecasts has gone 0.38 of the distance between the
MAX TEMP (°C) benchmark value of 35% and a perfect score of 100%.
Benchmark
Forecast Observed F-O ABS(F-O) (F-O)^2 Within
(F) (O) ±2°C 4.3.2 Deterministic Forecast for Two Categories
20 17 3 3 9 0
20 20 0 0 0 1
20 29 -9 9 81 0 Typical two category forecasts are:
20 25 -5 5 25 0 • Yes or No for occurrence of precipitation
20 16 4 4 16 0 • Yes or No for occurrence of severe weather
20 17 3 3 9 0 • Rain versus snow.
20 17 3 3 9 0
20 16 4 4 16 0
As can be seen, such a forecast can usually be expressed
20 14 6 6 36 0 as yes or no for an event. These are sometimes called forecasts
20 18 2 2 4 1 of a dichotomous variable. The combination of forecasts and
20 19 1 1 1 1 observations for a set of forecasts being verified can be put
20 17 3 3 9 0 into a contingency table such as:
20 18 2 2 4 1
20 18 2 2 4 1 Observed
20 31 -11 11 121 0 Yes No
20 20 0 0 0 1
20 14 6 6 36 0
Yes A B
20 28 -8 8 64 0
Forecast
20 23 -3 3 9 0
20 18 2 2 4 1
Average: 20.0 19.8 0.3 3.9 22.9 35% No C D
Bias MAE MSE % correct
Guidelines on performance assessment of public weather services 11
To illustrate the use of this, suppose there has been a set If the event is a significant or a rare one, there may not
of forecasts of whether or not there will be measurable actually be any count of the times when the event was neither
precipitation “today”. These could be spot forecasts that there forecast nor occurred. This could be the case, for example,
would be greater than 0.1mm rain between 6 am and 6 pm with warnings of heavy rainfall. The numerous times when a
during the daytime, together with observations from that warnings was not issued, and when heavy rain didn’t occur,
spot on whether or not precipitation was measured. may not actually be counted.
The following table shows the results for this example, In this case, it is common to use three measures of accu-
for a month’s worth of data (31 days). Again, there are not racy – POD, FAR and CSI.
many numbers here, but the purpose is to show the use of The Probability of Detection (POD) is the proportion of
various scores. The numbers come from an example, which times the event occurred that it was correctly forecast:
is shown in Appendix 1, together with all the reliability, A
accuracy and skill measures, which will now be described, POD =
A+C
and a few more.
Observed For the example of rainfall forecasts this is:
Yes No
19
POD = = 0.90
19 + 2
Yes 19 4
Forecast
A D 19 6
1
N ∑ pi
HKS = + −1 = + −1 Bias = i =1
A+C B+ D 19 + 2 4 + 6 N
= 0.90 + 0.60 − 1 = 0.50
1
N ∑ oi
i =1
This skill score also does not make explicit use of a For the example, the average forecast is 0.51 and the aver-
benchmark forecast. However, a naïve forecast of always age observation is 0.40 so there is a Bias of 1.28 –
forecasting “yes”, or always forecasting “no”, will give a over-forecasting of the probabilities.
score of zero. Similarly, a naïve forecast with a random Other reliability measures can be generated by dividing
choice each time between yes and no will also have an the forecast probabilities up into various ranges and seeing for
expected score of zero. Positive values of the HKS therefore each range what the actual frequency of occurrence was. For
represent skill over these naïve forecasts, with a score of 1 example, Reliability diagrams can be produced showing this
for perfect forecasting. information (see, for example,Wilks, 1995).
BSb − BS f BS f
Brier Skill Score = = 1− Observed
BSb BSb Sum of
Dry (1) Showers (2) Wet (3)
Forecasts
Hence, this is like a reduction in variance (RV). It is in the
form of a percentage improvement over the climatological Dry (1) n11 n21 n31 n*1
benchmark, with a skill score of 1.0 for perfect forecasting.
In this case, BSf is 0.09 and BSb for a climatological prob-
ability of 0.40 is 0.24, so the Brier Skill Score is 0.63.
Showers (2) n12 n22 n32 n*2
Forecast
4.3.4 Deterministic Forecast for Multiple Categories
Wet (3) n13 n23 n33 n*3
There are two different kinds of forecasts for multiple cate-
gories. One is where they are not ranked – there is no
particular order to the categories.An example of this is where
there may be a number of categories of precipitation type – Sum of
n1* n2* n3* n**
for example, rain, snow, mixed precipitation, freezing rain. Observations
More commonly, the categories are ranked, and do have some
kind of order. Examples include wind speeds in terms of
Beaufort force rather than values, visibility categories, and An example of some numbers in this 3 by 3 contingency
precipitation in categories of increasing amounts. table, which will be used for the scores, is:
To illustrate how this might work,suppose forecasts of rain
are being made for a tropical location, where typically the Observed
weather might be in three categories –“dry”,“showers”,or“wet” Sum of
Dry (1) Showers (2) Wet (3)
(widespread showers or rain) for a 12 hour period from 6 am to Forecasts
6 pm.
An observation of “dry” might correspond to no rain Dry (1) 63 13 8 84
obser ved at the station; of “showers” if no rain was
recorded at the station, but rain was reported in the area or
thunder was heard; and of “wet” if rain was recorded at the
station. Showers (2) 15 45 30 90
In the case of two categories (See Section 4.3.2) all the
Forecast
Accuracy
Skill
Usually the categories are ranked, and the most common
The simplest skill measures will involve a comparison accuracy measure is the Ranked Probability Score (RPS) orig-
between the accuracy of the actual forecasts and of some inally devised by Epstein (1969).
benchmark. Typical benchmark forecasts would be always to Using the above notation, the RPS for the individual fore-
forecast the climatologically most likely category, or to cast i is:
randomly forecast a category based on the climatological m j
2
1
j
∑ ∑ pik − ∑ oik
frequency of the categories. Again, the climatology may be
1−
based on the sample itself. If PCf is the percent correct for the m − 1 j =1 k =1 k =1
forecasts, and PCb the percent correct for the benchmark,
then the skill is just: This has a range of 0 (bad) to 1 (a perfect forecast).
PC f − PCb
1 − PCb
Skill
For the example, suppose the benchmark forecast is to
always forecast “showers”, since this is the most common A skill score against a benchmark can be computed in the
observed category. The result would be that the forecast was usual way, by comparing the Ranked Probability Score RPSf
correct 80 times (the number of times the “showers”category for the forecast with RPSb for the benchmark.
was observed) and the percent correct for the benchmark is RPS f − RPSb
80/241 or 33%.
For this case, the skill would then be: 1 − RPSb
0.61 − 0.33
= 0.42
1 − 0.33 4.3.6 Forecasts of Timing of Events
The skill scores proposed by Gordon (1982) provide a
more direct and theoretically satisfying means of assessing Discussion so far has concentrated on weather variables and
skill, including confidence intervals on the score, though they categories. However, there is also increasing interest in the
may be less readily explained to the user community. timing of events, rather than just whether or not they will
occur.
It can be useful to collect and assess statistics on the fore-
4.3.5 Probabilistic Forecast for Multiple Categories cast and observed time to:
• Start of precipitation
For completeness,a description will now be given of probabilis- • End of precipitation
tic forecasts for more than multiple categories.However,the details • Time of change of precipitation type (e.g., rain to snow,
and technicalities involved are beyond the primary purpose of or snow to rain)
this Technical Document, so the reader should refer to Stanski • Start of a severe event
and Burrows (1989) for more details on these kinds of scores. • End of a severe event.
Suppose there are N forecasts, each of which has This verification information can be treated using the
probabilities for m categories, pij for i = 1 and j = 1...m. The assessment measures for continuous weather variables (see
corresponding observations will be called oij, although in Section 4.3.1). For example, if precipitation is forecast to start
each case this will take on a value of 1 for the observed at 1500 and actually starts at 1100 then this can be treated as
category and 0 for the other categories. an error in the forecast of +4 hours, or 4 hours late. The
Guidelines on performance assessment of public weather services 15
Figure 1
Guidelines on performance assessment of public weather services 17
5.1.1.3 Dimensions: Requirements, Expectations, valuation whereby respondents, through an iterative process,
Understanding, Importance, Satisfaction, Utility, are asked to indicate their willingness to pay a suggested
etc. amount to have access to the services versus having that
service withdrawn.
As previously stated, the perceptions assessed include those The valuation techniques can be broadly described as
about requirements, accessibility, availability, accuracy, time- being either production based or demand based. The former
liness, utility, comprehension, language, sufficiency, and involves the modelling of the production process while in the
packaging amongst others. Classically, in the design and latter case direct inferences are made as to the value of non-
development of products and services one starts with the market services such as public weather services. Economic
assessment of user requirements. That is, what are the needs value assessments range from measuring the value of certain
of the spectrum of end users (the public, stakeholder commu- forecast elements to that of estimating the value attributable to
nities, funding agencies) from the spread of possible services the provision of the full set of national services.Reported bene-
that the NMS has the capacity to provide? fit-cost ratios have been reported as being 2:1 to well over 10:1.
This effort benefits from gaining an understanding of Economic value assessments also can be used to determine
user processes – that is, an understanding of how the infor- the justifiability of making investments in research and devel-
mation is used in the activity to which it is applied. opment into improvements in forecast accuracy.Additionally
Frequently, expectations do not line up with actual needs, in such assessments can be used to compare the effectiveness of
which case two alternative paths could be pursued. If the end- various meteorological service delivery systems. With some
user cannot be convinced of the faulty expectations then the measured success some of these techniques have been used to
survival strategy may be to target on those expectations. In impute the ‘social’or non-economic benefits derived from the
other words, try to provide the information they want, even use of public weather services. Further discussion on the
if you know that it may not be the best information for their methodologies for the undertaking of such assessments
purposes. Fortunately, most often with the increasing sophis- appears further on in this document.
tication of the end-user the result is a realignment of
expectations with needs.
A complementary activity with pure user-based assess- 5.2 GUIDING PRINCIPLES FOR METHODOLOGY
ment is thus that of increasing awareness and user education.
The theory is that this process, with iteration, yields improved There is a need for user-based information for decision-
knowledge of the spread of requirements (stated and implied) making purposes by individuals, whether office managers or
that then can be translated into the design of a set of meteo- the most senior executives of the NMS. The information is
rological products and services that cover the degree of used for day-to-day programme delivery management as well
requirements that is within the capacity of the NMS to as for longer-term vision and strategic planning. While the
provide. This results in the development of new products and information gathered may serve the objectives at a variety of
services, and/or the adaptation or refinement of existing levels within an organization, often the methodology chosen
products and services, or even in dropping services that are must be specific to the objectives at the organizational level.
no long needed, to better match the evolving requirements.
Increasingly NMSs are under some pressure to reduce costs The circumstances of planning have changed and the
of operation and to justify any major upgrades of their complexities of managing have increased in recent years.
services and equipment based on a detailed benefit-cost The NMS’s organisational and decision-making structures
analysis. NMSs are interested in demonstrating the economic have changed. The governmental and departmental plan-
and social benefits of services they provides to the public, ning systems have created new processes and products.
industries and organizations. As illustrated in the perfor- The focus on value for money and making the NMS’s
mance logic model (Figure 1) benefits to society as a whole funds go further has sharpened as budgets have signifi-
are commonly perceived as an ultimate outcome of the provi- cantly decreased with governmental budget reduction
sion of meteorological services. exercises. Performance management has taken on greater
For the purpose of this discussion public weather prominence with emphasis on frameworks, concrete
services are generally considered non-rival (if someone uses measures, and continuous improvement. At the same time,
the service it doesn’t stop others from using it) and non- in several domains, the programme has expanded from an
exclusive non-market goods and services. While some initial narrow focus on weather, migrating through the
services are rivalrous, such as limited capacity telephone- larger domain of atmospheric change, to a broader focus
based services, these kinds of services are generally being on environmental prediction.
de-emphasised or commercialised by NMSs. User-based assessment needs to be tied closely to perfor-
A variety of research methods in applied economics mance management, planning and reporting requirements
(environmental, resource, production, information, risk and and the links to both operations and long-term strategic
uncertainty, welfare, etc) can be applied. One of the tech- results should be clear. A more proactive role can be played
niques being increasingly employed is that of contingent by:
Guidelines on performance assessment of public weather services 19
• Obtaining direction from senior management on definitions which indicate who or what is to be observed and
planned user-based assessments to ensure that these what is to be measured. Once operational definitions are
assessments will be useful and that there are resources developed the researcher can specify the data requirements
and the management will to take follow-up action once and decide upon the level of error that is acceptable. Finally,
the findings and recommendations are presented the statement of objectives should indicate the purpose, the
• Working with the organisational units, within the NMS, areas covered, the kinds of results expected, the users as well
responsible for implementation of program changes, to as the uses of the data, and the level of accuracy that is
advise them of the findings and facilitate follow-up desired.
action Essentially, a survey involves the collection of informa-
• Tracking follow-up actions and reporting back to senior tion about characteristics of interest from some units of a
management. Senior management support in terms of population using well-defined concepts, methods and proce-
commitment and resources to implement change is a dures, and the compilation of such information into a useful
key success factor. summary form. The collection of such information from all
Follow-up is essential – if not done, user-based assess- units of a population would constitute a census. Surveys are
ment research will have little value. carried out for either one of two purposes: descriptive or
The kinds of decisions that benefit from the user-based analytical. The main purpose of descriptive survey is to esti-
assessment process range from those pertaining to the initi- mate certain characteristics or attributes of a population –
ation, continuance or modification of major programmes to e.g., awareness of a particular meteorological service.
specific product lines or programme elements and delivery Analytical surveys are generally concerned with testing
mechanisms. Within this spectrum is included the range of statistical hypotheses or exploring relationships among the
decision activities as diverse as that regarding investments in characteristics of a population. An example of an analytical
research and development, technology for automation, survey would be one that determines whether there is a
human resource training, and public education or awareness change in protective behaviours following the introduction of
campaigns. Ultimately, within the resource context of the an Ultra Violet Index programme.
NMS, policies on detailed levels of service can be established. There can be many reasons for undertaking a user-based
assessment by an NMS. These can include the checking of
perceptions against expectations, tracking of trends, seeking
5.2.2 Multi-year User-based Assessment Strategy feedback to improve existing services, determining require-
ment for new or different services, assessing perceived
A plan must follow a development process, which accom- effectiveness of overall programme, and the identifying areas
modates the funding and reporting context that the NMSs where actions can be taken.An NMS’s “Service Charter”may
find themselves in, and have the following characteristics: dictate the requirement to routinely publish information
• A limited, manageable number of priorities that reflect regarding such dimensions as user satisfaction. Such infor-
the needs of the programme mation can be derived from the administration of a
• A schedule of user-based assessments that supports re-useable tracking survey. Subject area surveys can be used
these priorities while being flexible enough to meet to elicit information feedback for the improvement of certain
needs arising from unpredictable or opportunistic specific surveys or for determining the requirement for new
circumstances or different services. Large comprehensive surveys can be
• An approach to communicating findings that promotes used for gauging the overall effectiveness of the NMS's total
sharing information and the development or improve- programme.
ment of products and services.
In developing the schedule of user-based assessment,
the areas of research are selected on the basis of programme 5.2.4 Credibility and Transparency
need, risk management, and commitments in business plans,
management frameworks, and performance frameworks. In There are many considerations that come to mind when
this multi-year strategy for user-based assessment it is impor- wrestling with the concepts of credibility and transparency
tant to cover both product lines and delivery mechanisms for user-based assessment. Comments made above, regarding
and to use consistent questions over years for proper trend- an overall performance management framework and strat-
line analysis. Performance measurement, after all, is about egy, certainly apply. User-based assessment is an effective and
the change over time as opposed to the measurement of the essential component of an organization’s “balanced scorecard”
state of affairs at a give point in time. giving a comprehensive picture of its health and effectiveness.
The adoption of a rigorous approach or methodology based
on established theory and practices is essential. The adher-
5.2.3 Need to Know Why it Should be Done ence to a multi-year user-based assessment strategy facilitates
a co-ordinated and structured approach. Even such simple
The first task in planning a user-based assessment is to spec- precepts as undertaking fewer but well planned surveys, focus
ify the objectives as thoroughly as possible. The key to this groups, etc., rather than a large mixture of disconnected ones
exercise is to come up with clearly defined concepts and terms. and following a consistent approach to track trends help.
Once the basic objectives have been broken down and defined, Finally, publicizing the changes triggered by the assessment
the researcher can then proceed to develop operational enhances credibility and transparency.
20 Chapter 5 — User-based assessment
5.2.4.1 Statistical Significance Issues In probability sampling all within the population have a
non-zero chance of being selected and inferences are made
With regard to public opinion or stakeholder surveys a focus about the entire population that the sample represents.
on sampling and on sampling errors and accuracy can head Probability sampling methods range from simple random
off credibility and transparency problems. selection of members from the population to complex
sampling strategies (random, systematic, stratification, and
5.2.4.1.1 Sampling multi-stage).
Stratification is the most common amongst these meth-
For a specific subject area relative to the programme of an ods. Stratification is the process of dividing the population
NMS to be examined one of the first decisions to be made is into relatively homogeneous groups called strata, and then
whether to undertake a sample survey or a census survey. A selecting independent samples. Stratification variables may
census survey refers to the collection of information about be geographic or non-geographic (e.g. gender, income, indus-
characteristics of interest from all units of a population. An try, occupation). Reasons for stratification include the desire
NMS may want to determine certain characteristics about to acquire estimates at the stratum level. Each stratum
the redistribution of meteorological products by their requires an adequate sub-sample size to ensure that valid
domestic media. An NMS may want to determine what ice results can be derived that are particular to that stratum.
forecasting services high Arctic marine operators would like In random sampling each unit in the population has an
to receive. In such cases, for most countries, a census survey equal chance of being included in the sample.
may be more appropriate given the very small population In systematic sampling, units from a list are selected
under study. using a selection interval (K), so that every Kth element on
A sample survey refers to the collection of information the list, following a random start between 1 and K, is included
about characteristics of interest from only a part of the popu- in the sample. If the population size is M and the desired
lation. A survey of the general population’s awareness and sample size is “n”, then K=M/n. Thus systematic sampling
understanding of a wind-chill programme would be a valid requires a sampling interval and a random start.
use of a sample survey. Multi-stage sampling refers to a process of selecting a
A sample survey is cheaper to do than a census survey. sample in two or more successive stages. For the two stage
Sampling also reduces data collection and processing time. sampling case a number of first stage units are selected -
Sample surveys allow more selective recruiting of e.g. selected communities, from which second stage units
interviewers, more extensive training programmes and closer are selected from within the larger units have already been
supervision.As well, the smaller scale of operations allows for selected, e.g. households within the selected communities.
more extensive follow-up of non-respondents and for a The probability of being selected is P = P1 × P2 for the two
higher level of quality control for such data processing stage sampling case where P1 and P2 represent the proba-
activities as coding and data capture. For these reasons bility of being included in the sample at the respective
sample surveys can be more accurate than their census stages.
counterparts. In some cases where highly trained personnel
or specialized equipment is required it would be difficult 5.2.4.1.2 Sample Errors and Accuracy
and expensive to consider a census. Sample surveys
inconvenience fewer people meaning reduced respondent Both sampling and non-sampling errors affect the accuracy
burden. of survey results.
The target population is the set to which the survey results Sources of non-sampling errors include non-response,
are to apply; about which information is sought; to which the difficulties in establishing precise operational definitions,
sample is intended to represent; and about which one wishes incorrect information provided by respondents, incorrect
to make inferences based on data collected from a sample. A interpretation of questions by respondents, and mistakes in
population has definable characteristics, a specific geographic processing operations.
location and a time period under consideration. The survey Sample error is the difference between the results of a
population is the population that is actually covered which sample estimate and a census, i.e., the population. The size of
may be different from the target population for practical the sampling error generally decreases as the sampling size
reasons. For example, in a national survey remote locations increases. The extent of the sampling error also depends on
are frequently excluded because they are too difficult or costly the variability of the characteristics of interest in the popula-
to enumerate. When a survey population is chosen which tion, the sample design, and the estimate method. Thus the
differs from the target population, it is necessary to be aware size of the sample, population variability, sample design, and
that gap exists between the two populations and recognise the estimation method, are all included as sources of
that conclusions based on the survey results apply only to the sampling error. The sampling error can be reduced through
survey population. the development of an efficient sampling plan, where proper
Samples can be probabilistic and non-probabilistic. use is made of available information in developing the sample
In non-probability sampling elements are chosen in an design and estimation procedure.
arbitrary manner such that there is no way of determining the Accuracy refers to the difference between a survey result
probability of any one element being included in the sample for a characteristic and the true value of that characteristic of
thus there is no assurance that every element has a chance of the population. Precision (or reliability) is a measure of the
being included. closeness of sample estimates to the results of a census (or
Guidelines on performance assessment of public weather services 21
100% enumeration of the population) that is undertaken modules or sections, each dealing with a different topic and
under identical conditions. The greater the variability in the each conducted for a separate organization. Organizations
population, the larger the sample size needed to obtain the are charged on the basis of their level of participation in the
specified level of reliability. Complex sampling procedures omnibus survey. These surveys are routine surveys according
usually increase the margin of error as they increase the to a specific schedule. Frequently, a private survey company
possible sources of errors. Increasing the sample size will will attempt to accommodate the NMS client by pairing the
lower the margin of error due to non-response but the bias meteorology portion with another on a similar
resulting from non-response is not reduced. With respect to (environmental?) theme. “Piggy-backing” questions on an
the characteristics of interest in the survey, the non-respon- omnibus has the effect of sharing the cost of the undertaking.
dents may be different from the respondents. They are useful for a research effort where there are only a few
Confidence interval statements are commonly provided questions to be asked. These surveys typically use
with published survey results.A 95% confidence interval can classification data such as age, gender, region, community
be described as follows: if sampling is repeated indefinitely size, family income, occupation, education, and mother
with each sample leading to a new confidence interval, then tongue.
in 95% of the samples the interval will cover the true popu- On occasion, a survey company will try to set up a larger
lation value. The size of the confidence interval is usually one-time survey effort by inviting certain like-minded
indicated by the margin of error. For example, if the estimate organizations. These can also provide opportunities for cost
is 50% and the margin of error is 3% either way (below and reduction. More frequently, with the recent increase in
above 50%) then the confidence interval is that the “true” interdisciplinary activities with others in the media,
percentage falls somewhere between 47% and 53%, 19 in 20 environment and health fields, collaborative efforts result in
times (i.e., 95% of the time). cross-disciplinary user-based assessments and these
The confidence interval does not take into account a assessments often take the form of surveys. One such survey
margin of additional error that may result from practical was the Canadian National Survey on Sun Exposure and
difficulties that are involved in conducting a survey. The Protective Behaviour into which a section on the Ultra-Violet
sources of this type of error include, for example, the way the (UV) Index was added. Major surveys of this nature may be
questions are worded, respondents misunderstanding the administered every five years to establish trend-line
questions or answering incorrectly, and non-response. The information.
acceptable level of reliability depends on the estimate under With the NMSs moving towards the provision of
consideration and the intended use of the data, that is, the broader weather and environmental prediction services there
acceptable level of reliability depends on the level of accuracy are increasing opportunities for collaborative user-based
required for a particular application.What may be an accept- assessment efforts. Examples of these include air quality and
able margin of error for one estimate may differ from that felt smog forecasting programmes precipitating the need for joint
suitable for another estimate. assessment activities between various levels of government
The determination of sample size involves a process of and sometimes non-governmental environmental organiza-
making practical choices and trade-offs among the conflict- tions.As a minimum, in-kind resources are offered but more
ing requirements of precision, cost, timeliness and recently actual financial support is provided. With expan-
operational feasibility. sion in the road weather forecasting area there could be
possibilities for similar collaborative efforts with the trans-
portation sector and other levels of government.
5.2.4.2 Collaboration with Other Relevant Authorities is
Desirable
5.2.5 Additional Principles of User-based Assessment
Working with others can achieve synergies and economies of Design
scale. The process of developing a plan and sharing informa-
tion on intentions will be more inclusive increasing 5.2.5.1 Use of Professional Expertise and Independent
co-operation, communication, and co-ordination of efforts. Administration Authority
Teaming up with others may yield mutual benefits such as
reduced costs, increased internal communication, and new The satisfaction of credibility and transparency concerns is
ideas. To be successful, this requires communication by all facilitated by the use of external independent expertise as an
parties. Organisations in the private or not-for-profit sectors input to the design and for the administration of the user-
can be approached for help in reaching their communities. based assessments. The use of external accredited
They may be willing to provide funding or service in kind. consultation expertise can facilitate the free and honest flow
Examples include approaches such as co-operation with of ideas and concerns. Focus group facilitators are essential
community support and advocacy organisations for deaf, for the creation of the desired information discussion
deafened, and hard of hearing clients. environment when considering the characteristics of interest
One of the most common forms of “collaboration”is the to the NMS funding the study. The expertise of a private
use of omnibus surveys that are usually conducted by survey firm, a dedicated government body with the assigned
telephone. In the case of omnibus surveys the NMS buys a responsibility and appropriate skills, or of an academic
portion of a larger survey that may cover several clients. (University) professional adds value to the design of a survey
Omnibus surveys are questionnaires consisting of several instrument. Such expertise and at-arms-length objective
22 Chapter 5 — User-based assessment
positioning is usually essential for the administration of a proprietary software packages, available commercially, used
survey. Such external expertise will assist in the perception of for scientific and survey applications).
credibility and in the attainment of statistically valid results
from the perspectives of sample size and geographical and
geopolitical representation. Indeed, it may be a formal 5.2.6 Communication of Information
requirement for performance pledge / charter or of quality
assurance system to use such expertise. To be effective and worth the expenditure of the resources
involved the information must be communicated and appro-
priately used internally within the NMS as well as externally
5.2.5.2 Lack of Professional Advice or Availability of an to clients and stakeholders.
Independent Capacity Should Not Stop
Assessments From Being Done
5.2.6.1 Accessibility Within the NMS
Although it would be best for an NMS’s to use professional
advice or some independent capacity, if these are not avail- Increasing the access to user-based assessment results within
able, user-based assessments should still be done. It is the NMS is important. Use of this information in both the
essential to measure certain basic end-users’ understandings long and shorter-term strategic/tactical decision context has
and reactions to the services provided. The use of some “best been discussed above. The results of user-based assessment
practice” examples of other NMSs providing similar research need to be made available to managers and employ-
programmes can help.Adaptation of these by in-house staff, ees if they are to be worthwhile. A greater awareness of what
and in-house staff administration of such assessments can has already been done elsewhere could avoid possible dupli-
yield very useful information that can assist in the manage- cation. The results could be used by others in various
ment and planning of the NMS. activities such as planning, risk management, briefing note
preparation, and tracking issues.
5.2.6.3 Archive, Publish, Use as Appropriate for through restricted observations on a massive domain.
Promotion (and Education) Quantitative data, such as that from a sample survey asking
a few rigidly structured questions of many people, yield infor-
Since user-based assessment is quite costly, it is important to mation through a mass of observations on a restricted
maintain both the reports and raw data in a variety of media, domain (e.g., data from large sample survey on satisfaction
with backup copies, for future use and possible reanalysis. with temperature forecasts). Compared to quantitative data,
The media range from hard copy to electronic to video or the meaning of qualitative data is more likely decided after
audio. The material can be used for distribution to a variety data collection.
of users ranging from management for decision making The general characteristics of qualitative and quantita-
purposes, to staff for internal awareness, to funding author- tive methods are summarized in the table below.
ities for resource justification, to the public or stakeholders for Qualitative techniques are employed when rich contex-
end-user awareness and education, to regulatory bodies for tual program description or new/refined program theory is
the attainment of approvals, to central agencies to satisfy needed or variations in implementation or process are to be
reporting requirements, etc. It is important that the data is assessed. When causal attribution, incremental effects or
properly indexed and easily retrievable. resource expenditure assessments are the objective, quanti-
tative methods are more appropriate.
Dimension Method
Qualitative Quantitative
Intent/Purpose Discovery of theory, understanding of Verification of theory, statistical prediction
phenomena under study
Assumption re: origin of meaning Socially constructed and conferred on objects Inherent in objects and acts
and acts
Scope/Nature of investigation Holistic, rich in context, emphasizes interactions Particularistic, guided by program objectives
Sampling Revealing in nature, population inferences Probability, population inferences can be
cannot be drawn drawn
Data gathering Semi-structured or unstructured (open-ended) Fixed response options
response options, observation
Analytical techniques Inductive Deductive
Generalizing to population Invalid Valid
Data collection skills required On-the-fly processing required Rigid script
24 Chapter 5 — User-based assessment
NMS. These also can have the effect of aligning the NMS with that, but the results can provide useful input to the design of
overall governmental initiatives. They involve independent questions for a formal survey. Qualitative data, such as comes
auditing of the NMS and its services by an independent party from focus groups, may be summarised and synthesized
(e.g., government audit agency or consulting company) using systematic techniques. Before coding can begin, data
according to some established or agreed-to criteria. They are often have to be cleaned (i.e., non-relevant or non-codable
usually undertaken according to an established schedule for all {incapable of being categorized} material identified and
or part of the NMS’s range of accountabilities. They identify removed) and unitised (broken down into codable units).
performance improvements achieved and those not adequately Meaning is assigned to observations by finding patterns
achieved and for these they can specify some subsequent through the processes of integration, differentiation and
reporting of actions taken and associated results on a later ordering frequently using a matrix approach.
date. These audits may be part of an overall quality manage- A formal report on the conclusions of the focus group
ment system at the service level or across government and its session is a standard requirement. These reports usually
agencies. These should be seen as an opportunity to learn and summarize both the central tendencies and significant vari-
improve, and perhaps to justify requirement for resources. ations and also make extensive use of verbatim quotes from
respondents to illustrate key points.
Hosting workshops or other events for the broad user commu- To accommodate the measurement of several items
nity or for particular clients or client groups is also effective. within one survey plan, it is likely necessary to make compro-
mises in many areas of the survey design. The method of
data collection (telephone, personal interview, mail-out, etc)
5.3.1.5 Post-Event Review, Case Studies and Debrief may be suitable for measurement of some characteristics but
not for others. The survey design must be made to properly
On the one hand post-event reviews or case studies can be balance statistical efficiency, time, cost, and other operational
evocative with problems coming to the forefront and becom- constraints. As such, they tend to be rather costly so such
ing more persuasive leading to a motivation to make positive base line surveys are usually undertaken once every four or
changes, while on the other hand the case may dominate all five years. In order to make proper inferences on trends
other information and can be too striking thereby biasing the consistency in the design and questions from one baseline
interpretations. Careful selection of the case is essential. Most survey to the next is necessary. Given the cost, such surveys
NMSs will undertake operational performance reviews follow- demand particular senior management discipline and
ing major meteorological events to assess the effectiveness of commitment for appropriate long tern execution. An exam-
their systems. One such review was undertaken following the ple of such a survey, the 1997 Canadian Goldfarb Survey,
costly “Ice Storm” of January 1998 in Eastern Canada. forms Appendix 2 of this Technical Document.
Reviews can result in complete end-to-end operational
system audits of what worked effectively and what did not. It
is common to analyse the accuracy and appropriateness of 5.3.2.2 More Frequent Tracking Surveys
meteorological products. The effectiveness of the informa-
tion delivery system is a critical component to be analysed, One-time or baseline surveys differ from periodic or contin-
as is the effectiveness of the NMS’s relationship with other uing surveys in many ways. The aim of periodic or continuing
agencies involved is disaster management. Surveys of the citi- surveys is often to study trends or changes in the character-
zenry and even the local media provide useful information. istics of interest over a period of time. Such studies nearly
An assessment of the public “issue management” can lead to always measure changes in the characteristics of a population
improved strategies for future similar situations. with greater precision. Overhead costs of survey develop-
Documenting and learning from these situations are key ment and sample selection can be spread over many surveys
steps towards improvements. and this in turn cuts down the costs. Decisions made in the
sample design of periodic or continuing surveys should take
into account the possibility of deterioration in design effi-
5.3.1.6 Collection of Anecdotal Information ciency over time. Designers may elect, for example, to use
stratification variables that are more stable, avoiding those
Finally, for more than historical purposes, NMSs collect anec- that may be more efficient in the short term but which change
dotal information to be used strategically. This involves the rapidly over time. Another feature of a periodic or continu-
collection of stories of lives saved and damage avoided, ing survey is that, in general, a great deal of information is
through effective warnings and forecasts. These “sound bites” available which is useful for design purposes. If, for example,
can be used strategically for public relations purposes or to a Service Charter calls for routine reporting on levels of satis-
defend certain perspectives with clients and partners. faction (or another dimension) with regard to certain
standard forecast elements, a well designed standard survey
instrument can be used repetitively. Recognising the compro-
5.3.2 Formal Structured Surveys mises, an omnibus survey vehicle can be used.
An example of a tracking survey is the Hong Kong,
5.3.2.1 Large Aurvey every 4 or 5 Years – Comprehensive China, survey that forms Appendix 3 of the present Technical
Document.
In most cases survey objectives call for the measurement of
many characteristics. In a survey on meteorological services
one usually wants to determine more than overall satisfaction 5.3.2.3 Subject Area Surveys
or perceptions about weather forecasts. A comprehensive
survey may include sets of questions on the general use of Subject area surveys offer the potential to delve more deeply
weather information, on weather warning information, on into specific characteristics of interest. This can be for the
regular forecast information, on air quality information, on purpose of investigating perceptions regarding key issues of
weather information delivery, demographics, etc. Within concern to an NMS such as climate change, or even for
these sections of a multi-purpose survey further breakdowns specific valuation exercises such as estimating the benefits of
can occur such as under the general topic of weather forecast a specific service provided via a specific delivery mechanism.
information one can investigate, on a per season basis These surveys are specifically designed to answer a limited set
perceptions of what is considered accurate for temperature, of questions and, as such, all of the design dimensions should
wind direction/speed, onset of precipitation, probability of be carefully considered. These include the thorough
precipitation, sky cover conditions (sunny, cloudy), etc. These specification of the objectives, the development of
surveys are usually quite long and demand fairly large sample operational definitions which indicate who or what is to be
sizes to facilitate geo-politically based inferences. observed and what is to be measured, the specification of the
26 Chapter 5 — User-based assessment
data requirements, an indication of the purpose, the areas interrupting the viewing of programmes and/or
covered, the kinds of results expected, the users as well as the commercials.
uses of the data, and the level of accuracy that is desired. Information derived from such investigations can be
used in presentations made before industry and government
5.3.2.3.1 Key Issues authorities in application for licenses etc. Generally, infor-
mation derived from public opinion research in the service
As stated above, climate change is an example of an issue area delivery area can lead to decisions on which systems to be
that can be the focus of a subject area survey. Others can be utilised for the population as a whole and for specific target
perceptions about air pollution, natural disasters, etc. These audiences, and on specific attributes in terms of product
issue area investigations are more prevalent in the broader design and delivery.
environmental field than in the more narrowly defined scope
of meteorological services. 5.3.2.3.4 Economic Value Estimation
DB “Survey or Interview Methods” assume that the user approach to value current precipitation forecasts to the
implicitly knows what the value of the service is to him in the Southern Ontario, Canada dry hay industry at CAN$54
context of his own ability to use it to produce benefits to Million while a 50% improvement in those forecasts
himself. For business users who use the information as a increased that value to $58 Million.A descriptive Contingent
productive input, the benefits implicitly include the user’s Valuation (DB) approach was used by Dalhousie University
understanding of the production process.For household users to value the Marine Weather Services in the Canadian
who use the information for planning recreational activities, Maritime Provinces at more than twice the cost of the provi-
the benefits implicitly incorporate a subjective valuation of sion of the service. One Meteorological Service of Canada
the increase to household utility from the information. Contingent Valuation study demonstrated the ability to select
Different users for the same service likely derive differ- an optimal asking price for services delivered over the tele-
ent levels of benefits from it, and these differences would phone for maximisation of cost recovery while another study
be expected to be reflected in a random sample of all users. demonstrated that the benefit of Marine Weather Services
The contingent valuation (CV) method (one of a number delivered via Weatheradio Canada exceeded the anticipated
of survey-based economic valuation techniques that can increased cost of provision of that service resulting from to
be employed with the assistance of professional expertise large increases in broadcast tower costs.
in this economic theor y – not to be explained here)
directly measures individual willingness to pay (WTP),
and can easily differentiate between significant differences 5.3.2.4 Questionnaire Design
in WTP among user groups, provided the sample of each is
large enough. The individual WTP for each user group can 5.3.2.4.1 Some General Rules for Questionnaire Design and
then be aggregated over the populations of users in each Wording
group. The sum of these aggregates is thus the total value
of the proposed change in the provision of the service • It is essential to ensure that the questions and instruc-
throughout the market. tions are easy to understand.
A demand-based approach is not intended to result in an • Abbreviations and jargon should be avoided.
in-depth analysis of how changes in provision of the service • Words and terminology that are too complex should be
can affect production in a given production process. avoided.
Typically, production issues are treated qualitatively with • The frame of reference should be specified. For example
additional survey questions that ask each user how they use if income information is requested then, at a minimum,
the information in their own decision-making. On the other a time frame should be specified.
hand, demand-based approaches, properly applied, do • Questions must be as specific as possible.
analyse what, if any, substitutes exist for the service, and then • The question needs to be understood by all respondents
value the service as the marginal value of the service over in the same way. To the extent possible, the questions
and above the value of these substitutes. asked should be applicable to all respondents. Clearly,
DB approaches are very specific to the type of weather- skip patterns (those “go to” type directional statements
information dissemination service considered, and results that determine the next question to be asked based on
are not theoretically applicable to other types of services. So the response to the question just asked) are defined such
while, PB methods value the information itself as a produc- that respondents are not required to answer all of the
tive input, DB methods are more specific to the means by questions.
which the information is delivered because this is the specific • The questions should be relevant to the respondent and
good that users employ – a particular bundle of weather the respondent should know enough about the subject to
information supplied in a particular manner, accessible at answer the question knowledgeably.
particular times, etc. The DB approach assumes that the user • Double-barrelled questions should be avoided. Double-
of the service knows how they would respond to a price barrelled questions are ones that have two or more
change or quality change in the service by substituting with questions “nested” within them. Respondents become
other sources of the needed information. confused in trying to answer the question, especially
In the specific policy context of analysing the impact of when they have different answers for each part. One
alternative weather information delivery systems in a cross- indicator of the likelihood of a double-barrelled question
sectoral comparison, DB methods are likely superior to PB is the appearance of the conjunction “and” or “or” in the
methods. In a context requiring an in-depth analysis that question. The best way to avoid the confusion is to
models the complexity of means by which a change in the replace double questions with two or more questions.
quality of information delivered by any system would affect • Don’t try to get two questions answered by way of one
a particular user group, PB methods are likely superior. question.
• The response categories should be mutually exclusive
5.3.2.3.5 Current Value Versus Value if Accuracy Increased and exhaustive.
• Care should be taken in developing the wording of the
Both the PB and DB methods can be used to achieve valua- questions so as to avoid the likelihood of drawing invalid
tions of both current value of specific services and the degree inferences from the responses. That is, the questions
of increased benefit attributable to improved quality of the should not be “leading” or “loaded” i.e. should not
services. A Guelph University study used a prescriptive PB suggest that one answer is preferable to another.
28 Chapter 5 — User-based assessment
or threatening), for example, questions on income and age, processing codes should not take precedence over, nor
tend to get a low response rate and may trigger a refusal by conflict with, the question numbers. The benefits of a
the respondent to co-operate any further. They should not respondent friendly questionnaire include improved
be placed at the beginning of the questionnaire. Introduce respondent relations and co-operation, improved data
them at the point where the respondent is likely to have quality, reduced response time and reduced costs.
developed trust and confidence. Locate sensitive questions
in a section where they are most meaningful in the context 5.3.2.4.5 Response Errors
of other questions. It is useful to introduce these gradually
by warm-up material that is less threatening. Options or A response error is the difference between the true answer to
tools that can be employed are self-enumeration (the a question and the respondent’s answer to it. It can occur
respondent fills out the questionnaire in private), anony- anywhere during the question-answer-recording process.
mous questionnaire, careful wording of questions, the use There are two types. Random errors are variable and tend to
of ranges for response categories and randomised cancel out. Biases tend to create errors in the same direction.
response. In the simplest form of the randomised response One of the sources of response error is the questionnaire
technique, the respondent answers one or two randomly design. It can come from the wording, the complexity and
selected questions without revealing to the interviewer from the order of the questions. It can also come from the
which question is being answered. One of the questions is question structure, complicated skip patterns and from the
on a sensitive topic; the other question is innocuous. Since very length of the questionnaire.
the interviewer records a “yes” or “no” answer without ever Another source of response error is the respondent
knowing which question has been answered, the respon- problems of understanding, recall, judgement, motivation
dent should feel free to answer honestly. This can be done, and reporting. Recalling an event or behaviour can be
for example, in an in-person interview where the intervie- difficult if the decision was made almost mindlessly in the
wee selects a card (code noted by the interviewer without first place, or if the event was so trivial that people have
seeing the side that contains the questions) or is handed hardly given it a second thought since it occurred.
one by the inter viewer, who notes the respondent’s Recalling is also difficult if the question refers to some-
responses to the questions on the card in sequence. thing that happened long ago or if the questions require
Demographic and classification data can be either placed the recall of many separate events. The resultant errors
at the end of the questionnaire or inserted into the most include the respondent failing to report certain events or
relevant sections. failing to report them accurately leading to an under-
The flow of the items should follow the logic of the reporting of events. A less frequent memory error is the
respondent. Time reference periods should be clear to the telescoping error. Here some events may be reported that
respondent. Similar questions should be grouped together. It actually occurred outside the reference period leading to
is useful to provide titles or headings for each section of the the over-reporting of events. Generally speaking the longer
questionnaire. Also, use wording that facilitates movement the reference period, the greater is the recall loss while a
from one section to the next. shorter reference period tends to increase telescoping
errors.
5.3.2.4.4 Layout Considerations for Questionnaires Social desirability bias can also emerge. This is the
tendency to choose those response options that are most
As a general guideline the questionnaire should appear favourable to one’s self esteem or most in accord with
interesting and easy to complete and respondent-friendly. If perceived social norms, at the expense of expressing one’s
done through the mail (regular or electronic) the cover letter own position.
and front cover should create a positive initial impression by Finally, the interviewer can be the source of the error.
way of a respondent-friendly introduction. If the
questionnaire is administered in person or over the 5.3.2.4.6 Probing for More Information
telephone, the questionnaire should be interviewer-friendly.
The instructions should be short and clear and the structure Probing for more information is a common practice in inter-
should be such that the respondent is guided step-by-step viewing whether in the context of a consultation session, a
through the questionnaire. The instructions and answer workshop or a focus group session. Indeed, it is the main
spaces should facilitate proper answering of the questions. means of eliciting information and it is the skills of the facili-
Illustrations and symbols (such as arrows and circles) should tator that come to advantage here. While it can also be used
be used to attract attention and guide respondents or in in-person one-on-one interviews it is less common in tele-
interviewers. It is a good idea for the last page or end of the phone interviews and not possible in mail, Internet or kiosk
questionnaire to provide space for additional comments by based interviews. The survey instrument can often be written
respondents. Finally, always include an expression of in such a manner so as to effectively achieve a similar purpose.
appreciation (“Thank You)”.
Typography considerations in organising the printed 5.3.2.4.7 Geographical and Geopolitical Representation
word on a page include typeface/font (ensure consistency,
use bold face print or ALL CAPITAL LETTERS to high- Most national government statistical bodies have devel-
light important instructions or words), form titles, section oped “standard industrial classifications” that classify
headings, questions and question numbers. Data entry or industries on the basis of their principal activities and
30 Chapter 5 — User-based assessment
“standard geographical classifications” for the identifica- for rapid data capture. The best way of ensuring that the
tion and coding of geographical areas. These “standard concerns of data capture are addressed is to make the indi-
geographical classifications” usually correspond to geopo- vidual/organization responsible for this aspect of the
litical boundaries. The objective of the system is to make survey a permanent member of the team planning and
available a standard set or framework, which can be used implementing the questionnaire .
to facilitate the comparison of statistics for particular If data is to be processed by a computer, which is
areas. Sample allocation decisions are often made on the usually the case, codes for the fields into which answers are
basis of these standard classifications. to be keyed should appear directly on the questionnaire.
These are there to better ensure error-free data entry by
5.3.2.4.8 Data Coding and Capture interviewers. It is now common to have this process
entirely computer resident with the interviewer entering
To avoid being faced with a long, expensive error-prone the data into a computer database via a questionnaire data
task of manually coding and possibly transcribing data, entry screen. The database can be personal computer
consideration should be given, at the design stage, to the based utilizing commonly available and relatively inexpen-
capture of the data for subsequent processing. It is impor- sive software. The data can also be analysed using relatively
tant to consult early, regularly, and often with the inexpensive spreadsheet software or slightly more costly
processing staff, to design any formal survey questionnaire statistical software packages such as SPSS or SAS.
Chapter 6
CONCLUSIONS
6.1 INTRODUCTION 6.3.1 Planning
This Chapter is written especially for those readers who like Since performance assessment involves a range of func-
to read the Introduction to a document, skim through the tions within the NMS, the first step should be to set up a
technical detail in the middle, and jump to the end to find out team to develop a programme plan. This team should be
what the main conclusions were and what, if anything, they large enough to involve the main functions – in particular,
should do about it. Here are the answers you seek…. forecasting, computing systems, marketing (or whatever
this function is called) – but also small enough so that it
does not become unwieldy. Commitment from senior
6.2 SUMMARY management is essential, and preferably at least one senior
manager should be on the team.
Performance assessment should be an essential element of the The first task of the team should be to reach agree-
public weather services programmes of all NMSs. Imagine ment on the purposes and objectives of the performance
how it would be if an NMS tried to do forecasting without assessment programme. What is the most important infor-
first gathering observations. Performance assessment is a bit mation you want to discover? Do you need particular
like gathering that basic data – on user requirements, on information for reporting purposes? Have there been
users’ perceptions of services, and on how good the outputs many complaints about a particular forecast? Have you
are.Analysis of the data can be used to improve performance. asked the users recently whether the products are meeting
The purpose of performance assessment is to ensure there needs? A review of this Technical Document should
above all that, as far as possible, the user requirements are provide lots of clues and cues for the kind of information
being met. It is also used as a check on the operational effec- you might want to gather.
tiveness and efficiency of the overall PWS system. Planning should then proceed on how best to gather
Importantly, the information gathered is also very useful for that information, how it is going to be analysed and used
communications with the public and government, which help and communicated, and who is going to be responsible
raise the profile of the NMS and enhance its credibility. for ensuring that actions are actually taken based on the
The risk is that a performance assessment programme results. Since this will all involve work, it is important to
may be carried out without ever taking any actions based on “keep it simple” and not embark on an overly ambitious
the results. It is important from the outset to ensure that infor- programme to start with. Communicate widely within
mation is being gathered not to just sit on the shelf, but to be the NMS as this planning takes place, and seek feedback
analysed and used for actions which will improve the NMS’s from people who are interested. Forecasters, amongst
performance in the provision of public weather services. others, w ill undoubtedly have something useful to
These actions may include improving the products and contribute.
their delivery, modifying the forecast production system,
carrying out needed research and development, and recruit-
ing and training staff, as well as communicating relevant 6.3.2 User-based Assessment
information. Because budgets and resources are always
limited, there will of course have to be some prioritisation on In the area of User-Based Assessment, the questionnaire
what actions will bring the best benefits. from the Hong Kong Observatory in Appendix 3 is a good
The two essential and complementary aspects of an assess- example of a simple, focussed questionnaire. This gathers
ment programme are Verification,and User-Based Assessment. some basic information on the public’s use of weather fore-
The overall purpose of Verification of forecasts is to ensure casts, how they access them, and what their perceptions are
that products such as warnings and forecasts are accurate,skil- of their accuracy.
ful and reliable from a technical point of view. User-Based You might wish to use this as the basis of a similar
Assessment relies on seeking information from people, to questionnaire for your NMS. But, before doing so, think
obtain a true but subjective reflection of the user perception very carefully about how the information gathered will be
of products and services provided by the NMS,as well as qual- used by you. Some of the information in this sample
itative information on desired products and services. questionnaire is clearly designed for “tracking perfor-
mance” – this is useful for reporting purposes and also
for suggesting remedial action if the performance is
6.3 HOW TO GET STARTED ON A perceived to be very poor in some areas. Other informa-
PERFORMANCE ASSESSMENT PROGRAMME tion about the deliver y channels can be used for
re-prioritising the effort put into different products for
For those NMSs which don’t currently have a performance the different channels. You should also consider how the
assessment programme, now is the time to get started on that questions should be modified to fit your own circum-
first step (always the hardest!). stances, and needs to information to communicate and
make decisions on.
32 Chapter 6 — Conclusions
A simple first step into verification is to verify maximum Severe Weather Warnings
temperature forecasts. These are provided by most NMSs,
and just about everyone cares about temperatures. The exam- Given the importance of forecasts of severe weather, these
ple in Section 4.3.1 shows many measures of reliability, could form the third part of an initial Verification programme.
accuracy and skill which can be used to verify these. Perhaps It is critical for these forecasts to have a well-defined criteria,or
the first questionnaire you use can also ask the public what else verification will be difficult. For example, the criterion
they consider to be an “accurate”maximum temperature fore- used in New Zealand for issuing (and verifying) a warning of
cast. Is within 2°C accurate? Within 3°C? heavy rainfall is for more than 100 mm in 24 hours, over a
As statistics accumulate, you can see how skilful the fore- widespread area (more than 1000 km2). Such forecasts can be
casts are compared to benchmarks, which could include verified using the scores in Section 4.3.2.
statistical forecasts based on numerical model output. Do the
manual forecasts have a worthwhile improvement over model
forecasts? Are they both poor? Is it worth considering a 6.3.4 Ongoing Assessment
research and development programme to improve the guid-
ance? Do the forecasters need more information available on A Performance Assessment Programme is not something that
temperature climatology, and on case studies of unusually you just set up, and let run. It will need ongoing develop-
hot or cold temperatures? ment, and adjustment, and fine tuning. In fact, you should be
assessing the Assessment Programme itself. Many of the
methods described in Chapter 5 can be used with your inter-
Precipitation nal customers in the NMS to make sure that the programme
is meeting their needs, and to improve it.
A typical second step into verification would be to verify fore-
casts of precipitation. In most parts of the world this is of
significant interest to the public - but maybe you should check 6.4 FINAL WORDS
this as part of your first questionnaire?
Verification of “yes”or “no”for precipitation is covered in Performance Assessment is the key to ensuring an effec-
some detail in Section 4.3.2, and the example in Appendix 1 tive, efficient and sustainable Public Weather Services
shows how a simple spreadsheet can be used to compute vari- programme. We trust that the guidelines provided in this
ous scores.You can ask yourself the same kinds of questions Technical Document will be of value to you in establishing or
as for maximum temperatures above. If in some climates a developing your own Programme, and wish you well in that
simple “yes” or “no” may not suffice – the three category endeavour.
REFERENCES
Brier, G.W., 1950:Verification of forecasts expressed in terms Purves, Glenn T., 1997: Economic Aspects of AES Marine
of probability. Monthly Weather Review, 78, 1-3. Weather Services in Marine Applications, A Case Study of
Epstein, E.S., 1969: A scoring system for probability forecast Atlantic Canada. Dalhousie University.
of ranked categories. Journal of Applied Meteorology, 8, Rollins, Kimberly, J. Shaykewich, 1997: Cross-Sector Economic
985-987. Valuation of Weather Information Dissemination Services:
Gordon, N.D., 1982: Evaluating the skill of categorical fore- Two Applications Using the Contingent Valuation Method.
casts. Monthly Weather Review, 110, 657-661. University of Guelph.
Hanssen,A.W., and W.J.A. Kuipers, 1965: On the relationship Satin, A., W. Shastry, 1983: Survey Sampling: A Non-
between the frequency of rain and various meteorologi- Mathematical Guide. Statistics Canada.
cal parameters. KNMI Meded. Verhand., 81, 2-15. Stanski, H.R., L.J. Wilson and W.R. Burrows, 1989: Survey
Murphy,A.H., 1997: Forecast verification. In Economic Value of common verification methods in meteorology.
of Weather and Forecasts, ed. R.W. Katz and A.H. WWW Technical Report No. 8 (WMO/TD 358),
Murphy, 19-74. Cambridge: Cambridge University Press. 114 pp.
Patton, M., 1990: Qualitative Evaluation and Research Turner, Jason R., 1996: Value of Weather Forecast Information
Methods, (2nd Edition). Newbury Park, California: Sage for Dry Hay and Winter Wheat Production in Ontario.
Publications. University of Guelph.
Platek,R.,F.K.Pierre-Pierre and P.Stevens,1985: Development Wilks, D.S., 1995: Statistical Methods in the Atmospheric
and Design of Survey Questionnaires. Statistics Canada Sciences. Academic Press, 467 pp.
Appendix 1
EXAMPLE OF MONTHLY RAINFALL VERIFICATION
The following table shows an example (using a simple spreadsheet) of rain / “no rain” verifications.
RAINFALL VERIFICATION
LOCATION: Auckland
MONTH: July YEAR: 1999
The following area on the spreadsheet shows various skill scores which can be computed from the 2 by 2 contingency table
resulting from these data. The scores are defined in Section 4.3.2, and the 2 by 2 contingency table is the same as used for an
example in that section.
SUMMARY FORMULAE:
Observed Observed
Yes No Yes No
Forecast Yes 19 4 Forecast Yes A B
No 2 6 No C D
for:
The Program Evaluation Group of the Policy, Program and International Affairs Directorate
Good morning/afternoon/evening. My name is ___________ of Goldfarb Consultants, a national survey and opinion research
firm.We are conducting a survey on behalf of Environment Canada today. The results of this study will be used to help design
and modify existing programs and services to better meet your needs. We are not selling anything. We are simply interested
in your attitudes and opinions. Can you spare some time to answer some questions for me? THANK YOU.
A. May I please speak with the male/female [ROTATE] in the household age 18 or over whose birthday comes next? [IF THE
RESPONDENT IS NOT AVAILABLE, GET PERSON’S NAME, MARK AS “ARNA”, AND ARRANGE FOR A CALL
BACK.]
[REINTRODUCE IF NECESSARY]
B. Respondent is...
Male
Female
C. I would just like to confirm that you are over the age of 18.
D. We are interested in people’s occupations. Do you or does anyone in your household work for...
1. We would like to talk to you about the types of news that you hear or look at. During a typical day, how likely are you to
look at or hear news on each of the following topics? Are you very likely, somewhat likely, not very likely or not likely at
all to get news on... [ROTATE]
Very Somewhat Not Very Not Likely
Likely Likely Likely At All
Local events and politics
Entertainment
Weather
Traffic
Sports
2a) We’d like to focus more on weather information for the remainder of this interview. First of all, on a typical day, how many
times would you say that you specifically make a point of actually looking at or listening to weather forecasts? Would it
be... [READ LIST]
2b) If you are in need of a weather forecast, how often is it available to you? Is it available… [READ LIST]
Always
Most of the time
About half of the time
Less than half of the time
Rarely or never
2c) Compared to two years ago, would you say that you are using weather forecasts more often today, the same, or less often
than you were two years ago?
More often
The same
Less often
2d) Compared to two years ago, how satisfied are you with your access to weather information or forecasts?
[READ. CHECK ONE]
3a) We are interested in where you get your weather information from. From what main source are you most likely to get your
daily weather information? [DO NOT READ. CHECK ONE ONLY. CLARIFY “TELEVISION” AND “TELEPHONE”
RESPONSES.]
Guidelines on performance assessment of public weather services 37
3b) What other sources do you get weather information from? [DO NOT READ. CHECK AS MANY AS APPLY.]
3a) 3b)
Primary Secondary
source source
Television – General mention
Television – Weather network
Television – Local Environment Canada cable channel
Radio
Newspaper
Internet Access
WeatherRadio Canada
WeatherCopy Canada
Contact Environment Canada weather office
Telephone – General mention
Telephone – 1-800 number
Telephone – 1-900 number
Environment Canada recorded tape
Family member
4. On a typical day, when do you make a point of trying to look at or hear weather forecasts? [PROBE] Are there any other
times? [DO NOT READ. CHECK ALL THAT APPLY.]
Other
5a) We would like to know if the information provided in weather forecasts is sufficient enough for you to make decisions on
plans or actions that you would take, on a typical day. That is, do you feel that weather forecasts always provide you with enough
information to make decisions, sometimes provide you with enough information, rarely provide you with enough informa-
tion or never provide you with enough information to make decisions?
Always
Sometimes ASK QUESTION 5B
Rarely ASK QUESTION 5B
Never ASK QUESTION 5B
38 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey
5b) What other information would you require to make decisions? [DO NOT READ. PROBE. CHECK ALL THAT APPLY.]
Other:________________________________________________________________________
_________________________________________________________________________
6. We’d now like you to think specifically about Environment Canada for a moment. Can you think and tell me the types of
weather-related services Environment Canada provides and performs? [PROBE AND CLARIFY]
_________________________________________________________________________
7. Now, how often does your work or job require you to make decisions based on the weather? Is it... [READ LIST]
Always
Sometimes
Rarely
Never GO TO QUESTION 10
8. What parts of the weather forecast do you need for you to make work-related decisions? [DO NOT READ. PROBE.
CLARIFY. CHECK ALL THAT APPLY.]
Other:________________________________________________________________________
_________________________________________________________________________
9a) What is your main source of weather information for work-related decisions? [DO NOT READ. CHECK ONE ONLY]
Guidelines on performance assessment of public weather services 39
9b) From what other sources do you get work-related weather information? [DO NOT READ. CHECK AS MANY AS APPLY.]
9a) 9b)
Primary Secondary
source source
10a)We would like you to think of the four seasons. On a scale of 1 to 10, where 10 means “very important” and 1 means “not
important at all”, how important are weather forecasts to you for each of the following seasons? [START RANDOMLY,
AND THEN PROCEED IN ORDER.]
10b)Now we would like you to think of the changes between seasons. On a scale of 1 to 10, where 10 means “very important”
and 1 means “not important at all”, how important are weather forecasts to you for each of the following change of
seasons? [START RANDOMLY, AND THEN PROCEED IN ORDER.]
11. Say you are planning a vacation six months from now to an area of Canada that you’ve never been to. Would the kind of
weather you’d likely experience in six months from now in that location be very important, somewhat important not very
important or not important at all to you in planning your holiday?
Very important
Somewhat important
Not very important
Not important at all
40 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey
12. If you did need this kind of weather information now for your trip in six months, from where do you think you could get
this type of information? (DO NOT READ – CHECK ALL THAT APPLY.)
Weather Office
Library
Atlas
CAA
Travel Agent
Travel Books
Television – General mention
Weather Network – Specific mention
Radio
Newspaper
Internet Access – The Web (WWW)
WeatherRadio Canada
WeatherCopy Canada
Environment Canada recorded tape
Contact Environment Canada
weather office
Family member
Other
Don’t know
13. Besides vacation planning, have you ever obtained this kind of long term weather information for other purposes?
Yes
No GO TO NEXT SECTION
Don’t know
________________________________________________________________________________
We would like to talk to you about weather warnings a specific type of weather forecast that Environment Canada provides to all
Canadians …
1. First of all, what do you think of when you see or hear the words “Weather Warning” as part of a weather report? What
does a “Weather Warning” mean to you? [PROBE AND CLARIFY] Anything else?
2a) From what source are you most likely to receive a “Weather Warning”? [DO NOT READ LIST. CHECK ONE]
2b) From what other sources are you likely to receive “Weather Warnings”? [DO NOT READ LIST.CHECK ALL THAT APPLY]
2a) 2b)
Primary Secondary
source source
We would like you to think of a summer weather situation in which you hear that a Weather Warning is in effect for an approach-
ing summer storm.
3. Of all the times that you have heard a summer storm warning for your area, how often does the summer storm actually
occur in your area? Would you say that it occurs...
Always
Most of the time
About half of the time
Less than half the time
Rarely
Never
[DON’T READ]
Don’t know / No answer
4. How often would you say that you receive enough notice in order to properly react to a warning about a summer storm
heading toward your area?
Always
Most of the time
About half of the time
Less than half the time
Rarely
Never
[DON’T READ]
Don’t know / No answer
5. We would like to know how clear and well-communicated various aspects of a summer storm warning are presented to
you. Based on what you know or have experienced, are the following communicated very well, somewhat well, not very
well or not well at all? [ROTATE]
6. What other type of information do you feel you need to hear as part of the warning message in order to properly prepare
and respond to a summer storm warning? [PROBE AND CLARIFY]
________________________________________________________________________________
7a) When you hear a summer storm warning for your area, how much advance notice do you need in order to ensure your
safety? Would you need... [READ LIST]
7b) What is the minimum a mount of time that you would accept in order to prepare for a summer storm warning for your
area? Would you say it is... [READ LIST]
8a) Based on what you can recall and your own experience over the last two years with summer storm warnings, generally did
you have enough time to respond?
8b) How much more time did you require? Would you require... [READ LIST.]
Now, we would like you to consider a winter weather situation, and you hear that a winter storm warning is in effect for an approach-
ing winter storm.
9. Of all the times that you have heard a Winter Storm warning in your area, how often does this winter storm occur? Would
you say that it occurs …
Always
Most of the time
About half of the time
Less than half the time
Rarely
Never
10. How often would you say that you have received enough notice in order to properly react to a warning about a winter storm
warning in your area?
Always
Most of the time
About half of the time
Less than half the time
Rarely
Never
11. We would like to know how clear and well communicated various aspects of a winter storm warning are presented to you.
Based on what you know and have experienced, are the following communicated very well, somewhat well not very well
or not well at all? [ROTATE]
Very Somewhat Not very Not at Don’t
well well well all well Know
12. What other type of information do you feel you need to hear as part of the warning message in order to properly prepare
and respond to a Winter storm Warning? [PROBE AND CLARIFY]
________________________________________________________________________________
13a)When you hear a winter storm warning for your area, how much advance notice do you need in order to ensure
your safety? Would you say you need... [READ LIST. CHECK ONE ONLY.]
13b)What is the minimum amount of time that you would accept in order to prepare for a winter storm warning for your area?
Would you say it is... [READ LIST. CHECK ONE ONLY.]
14a)Based on what you can recall and your own experience with winter storm warnings, generally did you have enough
time to respond?
14b) How much more time did you require? Would you require.. [READ LIST. CHECK ONE ONLY]
SUMMERTIME SCENARIO
We would like to know your opinions about the accuracy of various types of weather forecasts. Consider a summer forecast that
you hear in July for your area.
1a) So, let’s say that this forecast states that the anticipated high for the day would be 25 degrees. Suppose the actual high is not
25,but is some temperature less than 25 degrees. At what temperature below 25 would you consider the forecast inaccurate?
1b) Now suppose the actual high is not 25, but is some temperature more than 25 degrees. At what temperature above 25 would
you consider the forecast inaccurate?
2a) Say the forecast states that the anticipated overnight low would be 20 degrees. Suppose the actual low is not 20, but is some
temperature less than 20 degrees. At what temperature below 20 would you consider the forecast inaccurate?
2b) Now suppose that the actual overnight low is not 20, but is some temperature more than 20 degrees. At what temperature
above 20 would you consider the forecast inaccurate?
3a) Say the forecast mentioned that the anticipated wind speed would be 30 kilometers per hour. Suppose that the actual wind-
speed is not 30, but is at some speed less than 30. At what speed below 30 would you consider the forecast inaccurate?
3b) Now suppose that the actual wind speed is not 30, but is at some speed more than 30. At what speed above 30 would you
consider the forecast inaccurate?
4. Say the forecast mentioned that the wind would be coming from the west. Would you consider the forecast to be accurate
or not accurate if the wind actually came from… [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]
5. Say the forecast said “rain beginning in the afternoon”. Would you consider the forecast to be accurate or not accurate if
the rain actually began... [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]
6. Say the forecast says “Sunny with afternoon cloudy periods”. Would you consider the forecast accurate or not accurate if it
was... [ROTATE. READ LIST]
Accurate Not Don’t
Accurate Know
Sunny all day
Cloudy all day
Cloudy in the morning and sunny in the afternoon
7. Say that heavy rain with over 50 millimeters of rainfall over the next 24 hours is forecast. Would you consider the forecast
to be accurate or not accurate if actually... [READ LIST. ROTATE]
8. Say the forecast said the probability of precipitation was 70% for today. When you hear that the probability of precipitation
for today is 70%, what does that mean to you? [READ LIST. ROTATE. READ NUMBERS. CHECK ONE ONLY.]
3 There is a 70% chance that rain will occur somewhere in the forecast area today
9. And continue to think about the summer... Which forecast do you use most to plan for special activities, events or weekends?
[READ LIST. CHECK ONE ONLY.]
10. We would like to know how useful various parts of a summer weather forecast are to you. On a scale of 1 to 10, where 10
is “extremely useful” and 1 is “not useful at all” how useful are each of the following parts of a weather forecast and other
summer weather information... [READ LIST. ROTATE]
11. Now we would like to know how accurate summer weather forecasts are on each of the following weather measures. In
your experience, on a scale of 1 to 10, where 10 is “extremely accurate” and 1 is “not accurate at all” how accurate are each
of the following parts of a weather forecast and other summer weather information... [READ LIST. ROTATE]
We would like to know your opinions about the accuracy of various types of weather forecasts. Consider a fall or spring
forecast that you hear in October or March for your area.
1a) So, let’s say that this forecast states that the anticipated high for the day would be plus one. Suppose the actual high is not
plus one, but is some temperature less than plus one. At what temperature below plus one would you consider the fore-
cast inaccurate?
[CONFIRM PLUS OR MINUS WITH RESPONDENT]
1b) Now suppose the actual high is not plus one, but is some temperature more than plus one. At what temperature above
plus one would you consider the forecast inaccurate?
2a) Say the forecast states that the anticipated overnight low would be minus five degrees. Suppose the actual low is not minus
five, but is some temperature less than minus five. At what temperature below minus five would you consider the forecast
inaccurate?
2b) Now suppose that the actual overnight low is not minus five, but is some temperature more than minus five.At what temper-
ature above minus five would you consider the forecast inaccurate?
[CONFIRM PLUS OR MINUS WITH RESPONDENT.]
3a) Say the forecast mentioned that the anticipated wind speed would be 30 kilometers per hour. Suppose that the actual wind-
speed is not 30, but is at some speed less than 30. At what speed below 30 would you consider the forecast inaccurate?
3b) Now suppose that the actual wind speed is not 30, but is at some speed more than 30. At what speed above 30 would you
consider the forecast inaccurate?
4. Say the forecast mentioned that the wind would be coming from the west. Would you consider the forecast to be accurate
or not accurate if the wind actually came from… [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]
5. Say the forecast said “wet snow developing in the afternoon”. Would you consider the forecast to be accurate or not accu-
ate if the wet snow actually began... [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]
Accurate Not Don’t
Accurate Know
In the morning
Around noon
Mid afternoon
In the late afternoon
In the evening
If no wet snow occurred throughout the day or evening
6. Say the forecast says “Sunny with afternoon cloudy periods”. Would you consider the forecast accurate or not accurate if it
was... [ROTATE. READ LIST]
Accurate Not Don’t
Accurate Know
Sunny all day
Cloudy all day
Cloudy in the morning and sunny in the afternoon
7. Say that freezing rain is forecast. Would you consider the forecast to be accurate or not accurate if the precipitation was
actually... [READ LIST. ROTATE]
8. Say the forecast said the probability of precipitation was 70% for today. When you hear that the probability of precipita-
tion for today is 70%, what does that mean to you? [READ LIST. ROTATE. READ NUMBERS. CHECK ONE ONLY.]
3 There is a 70% chance that rain will occur somewhere in the forecast area today
Guidelines on performance assessment of public weather services 49
9. And continue to think about the fall and/or spring... Which forecast do you use most to plan for special activities, events
or weekends? Would it be... [READ LIST. CHECK ONE ONLY.]
10. We would like to know how useful various parts of a fall or spring weather forecast are to you. On a scale of 1 to 10, where
10 is “extremely useful”and 1 is “not useful at all”how useful are each of the following parts of a weather forecast and other
fall or spring weather information... [READ LIST. ROTATE]
11. Now we would like to know how accurate spring and/or fall weather forecasts are on each of the following weather
measures. In your experience, on a scale of 1 to 10, where 10 is “extremely accurate”and 1 is “not accurate at all”how accu-
rate are each of the following parts of a weather forecast and other fall or spring weather information... [READ LIST.
ROTATE]
We would like to know your opinions about the accuracy of various types of weather forecasts. Consider a winter forecast
that you hear in January for your area.
1a) So, let’s say that this forecast states that the anticipated high for the day would be minus 5 degrees Celsius. Suppose the
actual high is not minus 5, but is some temperature less than minus 5. At what temperature below minus 5 would you
consider the forecast inaccurate?
1b) Now suppose the actual high is not minus 5, but is some temperature more than minus 5. At what temperature above minus
5 would you consider the forecast inaccurate? [CONFIRM PLUS OR MINUS WITH RESPONDENT]
2a) Say the forecast states that the anticipated overnight low would be minus 20 degrees Celsius. Suppose the actual low is not
minus 20, but is some temperature less than minus 20. At what temperature below minus 20 would you consider the fore-
cast inaccurate?
2b) Now suppose that the actual overnight low is not minus 20, but is some temperature more than minus 20. At what temper-
ature above minus 20 would you consider the forecast inaccurate? [CONFIRM PLUS OR MINUS WITH RESPONDENT.]
3a) Say the forecast mentioned that the anticipated wind speed would be 30 kilometers per hour. Suppose that the actual wind-
speed is not 30, but is at some speed less than 30. At what speed below 30 would you consider the forecast inaccurate?
3b) Now suppose that the actual wind speed is not 30, but is at some speed more than 30. At what speed above 30 would you
consider the forecast inaccurate?
4. Say the forecast mentioned that the wind would be coming from the west. Would you consider the forecast to be accu-
rate or not accurate if the wind actually came from… [READ LIST. START RANDOMLY AND CONTINUE IN
ORDER]
5. Say the forecast said “snow beginning in the afternoon”. Would you consider the forecast to be accurate or not accurate if
the wet snow actually began... [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]
6. Say the forecast says “Sunny with afternoon cloudy periods”. Would you consider the forecast accurate or not accurate if it
was... [ROTATE. READ LIST]
7. Say that heavy snow is forecast. Would you consider the forecast to be accurate or not accurate if the precipitation was
actually... [READ LIST. ROTATE]
8. Say the forecast said the probability of precipitation was 70% for today. When you hear that the probability of precipi-
tation for today is 70%, what does that mean to you? [READ LIST. ROTATE. READ NUMBERS. CHECK ONE
ONLY.]
9. And continue to think about the winter... Which forecast do you use most to plan for special activities, events or weekends?
Would it be... [READ LIST. CHECK ONE ONLY.]
10. We would like to know how useful various parts of a winter weather forecast are to you. On a scale of 1 to 10, where 10 is
“extremely useful”and 1 is “not useful at all”how useful are each of the following parts of a weather forecast and other winter
weather information... [READ LIST. ROTATE]
11. Now we would like to know how accurate winter weather forecasts are on each of the following weather measures. In your
experience, on a scale of 1 to 10, where 10 is “extremely accurate”and 1 is “not accurate at all”how accurate are each of the follow-
ing parts of a weather forecast and other winter weather information... [READ LIST. ROTATE]
Guidelines on performance assessment of public weather services 53
We would like you to now think about the environment in your area
1a) Do you consider your local area to have an air pollution problem?
Yes
No GO TO QUESTION 2
1b) What air pollution or air quality problems do you feel your area has?
2a) Two different types of air-quality information messages could be provided to you. First, anticipated or expected levels of
pollution for the day could be provided, or information on the actual pollution levels as they are presently occurring could
be provided. Would you prefer to have information on the anticipated pollution levels, on the current levels as they’re
happening, or on both?
3a) Are you aware of any air quality or air pollution information sources available for your area that reflect the current
conditions?
Yes
No GO TO QUESTION 6
54 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey
4. How often do you make a point of checking for information on the current levels of air pollution in your area?
5. On a scale of 1 to 10, 1 being “Not at all satisfied”and 10 being “Extremely satisfied”, how satisfied are you with all the infor-
mation you see or hear now about the levels of air pollution in your area? [CIRCLE ONE]
6. If you heard a message indicating high levels of air pollution, how likely are you to do each of the following?
We would like to talk to you about various weather services that are available to you either by phone or electronically.
In most major urban centres, Environment Canada provides a free 24 hour recorded local weather forecast accessible only over
the telephone. Callers in the local dialing area do not pay any charges. However, those calling from outside the local area must
pay long distance charges to hear about weather that affects their area.
1. Are you aware of this Environment Canada 24 hour recorded local weather forecast service message only accessible over
the telephone?
(Words in italics were added to the questionnaire during the field work, on March 5, 1997 – after a review of preliminary
data seemed suspect)
Yes
No GO TO QUESTION 8
Yes
No GO TO QUESTION 8
Guidelines on performance assessment of public weather services 55
3. How often do you use it? [READ LIST. CHECK ONE ONLY]
4. How often do you try to call this weather line and receive a busy signal? [READ LIST]
Always
Most of the time
About half of the time
Less than half of the time
Rarely or never
5. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the type of
information provided through this service?
6. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the accessi-
bility of weather information provided by this service?
7. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the format
and the presentation of the weather information provided by this service?
8. For budgetary reasons, Environment Canada cannot provide such a service free of long distance charges uniformly across
Canada to smaller centres. Do you think that Environment Canada should… [READ AND ROTATE]
Require everyone to pay, even if someone calls from within their local area
or keep it as it currently is … that is callers from the local calling area are
not charged, but callers from outside the area are charged long distance GO TO QUESTION 10
9a) Would you prefer to pay a fixed fee per call or a charge per minute?
Fixed fee
Charge per minute GO TO QUESTION 9C
9b) How much would you be willing to pay per call? Would it be... [READ LIST]
Under $1.00
$1.00 – $1.99
$2.00 – $2.99
$3.00 – $3.99
$4.00 – $4.99
$5.00 or more
9c) How much per minute would you be willing to pay for this service? Would it be... [READ LIST] (IF ASKED, THE
AVERAGE LENGTH IS 3 MINUTES)
10. So that Environment Canada does not charge all users for this service, commercial advertising needs to be played on this
line. Do you think this is .. [READ LIST]
An excellent idea
A good idea
A fair idea
A poor idea
Environment Canada has recently launched a new national service, a 1-900 user-pay telephone weather service called “Weather
Menu” which provides up-to-date weather and environmental bulletins.
(** If asked ..The phone number is 1-900-565-5000 in English/ 1-900-565-4000 in French called “Meteo à la carte”)
11. Are you aware of this 1-900 User Pay Telephone service?
Yes
No GO TO QUESTION 14
Guidelines on performance assessment of public weather services 57
Yes
No GO TO QUESTION 14
13. How often do you use it? [READ LIST. CHECK ONLY ONE)
14. The cost for this type of service is 95 cents per minute. Do you think this is… (READ LIST. CHECK ONE)
Just right
Too low
Too high
WeatherRadio
WEATHERADIO is an Environment Canada Service that broadcasts weather information 24 hours a day in many areas across
Canada. A special radio must be purchased to receive these weather broadcasts.
(**If asked one can purchase a special receiver at major electronics retailers like RADIO SHACK)
Yes
No GO TO QUESTION 21
Yes
No GO TO QUESTION 21
17. How often do you use it? [READ LIST. CHECK ONE]
18. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the type of
information provided on the WeatheRadio Broadcasts?
1 2 3 4 5 6 7 8 9 10
58 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey
19. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the format
and presentation of information on the WeatheRadio Broadcasts?
1 2 3 4 5 6 7 8 9 10
20. On a scale of 1 to 10, where 10 is “extremely timely” and 1 is “not timely at all”, how timely do you consider the 20 minute
cycle for the WeatheRadio Broadcasts?
1 2 3 4 5 6 7 8 9 10
Environment Canada has a World Wide Web Internet site providing weather and environmental information.
[If they ask for the Universal Resource Locator, i.e. the URL, it is: http://www.ec.gc.ca/ ]
21. Were you aware of Environment Canada’s Information centre on the INTERNET.
Yes
No GO TO DEMOGRAPHICS
Yes
No GO TO DEMOGRAPHICS
23. How often do you use it for weather information or forecasts? [READ LIST. CHECK ALL THAT APPLY.]
24. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the type of
weather information provided on Environment Canada’s Internet Pages?
1 2 3 4 5 6 7 8 9 10
25. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the format
and presentation of weather information in Environment Canada’s Internet Pages?
1 2 3 4 5 6 7 8 9 10
Guidelines on performance assessment of public weather services 59
G. DEMOGRAPHICS
THE FOLLOWING QUESTIONS ARE FOR CLASSIFICATION PURPOSES ONLY. YOUR ANSWERS ARE STRICTLY
CONFIDENTIAL, AND WILL ONLY BE USED IN COMBINATION WITH OTHER RESPONSES.
18 – 24
25 – 34
35 – 49
50 – 64
65 and over
2a) Do you have any children living in your household under the age of 18?
Yes 61-1
No 2 GO TO QUESTION 3
2b) What ages are the children under the age of 18 that live in your household.
[CHECK ALL THAT APPLY]
4a) Please indicate which of the following best describes your current status.
5a) How many cars, trucks and vans are owned or leased by you or all members of your household?
None 68-1
1 2
2 3
3 4
4 5
5 or more 6
5b) And finally, in which category does your total annual household income fall before income taxes?
Refused 6
THANK
Finally, may I have your first name in case my supervisor needs to verify that I conducted this interview with you?
NAME:
PHONE:
Appendix 3
HONG KONG OBSERVATORY SURVEY
MAIN QUESTIONNAIRE
Q2 From where do you usually obtain weather information of Hong Kong? Do you obtain from radio, television, newspaper,
weather hotline, internet, pagers / mobile phones, or other sources? Any other? (up to 3 sources)
(For “weather hotline”, probe : Is it Hong Kong Observatory’s Dial-a-Weather hotlines 1878-200, 1878-202 and 1878-066,
or Hong Kong Observatory’s Information Enquiry System 2926-1133 or Hong Kong Telecom’s 18-501 and 18-503, 18-508?)
1. Radio
2. Television
3. Newspaper
4. Hong Kong Observatory’s Dial-a-Weather hotlines (1878-200 / 202 / 066)
5. Information Enquiry System (2926-1133)
6. Hong Kong Telecom’s 18 501 / 3 / 8
7. Observatory’s Home Page
8. Other homepages
9. Pagers / Mobile Phones
10. Other sources (please specify)
Q3a Do you consider the weather forecasts of the Hong Kong Observatory over the past several months accurate or inaccu-
rate? (Probe the degree)
1. Very accurate
2. Somewhat accurate
3. Average
4. Somewhat inaccurate
5.Very inaccurate
6. Don’t know / no comment
Q3bWhat percentage of weather forecasts of the Hong Kong Observatory over the past several months do you consider accu-
rate ?
1. ___________ per cent
2. Don’t know / No comment
62 Appendix 3 — Hong Kong Observatory Survey
Q4 Do you consider the following aspects of weather forecasts of the Hong Kong Observatory over the past several months
accurate or inaccurate?
Temperature
Fine / Cloudy
Q5 How do you compare weather forecasts nowadays with those from the past 3 to 4 years ago? Is it more accurate, less accu-
rate or about the same?
1. More accurate
2. About the same
3. Less accurate
4. Don’t know / no comment
Q6 How satisfied are you with the services provided by the Hong Kong Observatory? If you rate on a scale of 0 to 10, with “5”
being the passing mark and “10” being “excellent service”, how many marks will you give?
End of Questionnaire