Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
255 views67 pages

Guidelines On Performance Assessment of Public Weather Services

Uploaded by

Le Duy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views67 pages

Guidelines On Performance Assessment of Public Weather Services

Uploaded by

Le Duy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Wo r l d M e t e o r o l o g i c a l O r g a n i z a t i o n

GUIDELINES
ON PERFORMANCE ASSESSMENT
OF PUBLIC WEATHER SERVICES

WMO/TD No. 1023


Wo r l d M e t e o r o l o g i c a l O r g a n i z a t i o n

GUIDELINES
ON PERFORMANCE ASSESSMENT
OF PUBLIC WEATHER SERVICES

WMO/TD No. 1023

Geneva, Switzerland
2000
Text by Neil Gordon and Joseph Shaykewich

Cover design by Irma Morimoto


Graphic provided by Joseph Shaykewich

© 2000, World Meteorological Organization

WMO/TD No. 1023

NOTE

The designations employed and the presentation of material in this


publication do not imply the expression of any opinion whatsoever on
the part of any of the participating agencies concerning the legal status
of any country, territory, city or area, or of its authorities, or concern-
ing the delimitation of its frontiers or boundaries.
CONTENTS

Page

CHAPTER 1 — INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

CHAPTER 2 — KEY PURPOSES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2


2.1 Ensuring that user requirements are met . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Ensuring the effectiveness of the public weather services system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Ensuring the credibility of and support for the public weather services system . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

CHAPTER 3 — AREAS THAT ACTIONS ARE REQUIRED TO MEET THE KEY PURPOSES . . . . . . . . . . . . . . . . . 3
3.1 Production definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Delivery mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.3 Production system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.4 Research and development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.5 Staff training and development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.6 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

CHAPTER 4 — VERIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1 Introduction ................................................................................. 5
4.1.1 Overall purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1.2 Accuracy, skill and reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1.3 Objective and subjective verifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Guiding Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.1 Principles related to why to verify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.2 Principles related to how to verify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2.3 Principles related to do what to do with results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3.1 Deterministic forecasts of values of continuous weather variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3.2 Deterministic forecasts for two categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.3 Probabilistic forecast for two categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.4 Deterministic forecast for multiple categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.5 Probabilistic forecast for multiple categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.6 Forecasts of timing of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.7 Forecasts of the location of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

CHAPTER 5 — USER-BASED ASSESSMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.1.1.1 Subjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.1.1.2 Perception as reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.1.1.3 Dimensions: requirements, expectations, understanding, importance, satisfaction, utility, etc. . . 18
5.1.1.4 Economic value assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Guiding principles for methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.1 Long and shorter term strategic/tactical decision context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.2 Multi-year user-based assessment strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2.3 Need to know why it should be done . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2.4 Credibility and transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2.4.1 Statistical significance issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2.4.1.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2.4.1.2 Sample errors and accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2.4.2 Collaboration with other relevant authorities is desirable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
iv Contents
Page
5.2.5 Additional principles of user-based assessment design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2.5.1 Use of professional expertise and independent administration authority . . . . . . . . . . . . . . . . . . 21
5.2.5.2 Lack of professional advice or availability of an independent capacity should not stop
assessments from being done . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.5.3 Dry run or pilot test the assessment instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.5.4 Information storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.6 Communication of information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.6.1 Accessibility within the NMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.6.2 Clear reports for internal and external consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.6.3 Archive, publish, use as appropriate for promotion (and education) . . . . . . . . . . . . . . . . . . . . . . 23
5.2.6.4 Targets for communication of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.1 Non-survey user-based assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.1.1 Formal audits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.1.2 Focus groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3.1.3 Monitoring public opinion and direct feedback and response (complaints, compliments,
suggestions) mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3.1.4 Consultation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3.1.5 Post-event review, case studies and debrief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.1.6 Collection of anecdotal information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.2 Formal structured surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.2.1 Large survey every 4 or 5 years - comprehensive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.2.2 More frequent tracking surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.2.3 Subject area surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.2.3.1 Key issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3.2.3.2 Product lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3.2.3.3 Delivery systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3.2.3.4 Economic value estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3.2.3.5 Current value versus value if accuracy increased . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3.2.4 Questionnaire design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3.2.4.1 Some general rules for questionnaire design and wording . . . . . . . . . . . . . . . . . . . . 27
5.3.2.4.2 Types of questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3.2.4.3 Sequencing of questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3.2.4.4 Layout considerations for questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.2.4.5 Response errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.2.4.6 Probing for more information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.2.4.7 Geographical and geopolitical representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.2.4.8 Data coding and capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

CHPATER 6 — CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 How to get started on a performance assessment programme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.2 User-based assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.3 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.3.4 Ongoing assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.4 Final words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

APPENDICES
1 EXAMPLE OF MONTHLY RAINFALL VERIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2 ENVIRONMENT CANADA’S ATMOSPHERIC PRODUCTS AND SERVICES 1997 NATIONAL
PILOT SURVEY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 HONG KONG OBSERVATORY SURVEY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chapter 1
INTRODUCTION
Weather services delivered to the public are one of the most • Favour simplicity where possible, rather than overly
visible returns for the taxpayers’ investment in meteorologi- complicated schemes
cal services.It is difficult to quantify this particular Return On • Be very careful about the statistical significance of results
Investment in financial terms. It is both possible, and essen- based on small samples or short records
tial, to carry out ongoing performance assessment of public • Provide regular reports to stakeholders
weather services to ensure that they are efficiently and effec- • Make relevant, interpreted, information available to the
tively meeting the public’s needs. public.
There are many technical papers and publications on There are two major methods available for gathering
the narrow topic of forecast verification, including numerous information in an assessment programme – Verification, and
accuracy and skill scores. There is less material available by User-Based Assessment. Neither can stand alone. It is impor-
way of guidance on why and how verifications should be tant to do both, in a balanced fashion. The amount of effort
carried out, and on the more general topic of assessing spent on each will depend on the country, the nature of the
whether user needs are being met, rather than just whether services, and the user community. The worst thing would be
forecasts are accurate. Forecast accuracy is irrelevant if the not to do either of them!
forecast products are not available to the public at a time and The overall purpose of Verification of forecasts is to
in a form that is useful. ensure that products such as warnings and forecasts are accu-
The purpose of this Technical Document is to provide rate, skilful and reliable from a technical point of view.As far
broader guidance on performance assessment of public as possible, forecast verifications are produced in an objective
weather services, with something of an emphasis on fore- fashion, free of human interpretation. The results tend to be
casts and warnings. An assessment programme can be seen numbers and statistics, which can be manipulated and inter-
in the context of a quality system, where it is important to preted using statistical theory. There is no guarantee that
ensure that the information gathered and processed is verification results will match people’s perceptions of how
focussed on user requirements, to be used in making deci- good the forecasts are. Nonetheless, information gathered
sions and taking actions to improve performance, rather than through verification can be very useful for improving the
just being gathered for the sake of it. In essence, the object of accuracy of forecasts.
the exercise is to ensure a sustainable and cost-effective On the other hand, User-Based Assessment should give
system delivering quality public weather services. a true reflection of the user perception of products and
The guidelines are based on an outline developed at a services provided by the NMS, as well as qualitative infor-
meeting of the WMO Public Weather Services Expert mation on desired products and services. It is almost
Team on Product Development And Verification And completely subjective information, subject to human percep-
Service Evaluation, in Hong Kong, China in November tion and interpretation.
1999. Two of the terms of reference of this team were to In carrying out an assessment programme combining
“Prepare recommendations on standardised verification both methods, there are some commonalities.Although veri-
techniques for public warnings and forecasts”, and to fications may typically provide objective numbers, they
“Prepare guidelines on technical and user-oriented verifi- should still be based around numbers which are relevant to
cation mechanisms including measures of overall users. It should be possible to match user-based assessment
satisfaction with the service”. This guidance addresses results (e.g., of perceptions of forecast accuracy) with corre-
both terms of reference in the context of overall perfor- sponding technical verification results, and seek common
mance measurement, but does not provide hard and fast trends and patterns. In both methods, there is no single score
rules on standardised verification techniques. or method that can give “The Answer”. Various scores and
Some of the basic guidelines about performance assess- assessment methods have their particular uses.
ment include: In Chapter 2 of this Technical Document, the three key
• Know why you are carrying it out (what new informa- purposes for performance assessment will be discussed.
tion do you want to discover?) Services can only improve if actions are taken – the six
• Do not just collect and process information and then main areas are dealt with in Chapter 3.
file it away Chapter 4 considers in detail how to carry out
• Be prepared to take actions based on the results Verifications, and Chapter 5 is on User-Based Assessment.
• Gather information designed to help a National The final chapter reviews why and how to carry out an assess-
Meteorological Service (NMS) make strategic decisions ment programme, and provides some guidance on an
about all aspects of public weather services “entry-level” programme.
Chapter 2
KEY PURPOSES
There are three key purposes for carrying out an assessment 2.2 ENSURING THE EFFECTIVENESS OF THE
programme for public weather services. They are: PUBLIC WEATHER SERVICES SYSTEM
(1) Ensuring that public weather services are responding to
user requirements It is one thing to provide public weather service that meet user
(2) Ensuring the effectiveness and efficiency of the overall needs – and quite another to do it effectively and efficiently,
public weather services system from an overall point of view. This purpose is not about what
(3) Ensuring the overall credibility and proven value of is delivered and how. Rather, it is about the organization,
public weather services. management and planning of the overall public weather
Another way of looking at this, is that the three purposes services system that delivers the services.
are about: A performance assessment programme can gather infor-
(1) Making sure that you are providing the right products mation that can be used to make strategic decisions about the
(2) Making sure that you have a good system for making future delivery of services, about staffing, about training,
them research and development, and about the best mix of infor-
(3) Building stakeholder support for the NMS. mation from computer models and from human value adding.

2.1 ENSURING THAT USER REQUIREMENTS 2.3 ENSURE CREDIBILITY OF AND SUPPORT FOR
ARE MET THE PUBLIC WEATHER SERVICES SYSTEM

There are a wide variety of end-users of public weather Even if public weather services have been designed and deliv-
services. These include individual members of the general ered to meet user needs, there may be a perception problem
public, emergency management agencies, and paying over how good they are. This can be serious, and life threat-
customers for specialised services. ening. For example, if the public has a poor perception of the
In order to make sure that user requirements are being accuracy of topical cyclone forecasts, they may disregard
met, first of all it is necessary to know what they are – and warnings, resulting in major loss of life and property. In the
what better way than asking the users? This topic is covered best of all possible worlds weather forecasts will never be
extensively in Chapter 5. perfect, so this can be a vicious circle, with public credibility
The definition of the needs in the particular case of declining every time there is the inevitable poor forecast.
weather forecasts can encompass what weather elements are An assessment programme can assist in two ways – by
most important, when and how forecasts should be deliv- finding out what the public perceptions are, and by gathering
ered, in what format, and with what accuracy. and publicising facts about performance to improve the
Knowing what the needs are, it is necessary to find out public perception and credibility of the services. Those occa-
whether they are being met, and take actions to improve sions that forecasts do go wrong can be used as opportunities
where possible. to publicise the role of the NMS and to draw attention yet
This may be as simple as checking and then changing the again to the fact (gained from the assessment programme)
issue time of forecasts to make sure that they are available that, say, forecasts are usually 85% accurate.
when they are most useful. It can also involve keeping score Similar information on performance can be incredibly
on how many forecasts are issued late, and changing manage- useful for gaining the support of other stakeholders, including
ment practices and schedules to ensure that forecasts are government ministers responsible for the NMS. The NMS will
issued on time. be in a much stronger position for sustaining and building fund-
Verifying the accuracy of forecasts is, of course, another ing if it can demonstrate such things as its level of performance,
aspect. But it needs to be done in ways that are relevant to the public satisfaction with its services,and the impacts of previous
user, who has probably never heard of a “Brier Score”. investment and research and development programmes.
Chapter 3
AREAS THAT ACTIONS ARE REQUIRED
TO MEET THE KEY PURPOSES
There is no point in gathering information through an assess- they have for accessing and receiving products, and then to
ment programme without using it. Using it means taking improve the delivery system to better meet those needs.
actions. This chapter is about the six main areas where actions
need to be taken – mostly through changing what is being
done now (unless it is perfect, which is unlikely!) or making 3.3 PRODUCTION SYSTEM
plans for future changes.
(1) Improve the products to be provided There are many aspects of the production system that may
(2) Improve how the products are delivered need to be changed as a result of information gathered in an
(3) Improve the production system assessment programme. Just a few of the numerous possible
(4) Carry out needed research and development changes are:
(5) Train and develop staff • Re-configuration of data networks to gather new data
(6) Communicate information. required for products and services, possibly at the
All of these action areas should involve feedback loops. expense of data which may no longer be required
Information is gathered on user requirements and on perfor- • Obtaining new sources of local or global NWP model
mance levels.Actions are taken to improve matters. The final information on which to base new products and services
step of “closing the loop” is also important – checking what • Revising shift schedules to accommodate new, or modi-
the actual impact was of those actions, in order to learn how fied (or discontinued!) products
to do better next time. • Revising shift schedules to accommodate new delivery
Of course, there is also an assumption here that the NMS times
has the resources and staff to take such actions. There may • Installing systems (e.g., fax machines, or a web server)
well be a gap between the measured performance and expec- for new means of delivery of products
tations, but no ability to improve it because of lack of • Using more automated products (e.g., for maximum
resources, or because there are no people available to carry temperature forecasts) if verifications prove that these
out training. satisfy accuracy requirements and they can be cost-effec-
The fundamental management issue here, which is tively produced
beyond the scope of this Technical Document, is how best to • Devoting more forecaster shift time to producing critical
allocate limited resources (and they are always limited) to warnings which have proven not to be accurate enough
best effect, to improve the situation, based on the information • Centralising forecasting, or de-centralising forecasting.
gathered from the assessment programme.

3.4 RESEARCH AND DEVELOPMENT


3.1 PRODUCT DEFINITION
Information gathered through verifications, and user-based
The product definition is assumed to include what infor- assessment, can be used to determine the priorities for
mation is included in the product, and how it is formatted research and development, and to reshape what R&D needs
and expressed. This may include, for example, criteria for to be carried out. Some typical examples of the actions that
warnings. may take place as a result are:
The techniques discussed fully in Chapter 5, such as • Research and document case studies of weather situa-
surveys, focus groups, and direct visits and discussions, can tions which have been shown through verifications to be
be used to identify user requirements for products. Naturally, poorly handled (e.g., heavy rain situations)
this will not be done in a vacuum, since many products will • Basic research into phenomena where improvements
already exist. It is crucial to ensure that the information gath- are demonstrably needed (e.g., tropical cyclone
ered can be used to make decisions and take actions on development)
product definition. This should always be borne in mind • Development of forecast techniques for new services
when designing the survey – know why you are asking the (e.g., prediction of road surface icing)
questions, and have some idea about what you are likely to do, • Development and improvement of local or regional
depending on the answers. NWP models in support of many products
• Development of statistical post-processing of NWP
model output for new products (e.g., precipitation prob-
3.2 DELIVERY MECHANISMS abilities) or to improve existing products.
The most important aspect of all these examples is that
Part of the user requirement is how the product is delivered, they are driven by the knowledge of user requirements, and
and when. Similar methods to those in the previous section of existing performance levels – gained from the assessment
need to be used to check with the users on what capabilities programme.
4 Chapter 3 — Areas that actions are required to meet the key purposes

3.5 STAFF TRAINING AND DEVELOPMENT to systematic errors that may need to be corrected.Researchers
need information on performance of the system, and on likely
Once again, there are many actions that may take place as a new products so they can plan and prioritise R&D. All staff
result of information from a performance assessment need information on the technical accuracy of the services
programme. A few examples are: delivered, and on public expectations, perceptions and needs.
• Recruiting and training more forecasters based on All staff should have a sense of ownership, accountability,
projected shift requirements from planned introduction and pride in what is being delivered to the users.
of new products and services Secondly, relevant and appropriate information must
• Training staff to make use of new numerical guidance proactively be made available to stakeholders in general. This
information may be a formal requirement of some kind of “Service
• Training staff on the scientific basis of a new product, Charter”or agreement with the government or community at
and operational procedures for producing it large on services to be provided. Communicating such infor-
• Re-training staff on the fundamental meteorology of a mation is particularly important in relation to the third key
weather phenomenon which verifications show is being purpose of “Ensuring the overall credibility and proven value
poorly forecast of public weather services”. If there is a vacuum of informa-
• Training staff on how to write forecasts in a new and tion, particularly on demonstrated performance, public
more “user-friendly” style (which surveys have shown perceptions will be based on anecdotal evidence. People tend
the public would find more useful) to remember the last time a forecast went wrong – not how
• Training staff on how to reduce a known bias of over- well forecasts do overall.
forecasting precipitation occurrence. The most important stakeholder is the source of funds
for the NMS – the government on behalf of the taxpayers.
Information from a performance assessment programme
3.6 COMMUNICATION must be communicated to demonstrate performance, to
demonstrate the beneficial impacts of previous investment in
One of the most important actions that must be taken is to the NMS, and in support of future plans for the development
communicate the results and information gathered from a of the NMS.
performance assessment programme. Information is only of Finally, and often in reaction to events, information must
value if people know about it. It must be in a form that is be communicated to the public via the media when oppor-
understandable to the audience, and tailored to their likely tunities present themselves.A good example is when there has
use of it. been a severe weather event. Whether or not this was well
Firstly, information gathered must be made available to forecast, the public interest in severe weather is heightened,
the staff of the NMS. Managers need information to guide and this is a good opportunity to include information on
them in decision making. Forecasters need information by overall performance of the public weather services as part of
way of feedback on their performance, particularly in relation the “weather story”, to build public support and credibility.
Chapter 4
VERIFICATION
4.1 INTRODUCTION average of the forecast values minus the average of the
observed values.
4.1.1 Overall Purpose Reliability measures are also used to assess how closely
forecasts expressed in probability terms match reality. For
The overall purpose of verification is to ensure that products example,suppose you were verifying a set of many forecasts of
such as warnings and forecasts are accurate, skilful and reli- the probability of occurrence of rain. Suppose also that there
able from a technical point of view. This is distinct from were 100 occasions when the forecast probability was around
whether the products are actually meeting user needs, which 30 % (e.g., between 25% and 35%), but it only rained on 10 of
is covered separately in the next chapter. Nonetheless, the those occasions.The implication is that the forecasts of a prob-
technical assessments should be in terms of measures that are ability of 30% chance of rain were not very reliable, since it
relevant to user needs. really only rained 10% of the time on average.
There are many dimensions and techniques of forecast
verification. This Technical Document is not intended to
cover all possibilities, but to provide sufficient general infor- 4.1.3 Objective and Subjective Verifications
mation on the possibilities. An extensive survey of
verification techniques was carried out by Stanski et al. There are two main ways of verifying forecasts – objective
(1989) and published by WMO. The work by the late Allan and subjective.
Murphy (1997) is also worth reviewing for his philosophy on Objective verification is based on purely objective
verification, and for the list of references. comparisons of forecast and observed weather elements.
There is no element of human interpretation of either the
forecast or observation.2 The results can be replicated.
4.1.2 Accuracy, Skill and Reliability Objective methods should be based on sound statistical
theory – essentially the comparison of observed and forecast
In concept, forecast verification is simple. You just need to numbers.
compare the forecast weather with the observed weather Subjective methods involve some human assessment of
actually occurred. The accuracy1 of a forecast is some forecasts and/or observations. They are a result of human
measure of how close to the actual weather the forecast was. perception, and the results are not always consistent and
The skill of a forecast is taken against some benchmark fore- cannot necessarily be replicated. However, these perceptions
cast, usually by comparing the accuracy of the issued forecast are a true reflection of the value of the forecast to the indi-
with the accuracy of the benchmark. A benchmark forecast vidual or user who does the assessment.
can be something simple such as climatology, chance, or
persistence, or it could be a partly or completely automated
product. The skill measure should give some meaningful 4.2 GUIDING PRINCIPLES
information about what value has been added in the forecast
process, compared to the usually much simpler or cheaper Unless careful planning is done, there is a risk that a verifica-
benchmark forecast. tion programme will never get off the ground, or that it will
There is a great deal of theory and practice about be engulfed in an avalanche of numbers that are never used.
measures of forecast accuracy, involving sometimes-complex The purpose of this section is to suggest guiding principles
formulas for comparing frequency distributions of forecast on the Why, How and What Next of Verification.
versus observed weather. Usually, an accuracy measure gives
information on the spread of differences between forecast
and observed. A typical example is a Root-Mean-Square- 4.2.1 Principles Related to Why to Verify
Error (RMSE) – the square root of the mean of the squared
difference between forecast and observed. There are four main reasons for verifying forecasts:
Reliability is another aspect of forecast accuracy (it (1) We must know the quality of our products
does not involve comparison with a control forecast). (2) We need information to aid decision-making
Literally, this means the extent to which the forecast can be (3) We need information to feed back into process
“trusted” on average. One measure of reliability would be improvement
the average bias in a maximum temperature forecast – the (4) We need appropriate information for reporting to users
and other stakeholders.

1 2
There is sometimes confusion between accuracy and precision. The The observed weather element may of course have been made by a
precision of a forecast is how much detail is put into it in time, space, human observer as part of a routine weather observing programme
weather elements, and numbers of significant digits in numerical – this can be distinguished from subjective assessment of observa-
values. For example, a forecast maximum temperature of tions such as estimating precipitation that occurred in a spot in a
23.42963°C would be very precise,but that does not make it accurate! data-sparse region.
6 Chapter 4 — Verification

Knowing the Quality of the Products systematic differences between the forecast and observa-
tion may turn out to be a problem in the observation, not
It is essential for any service provider to know the quality of the forecast!
the products and services they provide.
However, historically, because of some of the perceived
difficulties of verifying weather forecasts, and the work Appropriate Information for Reporting
involved, NMSs have probably not done this as much as they
should have. Much of the information from a verification programme can
That time of not knowing is now over. In an era of be used internally.
shrinking budgets for NMSs, increased demands for account- However, there is also an increasing, and perfectly under-
ability for expenses and investments, and competition, NMSs standable, demand from users and other stakeholders for
must know how well they are doing. Assumptions about how information on the quality of products and services.
well they are doing are no longer good enough. Providing such information can be very useful for an NMS.
Furthermore, the information gathered on forecast qual- Users sometimes have an incorrect perception of the
ity can be extraordinarily valuable, provided that it is carefully quality of forecasts, which can be corrected by sharing appro-
gathered and analysed, and appropriately used. Information priate verification information with them. Of course, the
on forecast quality is like having a medical check-up – it can verification information may also validate their perceptions
help you work out what parts of your forecast production or poor forecast – there is no point in hiding this, but there
system are working and what are not. It can provide facts will be value in discussing the issue with users and working
rather than assumptions for discussions with customers, and together on how the forecasting can be improved to better
the media, and the government. meet their needs.
Government ministers like to have proof of “value for
money” expended on NMSs, and particularly like to see
Information to Aid Decision Making evidence of improvements over time, as a payback for money
that they have committed to the NMS budget.
NMSs are continually making decisions that involve alloca- Verification information can be useful in dealings with
tion of resources, staffing, training, research and the media, particularly when countering any negative public-
development, and large expenditures. It is vital to make sure ity on a particular forecast that may have gone wrong.
that sufficient information is available on the quality of the A key word here, of course, is “appropriate”. Information
final output products to support these decisions. for reporting purposes needs to be carefully selected, simple,
Measuring and quantifying forecast performance allows and relevant for reporting purposes. Complicated and hard
you to compare forecasters,and forecast systems,and perform to understand scores will not enhance the image of the NMS.
“what if” scenarios on how different systems might perform.
Many examples of where actions can be taken, and deci-
sions made, can be found in Chapter 3 of this Technical 4.2.2 Principles Related to How to Verify
Document.
When considering how to conduct verification, it is vital to
refer back to the principles in the previous Chapters on why
Feed Back into Process Improvement verification is being done. If the “how” of verification is not
answering questions or providing information needed under
Verification results should provide information that is of “why”, then it may not be needed.
value in ongoing process improvement in forecast opera- There are four key principles on how to verify forecasts:
tions. Just one simple example would be recognition that (1) There Should Be an Overall Plan
rain is forecast far too often. Verification information can (2) Measures Must be Relevant to the Users (internal and
be analysed further to see what the weather conditions are external)
like when the forecast was wrong, and to look for trends. (3) Keep It Simple
You might find that there are particular weather conditions (4) Use Consistent Elements, Locations, Methods and
when the over-forecasting is raking place. Forecasters can Scores.
use this information to improve their own performance,
and it can be used to drive research and development
projects. Overall Plan
Since verification involves a comparison between
forecasts and observations, it can be used to pick up Before embarking on a verification programme, it is very
quality problems in either. If the forecasts are being passed worthwhile to take some time to develop an overall plan. This
through some automatic decoder program that is should cover many of the issues addressed in this Technical
having problems, this may indicate that some forecasters Document, focussing on particular issues for your country.
are using the wrong syntax for writing their forecasts. Those staff who will be producing and using the results need
(This can be fixed by training the forecasters to do better, to be involved in the development of the plan, to ensure
or by putting new systems in place that do not allow fore- ownership, a commitment to success, and broad under-
casts to be written the wrong way to start with.) Large, standing of the purposes.
Guidelines on performance assessment of public weather services 7

Customers
Media

Products Expectations Gov't


Reporting

Adjust

Forecasting Training VERIFICATION Other


SYSTEM Stakeholders
R&D

NWP

Re-configure
Observing

The plan needs to take into account why the measures It is important that the verification scheme truly reflects
are being produced. the perception of the public or users on the accuracy of the
The diagram above illustrates the overall information forecast. Surveys may show that the public believe that a
flows in an operational verification system. Meteorological temperature forecast is “correct” if it is within 3°C, and veri-
information and product flows are shown with straight lines. fications can then be made in those terms. However, a higher
Observations are used in NWP and by forecasters, who then level of accuracy may be needed by an electricity supplier
produce products, which go to users. The observations, NWP wanting to forecasting power demand, for whom the temper-
information, and products also feed into the verification ature forecast may need to be within 1°C.
system. This system employs user expectations, to produce It is also important that the system captures how good
reports for the paying customers, and for the media and performance is for the times when the forecast most needs to
government and other stakeholders. Information from the be right – the relevant and critical times. For example, in a
verification system may also be analysed and used to make place that rarely gets frosts, a constant forecast of “no frost”
decisions about re-configuring of the observing system, what may be right 99% of the time, but is clearly of no value, since
research and development may be done to improve NWP it always says the same thing.
and to feed into training to improve forecaster performance, Depending on the climate of the region and the time of
and also to adjust the definition and format of products. year, some weather elements are more important than others.
For example, there may be little value in verifying maximum
temperatures in a region where they always vary little from
User-relevant Measures day to day.
You may also take into account the needs of internal
Information should be relevant to the needs of the users. users of the information for decision making. For example,
There is little point in producing scores that are complex and some particular skill measures may be useful when making
satisfying theoretically, and have all the right attributes of decision on the value of numerical guidance and the value
proper3 scores, if no one can understand or use them. For added by forecasters.
example, scores which give “percent correct”accuracy are not
always favoured by the theoreticians, but they are easily
understood by the public. Keep it Simple

3
A “proper”score is one that encourages a forecaster to forecast what Embarking on a verification programme can be a daunting
he or she truly believes, rather than biasing (or hedging) the fore- prospect for an NMS with little experience in this area. It is
cast one way or another in the hope of producing a better score. better to use simple, easy to understand measures, than to
8 Chapter 4 — Verification

implement very complex schemes. It is also better to concen- Analysis of the results should be ongoing to ensure that bene-
trate on verifying for just a few key places, rather than trying fits are coming from these improvements.
to verify many weather elements for many places. Keeping the If the results are acceptable, this information can be used
number of verifications down avoids being buried in to validate previous decisions, and to assess the likely future
numbers that are never analysed, and keeps costs down. impact of new decisions to be taken.
Use Them for Process Improvement: On a shorter
timescale, verification results should provide information
Consistency that is of value in ongoing process improvement in forecast
operations. Just one simple example would be recognition
One of the most useful aspects of verification information is that maximum temperature forecasts for a city tend to have
that the results can be tracked with time to see how perfor- a warm bias (say, of 1.5°C) – forecasters can use this infor-
mance is (one hopes) improving. But performance cannot be mation to improve their own performance.
tracked if the weather elements, locations, methods and
scores keep changing. And tracking performance in a statis-
tically significant way may take a long time series of Not Misusing the Results
information. For example, at least four years of data will be
needed to analyse seasonal differences in performance in a Verification results based on small sample sizes, or of rare
meaningful fashion. events, may have very large margins of error. It is a good idea,
It is,therefore,important to ensure consistency in an ongo- where possible, to compute error bars on verification results.
ing verification programme.You should be consistent by using Care is needed in interpreting information that has poor
the same weather elements, from the same locations, for the statistical validity. This includes being too proud of very good
same times, and using the same accuracy and skill measures. results (which may not last!) or too concerned about very
Then results can be tracked in time,rather than trying to work poor results (which hopefully also won’t last!).
out whether change in skill were due to using a new score, or You should be careful to double check the results if they
to verifying for a different location after a couple of years. are either very good or very bad – there may have been a
However, it can also be very useful to save the raw data problem with the data or with the computer programs.
used for the verifications so that if some new verification Care must also be used in trying to compare results between
method is introduced it may be possible to go back and regions with different climates, which may not be meaningful,
recompute the verifications results from the beginning. even if the verification methods were exactly the same.

4.2.3 Principles Related to What to Do with Results 4.3 PERFORMANCE MEASURES

The ultimate benefit of a verification programme will only There are many scientific papers and documents on various
come about when the results are used, in support of the four measures of performance that can be used for verification.
reasons we are actually doing verifications (see Section 4.2.1). See, for example, Stanski et al. (1989), and Murphy (1997).
The key principles are quite simple, really: The intent of this Technical Document is not to duplicate
(1) Use the results such material, but to give a sample of the simplest and most
(2) Do not misuse the results. common measures that can be used, together with some brief
examples of their application.
There are two fundamentally different types of variables,
Using the Results which can be forecast in two fundamentally different ways.
The two types of variables are continuous (numbers),
Communicate them: In general, the results should be and categorical (e.g., rain or no-rain, or a category of precip-
communicated appropriately and promptly, rather than just itation amount).
being filed away. This will facilitate general use of the infor- They can be forecast either deterministically, by giving
mation. Communication includes reporting to users and just a single value or category, or probabilistically, through
stakeholders, and providing direct, immediate feedback to giving some information on the probability distribution of
forecasters. Forecasters are usually very interested in the the continuous number, or the individual probabilities for
results of verification. They want to know if they have system- the possible categories which could occur.
atic errors in their forecast so that they can correct them. A forecast expressed in probability terms is more useful
Analyse them: The results should be analysed to assist in for making decisions than a forecast that explicitly states what
decision making. will occur. The user can choose to take one or other decision
If the verification results are not acceptable, then deci- based on the probabilities, and their particular knowledge of
sions may need to be made on the end-to-end forecasting the costs of taking decisions, and rewards or losses depend-
process in order to improve matters. This could include ing on the weather that actually occurs. In the final analysis,
improved data gathering, better numerical guidance, research the value of a probabilistic forecast comes down literally to
and development targetted at the weather elements being the value that such a sophisticated user can extract by making
verified, training programmes, improved procedures, decisions based on the forecast rather than some benchmark
processes and tools in the forecast room, staffing levels. assumptions.
Guidelines on performance assessment of public weather services 9

In this section typical performance measures for the 19.8°C, so there is a slight bias of -0.4°C – on average the
most common types of forecast will be discussed. forecast maxima were 0.4°C colder than the actual maxima.
Other more complicated reliability measures can be
computed. For example, the bias could be considered sepa-
4.3.1 Deterministic Forecasts of Values of Continuous rately for forecasts of colder than 20°C, compared to forecasts
Weather Variables of 20°C or more, to see whether the bias depends on the fore-
cast.It might be that forecasters tend to underdo the maximum
The most common forecasts are of actual values of weather temperatures more when they expect it to be colder.
elements, as real numbers (as distinct from probabilistic fore- Before carrying out calculations of more detailed bias
casts of numbers). Examples of such weather elements are: information such as this, it is important to think about
• Temperature what reason there might be for variations.
• Wind speed Another way of looking for bias is also simply to plot the
• Wind-chill forecast versus observed values. This is easily done these days
• Humidity using standard spreadsheet software. The following graph
• Precipitation amount. shows the forecast versus observed maximums, together with
The following simple example of a set of twenty maxi- the line representing a “perfect forecast”.While this is far too
mum temperature forecasts will be used in this section to small a sample to draw any definitive conclusions from, there
illustrate the scores. Both the forecasts and the observations is a hint here that both the coldest forecasts and the warmest
have been rounded to the nearest whole degree Celsius, since forecasts tend to be too cold.
this is how the public usually see or hear them. In real life, 35
twenty forecasts would be far too small a sample to draw any
conclusions from. This example is purely intended to explain
the various scores and how they can be interpreted. 30
The table includes other columns of information, which
will be explained later.
Observed Max

25
MAX TEMP (°C)
Forecast Observed F-O ABS(F-O) (F-O)^2 Within
(F) (O) ±2°C
17 17 0 0 0 1 20
24 20 4 4 16 0
28 29 -1 1 1 1
22 25 -3 3 9 0 15
14 16 -2 2 4 1
16 17 -1 1 1 1
17 17 0 0 0 1
10
16 16 0 0 0 1
10 15 20 25 30 35
15 14 1 1 1 1
19 18 1 1 1 1 Forecast Max
22 19 3 3 9 0
21 17 4 4 16 0 Accuracy
16 18 -2 2 4 1
20 18 2 2 4 1
Various accuracy measures are shown in the previous table
27 31 -4 4 16 0
21 20 1 1 1 1 for this example.
15 14 1 1 1 1 In terms of accuracy, the Mean Absolute Error or MAE is:
22 28 -6 6 36 0 N
20 23 -3 3 9 0 MAE = N1 ∑ (| fi − oi |)
15 18 -3 3 9 0
i =1
Average: 19.4 19.8 -0.4 2.1 6.9 60%
Bias MAE MSE % correct
For the example, this is 2.1°C. The MAE is a very simple
Reliability measure of accuracy to use and to explain to users – “it’s the
average difference between the forecast and observed temper-
Suppose there are N forecasts fi and corresponding observa- ature”. However, people are often more concerned about the
tions oi for i = 1...N large errors, and this measure does not take these into
A gross measure of reliability is the mean bias. It is simply account as much as ….
the average of the forecast value minus the average observed
value, or The Mean-Square Error or MSE is
N N
bias = N1 ∑ ( fi − oi ) MSE = N1 ∑ ( fi − oi )
2

i =1 i =1

For our simple example, N is 20, the average forecast For the example, this is 6.9. The MSE is affected more
maximum is 19.4°C and the average actual maximum is by large errors, and has the nice statistical property of
10 Chapter 4 — Verification

being a “proper” score – forecasters will do best if they For example, if MAEf is the Mean Absolute Error of the
always forecast the average of what they truly believe the forecast, and MAEb is the Mean Absolute Error of the bench-
maximum temperature is likely to be. It is also the quantity mark, then one skill measure is
that is minimised with classical linear regression equations MAEb − MAE f MAE f
that try and relate some predictor variables to the variable = 1−
being predicted (the predictand). MAEb MAEb
However, the MSE has unfriendly units of °C squared. So, which will be zero when the forecast has the same accuracy
instead, what is usually used is its square root …. as the benchmark, and 1 when the forecast is perfect. This is
typical for a skill measure. Note, however, that since forecasts
The Root-Mean-Square Error or RMSE is are (almost) never perfect, the practical upper limit of a skill
measure may be much smaller than 1.
N
∑ ( fi − oi ) For this particular example, the skill measure based on
2
RMSE = MSE = 1
N MAE is:
i =1 MAE f 2.1
1− = 1− = 0.45
This has units of °C, and for the example the RMSE is 2.6°C. MAEb 3.9
Another measure that is commonly used for weather
elements such as temperature, is the “percent correct”of fore- If MAEf is the Mean Squared Error of the forecast, and
casts that are within some allowable range, e.g., within ±2°C MAEb is the Mean Squared Error of the benchmark, another
or ±3°C. This is shown in the above table by putting a 1 when skill measure is effectively the reduction of variance, or
the forecast was within ±2°C of the observed maximum, and MSE f
0 otherwise, then averaging the values. The result for this 1−
example is that 60% of the forecasts are within ±2°C. MSEb
It is obviously crucial for this measure to know what the For the example of 20 maximum temperature forecasts
public or specialised user considers to be a “correct”forecast. this is:
6.9
But this measure of accuracy is a very simple and useful one 1− = 0.70
22.9
to explain to the public once this has been decided.
If the accuracy measure being used is the percent correct
(of forecasts that are within an acceptable range of the obser-
vations), then another skill measure is:
Skill
PC f − PCb
Skill is measured against some benchmark forecast – typically 100% − PCb
climatology, persistence, or perhaps a numerical guidance
forecast. And for the example this is
Continuing with the same example, suppose that the
benchmark forecast is taken to be the climatological maxi- 0.60 − 0.35
= 0.38
mum temperature for this period of 20°C. Then the 1 − 0.35
corresponding table for this benchmark forecast is: where the value of 0.38 means that the percent correct for the
actual forecasts has gone 0.38 of the distance between the
MAX TEMP (°C) benchmark value of 35% and a perfect score of 100%.
Benchmark
Forecast Observed F-O ABS(F-O) (F-O)^2 Within
(F) (O) ±2°C 4.3.2 Deterministic Forecast for Two Categories
20 17 3 3 9 0
20 20 0 0 0 1
20 29 -9 9 81 0 Typical two category forecasts are:
20 25 -5 5 25 0 • Yes or No for occurrence of precipitation
20 16 4 4 16 0 • Yes or No for occurrence of severe weather
20 17 3 3 9 0 • Rain versus snow.
20 17 3 3 9 0
20 16 4 4 16 0
As can be seen, such a forecast can usually be expressed
20 14 6 6 36 0 as yes or no for an event. These are sometimes called forecasts
20 18 2 2 4 1 of a dichotomous variable. The combination of forecasts and
20 19 1 1 1 1 observations for a set of forecasts being verified can be put
20 17 3 3 9 0 into a contingency table such as:
20 18 2 2 4 1
20 18 2 2 4 1 Observed
20 31 -11 11 121 0 Yes No
20 20 0 0 0 1
20 14 6 6 36 0
Yes A B
20 28 -8 8 64 0
Forecast

20 23 -3 3 9 0
20 18 2 2 4 1
Average: 20.0 19.8 0.3 3.9 22.9 35% No C D
Bias MAE MSE % correct
Guidelines on performance assessment of public weather services 11

To illustrate the use of this, suppose there has been a set If the event is a significant or a rare one, there may not
of forecasts of whether or not there will be measurable actually be any count of the times when the event was neither
precipitation “today”. These could be spot forecasts that there forecast nor occurred. This could be the case, for example,
would be greater than 0.1mm rain between 6 am and 6 pm with warnings of heavy rainfall. The numerous times when a
during the daytime, together with observations from that warnings was not issued, and when heavy rain didn’t occur,
spot on whether or not precipitation was measured. may not actually be counted.
The following table shows the results for this example, In this case, it is common to use three measures of accu-
for a month’s worth of data (31 days). Again, there are not racy – POD, FAR and CSI.
many numbers here, but the purpose is to show the use of The Probability of Detection (POD) is the proportion of
various scores. The numbers come from an example, which times the event occurred that it was correctly forecast:
is shown in Appendix 1, together with all the reliability, A
accuracy and skill measures, which will now be described, POD =
A+C
and a few more.
Observed For the example of rainfall forecasts this is:
Yes No
19
POD = = 0.90
19 + 2
Yes 19 4
Forecast

The False Alarm Ratio (FAR) is the proportion of fore-


casts of the event that turned out to be false alarms:
No 2 6
B
FAR =
A+ B
Reliability The FAR for the example is:
4
The simplest bias measure is the ratio of the number of times FAR = = 0.17
19 + 4
the event was forecast over the number of times it was
observed:
A+ B The Critical Success Index (CSI) is the ratio of the
Bias = correct “yes” forecasts of the event to the sum of the correct
A+C
forecasts, the false alarms, and the misses.
For the example this is A
CSI =
19 + 4 23 A+ B+C
Bias = = = 1.10
19 + 2 21 The CSI for the example is:
so, for this particular case, precipitation is forecast 10% more 19
often than it occurs.This may not necessarily be a major prob- CSI = = 0.76
19 + 4 + 2
lem,particularly in forecasts of rare and severe events.Because
the benefits of taking precautions against such events can be Skill
much higher than the cost of protecting against them, over-
forecasting may in fact be a good thing. But for typical, It is possible to produce skill scores using the above
ordinary events, it would be better if the Bias was around one. measures of accuracy applied to both the forecasts as
issued, and to some benchmark forecast. For example if
there were some numerical guidance forecasts for which
Accuracy the Critical Success Index was CSI b and the CSI for the
issued forecasts was CSIf then a possible skill score to use
The simplest accuracy measure is the percent correct for all is:
the forecasts: CSI f − CSIb
A+ D 1 − CSIb
PC =
A+ B+C+ D
However, for this two-category case, one simple bench-
For the example this is: mark forecast is to use the sample frequency of events for the
sample of forecasts being evaluated.
19 + 6 25 The sample frequency of “yes” events for the example is:
PC = = = 81%
19 + 4 + 2 + 6 31 A+C 19 + 2 21
= = = 0.68
A + B + C + D 19 + 4 + 2 + 6 31
This particular measure may have quite high values for
events that are either very rare or very common, so it needs If there was no relationship at all between a “yes” fore-
to be interpreted with some care. However, for events such as cast and whether the event occurred (this is surely a
this example, where precipitation occurred on 20 out of 31 benchmark forecast with no skill) then one would expect for
days, it is quite a good measure. each “yes” forecast that 68% of the time rain would happen,
12 Chapter 4 — Verification
PROB FORECASTS
and 32% of the time it would not. The same would apply for
Prob (p) Obs (o) (p-o)^2
the “no” forecasts. 0.25 0 0.06
Thus, for this benchmark forecast, by pure chance one 0.95 1 0.00
would expect the value of A in the contingency table to be 1.00 1 0.00
the number of “yes” forecasts by the frequency of “yes” 0.85 1 0.02
events: 0.05 0 0.00
0.15 0 0.02
A+C 0.25 0 0.06
CHA = ( A + B) × 0.15 0 0.02
A+ B+C+ D 0.10 0 0.01
19 + 2 21 0.50 0 0.25
= (19 + 4) × = 23 × = 15.6
19 + 4 + 2 + 6 31 0.85 1 0.02
0.75 0 0.56
A common skill measure – the Heidke Skill Score – can 0.15 0 0.02
then be computed as: 0.65 0 0.42
1.00 1 0.00
A − CHA
Heidke Skill Score = 0.75 1 0.06
A + B − CHA 0.10 0 0.01
19 − 15.6 3.4 0.85 1 0.02
= = = 0.46 0.65 1 0.12
19 + 4 − 15.6 7.4 0.10 0 0.01
Average: 0.51 0.40 0.09
The Equitable Threat Score is a correction of the CSI to Brier Score
take into account CHA, and is defined as:
A − CHA
Equitable Threat Score = Reliability
A + B + C − CHA
19 − 15.6 3.4
= = = 0.36 A simple measure of reliability is the overall bias – the aver-
19 + 4 + 2 − 15.6 9.4 age of the forecast probabilities, divided by the frequency of
Finally, the often-used Hanssen and Kuipers (1965) score occurrence:
can be given as: N

A D 19 6
1
N ∑ pi
HKS = + −1 = + −1 Bias = i =1
A+C B+ D 19 + 2 4 + 6 N
= 0.90 + 0.60 − 1 = 0.50
1
N ∑ oi
i =1
This skill score also does not make explicit use of a For the example, the average forecast is 0.51 and the aver-
benchmark forecast. However, a naïve forecast of always age observation is 0.40 so there is a Bias of 1.28 –
forecasting “yes”, or always forecasting “no”, will give a over-forecasting of the probabilities.
score of zero. Similarly, a naïve forecast with a random Other reliability measures can be generated by dividing
choice each time between yes and no will also have an the forecast probabilities up into various ranges and seeing for
expected score of zero. Positive values of the HKS therefore each range what the actual frequency of occurrence was. For
represent skill over these naïve forecasts, with a score of 1 example, Reliability diagrams can be produced showing this
for perfect forecasting. information (see, for example,Wilks, 1995).

4.3.3 Probabilistic Forecast for Accuracy


Two Categories
The most common accuracy measure for these kinds of forecasts
A probabilistic forecast for two categories can be treated as is the Brier Score, (Brier, 1950) which is just the Mean Squared
the probability that the first of them will occur, since the Error (MSE) for these particular forecasts and observations:
probability of the second category is one minus the proba- N
Brier Score = N1 ∑ ( pi − oi )
2
bility of the first.
Suppose there are N probability forecasts pi and corre- i =1
sponding observations for oi for i =1...N
Each forecast pi will be in the range from 0 to 1, express- For this case, the Brier Score is 0.09.
ing the probability of a “yes”. Each observation will be 0 if
that event (the first category) did not occur, and 1 if the event
did occur. Skill
Data from the following table will be used as an example
– it has just twenty probability forecasts, which is not enough If BSf is the Brier Score for the forecast, and BSb is the Brier
to draw any conclusions, but it can be used to illustrate the Score for the benchmark forecast (in this case, climatology),
various scores. then the Brier Skill Score can be expressed as:
Guidelines on performance assessment of public weather services 13

BSb − BS f BS f
Brier Skill Score = = 1− Observed
BSb BSb Sum of
Dry (1) Showers (2) Wet (3)
Forecasts
Hence, this is like a reduction in variance (RV). It is in the
form of a percentage improvement over the climatological Dry (1) n11 n21 n31 n*1
benchmark, with a skill score of 1.0 for perfect forecasting.
In this case, BSf is 0.09 and BSb for a climatological prob-
ability of 0.40 is 0.24, so the Brier Skill Score is 0.63.
Showers (2) n12 n22 n32 n*2

Forecast
4.3.4 Deterministic Forecast for Multiple Categories
Wet (3) n13 n23 n33 n*3
There are two different kinds of forecasts for multiple cate-
gories. One is where they are not ranked – there is no
particular order to the categories.An example of this is where
there may be a number of categories of precipitation type – Sum of
n1* n2* n3* n**
for example, rain, snow, mixed precipitation, freezing rain. Observations
More commonly, the categories are ranked, and do have some
kind of order. Examples include wind speeds in terms of
Beaufort force rather than values, visibility categories, and An example of some numbers in this 3 by 3 contingency
precipitation in categories of increasing amounts. table, which will be used for the scores, is:
To illustrate how this might work,suppose forecasts of rain
are being made for a tropical location, where typically the Observed
weather might be in three categories –“dry”,“showers”,or“wet” Sum of
Dry (1) Showers (2) Wet (3)
(widespread showers or rain) for a 12 hour period from 6 am to Forecasts
6 pm.
An observation of “dry” might correspond to no rain Dry (1) 63 13 8 84
obser ved at the station; of “showers” if no rain was
recorded at the station, but rain was reported in the area or
thunder was heard; and of “wet” if rain was recorded at the
station. Showers (2) 15 45 30 90
In the case of two categories (See Section 4.3.2) all the
Forecast

information about the verifications of a set of forecasts was


obtained using a 2 by 2 contingency table. For m multiple
Wet (3) 7 22 38 67
categories, an m by m contingency table can also be used. (In
the example of three categories of dry, showers and wet, m
would be 3).
Such a table will be used for the remainder of this Sum of
85 80 76 241
section. The elements of the contingency table will be taken Observations
as nij, which is the number of times that the observed cate-
gory was i and the forecast category was j, where i and j are
both in the range 1 to m. Reliability
The notation n*j will be used for the total number of times
that category j was forecast, no matter what was observed: It is hard to have one overall number expressing reliability for
m the multiple category case. Instead, it is better to compare the
n* j = ∑ nij number of times that each category was forecast with the
i =1 number of times that it occurred.
and ni* for the total number of times that category i was The bias for forecast category j is then n*j /nj*.
observed, no matter what category was forecast: In the three-category example, the bias for category 1
m (“dry”) is very close to one at 84/85. The “showers” category
ni * = ∑ nij is slightly over-forecast, with a bias of 90/80, or 1.13. On the
j =1 other hand, the “wet”category is slightly under-forecast, with
Similarly, the total number N of forecasts being verified a bias of 67/76, or 0.88.
can also be given by n** where:
m m
n** = ∑ ∑ nij Accuracy
i =1 j =1
By way of example, for the three-category example of The most commonly used accuracy measure for multiple cate-
dry, showers and wet: gories is just the proportion correct – the sum of the diagonal
14 Chapter 4 — Verification

elements of the contingency table divided by the total number Reliability


of forecasts. This is usually expressed as a percentage.
m As in the case of deterministic forecasts, reliability needs to
∑ nii be measured for each of the forecast categories, and can be
i =1 done so using a bias for each category – the average of the
N forecasts probabilities for that category, divided by the
Note that this accuracy score is equivalent to giving a frequency of occurrence:
mark of 1 for each of the exactly correct forecasts, zero for the N
ones where the correct category was not forecast, and then 1
N ∑ pij
taking the overall accuracy score to be the average mark. Bias j = i =1
For the example, the sum of the diagonal elements is 146, N
and the total is 241, so the percent correct is 146/241 or 61%.
1
N ∑ oij
Other accuracy scores make use of the assumption that i =1
some credit should be given for a “near-miss”by one category, More complex information on reliability can be assessed
though the mark for being out by more than one category using reliability diagrams for each of the forecast categories,or
might be zero. Gordon (1982) developed a general method- by looking at the information in terms of observed categories.
ology for these kinds of scores for accuracy and skill.

Accuracy
Skill
Usually the categories are ranked, and the most common
The simplest skill measures will involve a comparison accuracy measure is the Ranked Probability Score (RPS) orig-
between the accuracy of the actual forecasts and of some inally devised by Epstein (1969).
benchmark. Typical benchmark forecasts would be always to Using the above notation, the RPS for the individual fore-
forecast the climatologically most likely category, or to cast i is:
randomly forecast a category based on the climatological m j  
2
1  
j
∑  ∑ pik − ∑ oik  
frequency of the categories. Again, the climatology may be
1−
based on the sample itself. If PCf is the percent correct for the m − 1  j =1 k =1 k =1
forecasts, and PCb the percent correct for the benchmark,  
then the skill is just: This has a range of 0 (bad) to 1 (a perfect forecast).
PC f − PCb
1 − PCb
Skill
For the example, suppose the benchmark forecast is to
always forecast “showers”, since this is the most common A skill score against a benchmark can be computed in the
observed category. The result would be that the forecast was usual way, by comparing the Ranked Probability Score RPSf
correct 80 times (the number of times the “showers”category for the forecast with RPSb for the benchmark.
was observed) and the percent correct for the benchmark is RPS f − RPSb
80/241 or 33%.
For this case, the skill would then be: 1 − RPSb
0.61 − 0.33
= 0.42
1 − 0.33 4.3.6 Forecasts of Timing of Events
The skill scores proposed by Gordon (1982) provide a
more direct and theoretically satisfying means of assessing Discussion so far has concentrated on weather variables and
skill, including confidence intervals on the score, though they categories. However, there is also increasing interest in the
may be less readily explained to the user community. timing of events, rather than just whether or not they will
occur.
It can be useful to collect and assess statistics on the fore-
4.3.5 Probabilistic Forecast for Multiple Categories cast and observed time to:
• Start of precipitation
For completeness,a description will now be given of probabilis- • End of precipitation
tic forecasts for more than multiple categories.However,the details • Time of change of precipitation type (e.g., rain to snow,
and technicalities involved are beyond the primary purpose of or snow to rain)
this Technical Document, so the reader should refer to Stanski • Start of a severe event
and Burrows (1989) for more details on these kinds of scores. • End of a severe event.
Suppose there are N forecasts, each of which has This verification information can be treated using the
probabilities for m categories, pij for i = 1 and j = 1...m. The assessment measures for continuous weather variables (see
corresponding observations will be called oij, although in Section 4.3.1). For example, if precipitation is forecast to start
each case this will take on a value of 1 for the observed at 1500 and actually starts at 1100 then this can be treated as
category and 0 for the other categories. an error in the forecast of +4 hours, or 4 hours late. The
Guidelines on performance assessment of public weather services 15

information can also be categorised – for example, by turning Hit


the timing error into categories of:
• Too early by 6 hours or more Forecast
• Too early by 2 to 6 hours
• About right (within 2 hours) Observed
• Too late by 2 to 6 hours
• Too late by more than 6 hours.
Note that in order to accumulate statistics on timing False Alarm Miss
errors, it is a given that the event was actually forecast and did
actually occur. Use of categories may enable two more cate- Reliability
gories to be analysed – “forecast but not observed”, and
“observed but not forecast”. The reliability for such a forecast can be assessed by compar-
Another, different timing statistic is the lead-time for the ing the average areal coverage of the forecast with the average
occurrence of severe weather events. This is probably best areal coverage of the observed event.
just summarised, with statistics such as the average lead-time
and distribution of times produced. Reliability would come
separately from timing error calculations for the start of the Accuracy
event.
For example, the lead time for the start of gale forecast The accuracy can be assessed by computing the Threat Score
winds could be analysed. Skill could be assessed by for each forecast or event, and then averaging for the verifica-
comparing the lead-time for warnings as issued by the tion period. The Threat Score is analogous to the Critical
forecast office, with the lead-time based on an NWP model Success Index (see 4.3.2) for a series of two-category forecasts.
forecast. In the diagram above, the area of overlap where the event was
both observed and forecast can be considered to be a “hit”.
The area where the event was forecast but not observed can
4.3.7 Forecasts of the Location of Events be considered to be a “false alarm”. The area where the event
was observed but not forecast can be considered to be a “miss”.
The discussion until now in this Chapter has assumed that Area( Hit )
the comparison is between forecast and observed weather TS =
Area( Hit ) + Area( FalseAlarm) + Area( Miss)
variables for a place, or at most for some small region.
However, there are some types of forecast which explic-
itly predict the areal coverage and extent of an event. One Skill
example would be a warning of severe weather where the
coverage is drawn on a map, or stated to occur over a number Skill can also be assessed analogously with the CSI, including
of counties. This forecast could be verified by dealing with use of the Equitable Threat Score (see 4.3.2). This needs some
each small region individually. It can also be verified as a definition of what the “hit”area would be by pure chance. For
whole, and the usual statistics produced. example, if a country was divided up into 30 regions, and a
The following diagram illustrates a typical situation particular severe weather event affected 10 of them, and a
where a forecast area of severe weather overlaps, but does warning was for 15 regions to be affected by the event, then
not exactly match the area where severe weather was through pure chance one would have expected hits in
observed: 15*(10/30) regions, or 5 regions of the country.
Chapter 5
USER-BASED ASSESSMENT
5.1 INTRODUCTION To facilitate management’s approach to adopting a results-
based, integrated strategy for performance measurement,
As stated in the introductory chapter to this Technical review and reporting, it is useful to develop a performance
Document it is important to carry out ongoing performance framework and a system that is comprehensive and timely,
assessment of public weather services to ensure that they are and that balances expectations with the NMS’s capacities. The
efficiently and effectively meeting the public’s needs and framework should also reflect the ability to manipulate,and the
contribute to longer term societal objectives. Managers need dependent relationship between, the various dimensions of
relevant information to appropriately lead and manage infor- the NMS’scapacities and the resultant consequences of doing
mation, products, services and policy development. Most so. It can serve as an evolving descriptive management tool to
NMSs are now routinely required to report annually to central meet the needs of the management team. A performance
agencies and to meet these requirements they have to system- measurement framework attempts to define the linkages
atically collect and analyse performance information. This between decisions on resource utilisation and results.
activity needs to be undertaken in the context of established The basic logic model is illustrated in Figure 1. The NMS
measurement strategies and defined performance targets. achieves its objectives through managing its programmes
More recently some NMSs have developed “Service and determining its priorities. The NMS does this by manip-
Charters” which detail their pledge of performance to their ulating the mix in its capacities according to some optimal
user communities – specifically, their country’s citizens. balance. The activities and outputs reach a target client group
These service charters provide a brief overview of the services (community of interest) either directly or with the aid of co-
provided, a commitment of performance against specific delivery partners and stakeholders as determined by the
targets (both purely verification oriented and user-based), service delivery capacity mix chosen.As a result of the activ-
and a commitment to consult and identify a means by which ities and outputs, the community of interest group exhibits a
the citizens may register their concerns. These service char- behavioural response of some sort, and immediate impacts
ters can be perceived as being the NMS’s contract with the can occur. Over the longer term, a modified behavioural
citizen. As such they have become an important component pattern can emerge which can lead to more extensive and
of the performance measurement strategy of the Services that consequential impacts that, if the program is performing well,
have adopted them. They represent a public commitment to may be causally linked to the NMS’s long term objectives. In
measure performance and to report on it according to publi- theory, indicators are developed to measure performance in
cised commitments and targets. each of these areas and sources of information are identified.

Figure 1
Guidelines on performance assessment of public weather services 17

The generic ultimate desired result of a NMS’ activities 5.1.1 Characteristics


can be described as that of reduced impact of weather and
related hazards on health, safety, the economy and the envi- User-based assessments are focused around the ability to
ronment. A NMS can only have an indirect influence over obtain information on specific characteristics of interest
such an ultimate outcome of the delivery of its services. A through a variety of direct methods such as surveys, focus
more direct influence is in the area of decision-making and groups, public opinion monitoring, feedback and response
behavioural changes, say, in the form of avoidance of the risks mechanisms, consultations such as users' meetings and
involved or adaptation to them, which come from increased workshops, and the collection of anecdotal information.
awareness. The questions here are externally focused and deal On their own, each of these methods may produce infor-
with client satisfaction, and achievement of intermediate mation which is subjective and of questionable reliability.
results such as building awareness, improving capacity, and However, taken as a whole, a consistent picture often
influencing behaviour and actions. A NMS has more direct emerges which is credible. They are the only effective
control over how it manages its human resources, scientific means by which information can be gathered on needs,
activities, service delivery activities, and its government expectations, satisfaction, etc. More recently, they have also
policy and financial management strategies. The associated been demonstrated as effective means for getting at the
questions here focus on internal issues such as how the orga- economic value of information such as weather informa-
nization manages these dimensions, with what emphases, tion and forecasts.
trade-offs, etc. Performance needs to be measured in each of
these areas with due consideration to their interdependence.
The acceptance of the NMS’s products by the public 5.1.1.1 Subjective
and other users depends on a number of factors. Scientific
accuracy is just one of those factors. User-based assess- Any perception data is by its very nature subjective.
ment is about measuring perceptions on a matrix of Responding to a question involves four distinct processes:
dimensions important to specific user communities and (1) Respondents must first understand the question.
amongst a diversity of user communities. These percep- (2) They must then search their memories to retrieve the
tions include those about requirements, accessibility, requested information.
availability, accuracy, timeliness, utility, comprehension, (3) After retrieving the information, they must think about
language, sufficiency, and packaging. The user communi- what the answer to the question might be and how much
ties range from the individual citizen using the products to of that answer they are willing to reveal.
make personal decisions, to the media organizations essen- (4) Only then do they communicate an answer to the
tial for the communication of the product, to government question.
agencies funding the production and delivery of those Cognitive methods provide the means to examine
products. The health of the NMS depends on the percep- respondents’ thought processes as they answer questions.
tions from the full spectrum of these users. This chapter Cognitive testing methods include
focuses on the characteristics of user-based assessment • Observation of respondents
and methodologies employed. The objective is to ulti- • Think aloud interviews
mately measure performance from the user perspective. • Focus groups
This can be done by achieving an understanding on the • Paraphrasing
“logic” underlying the approach to achieving specific • Confidence rating.
results, identifying a limited set of indicators that responds They are used to find out whether or not respondents
to performance questions for each result, and implement- understand what the questions mean. In this way, cognitive
ing a data collection plan. Starting with key results methods help assess the validity of questions, and identify
(ultimate outcomes) the NMS needs to get consensus and potential sources of measurement error. Respondents often
clarity on user strategy (key activities/outputs), define the do not understand the words and concepts the same way
target groups and the desired influences/changes (interme- as researchers. The researcher must relate to the respon-
diate outcomes sought) and focus on the gaps in logic. dents by using their language and ways of expressing
Such a broad framework can be further broken down concepts.
through more detailed articulation of specific outcomes
and of the nature of influence.
The biggest challenge of user-based assessment is to 5.1.1.2 Perception as Reality
translate vaguely formulated concerns around ultimate and
intermediate outcomes into a well-conceptualised and Gauging the perceptions of citizens, direct clients, stake-
methodologically sound study. There is a requirement to holders and government agencies is an important component
specify what information is needed, from whom or where of service evaluation, and all of these ‘user communities”
the information should be obtained, and how the informa- must be included in the assessment. The goal of service eval-
tion will be used. Decisions must be made on how to obtain uation is to identify users’ needs and to measure the
data from the three possible sources: acceptance of the services provided from such dimensions as
1) Documents, records or other existing information expectations, understanding, importance, satisfaction, utility,
2) Observation, e.g., of actual behaviour of people etc. Data on perceptions relative to such parameters is
3) Questioning. collected through a variety of means.
18 Chapter 5 — User-based assessment

5.1.1.3 Dimensions: Requirements, Expectations, valuation whereby respondents, through an iterative process,
Understanding, Importance, Satisfaction, Utility, are asked to indicate their willingness to pay a suggested
etc. amount to have access to the services versus having that
service withdrawn.
As previously stated, the perceptions assessed include those The valuation techniques can be broadly described as
about requirements, accessibility, availability, accuracy, time- being either production based or demand based. The former
liness, utility, comprehension, language, sufficiency, and involves the modelling of the production process while in the
packaging amongst others. Classically, in the design and latter case direct inferences are made as to the value of non-
development of products and services one starts with the market services such as public weather services. Economic
assessment of user requirements. That is, what are the needs value assessments range from measuring the value of certain
of the spectrum of end users (the public, stakeholder commu- forecast elements to that of estimating the value attributable to
nities, funding agencies) from the spread of possible services the provision of the full set of national services.Reported bene-
that the NMS has the capacity to provide? fit-cost ratios have been reported as being 2:1 to well over 10:1.
This effort benefits from gaining an understanding of Economic value assessments also can be used to determine
user processes – that is, an understanding of how the infor- the justifiability of making investments in research and devel-
mation is used in the activity to which it is applied. opment into improvements in forecast accuracy.Additionally
Frequently, expectations do not line up with actual needs, in such assessments can be used to compare the effectiveness of
which case two alternative paths could be pursued. If the end- various meteorological service delivery systems. With some
user cannot be convinced of the faulty expectations then the measured success some of these techniques have been used to
survival strategy may be to target on those expectations. In impute the ‘social’or non-economic benefits derived from the
other words, try to provide the information they want, even use of public weather services. Further discussion on the
if you know that it may not be the best information for their methodologies for the undertaking of such assessments
purposes. Fortunately, most often with the increasing sophis- appears further on in this document.
tication of the end-user the result is a realignment of
expectations with needs.
A complementary activity with pure user-based assess- 5.2 GUIDING PRINCIPLES FOR METHODOLOGY
ment is thus that of increasing awareness and user education.
The theory is that this process, with iteration, yields improved There is a need for user-based information for decision-
knowledge of the spread of requirements (stated and implied) making purposes by individuals, whether office managers or
that then can be translated into the design of a set of meteo- the most senior executives of the NMS. The information is
rological products and services that cover the degree of used for day-to-day programme delivery management as well
requirements that is within the capacity of the NMS to as for longer-term vision and strategic planning. While the
provide. This results in the development of new products and information gathered may serve the objectives at a variety of
services, and/or the adaptation or refinement of existing levels within an organization, often the methodology chosen
products and services, or even in dropping services that are must be specific to the objectives at the organizational level.
no long needed, to better match the evolving requirements.

5.2.1 Long and Shorter Term Strategic/Tactical


5.1.1.4 Economic Value Assessment Decision Context

Increasingly NMSs are under some pressure to reduce costs The circumstances of planning have changed and the
of operation and to justify any major upgrades of their complexities of managing have increased in recent years.
services and equipment based on a detailed benefit-cost The NMS’s organisational and decision-making structures
analysis. NMSs are interested in demonstrating the economic have changed. The governmental and departmental plan-
and social benefits of services they provides to the public, ning systems have created new processes and products.
industries and organizations. As illustrated in the perfor- The focus on value for money and making the NMS’s
mance logic model (Figure 1) benefits to society as a whole funds go further has sharpened as budgets have signifi-
are commonly perceived as an ultimate outcome of the provi- cantly decreased with governmental budget reduction
sion of meteorological services. exercises. Performance management has taken on greater
For the purpose of this discussion public weather prominence with emphasis on frameworks, concrete
services are generally considered non-rival (if someone uses measures, and continuous improvement. At the same time,
the service it doesn’t stop others from using it) and non- in several domains, the programme has expanded from an
exclusive non-market goods and services. While some initial narrow focus on weather, migrating through the
services are rivalrous, such as limited capacity telephone- larger domain of atmospheric change, to a broader focus
based services, these kinds of services are generally being on environmental prediction.
de-emphasised or commercialised by NMSs. User-based assessment needs to be tied closely to perfor-
A variety of research methods in applied economics mance management, planning and reporting requirements
(environmental, resource, production, information, risk and and the links to both operations and long-term strategic
uncertainty, welfare, etc) can be applied. One of the tech- results should be clear. A more proactive role can be played
niques being increasingly employed is that of contingent by:
Guidelines on performance assessment of public weather services 19

• Obtaining direction from senior management on definitions which indicate who or what is to be observed and
planned user-based assessments to ensure that these what is to be measured. Once operational definitions are
assessments will be useful and that there are resources developed the researcher can specify the data requirements
and the management will to take follow-up action once and decide upon the level of error that is acceptable. Finally,
the findings and recommendations are presented the statement of objectives should indicate the purpose, the
• Working with the organisational units, within the NMS, areas covered, the kinds of results expected, the users as well
responsible for implementation of program changes, to as the uses of the data, and the level of accuracy that is
advise them of the findings and facilitate follow-up desired.
action Essentially, a survey involves the collection of informa-
• Tracking follow-up actions and reporting back to senior tion about characteristics of interest from some units of a
management. Senior management support in terms of population using well-defined concepts, methods and proce-
commitment and resources to implement change is a dures, and the compilation of such information into a useful
key success factor. summary form. The collection of such information from all
Follow-up is essential – if not done, user-based assess- units of a population would constitute a census. Surveys are
ment research will have little value. carried out for either one of two purposes: descriptive or
The kinds of decisions that benefit from the user-based analytical. The main purpose of descriptive survey is to esti-
assessment process range from those pertaining to the initi- mate certain characteristics or attributes of a population –
ation, continuance or modification of major programmes to e.g., awareness of a particular meteorological service.
specific product lines or programme elements and delivery Analytical surveys are generally concerned with testing
mechanisms. Within this spectrum is included the range of statistical hypotheses or exploring relationships among the
decision activities as diverse as that regarding investments in characteristics of a population. An example of an analytical
research and development, technology for automation, survey would be one that determines whether there is a
human resource training, and public education or awareness change in protective behaviours following the introduction of
campaigns. Ultimately, within the resource context of the an Ultra Violet Index programme.
NMS, policies on detailed levels of service can be established. There can be many reasons for undertaking a user-based
assessment by an NMS. These can include the checking of
perceptions against expectations, tracking of trends, seeking
5.2.2 Multi-year User-based Assessment Strategy feedback to improve existing services, determining require-
ment for new or different services, assessing perceived
A plan must follow a development process, which accom- effectiveness of overall programme, and the identifying areas
modates the funding and reporting context that the NMSs where actions can be taken.An NMS’s “Service Charter”may
find themselves in, and have the following characteristics: dictate the requirement to routinely publish information
• A limited, manageable number of priorities that reflect regarding such dimensions as user satisfaction. Such infor-
the needs of the programme mation can be derived from the administration of a
• A schedule of user-based assessments that supports re-useable tracking survey. Subject area surveys can be used
these priorities while being flexible enough to meet to elicit information feedback for the improvement of certain
needs arising from unpredictable or opportunistic specific surveys or for determining the requirement for new
circumstances or different services. Large comprehensive surveys can be
• An approach to communicating findings that promotes used for gauging the overall effectiveness of the NMS's total
sharing information and the development or improve- programme.
ment of products and services.
In developing the schedule of user-based assessment,
the areas of research are selected on the basis of programme 5.2.4 Credibility and Transparency
need, risk management, and commitments in business plans,
management frameworks, and performance frameworks. In There are many considerations that come to mind when
this multi-year strategy for user-based assessment it is impor- wrestling with the concepts of credibility and transparency
tant to cover both product lines and delivery mechanisms for user-based assessment. Comments made above, regarding
and to use consistent questions over years for proper trend- an overall performance management framework and strat-
line analysis. Performance measurement, after all, is about egy, certainly apply. User-based assessment is an effective and
the change over time as opposed to the measurement of the essential component of an organization’s “balanced scorecard”
state of affairs at a give point in time. giving a comprehensive picture of its health and effectiveness.
The adoption of a rigorous approach or methodology based
on established theory and practices is essential. The adher-
5.2.3 Need to Know Why it Should be Done ence to a multi-year user-based assessment strategy facilitates
a co-ordinated and structured approach. Even such simple
The first task in planning a user-based assessment is to spec- precepts as undertaking fewer but well planned surveys, focus
ify the objectives as thoroughly as possible. The key to this groups, etc., rather than a large mixture of disconnected ones
exercise is to come up with clearly defined concepts and terms. and following a consistent approach to track trends help.
Once the basic objectives have been broken down and defined, Finally, publicizing the changes triggered by the assessment
the researcher can then proceed to develop operational enhances credibility and transparency.
20 Chapter 5 — User-based assessment

5.2.4.1 Statistical Significance Issues In probability sampling all within the population have a
non-zero chance of being selected and inferences are made
With regard to public opinion or stakeholder surveys a focus about the entire population that the sample represents.
on sampling and on sampling errors and accuracy can head Probability sampling methods range from simple random
off credibility and transparency problems. selection of members from the population to complex
sampling strategies (random, systematic, stratification, and
5.2.4.1.1 Sampling multi-stage).
Stratification is the most common amongst these meth-
For a specific subject area relative to the programme of an ods. Stratification is the process of dividing the population
NMS to be examined one of the first decisions to be made is into relatively homogeneous groups called strata, and then
whether to undertake a sample survey or a census survey. A selecting independent samples. Stratification variables may
census survey refers to the collection of information about be geographic or non-geographic (e.g. gender, income, indus-
characteristics of interest from all units of a population. An try, occupation). Reasons for stratification include the desire
NMS may want to determine certain characteristics about to acquire estimates at the stratum level. Each stratum
the redistribution of meteorological products by their requires an adequate sub-sample size to ensure that valid
domestic media. An NMS may want to determine what ice results can be derived that are particular to that stratum.
forecasting services high Arctic marine operators would like In random sampling each unit in the population has an
to receive. In such cases, for most countries, a census survey equal chance of being included in the sample.
may be more appropriate given the very small population In systematic sampling, units from a list are selected
under study. using a selection interval (K), so that every Kth element on
A sample survey refers to the collection of information the list, following a random start between 1 and K, is included
about characteristics of interest from only a part of the popu- in the sample. If the population size is M and the desired
lation. A survey of the general population’s awareness and sample size is “n”, then K=M/n. Thus systematic sampling
understanding of a wind-chill programme would be a valid requires a sampling interval and a random start.
use of a sample survey. Multi-stage sampling refers to a process of selecting a
A sample survey is cheaper to do than a census survey. sample in two or more successive stages. For the two stage
Sampling also reduces data collection and processing time. sampling case a number of first stage units are selected -
Sample surveys allow more selective recruiting of e.g. selected communities, from which second stage units
interviewers, more extensive training programmes and closer are selected from within the larger units have already been
supervision.As well, the smaller scale of operations allows for selected, e.g. households within the selected communities.
more extensive follow-up of non-respondents and for a The probability of being selected is P = P1 × P2 for the two
higher level of quality control for such data processing stage sampling case where P1 and P2 represent the proba-
activities as coding and data capture. For these reasons bility of being included in the sample at the respective
sample surveys can be more accurate than their census stages.
counterparts. In some cases where highly trained personnel
or specialized equipment is required it would be difficult 5.2.4.1.2 Sample Errors and Accuracy
and expensive to consider a census. Sample surveys
inconvenience fewer people meaning reduced respondent Both sampling and non-sampling errors affect the accuracy
burden. of survey results.
The target population is the set to which the survey results Sources of non-sampling errors include non-response,
are to apply; about which information is sought; to which the difficulties in establishing precise operational definitions,
sample is intended to represent; and about which one wishes incorrect information provided by respondents, incorrect
to make inferences based on data collected from a sample. A interpretation of questions by respondents, and mistakes in
population has definable characteristics, a specific geographic processing operations.
location and a time period under consideration. The survey Sample error is the difference between the results of a
population is the population that is actually covered which sample estimate and a census, i.e., the population. The size of
may be different from the target population for practical the sampling error generally decreases as the sampling size
reasons. For example, in a national survey remote locations increases. The extent of the sampling error also depends on
are frequently excluded because they are too difficult or costly the variability of the characteristics of interest in the popula-
to enumerate. When a survey population is chosen which tion, the sample design, and the estimate method. Thus the
differs from the target population, it is necessary to be aware size of the sample, population variability, sample design, and
that gap exists between the two populations and recognise the estimation method, are all included as sources of
that conclusions based on the survey results apply only to the sampling error. The sampling error can be reduced through
survey population. the development of an efficient sampling plan, where proper
Samples can be probabilistic and non-probabilistic. use is made of available information in developing the sample
In non-probability sampling elements are chosen in an design and estimation procedure.
arbitrary manner such that there is no way of determining the Accuracy refers to the difference between a survey result
probability of any one element being included in the sample for a characteristic and the true value of that characteristic of
thus there is no assurance that every element has a chance of the population. Precision (or reliability) is a measure of the
being included. closeness of sample estimates to the results of a census (or
Guidelines on performance assessment of public weather services 21

100% enumeration of the population) that is undertaken modules or sections, each dealing with a different topic and
under identical conditions. The greater the variability in the each conducted for a separate organization. Organizations
population, the larger the sample size needed to obtain the are charged on the basis of their level of participation in the
specified level of reliability. Complex sampling procedures omnibus survey. These surveys are routine surveys according
usually increase the margin of error as they increase the to a specific schedule. Frequently, a private survey company
possible sources of errors. Increasing the sample size will will attempt to accommodate the NMS client by pairing the
lower the margin of error due to non-response but the bias meteorology portion with another on a similar
resulting from non-response is not reduced. With respect to (environmental?) theme. “Piggy-backing” questions on an
the characteristics of interest in the survey, the non-respon- omnibus has the effect of sharing the cost of the undertaking.
dents may be different from the respondents. They are useful for a research effort where there are only a few
Confidence interval statements are commonly provided questions to be asked. These surveys typically use
with published survey results.A 95% confidence interval can classification data such as age, gender, region, community
be described as follows: if sampling is repeated indefinitely size, family income, occupation, education, and mother
with each sample leading to a new confidence interval, then tongue.
in 95% of the samples the interval will cover the true popu- On occasion, a survey company will try to set up a larger
lation value. The size of the confidence interval is usually one-time survey effort by inviting certain like-minded
indicated by the margin of error. For example, if the estimate organizations. These can also provide opportunities for cost
is 50% and the margin of error is 3% either way (below and reduction. More frequently, with the recent increase in
above 50%) then the confidence interval is that the “true” interdisciplinary activities with others in the media,
percentage falls somewhere between 47% and 53%, 19 in 20 environment and health fields, collaborative efforts result in
times (i.e., 95% of the time). cross-disciplinary user-based assessments and these
The confidence interval does not take into account a assessments often take the form of surveys. One such survey
margin of additional error that may result from practical was the Canadian National Survey on Sun Exposure and
difficulties that are involved in conducting a survey. The Protective Behaviour into which a section on the Ultra-Violet
sources of this type of error include, for example, the way the (UV) Index was added. Major surveys of this nature may be
questions are worded, respondents misunderstanding the administered every five years to establish trend-line
questions or answering incorrectly, and non-response. The information.
acceptable level of reliability depends on the estimate under With the NMSs moving towards the provision of
consideration and the intended use of the data, that is, the broader weather and environmental prediction services there
acceptable level of reliability depends on the level of accuracy are increasing opportunities for collaborative user-based
required for a particular application.What may be an accept- assessment efforts. Examples of these include air quality and
able margin of error for one estimate may differ from that felt smog forecasting programmes precipitating the need for joint
suitable for another estimate. assessment activities between various levels of government
The determination of sample size involves a process of and sometimes non-governmental environmental organiza-
making practical choices and trade-offs among the conflict- tions.As a minimum, in-kind resources are offered but more
ing requirements of precision, cost, timeliness and recently actual financial support is provided. With expan-
operational feasibility. sion in the road weather forecasting area there could be
possibilities for similar collaborative efforts with the trans-
portation sector and other levels of government.
5.2.4.2 Collaboration with Other Relevant Authorities is
Desirable
5.2.5 Additional Principles of User-based Assessment
Working with others can achieve synergies and economies of Design
scale. The process of developing a plan and sharing informa-
tion on intentions will be more inclusive increasing 5.2.5.1 Use of Professional Expertise and Independent
co-operation, communication, and co-ordination of efforts. Administration Authority
Teaming up with others may yield mutual benefits such as
reduced costs, increased internal communication, and new The satisfaction of credibility and transparency concerns is
ideas. To be successful, this requires communication by all facilitated by the use of external independent expertise as an
parties. Organisations in the private or not-for-profit sectors input to the design and for the administration of the user-
can be approached for help in reaching their communities. based assessments. The use of external accredited
They may be willing to provide funding or service in kind. consultation expertise can facilitate the free and honest flow
Examples include approaches such as co-operation with of ideas and concerns. Focus group facilitators are essential
community support and advocacy organisations for deaf, for the creation of the desired information discussion
deafened, and hard of hearing clients. environment when considering the characteristics of interest
One of the most common forms of “collaboration”is the to the NMS funding the study. The expertise of a private
use of omnibus surveys that are usually conducted by survey firm, a dedicated government body with the assigned
telephone. In the case of omnibus surveys the NMS buys a responsibility and appropriate skills, or of an academic
portion of a larger survey that may cover several clients. (University) professional adds value to the design of a survey
Omnibus surveys are questionnaires consisting of several instrument. Such expertise and at-arms-length objective
22 Chapter 5 — User-based assessment

positioning is usually essential for the administration of a proprietary software packages, available commercially, used
survey. Such external expertise will assist in the perception of for scientific and survey applications).
credibility and in the attainment of statistically valid results
from the perspectives of sample size and geographical and
geopolitical representation. Indeed, it may be a formal 5.2.6 Communication of Information
requirement for performance pledge / charter or of quality
assurance system to use such expertise. To be effective and worth the expenditure of the resources
involved the information must be communicated and appro-
priately used internally within the NMS as well as externally
5.2.5.2 Lack of Professional Advice or Availability of an to clients and stakeholders.
Independent Capacity Should Not Stop
Assessments From Being Done
5.2.6.1 Accessibility Within the NMS
Although it would be best for an NMS’s to use professional
advice or some independent capacity, if these are not avail- Increasing the access to user-based assessment results within
able, user-based assessments should still be done. It is the NMS is important. Use of this information in both the
essential to measure certain basic end-users’ understandings long and shorter-term strategic/tactical decision context has
and reactions to the services provided. The use of some “best been discussed above. The results of user-based assessment
practice” examples of other NMSs providing similar research need to be made available to managers and employ-
programmes can help.Adaptation of these by in-house staff, ees if they are to be worthwhile. A greater awareness of what
and in-house staff administration of such assessments can has already been done elsewhere could avoid possible dupli-
yield very useful information that can assist in the manage- cation. The results could be used by others in various
ment and planning of the NMS. activities such as planning, risk management, briefing note
preparation, and tracking issues.

5.2.5.3 Dry Run or Pilot Test the Assessment Instrument


5.2.6.2 Interpretation Reports for Internal and External
Careful planning and pre-testing the survey or focus group Consumption
instrument or consultation strategy is essential. Pre-testing
will often reveal information on the ability of the proposed Reporting on the results of user-based assessments can take
question set to deliver on the objective of the survey. a variety of forms.
Misunderstanding of specific questions and unexpected There are the standard statistical reports such as
responses can be detected through such pre-testing. produced by public opinion research firms or in house staff.
Depending on the objective of the user assessment, pre-test- Public or stakeholder consultation reports usually
ing may reveal the requirement for additional or different summarize the results of the consultation activity along with
questions, faulty skip patterns (skipping to the wrong or reporting of the actual dialogue that has taken place. If the
unintended subsequent question based on the response to a consultation process used was that of a workshop then a full
question just asked), etc. Pre-testing in a demand-based proceedings of the workshop is frequently published.
economic valuation exercise will help set the willingness-to- For assessments done by way of focus groups these
pay (WTP) cost amounts so that eventual responses ideally usually consist of consensus remarks reinforced by some of
approximate a normal distribution about that WTP estimate. the dialogue, notable for capturing particular points of view,
all done according to the structure implicit in the focus
group’s questionnaire.
5.2.5.4 Information Storage Reports on public opinion surveys usually provide a
statistical analysis of the results on a question by question
Determination of and adoption of certain practices for infor- basis within each section. These statistical results can include
mation storage aspects of user-based assessments are a variety of descriptive statistics including frequencies and
essential for both current and future use of the results.Various cross-tabulations, custom tables including multiple response
types of information and sources result from user-based tables and tables of frequencies, comparative means, perhaps
assessments. Audio or video recordings of consultation, some linear models, correlations or regressions, perhaps
workshop and focus group proceedings can be made for some classification or cluster analysis, and some multiple
future retrieval and “mining” of the information. These response analysis. Results are frequently presented in the
recordings should be kept in a safe place.Written transcripts form of graphical representations (bar, line, pie, area, scatter,
or proceedings or reports of such events can be used in a etc.). In the case of surveys repeated according to a prescribed
similar fashion. Reports on analyses of surveys can have a schedule, time series analysis and trends analyses may be
similar use. These should be kept and made accessible in both reported upon. These analytical reports are used by staff to
hard copy and electronic form. Special consideration should generate issue specific or general summary reports for senior
be given to the electronic storage of raw survey data in a stan- management or for external parties. These summary reports
dardised format be it in a simple flat file, spreadsheet, or take a variety of forms depending on the purpose intended
statistical format such as SPSS or SAS (commonly used and the audience.
Guidelines on performance assessment of public weather services 23

5.2.6.3 Archive, Publish, Use as Appropriate for through restricted observations on a massive domain.
Promotion (and Education) Quantitative data, such as that from a sample survey asking
a few rigidly structured questions of many people, yield infor-
Since user-based assessment is quite costly, it is important to mation through a mass of observations on a restricted
maintain both the reports and raw data in a variety of media, domain (e.g., data from large sample survey on satisfaction
with backup copies, for future use and possible reanalysis. with temperature forecasts). Compared to quantitative data,
The media range from hard copy to electronic to video or the meaning of qualitative data is more likely decided after
audio. The material can be used for distribution to a variety data collection.
of users ranging from management for decision making The general characteristics of qualitative and quantita-
purposes, to staff for internal awareness, to funding author- tive methods are summarized in the table below.
ities for resource justification, to the public or stakeholders for Qualitative techniques are employed when rich contex-
end-user awareness and education, to regulatory bodies for tual program description or new/refined program theory is
the attainment of approvals, to central agencies to satisfy needed or variations in implementation or process are to be
reporting requirements, etc. It is important that the data is assessed. When causal attribution, incremental effects or
properly indexed and easily retrievable. resource expenditure assessments are the objective, quanti-
tative methods are more appropriate.

5.2.6.4 Targets for Communication of Results


5.3.1 Non-Survey User-based Assessments
The communication of the results to staff and management
will assist in the evolution to a more client-centred organiza- While much of the attention in this chapter is given to the
tion that can lead to improvement in products, production, design, development and administration of formal
efficiency and delivery or even end-user awareness and surveys, a quantitative technique, it is not the only vehicle
education thrusts. Communication upwards through higher for user-based assessment and frequently it is not the best
levels of management will assist in the longer term strategic vehicle for specific circumstances. Formal audits, whether
planning and management for the NMS. mandated or self-imposed, can yield useful information
Communication to central agencies may be a defined and have the effect of aligning the NMS with overall
requirement but can also be used as a justification for resources governmental initiatives.
(current and additional).Communication of the results exter- Focus groups, a qualitative technique, are a very popular
nally may have the effect of modifying certain practices, such means of gathering initial information that may be later used
as related to safety, or may encourage or accelerate the devel- in a formal survey or of acquiring greater in-depth
opment of new services or products within the private sector understanding of a particular dimension after a formal
(Weatheradio units, special services, etc.). Communication to survey. Most governments and major corporations monitor
the general public can have the effect of increasing awareness their public image through a variety of means and many have
and credibility of the NMS and its offerings. formal feedback and response mechanisms. Public and
stakeholder consultations are standard means of obtaining
input on NMS policies and issues. Most NMSs will undertake
5.3 METHODS operational performance reviews following major
meteorological events to assess the effectiveness of their
The information “universe” for assessments is only partially systems. Finally, for more than historical purposes, NMSs
measured by any research technique. Qualitative data can collect anecdotal information to be used strategically.
convey detailed information from a few respondents, while
quantitative data come from restricted information from
many respondents. Qualitative data, such as comes from a few 5.3.1.1 Formal Audits
in-person interviews (e.g., data from a rambling in-the-street
interview by the media) or focus groups exploring a wide Formal audits, whether mandated or self-imposed, can yield
range of dimensions of a particular topic, yield information useful information on the operation and effectiveness of the

Dimension Method
Qualitative Quantitative
Intent/Purpose Discovery of theory, understanding of Verification of theory, statistical prediction
phenomena under study
Assumption re: origin of meaning Socially constructed and conferred on objects Inherent in objects and acts
and acts
Scope/Nature of investigation Holistic, rich in context, emphasizes interactions Particularistic, guided by program objectives
Sampling Revealing in nature, population inferences Probability, population inferences can be
cannot be drawn drawn
Data gathering Semi-structured or unstructured (open-ended) Fixed response options
response options, observation
Analytical techniques Inductive Deductive
Generalizing to population Invalid Valid
Data collection skills required On-the-fly processing required Rigid script
24 Chapter 5 — User-based assessment

NMS. These also can have the effect of aligning the NMS with that, but the results can provide useful input to the design of
overall governmental initiatives. They involve independent questions for a formal survey. Qualitative data, such as comes
auditing of the NMS and its services by an independent party from focus groups, may be summarised and synthesized
(e.g., government audit agency or consulting company) using systematic techniques. Before coding can begin, data
according to some established or agreed-to criteria. They are often have to be cleaned (i.e., non-relevant or non-codable
usually undertaken according to an established schedule for all {incapable of being categorized} material identified and
or part of the NMS’s range of accountabilities. They identify removed) and unitised (broken down into codable units).
performance improvements achieved and those not adequately Meaning is assigned to observations by finding patterns
achieved and for these they can specify some subsequent through the processes of integration, differentiation and
reporting of actions taken and associated results on a later ordering frequently using a matrix approach.
date. These audits may be part of an overall quality manage- A formal report on the conclusions of the focus group
ment system at the service level or across government and its session is a standard requirement. These reports usually
agencies. These should be seen as an opportunity to learn and summarize both the central tendencies and significant vari-
improve, and perhaps to justify requirement for resources. ations and also make extensive use of verbatim quotes from
respondents to illustrate key points.

5.3.1.2 Focus Groups


5.3.1.3 Monitoring Public Opinion and Direct Feedback
Focus groups are a very popular qualitative means of gather- and Response (Complaints, Compliments,
ing initial information that may be later used in a formal survey. Suggestions) Mechanisms
Focus groups are also useful when developing new products
or initiatives,to explore needs,understanding and preferences. Many government organizations and major corporations
The user-based assessment process may actually end with the monitor their public image through a variety of means and
focus groups.An example of one such focus group is one that many have formal feedback and response mechanisms. Many
considers the use and understanding of specific meteorologi- NMSs, or their parent organizations, have designated staff
cal terminology such as “probability of precipitation”. that monitor electronic and printed media reports or
The focus group participants are selected through a vari- purchase media monitoring services for that purpose. Media
ety of means. Frequently, the NMS wants to collect some reports frequently precipitate media interviews of Service
qualitative information from certain sectors of society such personnel that generate further media reports. Just two exam-
as professional categories (e.g., mariners), different levels of ples of such occurrences in Canada were the publicity
education, gender, family status (e.g., mothers with children surrounding the windchill issue that spawned over 100 media
potentially exposed to ultra violet radiation), etc. The interviews, and the Ice Storm in January 1998 that generated
contracted or in-house authority could select from a known about 800 media interviews. Such circumstances can be capi-
client list or more or less randomly from sources like phone talised upon from the perspective of promoting awareness
books to identify potential participants. and understanding of service programmes.
Focus group sessions usually last from one to two hours It is increasingly common for NMSs to operate feedback
and are typically comprised of 8 to 12 participants seated and response mechanisms. Some of these systems work via
comfortably around a boardroom type table in a specially the Internet in conjunction with Web offerings, and others are
designed room. telephone based, and yet still others operate via regular mail.
Frequently, observers can observe the proceedings from Specific levels of service regarding initial and final response
behind one-way glass in an adjacent room or via a TV moni- are generally established. These tend to be very useful input
tor. The proceedings are usually recorded via audio or sources of information on the effectiveness and adequacy of
videotaping but the participants are made aware of this service offerings and on the operation of the production and
recording activity. The focus group session is usually delivery systems. The coding of the information in a database
conducted by a professional facilitator who has been brought will make it available for future analysis to determine patterns
up to speed on the subject area. It is critically important that or trends.
interruptions are avoided and that interference in the conduct
of the focus group, by Service personnel, does not occur.
Careful attention must be given to the development of 5.3.1.4 Consultation
the focus group questionnaire or guide with the facilitator.
This is where possible areas of misunderstandings are iden- Public and stakeholder consultations are a standard means of
tified and clarified to the facilitator so that he or she may obtaining input on NMS policies and issues. These consulta-
respond appropriately within the focus group. Unlike a tions can take various forms. The visiting of user associations
formal survey where a spread of related subjects can be such as attendance at their meetings lends a human face to
addressed, a maximum of a couple of issues can be addressed what may otherwise be seen as a faceless bureaucracy produc-
by a focus group. For proper treatment of the characteristic ing weather and climate services. Being on “their territory”
of interest several focus group sessions, geographically sepa- facilitates the exchange of honest reactions to the services
rated, are desirable. provided by the NMS. User meetings in some neutral ground
Care should be taken not to draw statistical inferences are also good venues to achieve similar results. User or joint
from the focus group sessions. The samples are too small for conventions and visiting client sites can be used similarly.
Guidelines on performance assessment of public weather services 25

Hosting workshops or other events for the broad user commu- To accommodate the measurement of several items
nity or for particular clients or client groups is also effective. within one survey plan, it is likely necessary to make compro-
mises in many areas of the survey design. The method of
data collection (telephone, personal interview, mail-out, etc)
5.3.1.5 Post-Event Review, Case Studies and Debrief may be suitable for measurement of some characteristics but
not for others. The survey design must be made to properly
On the one hand post-event reviews or case studies can be balance statistical efficiency, time, cost, and other operational
evocative with problems coming to the forefront and becom- constraints. As such, they tend to be rather costly so such
ing more persuasive leading to a motivation to make positive base line surveys are usually undertaken once every four or
changes, while on the other hand the case may dominate all five years. In order to make proper inferences on trends
other information and can be too striking thereby biasing the consistency in the design and questions from one baseline
interpretations. Careful selection of the case is essential. Most survey to the next is necessary. Given the cost, such surveys
NMSs will undertake operational performance reviews follow- demand particular senior management discipline and
ing major meteorological events to assess the effectiveness of commitment for appropriate long tern execution. An exam-
their systems. One such review was undertaken following the ple of such a survey, the 1997 Canadian Goldfarb Survey,
costly “Ice Storm” of January 1998 in Eastern Canada. forms Appendix 2 of this Technical Document.
Reviews can result in complete end-to-end operational
system audits of what worked effectively and what did not. It
is common to analyse the accuracy and appropriateness of 5.3.2.2 More Frequent Tracking Surveys
meteorological products. The effectiveness of the informa-
tion delivery system is a critical component to be analysed, One-time or baseline surveys differ from periodic or contin-
as is the effectiveness of the NMS’s relationship with other uing surveys in many ways. The aim of periodic or continuing
agencies involved is disaster management. Surveys of the citi- surveys is often to study trends or changes in the character-
zenry and even the local media provide useful information. istics of interest over a period of time. Such studies nearly
An assessment of the public “issue management” can lead to always measure changes in the characteristics of a population
improved strategies for future similar situations. with greater precision. Overhead costs of survey develop-
Documenting and learning from these situations are key ment and sample selection can be spread over many surveys
steps towards improvements. and this in turn cuts down the costs. Decisions made in the
sample design of periodic or continuing surveys should take
into account the possibility of deterioration in design effi-
5.3.1.6 Collection of Anecdotal Information ciency over time. Designers may elect, for example, to use
stratification variables that are more stable, avoiding those
Finally, for more than historical purposes, NMSs collect anec- that may be more efficient in the short term but which change
dotal information to be used strategically. This involves the rapidly over time. Another feature of a periodic or continu-
collection of stories of lives saved and damage avoided, ing survey is that, in general, a great deal of information is
through effective warnings and forecasts. These “sound bites” available which is useful for design purposes. If, for example,
can be used strategically for public relations purposes or to a Service Charter calls for routine reporting on levels of satis-
defend certain perspectives with clients and partners. faction (or another dimension) with regard to certain
standard forecast elements, a well designed standard survey
instrument can be used repetitively. Recognising the compro-
5.3.2 Formal Structured Surveys mises, an omnibus survey vehicle can be used.
An example of a tracking survey is the Hong Kong,
5.3.2.1 Large Aurvey every 4 or 5 Years – Comprehensive China, survey that forms Appendix 3 of the present Technical
Document.
In most cases survey objectives call for the measurement of
many characteristics. In a survey on meteorological services
one usually wants to determine more than overall satisfaction 5.3.2.3 Subject Area Surveys
or perceptions about weather forecasts. A comprehensive
survey may include sets of questions on the general use of Subject area surveys offer the potential to delve more deeply
weather information, on weather warning information, on into specific characteristics of interest. This can be for the
regular forecast information, on air quality information, on purpose of investigating perceptions regarding key issues of
weather information delivery, demographics, etc. Within concern to an NMS such as climate change, or even for
these sections of a multi-purpose survey further breakdowns specific valuation exercises such as estimating the benefits of
can occur such as under the general topic of weather forecast a specific service provided via a specific delivery mechanism.
information one can investigate, on a per season basis These surveys are specifically designed to answer a limited set
perceptions of what is considered accurate for temperature, of questions and, as such, all of the design dimensions should
wind direction/speed, onset of precipitation, probability of be carefully considered. These include the thorough
precipitation, sky cover conditions (sunny, cloudy), etc. These specification of the objectives, the development of
surveys are usually quite long and demand fairly large sample operational definitions which indicate who or what is to be
sizes to facilitate geo-politically based inferences. observed and what is to be measured, the specification of the
26 Chapter 5 — User-based assessment

data requirements, an indication of the purpose, the areas interrupting the viewing of programmes and/or
covered, the kinds of results expected, the users as well as the commercials.
uses of the data, and the level of accuracy that is desired. Information derived from such investigations can be
used in presentations made before industry and government
5.3.2.3.1 Key Issues authorities in application for licenses etc. Generally, infor-
mation derived from public opinion research in the service
As stated above, climate change is an example of an issue area delivery area can lead to decisions on which systems to be
that can be the focus of a subject area survey. Others can be utilised for the population as a whole and for specific target
perceptions about air pollution, natural disasters, etc. These audiences, and on specific attributes in terms of product
issue area investigations are more prevalent in the broader design and delivery.
environmental field than in the more narrowly defined scope
of meteorological services. 5.3.2.3.4 Economic Value Estimation

5.3.2.3.2 Product Lines Production-based methods vs. demand-based methods


The value of weather information is a subdivision of the
Public opinion research into product lines is far more economic literature on the value of information. Two main
common in the meteorology field. User perceptions regard- models or methods have been used for the valuation of mete-
ing offered products are popular topics of such surveys. orological information. Broadly speaking, one can generalise
Typically dimensions investigated include the establishment the majority of applications of non-market valuation of
of user requirements, determination of levels of satisfaction weather information services to these two types of method-
or utility, a measurement of the awareness of existence, or ologies. For either method chosen it is important that the
the origin or means of accessing of certain products. Also NMS avail itself of professional expertise in the respective
included are an assessment of the level of understanding of economic theory for such estimations.
the terminology or meaning associated with certain para- Production-Based (PB) “Analytical Methods” rely on
meters, a determination of the perception of what is accurate modelling processes in which the information is used as an
relative to individual forecast elements, an assessment of the input to the production of a consumer product, which is ulti-
perceived accuracy or credibility of certain forecast parame- mately valued in the marketplace. Thus these prescriptive
ters, an assessment of the required frequency for updates to analysis methods indirectly infer the benefits of the infor-
certain forecasts and reports, and an assessment of the time- mation input as the contribution to the market value of the
liness of any one of a variety of warnings. final product. Typically, the production process is modelled
Some concepts, such as probability of precipitation, and the added value attributable to the use of meteorological
windchill and heat indices, represent particular challenges information is estimated at each stage in the production
with regard to effective communication leading to appropri- process and aggregated for the entire production process.
ate behavioural response. Public opinion research into The Demand-Based (DB) “Survey or Interview Method”
comprehension and options for effective communication of directly infers the benefits of the weather information services
such more complex parameters is often essential for their via characterisation of the demand for the service, as articu-
design. Given scarce resources, and even more importantly lated by users’ willingness to pay. Direct descriptive methods
the limited “sound bite”space allowed by dissemination tech- rely on modelling the relationship between willingness to pay
nologies and specifically the media, it is often critical to for a service and the benefits generated by that service in
determine the relative importance of products or weather aggregate over the range of users. For this section on user-
elements. Decisions on service levels and design are based assessment the focus will be on methods such as
commonly made on the basis of user-based assessments contingent valuation, a widely accepted DB method used by
achieved through these means. economists to value public goods and services.

5.3.2.3.3 Delivery Systems Production-Based “Analytical Methods”


Production-based (PB) “Analytical Methods” have been by
As is often stated, critical meteorological information not far the most common approach used in the meteorological
delivered at all or delivered in an incomprehensible manner literature with a variety of published studies that value
or via a medium not receivable by users or specific critical weather information in contexts ranging from costs savings
stakeholder communities of users has little or no value.Surveys to road maintenance, forest fire prevention and fuel load deci-
covering the variety of dissemination technologies such as the sions for the aviation industry, irrigation scheduling, and the
popular media (radio, TV, newspapers, etc.), the Internet, value of increased accuracy of forecast information to
weatheradio,telephone,pagers,mobile technologies,and digi- increase production of a variety of agricultural commodities.
tal radio are frequent targets for subject area surveys. The assessment is typically not (end) user-based.
Specifics analysed can include layout, graphics, colours,
duration of a broadcast and length and wording of text. Reach Demand-Based “Survey or Interview Method”
and target audiences of the specific media are other specifics The Demand-Based (DB) “Survey or Interview Method”
analysed. An example of a delivery system specific survey relies on providing the means for users of specific weather
would be one focused on the acceptance and utility of information dissemination services to reveal how much they
“crawling”weather warning messages on TV screens thereby would be willing to pay for the service if they had to do so.
Guidelines on performance assessment of public weather services 27

DB “Survey or Interview Methods” assume that the user approach to value current precipitation forecasts to the
implicitly knows what the value of the service is to him in the Southern Ontario, Canada dry hay industry at CAN$54
context of his own ability to use it to produce benefits to Million while a 50% improvement in those forecasts
himself. For business users who use the information as a increased that value to $58 Million.A descriptive Contingent
productive input, the benefits implicitly include the user’s Valuation (DB) approach was used by Dalhousie University
understanding of the production process.For household users to value the Marine Weather Services in the Canadian
who use the information for planning recreational activities, Maritime Provinces at more than twice the cost of the provi-
the benefits implicitly incorporate a subjective valuation of sion of the service. One Meteorological Service of Canada
the increase to household utility from the information. Contingent Valuation study demonstrated the ability to select
Different users for the same service likely derive differ- an optimal asking price for services delivered over the tele-
ent levels of benefits from it, and these differences would phone for maximisation of cost recovery while another study
be expected to be reflected in a random sample of all users. demonstrated that the benefit of Marine Weather Services
The contingent valuation (CV) method (one of a number delivered via Weatheradio Canada exceeded the anticipated
of survey-based economic valuation techniques that can increased cost of provision of that service resulting from to
be employed with the assistance of professional expertise large increases in broadcast tower costs.
in this economic theor y – not to be explained here)
directly measures individual willingness to pay (WTP),
and can easily differentiate between significant differences 5.3.2.4 Questionnaire Design
in WTP among user groups, provided the sample of each is
large enough. The individual WTP for each user group can 5.3.2.4.1 Some General Rules for Questionnaire Design and
then be aggregated over the populations of users in each Wording
group. The sum of these aggregates is thus the total value
of the proposed change in the provision of the service • It is essential to ensure that the questions and instruc-
throughout the market. tions are easy to understand.
A demand-based approach is not intended to result in an • Abbreviations and jargon should be avoided.
in-depth analysis of how changes in provision of the service • Words and terminology that are too complex should be
can affect production in a given production process. avoided.
Typically, production issues are treated qualitatively with • The frame of reference should be specified. For example
additional survey questions that ask each user how they use if income information is requested then, at a minimum,
the information in their own decision-making. On the other a time frame should be specified.
hand, demand-based approaches, properly applied, do • Questions must be as specific as possible.
analyse what, if any, substitutes exist for the service, and then • The question needs to be understood by all respondents
value the service as the marginal value of the service over in the same way. To the extent possible, the questions
and above the value of these substitutes. asked should be applicable to all respondents. Clearly,
DB approaches are very specific to the type of weather- skip patterns (those “go to” type directional statements
information dissemination service considered, and results that determine the next question to be asked based on
are not theoretically applicable to other types of services. So the response to the question just asked) are defined such
while, PB methods value the information itself as a produc- that respondents are not required to answer all of the
tive input, DB methods are more specific to the means by questions.
which the information is delivered because this is the specific • The questions should be relevant to the respondent and
good that users employ – a particular bundle of weather the respondent should know enough about the subject to
information supplied in a particular manner, accessible at answer the question knowledgeably.
particular times, etc. The DB approach assumes that the user • Double-barrelled questions should be avoided. Double-
of the service knows how they would respond to a price barrelled questions are ones that have two or more
change or quality change in the service by substituting with questions “nested” within them. Respondents become
other sources of the needed information. confused in trying to answer the question, especially
In the specific policy context of analysing the impact of when they have different answers for each part. One
alternative weather information delivery systems in a cross- indicator of the likelihood of a double-barrelled question
sectoral comparison, DB methods are likely superior to PB is the appearance of the conjunction “and” or “or” in the
methods. In a context requiring an in-depth analysis that question. The best way to avoid the confusion is to
models the complexity of means by which a change in the replace double questions with two or more questions.
quality of information delivered by any system would affect • Don’t try to get two questions answered by way of one
a particular user group, PB methods are likely superior. question.
• The response categories should be mutually exclusive
5.3.2.3.5 Current Value Versus Value if Accuracy Increased and exhaustive.
• Care should be taken in developing the wording of the
Both the PB and DB methods can be used to achieve valua- questions so as to avoid the likelihood of drawing invalid
tions of both current value of specific services and the degree inferences from the responses. That is, the questions
of increased benefit attributable to improved quality of the should not be “leading” or “loaded” i.e. should not
services. A Guelph University study used a prescriptive PB suggest that one answer is preferable to another.
28 Chapter 5 — User-based assessment

5.3.2.4.2 Types of Questions The Likert scale is a composition of multiple-choice


questions where the respondent considers each statement
Open versus closed questions and reports how closely it reflects his/her own opinion by
There are two main types of questions: open and closed ques- indicating not only whether he/she agrees or disagrees, but
tions. They are sometimes called open-ended and also how much he/she agrees or disagrees.“Agreement”is not
closed-ended questions. the only response option that can be used. Other response
Open questions are answered in the respondent’s own dimensions include “satisfaction”,“usefulness”,“importance”,
words. An open question allows the respondent to interpret etc. Degrees of frequency are another possibility.
the question and answer anyway that he/she wants. The For a respondent the advantage of closed questions is
respondent writes the answer or the interviewer records that they are easier and faster to answer. For the researcher
verbatim what the respondent says in answer to the question. they are easier to code, easier to analyse, generally cheaper to
Blank spaces are left in the questionnaire after the question administer and provide consistent response categories.
for the response to be written in. Closed questions are an advantage when you can anticipate
Closed questions are answered by means such as by all (or most) of the responses and when an exact value is not
checking a box or circling the proper response from among required.
those that are provided on the questionnaire. A closed ques- There are also significant limitations to closed ques-
tion restricts the respondent or interviewer to select from the tions. Often, more effort is required to develop closed
answers or response options that are specified. questions than open questions. A closed question may
Sometimes a continuum from open to closed ques- elicit an answer where no knowledge or opinion exists
tions is employed. This can take the form of a closed (including a “Don’t know” or “No opinion” response option
question where amongst predetermined optional may help). Closed questions may oversimplify the issue or
responses is an option to check off a category such as force answers into an unnatural mold. Closed questions
“other (please specify) _____” followed by a blank space may not be in the same format as the respondent’s record-
where the respondent writes the answer or the interviewer keeping practices. The response categories must be
records verbatim what the respondent says in answer to the inclusive and non-overlapping.
question.
5.3.2.4.3 Sequencing of Questions
Open Questions
Open formats are typically used for qualitative research Issues in sequencing include the introduction, the opening
where “natural” wording is desired; for the provision of the questions, the location of sensitive items, the location of
opportunity for self-expression or elaboration; for the attain- demographic items and the flow of items. The order of the
ment of exact numerical data; or simply to add variety to the questions should be designed to encourage respondents to
questionnaire. For the respondent open questions can be complete the questionnaire and to maintain their interest in
more time-consuming and more demanding from the it. The order should facilitate respondent’s recall and appear
perspective of having to formulate a response. From the sensible to the respondents. The order should focus on the
researcher’s perspective open questions can be costly and topic of the survey. It should follow a sequence that is logical
difficult to analyse. to the respondents and should flow smoothly from one ques-
tion to the next but should not influence the actual response
Closed Questions itself.
There are many different types of closed questions including The introduction should provide the title or subject of the
two-choice, multiple choice, checklist, ranking format, rating survey and identify the sponsor. It should explain the purpose
scale, etc. Closed questions provide respondents with definite of the survey and request the respondent’s co-operation.
choices.The respondent indicates which choice is appropriate. Respondents frequently question the value of the information
With the two-choice and multiple-choice questions only to themselves and to users.Some like to receive feedback about
one choice is allowed. the survey.Therefore it is important to explain why it is impor-
In a checklist question many choices may be selected but tant to complete the questionnaire and to ensure that the value
the choices should be non-overlapping. of providing information is made clear to respondents. It is
In a ranking question the respondents are typically asked helpful to explain how the survey data will be used and how, if
to rank the choices from highest to lowest according to some possible and/or desirable respondents can access the data.
criteria.Such questions are often difficult for the respondent to Also, it is important to indicate the degree of confidentiality
deal with, especially in event of equally ranked items, and the and any data sharing arrangements.
results are difficult to analyse. The order in which the items The opening questions should establish respondents’
are listed can influence the results. Difficulties associated with confidence in their ability to answer the remaining questions.
rating scales include the determination of the appropriate If necessary, the opening questions should establish that the
number of categories and the tendency for responses to grav- respondent is a member of the survey population. The open-
itate to the middle area and avoidance of the extremes. ing questions should relate to the introduction and the survey
The Thurstone Scale is a composition of two-choice objectives. The opening should be applicable to all respon-
questions where the respondent is presented with a list of dents and be easy and interesting to answer.
statements, each of which he/she is asked to endorse or reject. The location of sensitive questions is a particular chal-
Each statement should be clear, brief, and easy to understand. lenge. Sensitive questions (i.e. ones perceived as irritating
Guidelines on performance assessment of public weather services 29

or threatening), for example, questions on income and age, processing codes should not take precedence over, nor
tend to get a low response rate and may trigger a refusal by conflict with, the question numbers. The benefits of a
the respondent to co-operate any further. They should not respondent friendly questionnaire include improved
be placed at the beginning of the questionnaire. Introduce respondent relations and co-operation, improved data
them at the point where the respondent is likely to have quality, reduced response time and reduced costs.
developed trust and confidence. Locate sensitive questions
in a section where they are most meaningful in the context 5.3.2.4.5 Response Errors
of other questions. It is useful to introduce these gradually
by warm-up material that is less threatening. Options or A response error is the difference between the true answer to
tools that can be employed are self-enumeration (the a question and the respondent’s answer to it. It can occur
respondent fills out the questionnaire in private), anony- anywhere during the question-answer-recording process.
mous questionnaire, careful wording of questions, the use There are two types. Random errors are variable and tend to
of ranges for response categories and randomised cancel out. Biases tend to create errors in the same direction.
response. In the simplest form of the randomised response One of the sources of response error is the questionnaire
technique, the respondent answers one or two randomly design. It can come from the wording, the complexity and
selected questions without revealing to the interviewer from the order of the questions. It can also come from the
which question is being answered. One of the questions is question structure, complicated skip patterns and from the
on a sensitive topic; the other question is innocuous. Since very length of the questionnaire.
the interviewer records a “yes” or “no” answer without ever Another source of response error is the respondent
knowing which question has been answered, the respon- problems of understanding, recall, judgement, motivation
dent should feel free to answer honestly. This can be done, and reporting. Recalling an event or behaviour can be
for example, in an in-person interview where the intervie- difficult if the decision was made almost mindlessly in the
wee selects a card (code noted by the interviewer without first place, or if the event was so trivial that people have
seeing the side that contains the questions) or is handed hardly given it a second thought since it occurred.
one by the inter viewer, who notes the respondent’s Recalling is also difficult if the question refers to some-
responses to the questions on the card in sequence. thing that happened long ago or if the questions require
Demographic and classification data can be either placed the recall of many separate events. The resultant errors
at the end of the questionnaire or inserted into the most include the respondent failing to report certain events or
relevant sections. failing to report them accurately leading to an under-
The flow of the items should follow the logic of the reporting of events. A less frequent memory error is the
respondent. Time reference periods should be clear to the telescoping error. Here some events may be reported that
respondent. Similar questions should be grouped together. It actually occurred outside the reference period leading to
is useful to provide titles or headings for each section of the the over-reporting of events. Generally speaking the longer
questionnaire. Also, use wording that facilitates movement the reference period, the greater is the recall loss while a
from one section to the next. shorter reference period tends to increase telescoping
errors.
5.3.2.4.4 Layout Considerations for Questionnaires Social desirability bias can also emerge. This is the
tendency to choose those response options that are most
As a general guideline the questionnaire should appear favourable to one’s self esteem or most in accord with
interesting and easy to complete and respondent-friendly. If perceived social norms, at the expense of expressing one’s
done through the mail (regular or electronic) the cover letter own position.
and front cover should create a positive initial impression by Finally, the interviewer can be the source of the error.
way of a respondent-friendly introduction. If the
questionnaire is administered in person or over the 5.3.2.4.6 Probing for More Information
telephone, the questionnaire should be interviewer-friendly.
The instructions should be short and clear and the structure Probing for more information is a common practice in inter-
should be such that the respondent is guided step-by-step viewing whether in the context of a consultation session, a
through the questionnaire. The instructions and answer workshop or a focus group session. Indeed, it is the main
spaces should facilitate proper answering of the questions. means of eliciting information and it is the skills of the facili-
Illustrations and symbols (such as arrows and circles) should tator that come to advantage here. While it can also be used
be used to attract attention and guide respondents or in in-person one-on-one interviews it is less common in tele-
interviewers. It is a good idea for the last page or end of the phone interviews and not possible in mail, Internet or kiosk
questionnaire to provide space for additional comments by based interviews. The survey instrument can often be written
respondents. Finally, always include an expression of in such a manner so as to effectively achieve a similar purpose.
appreciation (“Thank You)”.
Typography considerations in organising the printed 5.3.2.4.7 Geographical and Geopolitical Representation
word on a page include typeface/font (ensure consistency,
use bold face print or ALL CAPITAL LETTERS to high- Most national government statistical bodies have devel-
light important instructions or words), form titles, section oped “standard industrial classifications” that classify
headings, questions and question numbers. Data entry or industries on the basis of their principal activities and
30 Chapter 5 — User-based assessment

“standard geographical classifications” for the identifica- for rapid data capture. The best way of ensuring that the
tion and coding of geographical areas. These “standard concerns of data capture are addressed is to make the indi-
geographical classifications” usually correspond to geopo- vidual/organization responsible for this aspect of the
litical boundaries. The objective of the system is to make survey a permanent member of the team planning and
available a standard set or framework, which can be used implementing the questionnaire .
to facilitate the comparison of statistics for particular If data is to be processed by a computer, which is
areas. Sample allocation decisions are often made on the usually the case, codes for the fields into which answers are
basis of these standard classifications. to be keyed should appear directly on the questionnaire.
These are there to better ensure error-free data entry by
5.3.2.4.8 Data Coding and Capture interviewers. It is now common to have this process
entirely computer resident with the interviewer entering
To avoid being faced with a long, expensive error-prone the data into a computer database via a questionnaire data
task of manually coding and possibly transcribing data, entry screen. The database can be personal computer
consideration should be given, at the design stage, to the based utilizing commonly available and relatively inexpen-
capture of the data for subsequent processing. It is impor- sive software. The data can also be analysed using relatively
tant to consult early, regularly, and often with the inexpensive spreadsheet software or slightly more costly
processing staff, to design any formal survey questionnaire statistical software packages such as SPSS or SAS.
Chapter 6
CONCLUSIONS
6.1 INTRODUCTION 6.3.1 Planning

This Chapter is written especially for those readers who like Since performance assessment involves a range of func-
to read the Introduction to a document, skim through the tions within the NMS, the first step should be to set up a
technical detail in the middle, and jump to the end to find out team to develop a programme plan. This team should be
what the main conclusions were and what, if anything, they large enough to involve the main functions – in particular,
should do about it. Here are the answers you seek…. forecasting, computing systems, marketing (or whatever
this function is called) – but also small enough so that it
does not become unwieldy. Commitment from senior
6.2 SUMMARY management is essential, and preferably at least one senior
manager should be on the team.
Performance assessment should be an essential element of the The first task of the team should be to reach agree-
public weather services programmes of all NMSs. Imagine ment on the purposes and objectives of the performance
how it would be if an NMS tried to do forecasting without assessment programme. What is the most important infor-
first gathering observations. Performance assessment is a bit mation you want to discover? Do you need particular
like gathering that basic data – on user requirements, on information for reporting purposes? Have there been
users’ perceptions of services, and on how good the outputs many complaints about a particular forecast? Have you
are.Analysis of the data can be used to improve performance. asked the users recently whether the products are meeting
The purpose of performance assessment is to ensure there needs? A review of this Technical Document should
above all that, as far as possible, the user requirements are provide lots of clues and cues for the kind of information
being met. It is also used as a check on the operational effec- you might want to gather.
tiveness and efficiency of the overall PWS system. Planning should then proceed on how best to gather
Importantly, the information gathered is also very useful for that information, how it is going to be analysed and used
communications with the public and government, which help and communicated, and who is going to be responsible
raise the profile of the NMS and enhance its credibility. for ensuring that actions are actually taken based on the
The risk is that a performance assessment programme results. Since this will all involve work, it is important to
may be carried out without ever taking any actions based on “keep it simple” and not embark on an overly ambitious
the results. It is important from the outset to ensure that infor- programme to start with. Communicate widely within
mation is being gathered not to just sit on the shelf, but to be the NMS as this planning takes place, and seek feedback
analysed and used for actions which will improve the NMS’s from people who are interested. Forecasters, amongst
performance in the provision of public weather services. others, w ill undoubtedly have something useful to
These actions may include improving the products and contribute.
their delivery, modifying the forecast production system,
carrying out needed research and development, and recruit-
ing and training staff, as well as communicating relevant 6.3.2 User-based Assessment
information. Because budgets and resources are always
limited, there will of course have to be some prioritisation on In the area of User-Based Assessment, the questionnaire
what actions will bring the best benefits. from the Hong Kong Observatory in Appendix 3 is a good
The two essential and complementary aspects of an assess- example of a simple, focussed questionnaire. This gathers
ment programme are Verification,and User-Based Assessment. some basic information on the public’s use of weather fore-
The overall purpose of Verification of forecasts is to ensure casts, how they access them, and what their perceptions are
that products such as warnings and forecasts are accurate,skil- of their accuracy.
ful and reliable from a technical point of view. User-Based You might wish to use this as the basis of a similar
Assessment relies on seeking information from people, to questionnaire for your NMS. But, before doing so, think
obtain a true but subjective reflection of the user perception very carefully about how the information gathered will be
of products and services provided by the NMS,as well as qual- used by you. Some of the information in this sample
itative information on desired products and services. questionnaire is clearly designed for “tracking perfor-
mance” – this is useful for reporting purposes and also
for suggesting remedial action if the performance is
6.3 HOW TO GET STARTED ON A perceived to be very poor in some areas. Other informa-
PERFORMANCE ASSESSMENT PROGRAMME tion about the deliver y channels can be used for
re-prioritising the effort put into different products for
For those NMSs which don’t currently have a performance the different channels. You should also consider how the
assessment programme, now is the time to get started on that questions should be modified to fit your own circum-
first step (always the hardest!). stances, and needs to information to communicate and
make decisions on.
32 Chapter 6 — Conclusions

6.3.3 Verification example of “dry”, “showers” and “wet” described in Section


4.3.4 could be used instead.
Temperatures

A simple first step into verification is to verify maximum Severe Weather Warnings
temperature forecasts. These are provided by most NMSs,
and just about everyone cares about temperatures. The exam- Given the importance of forecasts of severe weather, these
ple in Section 4.3.1 shows many measures of reliability, could form the third part of an initial Verification programme.
accuracy and skill which can be used to verify these. Perhaps It is critical for these forecasts to have a well-defined criteria,or
the first questionnaire you use can also ask the public what else verification will be difficult. For example, the criterion
they consider to be an “accurate”maximum temperature fore- used in New Zealand for issuing (and verifying) a warning of
cast. Is within 2°C accurate? Within 3°C? heavy rainfall is for more than 100 mm in 24 hours, over a
As statistics accumulate, you can see how skilful the fore- widespread area (more than 1000 km2). Such forecasts can be
casts are compared to benchmarks, which could include verified using the scores in Section 4.3.2.
statistical forecasts based on numerical model output. Do the
manual forecasts have a worthwhile improvement over model
forecasts? Are they both poor? Is it worth considering a 6.3.4 Ongoing Assessment
research and development programme to improve the guid-
ance? Do the forecasters need more information available on A Performance Assessment Programme is not something that
temperature climatology, and on case studies of unusually you just set up, and let run. It will need ongoing develop-
hot or cold temperatures? ment, and adjustment, and fine tuning. In fact, you should be
assessing the Assessment Programme itself. Many of the
methods described in Chapter 5 can be used with your inter-
Precipitation nal customers in the NMS to make sure that the programme
is meeting their needs, and to improve it.
A typical second step into verification would be to verify fore-
casts of precipitation. In most parts of the world this is of
significant interest to the public - but maybe you should check 6.4 FINAL WORDS
this as part of your first questionnaire?
Verification of “yes”or “no”for precipitation is covered in Performance Assessment is the key to ensuring an effec-
some detail in Section 4.3.2, and the example in Appendix 1 tive, efficient and sustainable Public Weather Services
shows how a simple spreadsheet can be used to compute vari- programme. We trust that the guidelines provided in this
ous scores.You can ask yourself the same kinds of questions Technical Document will be of value to you in establishing or
as for maximum temperatures above. If in some climates a developing your own Programme, and wish you well in that
simple “yes” or “no” may not suffice – the three category endeavour.
REFERENCES
Brier, G.W., 1950:Verification of forecasts expressed in terms Purves, Glenn T., 1997: Economic Aspects of AES Marine
of probability. Monthly Weather Review, 78, 1-3. Weather Services in Marine Applications, A Case Study of
Epstein, E.S., 1969: A scoring system for probability forecast Atlantic Canada. Dalhousie University.
of ranked categories. Journal of Applied Meteorology, 8, Rollins, Kimberly, J. Shaykewich, 1997: Cross-Sector Economic
985-987. Valuation of Weather Information Dissemination Services:
Gordon, N.D., 1982: Evaluating the skill of categorical fore- Two Applications Using the Contingent Valuation Method.
casts. Monthly Weather Review, 110, 657-661. University of Guelph.
Hanssen,A.W., and W.J.A. Kuipers, 1965: On the relationship Satin, A., W. Shastry, 1983: Survey Sampling: A Non-
between the frequency of rain and various meteorologi- Mathematical Guide. Statistics Canada.
cal parameters. KNMI Meded. Verhand., 81, 2-15. Stanski, H.R., L.J. Wilson and W.R. Burrows, 1989: Survey
Murphy,A.H., 1997: Forecast verification. In Economic Value of common verification methods in meteorology.
of Weather and Forecasts, ed. R.W. Katz and A.H. WWW Technical Report No. 8 (WMO/TD 358),
Murphy, 19-74. Cambridge: Cambridge University Press. 114 pp.
Patton, M., 1990: Qualitative Evaluation and Research Turner, Jason R., 1996: Value of Weather Forecast Information
Methods, (2nd Edition). Newbury Park, California: Sage for Dry Hay and Winter Wheat Production in Ontario.
Publications. University of Guelph.
Platek,R.,F.K.Pierre-Pierre and P.Stevens,1985: Development Wilks, D.S., 1995: Statistical Methods in the Atmospheric
and Design of Survey Questionnaires. Statistics Canada Sciences. Academic Press, 467 pp.
Appendix 1
EXAMPLE OF MONTHLY RAINFALL VERIFICATION
The following table shows an example (using a simple spreadsheet) of rain / “no rain” verifications.

RAINFALL VERIFICATION

LOCATION: Auckland
MONTH: July YEAR: 1999

Enter either R (for Rain) or N (for No rain)


Day Forecast Observed
1 N R
2 N N
3 R R
4 R R
5 R R
6 N N
7 R R
8 R R
9 R N
10 N R
11 N N
12 N N
13 R R
14 N N
15 R R
16 R R
17 R R
18 R R
19 R R
20 R N
21 R N
22 R R
23 R R
24 N N
25 R R
26 R R
27 R R
28 R R
29 R N
30 R R
31 R R

The following area on the spreadsheet shows various skill scores which can be computed from the 2 by 2 contingency table
resulting from these data. The scores are defined in Section 4.3.2, and the 2 by 2 contingency table is the same as used for an
example in that section.

SUMMARY FORMULAE:
Observed Observed
Yes No Yes No
Forecast Yes 19 4 Forecast Yes A B
No 2 6 No C D

% correct all forecasts 81% (A+D)/(A+B+C+D)


% correct for rain forecast: 83% PC=A/(A+B)
% correct for no rain forecast: 75% D/(C+D)
Bias: 110% (A+B)/(A+C)
Rain POD: 90% A/(A+C)
Rain FAR: 17% B/(A+B)
Rain Threat Score or CSI: 0.76 A/(A+B+C)
Rain hits expected by chance: 15.6 CHA=(A+B)*((A+C)/(A+B+C+D))
Heidke Skill Score: 0.46 (A-CHA)/(A+B-CHA)
Equitable Threat Score 0.36 (A-CHA)/(A+B+C-CHA)
Hanssen-Kuipers Skill Score: 0.50 A/(A+C)+D/(B+D)-1
No-Rain hits expected by chance: 2.6 CHD=(B+D)*(C+D)/(A+B+C+D))
% correct expected by chance: 59% CHPC=(CHA+CHD)/(A+B+C+D)
Skill of % correct over chance: 53% (PC-CHPC)/(1-CHPC)
Appendix 2
ENVIRONMENT CANADA’S ATMOSPHERIC PRODUCTS AND
SERVICES 1997 NATIONAL PILOT SURVEY
Administered by:
Goldfarb Consultants

for:
The Program Evaluation Group of the Policy, Program and International Affairs Directorate

Good morning/afternoon/evening. My name is ___________ of Goldfarb Consultants, a national survey and opinion research
firm.We are conducting a survey on behalf of Environment Canada today. The results of this study will be used to help design
and modify existing programs and services to better meet your needs. We are not selling anything. We are simply interested
in your attitudes and opinions. Can you spare some time to answer some questions for me? THANK YOU.

A. May I please speak with the male/female [ROTATE] in the household age 18 or over whose birthday comes next? [IF THE
RESPONDENT IS NOT AVAILABLE, GET PERSON’S NAME, MARK AS “ARNA”, AND ARRANGE FOR A CALL
BACK.]

[REINTRODUCE IF NECESSARY]

B. Respondent is...

Male 
Female 

[WATCH QUOTAS – TERMINATE IF NECESSARY]

C. I would just like to confirm that you are over the age of 18.

Yes, respondent is over 18 


Respondent is under 18  TERMINATE

D. We are interested in people’s occupations. Do you or does anyone in your household work for...

A radio or television station 


A newspaper or magazine 
A public relations firm 
An advertising agency 
A market research firm 

IF “YES” TO ANY OF THE ABOVE, TERMINATE.


36 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

SECTION ONE: USE OF WEATHER INFORMATION

1. We would like to talk to you about the types of news that you hear or look at. During a typical day, how likely are you to
look at or hear news on each of the following topics? Are you very likely, somewhat likely, not very likely or not likely at
all to get news on... [ROTATE]
Very Somewhat Not Very Not Likely
Likely Likely Likely At All
Local events and politics    
Entertainment    
Weather    
Traffic    
Sports    

2a) We’d like to focus more on weather information for the remainder of this interview. First of all, on a typical day, how many
times would you say that you specifically make a point of actually looking at or listening to weather forecasts? Would it
be... [READ LIST]

More than four times a day 


Three times a day 
Two times a day 
Once a day 
Less often than once a day 

2b) If you are in need of a weather forecast, how often is it available to you? Is it available… [READ LIST]

Always 
Most of the time 
About half of the time 
Less than half of the time 
Rarely or never 

2c) Compared to two years ago, would you say that you are using weather forecasts more often today, the same, or less often
than you were two years ago?

More often 
The same 
Less often 

2d) Compared to two years ago, how satisfied are you with your access to weather information or forecasts?
[READ. CHECK ONE]

Much more satisfied now 


A little more satisfied now 
Just about as satisfied now as then 
A little less satisfied now 
Much less satisfied now 

3a) We are interested in where you get your weather information from. From what main source are you most likely to get your
daily weather information? [DO NOT READ. CHECK ONE ONLY. CLARIFY “TELEVISION” AND “TELEPHONE”
RESPONSES.]
Guidelines on performance assessment of public weather services 37

3b) What other sources do you get weather information from? [DO NOT READ. CHECK AS MANY AS APPLY.]

3a) 3b)
Primary Secondary
source source
Television – General mention  
Television – Weather network  
Television – Local Environment Canada cable channel  
Radio  
Newspaper  
Internet Access  
WeatherRadio Canada  
WeatherCopy Canada  
Contact Environment Canada weather office  
Telephone – General mention  
Telephone – 1-800 number  
Telephone – 1-900 number  
Environment Canada recorded tape  
Family member  

3a) Other Primary : _________________________________________________________

3b) Other Secondary: _______________________________________________________

4. On a typical day, when do you make a point of trying to look at or hear weather forecasts? [PROBE] Are there any other
times? [DO NOT READ. CHECK ALL THAT APPLY.]

Morning – General mention 


Morning – Wake-up 
Morning – While dressing/dressing kids 
Morning – With news 
Morning – Drive to work 

Afternoon- General mention 

Evening – General mention 


Evening – Drive home 
Evening – With news 
Evening – Before bed 
Evening – Before work 

Other 

5a) We would like to know if the information provided in weather forecasts is sufficient enough for you to make decisions on
plans or actions that you would take, on a typical day. That is, do you feel that weather forecasts always provide you with enough
information to make decisions, sometimes provide you with enough information, rarely provide you with enough informa-
tion or never provide you with enough information to make decisions?

Always 
Sometimes  ASK QUESTION 5B
Rarely  ASK QUESTION 5B
Never  ASK QUESTION 5B
38 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

5b) What other information would you require to make decisions? [DO NOT READ. PROBE. CHECK ALL THAT APPLY.]

Temperature – GM  Wind speed 


High/Maximum  Direction of wind 
Low/Minimum  Whether it will be gusty 
Significance of wind-chill 
Humidity level 
Humidex  Visibility information 
Amount of sun 
Precipitation/Rain/Snow  UV Index 
Amount of rain/snow  Air quality 
Type of precipitation (rain/snow/hail) 
When precipitation will start  Expected weather changes 
When precipitation will end  Storm expectations 
Whether precipitation will be heavy/light 
Probability of precipitation  Historical information 

Other:________________________________________________________________________

_________________________________________________________________________

6. We’d now like you to think specifically about Environment Canada for a moment. Can you think and tell me the types of
weather-related services Environment Canada provides and performs? [PROBE AND CLARIFY]

_________________________________________________________________________

7. Now, how often does your work or job require you to make decisions based on the weather? Is it... [READ LIST]

Always 
Sometimes 
Rarely 
Never  GO TO QUESTION 10

Don’t work  GO TO QUESTION 10

8. What parts of the weather forecast do you need for you to make work-related decisions? [DO NOT READ. PROBE.
CLARIFY. CHECK ALL THAT APPLY.]

Temperature-GM  Wind speed 


High/Maximum  Direction of wind 
Low/Minimum  Whether it will be gusty 
Significance of wind-chill 
Humidity level 
Humidex  Visibility information 
Amount of sun 
Precipitation/Rain/Snow  UV Index 
Amount of rain/snow  Air quality 
Type of precipitation (rain/snow/hail) 
When precipitation will start  Expected weather changes 
When precipitation will end  Storm expectations 
Whether precipitation will be heavy/light 
Probability of precipitation  Historical information 

Other:________________________________________________________________________
_________________________________________________________________________

9a) What is your main source of weather information for work-related decisions? [DO NOT READ. CHECK ONE ONLY]
Guidelines on performance assessment of public weather services 39

9b) From what other sources do you get work-related weather information? [DO NOT READ. CHECK AS MANY AS APPLY.]

9a) 9b)
Primary Secondary
source source

Television – General mention  


Television – Weather network  
Television – Local Environment Canada cable channel  
Radio  
Newspaper  
Internet Access  
WeatherRadio Canada  
WeatherCopy Canada  
Contact Environment Canada weather office  
Telephone – General mention  
Telephone – 1-800 number  
Telephone – 1-900 number  
Environment Canada recorded tape  
Family member  
Directly from employer  

9a) Other Primary : _________________________________________________________

9b) Other Secondary: _______________________________________________________

10a)We would like you to think of the four seasons. On a scale of 1 to 10, where 10 means “very important” and 1 means “not
important at all”, how important are weather forecasts to you for each of the following seasons? [START RANDOMLY,
AND THEN PROCEED IN ORDER.]

Not Very Very


important important
Spring 1 2 3 4 5 6 7 8 9 10
Summer 1 2 3 4 5 6 7 8 9 10
Fall 1 2 3 4 5 6 7 8 9 10
Winter 1 2 3 4 5 6 7 8 9 10

10b)Now we would like you to think of the changes between seasons. On a scale of 1 to 10, where 10 means “very important”
and 1 means “not important at all”, how important are weather forecasts to you for each of the following change of
seasons? [START RANDOMLY, AND THEN PROCEED IN ORDER.]

Not Very Very


important important
Change from Spring
to Summer 1 2 3 4 5 6 7 8 9 10
Change from Summer to Fall 1 2 3 4 5 6 7 8 9 10
Change from Fall to Winter 1 2 3 4 5 6 7 8 9 10
Change from Winter to Spring 1 2 3 4 5 6 7 8 9 10

11. Say you are planning a vacation six months from now to an area of Canada that you’ve never been to. Would the kind of
weather you’d likely experience in six months from now in that location be very important, somewhat important not very
important or not important at all to you in planning your holiday?

Very important 
Somewhat important 
Not very important 
Not important at all 
40 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

12. If you did need this kind of weather information now for your trip in six months, from where do you think you could get
this type of information? (DO NOT READ – CHECK ALL THAT APPLY.)

Weather Office 
Library 
Atlas 
CAA 
Travel Agent 
Travel Books 
Television – General mention 
Weather Network – Specific mention 
Radio 
Newspaper 
Internet Access – The Web (WWW) 
WeatherRadio Canada 
WeatherCopy Canada 
Environment Canada recorded tape 
Contact Environment Canada
weather office 
Family member 
Other 
Don’t know 

13. Besides vacation planning, have you ever obtained this kind of long term weather information for other purposes?

Yes 
No  GO TO NEXT SECTION
Don’t know 

14. For what use?

________________________________________________________________________________

SECTION TWO: WEATHER WARNING INFORMATION

We would like to talk to you about weather warnings a specific type of weather forecast that Environment Canada provides to all
Canadians …

1. First of all, what do you think of when you see or hear the words “Weather Warning” as part of a weather report? What
does a “Weather Warning” mean to you? [PROBE AND CLARIFY] Anything else?

2a) From what source are you most likely to receive a “Weather Warning”? [DO NOT READ LIST. CHECK ONE]

2b) From what other sources are you likely to receive “Weather Warnings”? [DO NOT READ LIST.CHECK ALL THAT APPLY]

2a) 2b)
Primary Secondary
source source

Television – General mention  


Television – Weather network  
Guidelines on performance assessment of public weather services 41

Television – Local Environment Canada cable channel  


Radio  
Newspaper  
Internet Access  
WeatherRadio Canada  
WeatherCopy Canada  
Contact Environment Canada weather office  
Telephone – General mention  
Telephone – 1-800 number  
Telephone – 1-900 number  
Environment Canada recorded tape  
Family member  
Directly from employer  

[ROTATE SUMMER AND WINTER WARNING SECTIONS RESPONDENT TO RESPONDENT. IF CONDUCTING


SUMMER WARNINGS, START BELOW. IF CONDUCTING WINTER WARNINGS, GO TO QUESTION 9.]

We would like you to think of a summer weather situation in which you hear that a Weather Warning is in effect for an approach-
ing summer storm.

3. Of all the times that you have heard a summer storm warning for your area, how often does the summer storm actually
occur in your area? Would you say that it occurs...

Always 
Most of the time 
About half of the time 
Less than half the time 
Rarely 
Never 
[DON’T READ]
Don’t know / No answer 

4. How often would you say that you receive enough notice in order to properly react to a warning about a summer storm
heading toward your area?

Always 
Most of the time 
About half of the time 
Less than half the time 
Rarely 
Never 
[DON’T READ]
Don’t know / No answer 

5. We would like to know how clear and well-communicated various aspects of a summer storm warning are presented to
you. Based on what you know or have experienced, are the following communicated very well, somewhat well, not very
well or not well at all? [ROTATE]

Very Somewhat Not very Not at Don’t


well well well all well Know
The area that the summer storm is going to affect     
The severity of the summer storm     
When the summer storm will be in your area     
How long the summer storm will last in your area     
The type of damage expected from the summer storm     
42 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

What actions to take to ensure the safety of


yourself, your family and your property?     

6. What other type of information do you feel you need to hear as part of the warning message in order to properly prepare
and respond to a summer storm warning? [PROBE AND CLARIFY]

________________________________________________________________________________

7a) When you hear a summer storm warning for your area, how much advance notice do you need in order to ensure your
safety? Would you need... [READ LIST]

Less than five minutes 


5 minutes to under 15 minutes 
15 minutes to under 30 minutes 
30 minutes to under 1 hour 
1 hour or more 

[DO NOT READ] Don’t Know 

7b) What is the minimum a mount of time that you would accept in order to prepare for a summer storm warning for your
area? Would you say it is... [READ LIST]

Less than five minutes 


5 minutes to under 15 minutes 
15 minutes to under 30 minutes 
30 minutes to under 1 hour 
1 hour or more 

[DO NOT READ] Don’t Know 

8a) Based on what you can recall and your own experience over the last two years with summer storm warnings, generally did
you have enough time to respond?

Yes  GO TO NEXT SECTION


No 

DON’T READ Don’t Know 

8b) How much more time did you require? Would you require... [READ LIST.]

Less than five minutes 


5 minutes to under 15 minutes 
15 minutes to under 30 minutes 
30 minutes to under 1 hour 
1 hour or more 

[DO NOT READ] Don’t Know 


Guidelines on performance assessment of public weather services 43

Now, we would like you to consider a winter weather situation, and you hear that a winter storm warning is in effect for an approach-
ing winter storm.

9. Of all the times that you have heard a Winter Storm warning in your area, how often does this winter storm occur? Would
you say that it occurs …

Always 
Most of the time 
About half of the time 
Less than half the time 
Rarely 
Never 

[DO NOT READ] Don’t Know 

10. How often would you say that you have received enough notice in order to properly react to a warning about a winter storm
warning in your area?

Always 
Most of the time 
About half of the time 
Less than half the time 
Rarely 
Never 

[DO NOT READ] Don’t Know 

11. We would like to know how clear and well communicated various aspects of a winter storm warning are presented to you.
Based on what you know and have experienced, are the following communicated very well, somewhat well not very well
or not well at all? [ROTATE]
Very Somewhat Not very Not at Don’t
well well well all well Know

The area that the winter storm is going to affect     


The severity of the winter storm     
When the winter storm will be in your area     
How long the winter storm will last in your area     
The type of damage expected from the winter storm     
What actions to take to ensure the safety of
yourself, your family and your property     

12. What other type of information do you feel you need to hear as part of the warning message in order to properly prepare
and respond to a Winter storm Warning? [PROBE AND CLARIFY]

________________________________________________________________________________

13a)When you hear a winter storm warning for your area, how much advance notice do you need in order to ensure
your safety? Would you say you need... [READ LIST. CHECK ONE ONLY.]

Less than one hour 


One to three hours 
Over three hours to six hours 
Over six hours to 12 hours 
Over 12 hours to 24 hours 
Over 24 hours to 48 hours 
Over 48 hours 
44 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

13b)What is the minimum amount of time that you would accept in order to prepare for a winter storm warning for your area?
Would you say it is... [READ LIST. CHECK ONE ONLY.]

Less than one hour 


One to three hours 
Over three hours to six hours 
Over six hours to 12 hours 
Over 12 hours to 24 hours 
Over 24 hours to 48 hours 
Over 48 hours 

[DO NOT READ] Don’t Know 

14a)Based on what you can recall and your own experience with winter storm warnings, generally did you have enough
time to respond?

Yes  GO TO NEXT SECTION


No 

DON’T READ Don’t Know 

14b) How much more time did you require? Would you require.. [READ LIST. CHECK ONE ONLY]

Less than one hour 


One to three hours 
Over three hours to six hours 
Over six hours to 12 hours 
Over 12 hours to 24 hours 
Over 24 hours to 48 hours 
Over 48 hours 

[DO NOT READ] Don’t Know 

SECTION 3A: WEATHER FORECAST INFORMATION

SUMMERTIME SCENARIO

We would like to know your opinions about the accuracy of various types of weather forecasts. Consider a summer forecast that
you hear in July for your area.

1a) So, let’s say that this forecast states that the anticipated high for the day would be 25 degrees. Suppose the actual high is not
25,but is some temperature less than 25 degrees. At what temperature below 25 would you consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 

1b) Now suppose the actual high is not 25, but is some temperature more than 25 degrees. At what temperature above 25 would
you consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 

2a) Say the forecast states that the anticipated overnight low would be 20 degrees. Suppose the actual low is not 20, but is some
temperature less than 20 degrees. At what temperature below 20 would you consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 


Guidelines on performance assessment of public weather services 45

2b) Now suppose that the actual overnight low is not 20, but is some temperature more than 20 degrees. At what temperature
above 20 would you consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 

3a) Say the forecast mentioned that the anticipated wind speed would be 30 kilometers per hour. Suppose that the actual wind-
speed is not 30, but is at some speed less than 30. At what speed below 30 would you consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 

3b) Now suppose that the actual wind speed is not 30, but is at some speed more than 30. At what speed above 30 would you
consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 

4. Say the forecast mentioned that the wind would be coming from the west. Would you consider the forecast to be accurate
or not accurate if the wind actually came from… [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]

Accurate Not Don’t


Accurate Know
the South   
the Southwest   
the Northwest   
the North   

5. Say the forecast said “rain beginning in the afternoon”. Would you consider the forecast to be accurate or not accurate if
the rain actually began... [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]

Accurate Not Don’t


Accurate Know
In the morning   
Around noon   
Mid afternoon   
In the late afternoon   
In the evening   
If no rain occurred throughout the day or evening   

6. Say the forecast says “Sunny with afternoon cloudy periods”. Would you consider the forecast accurate or not accurate if it
was... [ROTATE. READ LIST]
Accurate Not Don’t
Accurate Know
Sunny all day   
Cloudy all day   
Cloudy in the morning and sunny in the afternoon   

7. Say that heavy rain with over 50 millimeters of rainfall over the next 24 hours is forecast. Would you consider the forecast
to be accurate or not accurate if actually... [READ LIST. ROTATE]

Accurate Not Don’t


Accurate Know
The ground was slightly wet with 5mm of rainfall   
There are some puddles with 15 mm of rainfall   
A lot of water has accumulated with 30 mm of rainfall   
Basements have been flooded with over 55 mm of rainfall   
46 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

8. Say the forecast said the probability of precipitation was 70% for today. When you hear that the probability of precipitation
for today is 70%, what does that mean to you? [READ LIST. ROTATE. READ NUMBERS. CHECK ONE ONLY.]

1 Rain was expected to occur for 70% of the day 

2 There is a 70% chance that the rain will occur at


a particular geographic point in the forecast area today 

3 There is a 70% chance that rain will occur somewhere in the forecast area today 

4 70% of the forecast area is expected to receive some rain today 

[DON’T READ] Don’t know / No answer 

9. And continue to think about the summer... Which forecast do you use most to plan for special activities, events or weekends?
[READ LIST. CHECK ONE ONLY.]

The forecast for that particular day 


The forecast for TWO DAYS in advance 
The forecast for THREE OR MORE days in advance 

[DON’T READ] Don’t Know 

10. We would like to know how useful various parts of a summer weather forecast are to you. On a scale of 1 to 10, where 10
is “extremely useful” and 1 is “not useful at all” how useful are each of the following parts of a weather forecast and other
summer weather information... [READ LIST. ROTATE]

Not Useful Extremely


At All Useful
The overnight low temperature 1 2 3 4 5 6 7 8 9 10
The daytime high temperature 1 2 3 4 5 6 7 8 9 10
If it is going to rain 1 2 3 4 5 6 7 8 9 10
Whether the rain is going to be light
or heavy 1 2 3 4 5 6 7 8 9 10
The amount of rain expected 1 2 3 4 5 6 7 8 9 10
When the rain will start and when it
will end 1 2 3 4 5 6 7 8 9 10
The probability of precipitation 1 2 3 4 5 6 7 8 9 10
The amount of sun or cloud expected 1 2 3 4 5 6 7 8 9 10
The humidity level 1 2 3 4 5 6 7 8 9 10
The UV index 1 2 3 4 5 6 7 8 9 10
If a change in the weather is expected 1 2 3 4 5 6 7 8 9 10
The wind direction 1 2 3 4 5 6 7 8 9 10
The wind speed 1 2 3 4 5 6 7 8 9 10
A reduction of visibility due to fog 1 2 3 4 5 6 7 8 9 10

11. Now we would like to know how accurate summer weather forecasts are on each of the following weather measures. In
your experience, on a scale of 1 to 10, where 10 is “extremely accurate” and 1 is “not accurate at all” how accurate are each
of the following parts of a weather forecast and other summer weather information... [READ LIST. ROTATE]

Not Accurate Extremely Don’t


At All Accurate Know
The overnight low temperature 1 2 3 4 5 6 7 8 9 10 
The daytime high temperature 1 2 3 4 5 6 7 8 9 10 
If it is going to rain 1 2 3 4 5 6 7 8 9 10 
Whether the rain is going to
be light or heavy 1 2 3 4 5 6 7 8 9 10 
Guidelines on performance assessment of public weather services 47

The amount of rain expected 1 2 3 4 5 6 7 8 9 10 


When the rain will start and
when it will end 1 2 3 4 5 6 7 8 9 10 
The probability of precipitation 1 2 3 4 5 6 7 8 9 10 
The amount of sun or cloud
expected 1 2 3 4 5 6 7 8 9 10 
The humidity level 1 2 3 4 5 6 7 8 9 10 
The UV index 1 2 3 4 5 6 7 8 9 10 
If a change in the weather is
expected 1 2 3 4 5 6 7 8 9 10 
The wind direction 1 2 3 4 5 6 7 8 9 10 
The wind speed 1 2 3 4 5 6 7 8 9 10 
A reduction of visibility
due to fog 1 2 3 4 5 6 7 8 9 10 

SECTION 3B: WEATHER FORECAST INFORMATION

FALL/SPRING TIME SCENARIO

We would like to know your opinions about the accuracy of various types of weather forecasts. Consider a fall or spring
forecast that you hear in October or March for your area.

1a) So, let’s say that this forecast states that the anticipated high for the day would be plus one. Suppose the actual high is not
plus one, but is some temperature less than plus one. At what temperature below plus one would you consider the fore-
cast inaccurate?
[CONFIRM PLUS OR MINUS WITH RESPONDENT]

PLUS [WRITE IN] MINUS [WRITE IN] Don’t Know 

1b) Now suppose the actual high is not plus one, but is some temperature more than plus one. At what temperature above
plus one would you consider the forecast inaccurate?

PLUS [WRITE IN] [DON’T READ] Don’t Know 

2a) Say the forecast states that the anticipated overnight low would be minus five degrees. Suppose the actual low is not minus
five, but is some temperature less than minus five. At what temperature below minus five would you consider the forecast
inaccurate?

MINUS [WRITE IN] [DON’T READ] Don’t Know 

2b) Now suppose that the actual overnight low is not minus five, but is some temperature more than minus five.At what temper-
ature above minus five would you consider the forecast inaccurate?
[CONFIRM PLUS OR MINUS WITH RESPONDENT.]

PLUS [WRITE IN] MINUS [WRITE IN] Don’t Know 

3a) Say the forecast mentioned that the anticipated wind speed would be 30 kilometers per hour. Suppose that the actual wind-
speed is not 30, but is at some speed less than 30. At what speed below 30 would you consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 


48 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

3b) Now suppose that the actual wind speed is not 30, but is at some speed more than 30. At what speed above 30 would you
consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 

4. Say the forecast mentioned that the wind would be coming from the west. Would you consider the forecast to be accurate
or not accurate if the wind actually came from… [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]

Accurate Not Don’t


Accurate Know
the South   
the Southwest   
the Northwest   
the North   

5. Say the forecast said “wet snow developing in the afternoon”. Would you consider the forecast to be accurate or not accu-
ate if the wet snow actually began... [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]
Accurate Not Don’t
Accurate Know
In the morning   
Around noon   
Mid afternoon   
In the late afternoon   
In the evening   
If no wet snow occurred throughout the day or evening   

6. Say the forecast says “Sunny with afternoon cloudy periods”. Would you consider the forecast accurate or not accurate if it
was... [ROTATE. READ LIST]
Accurate Not Don’t
Accurate Know
Sunny all day   
Cloudy all day   
Cloudy in the morning and sunny in the afternoon   

7. Say that freezing rain is forecast. Would you consider the forecast to be accurate or not accurate if the precipitation was
actually... [READ LIST. ROTATE]

Accurate Not Don’t


Accurate Know
Just rain   
Just snow   
Mix of snow and rain   
Freezing rain   
Freezing drizzle   
No precipitation occurred at all   

8. Say the forecast said the probability of precipitation was 70% for today. When you hear that the probability of precipita-
tion for today is 70%, what does that mean to you? [READ LIST. ROTATE. READ NUMBERS. CHECK ONE ONLY.]

1 Rain was expected to occur for 70% of the day 

2 There is a 70% chance that the rain will occur at


a particular geographic point in the forecast area today 

3 There is a 70% chance that rain will occur somewhere in the forecast area today 
Guidelines on performance assessment of public weather services 49

4 70% of the forecast area is expected to receive some rain today 

[DON’T READ] Don’t know / No answer 

9. And continue to think about the fall and/or spring... Which forecast do you use most to plan for special activities, events
or weekends? Would it be... [READ LIST. CHECK ONE ONLY.]

The forecast for that particular day 


The forecast for TWO DAYS in advance 
The forecast for THREE OR MORE days in advance 

[DON’T READ] Don’t Know 

10. We would like to know how useful various parts of a fall or spring weather forecast are to you. On a scale of 1 to 10, where
10 is “extremely useful”and 1 is “not useful at all”how useful are each of the following parts of a weather forecast and other
fall or spring weather information... [READ LIST. ROTATE]

Not Useful Extremely


At All Useful
The overnight low temperature 1 2 3 4 5 6 7 8 9 10
The daytime high temperature 1 2 3 4 5 6 7 8 9 10
When the temperature will cross the
zero degree Celsius mark 1 2 3 4 5 6 7 8 9 10
If there is going to be some
precipitation 1 2 3 4 5 6 7 8 9 10
Whether the precipitation is
going to be light or heavy 1 2 3 4 5 6 7 8 9 10
What the precipitation type will be 1 2 3 4 5 6 7 8 9 10
The amount of precipitation expected 1 2 3 4 5 6 7 8 9 10
When the precipitation will start and
when it will end 1 2 3 4 5 6 7 8 9 10
The probability of precipitation 1 2 3 4 5 6 7 8 9 10
The amount of sun or cloud expected 1 2 3 4 5 6 7 8 9 10
The humidity level 1 2 3 4 5 6 7 8 9 10
The wind-chill 1 2 3 4 5 6 7 8 9 10
If a change in the weather is expected 1 2 3 4 5 6 7 8 9 10
The wind direction 1 2 3 4 5 6 7 8 9 10
The wind speed 1 2 3 4 5 6 7 8 9 10
The amount of snow currently on
the ground 1 2 3 4 5 6 7 8 9 10
A reduction of visibility due to fog 1 2 3 4 5 6 7 8 9 10

11. Now we would like to know how accurate spring and/or fall weather forecasts are on each of the following weather
measures. In your experience, on a scale of 1 to 10, where 10 is “extremely accurate”and 1 is “not accurate at all”how accu-
rate are each of the following parts of a weather forecast and other fall or spring weather information... [READ LIST.
ROTATE]

Not Accurate Extremely Don’t


At All Accurate Know
The overnight low
temperature 1 2 3 4 5 6 7 8 9 10 
When the temperature will
cross the zero degree
Celsius mark 1 2 3 4 5 6 7 8 9 10 
If there is going to be some
precipitation 1 2 3 4 5 6 7 8 9 10 
50 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

Whether the precipitation


is going to be light or heavy 1 2 3 4 5 6 7 8 9 10 
What the precipitation type
will be 1 2 3 4 5 6 7 8 9 10 
The amount of precipitation
expected 1 2 3 4 5 6 7 8 9 10 
When the precipitation will
start and when it will end 1 2 3 4 5 6 7 8 9 10 
The probability of
precipitation 1 2 3 4 5 6 7 8 9 10 
The amount of sun or cloud
expected 1 2 3 4 5 6 7 8 9 10 
The humidity level 1 2 3 4 5 6 7 8 9 10 
The wind-chill 1 2 3 4 5 6 7 8 9 10 
If a change in the weather is
expected 1 2 3 4 5 6 7 8 9 10 
The wind direction 1 2 3 4 5 6 7 8 9 10 
The wind speed 1 2 3 4 5 6 7 8 9 10 
The amount of snow
currently on the ground 1 2 3 4 5 6 7 8 9 10 
A reduction of visibility
due to fog 1 2 3 4 5 6 7 8 9 10 

SECTION 3C: WEATHER FORECAST INFORMATION

WINTER TIME SCENARIO

We would like to know your opinions about the accuracy of various types of weather forecasts. Consider a winter forecast
that you hear in January for your area.

1a) So, let’s say that this forecast states that the anticipated high for the day would be minus 5 degrees Celsius. Suppose the
actual high is not minus 5, but is some temperature less than minus 5. At what temperature below minus 5 would you
consider the forecast inaccurate?

MINUS [WRITE IN] [DON’T READ] Don’t Know 

1b) Now suppose the actual high is not minus 5, but is some temperature more than minus 5. At what temperature above minus
5 would you consider the forecast inaccurate? [CONFIRM PLUS OR MINUS WITH RESPONDENT]

PLUS [WRITE IN] MINUS [WRITE IN] Don’t Know 

2a) Say the forecast states that the anticipated overnight low would be minus 20 degrees Celsius. Suppose the actual low is not
minus 20, but is some temperature less than minus 20. At what temperature below minus 20 would you consider the fore-
cast inaccurate?

MINUS [WRITE IN] [DON’T READ] Don’t Know 

2b) Now suppose that the actual overnight low is not minus 20, but is some temperature more than minus 20. At what temper-
ature above minus 20 would you consider the forecast inaccurate? [CONFIRM PLUS OR MINUS WITH RESPONDENT.]

PLUS [WRITE IN] MINUS [WRITE IN] Don’t Know 


Guidelines on performance assessment of public weather services 51

3a) Say the forecast mentioned that the anticipated wind speed would be 30 kilometers per hour. Suppose that the actual wind-
speed is not 30, but is at some speed less than 30. At what speed below 30 would you consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 

3b) Now suppose that the actual wind speed is not 30, but is at some speed more than 30. At what speed above 30 would you
consider the forecast inaccurate?

[WRITE IN] [DON’T READ] Don’t Know 

4. Say the forecast mentioned that the wind would be coming from the west. Would you consider the forecast to be accu-
rate or not accurate if the wind actually came from… [READ LIST. START RANDOMLY AND CONTINUE IN
ORDER]

Accurate Not Don’t


Accurate Know
the South   
the Southwest   
the Northwest   
the North   

5. Say the forecast said “snow beginning in the afternoon”. Would you consider the forecast to be accurate or not accurate if
the wet snow actually began... [READ LIST. START RANDOMLY AND CONTINUE IN ORDER]

Accurate Not Don’t


Accurate Know
In the morning   
Around noon   
Mid afternoon   
In the late afternoon   
In the evening   
If no snow occurred throughout the day or evening   

6. Say the forecast says “Sunny with afternoon cloudy periods”. Would you consider the forecast accurate or not accurate if it
was... [ROTATE. READ LIST]

Accurate Not Don’t


Accurate Know
Sunny all day   
Cloudy all day   
Cloudy in the morning and sunny in the afternoon   

7. Say that heavy snow is forecast. Would you consider the forecast to be accurate or not accurate if the precipitation was
actually... [READ LIST. ROTATE]

Accurate Not Don’t


Accurate Know
The ground was slightly covered   
There is some snow on the ground   
There is snow on the streets that needs to be cleaned   
Snow has piled up significantly   
People are stranded because of the extreme amount
of snow   
No precipitation occurred at all   
52 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

8. Say the forecast said the probability of precipitation was 70% for today. When you hear that the probability of precipi-
tation for today is 70%, what does that mean to you? [READ LIST. ROTATE. READ NUMBERS. CHECK ONE
ONLY.]

1 Snow was expected to occur for 70% of the day 

2 There is a 70% chance that snow will occur at


a particular geographic point in the forecast area today 

3 There is a 70% chance that snow will occur somewhere


in the forecast area today 

4 70% of the forecast area is expected to receive some snow today 

[DON’T READ] Don’t know / No answer 

9. And continue to think about the winter... Which forecast do you use most to plan for special activities, events or weekends?
Would it be... [READ LIST. CHECK ONE ONLY.]

The forecast for that particular day 


The forecast for TWO DAYS in advance 
The forecast for THREE OR MORE days in advance 

[DON’T READ] Don’t Know 

10. We would like to know how useful various parts of a winter weather forecast are to you. On a scale of 1 to 10, where 10 is
“extremely useful”and 1 is “not useful at all”how useful are each of the following parts of a weather forecast and other winter
weather information... [READ LIST. ROTATE]

Not Useful Extremely


At All Useful
The overnight low temperature 1 2 3 4 5 6 7 8 9 10
The daytime high temperature 1 2 3 4 5 6 7 8 9 10
If it is going to snow 1 2 3 4 5 6 7 8 9 10
Whether the snow is going to be light
or heavy 1 2 3 4 5 6 7 8 9 10
The amount of snow expected 1 2 3 4 5 6 7 8 9 10
When the snow will start and when it
will end 1 2 3 4 5 6 7 8 9 10
The probability of precipitation 1 2 3 4 5 6 7 8 9 10
The amount of sun or cloud expected 1 2 3 4 5 6 7 8 9 10
The humidity level 1 2 3 4 5 6 7 8 9 10
The wind-chill 1 2 3 4 5 6 7 8 9 10
If a change in the weather is expected 1 2 3 4 5 6 7 8 9 10
The wind direction 1 2 3 4 5 6 7 8 9 10
The wind speed 1 2 3 4 5 6 7 8 9 10
The amount of snow currently
on the ground 1 2 3 4 5 6 7 8 9 10
A reduction of visibility due to
blowing snow 1 2 3 4 5 6 7 8 9 10

11. Now we would like to know how accurate winter weather forecasts are on each of the following weather measures. In your
experience, on a scale of 1 to 10, where 10 is “extremely accurate”and 1 is “not accurate at all”how accurate are each of the follow-
ing parts of a weather forecast and other winter weather information... [READ LIST. ROTATE]
Guidelines on performance assessment of public weather services 53

Not Accurate Extremely Don’t


At All Accurate Know
The overnight low
temperature 1 2 3 4 5 6 7 8 9 10 
The daytime high temperature 1 2 3 4 5 6 7 8 9 10 
If it is going to snow 1 2 3 4 5 6 7 8 9 10 
Whether the snow is going
to be light or heavy 1 2 3 4 5 6 7 8 9 10 
The amount of snow expected 1 2 3 4 5 6 7 8 9 10 
When the snow will start and
when it will end 1 2 3 4 5 6 7 8 9 10 
The probability of
precipitation 1 2 3 4 5 6 7 8 9 10 
The amount of sun or cloud
expected 1 2 3 4 5 6 7 8 9 10 
The humidity level 1 2 3 4 5 6 7 8 9 10 
The wind-chill 1 2 3 4 5 6 7 8 9 10 
If a change in the weather is
expected 1 2 3 4 5 6 7 8 9 10 
The wind direction 1 2 3 4 5 6 7 8 9 10 
The wind speed 1 2 3 4 5 6 7 8 9 10 
The amount of snow currently
on the ground 1 2 3 4 5 6 7 8 9 10 
A reduction of visibility due to
blowing snow 1 2 3 4 5 6 7 8 9 10 

SECTION FOUR: AIR QUALITY INFORMATION

We would like you to now think about the environment in your area

1a) Do you consider your local area to have an air pollution problem?
Yes 
No  GO TO QUESTION 2

1b) What air pollution or air quality problems do you feel your area has?

2a) Two different types of air-quality information messages could be provided to you. First, anticipated or expected levels of
pollution for the day could be provided, or information on the actual pollution levels as they are presently occurring could
be provided. Would you prefer to have information on the anticipated pollution levels, on the current levels as they’re
happening, or on both?

Anticipated or expected levels 


Actual levels 
Both 

3a) Are you aware of any air quality or air pollution information sources available for your area that reflect the current
conditions?

Yes 
No  GO TO QUESTION 6
54 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

4. How often do you make a point of checking for information on the current levels of air pollution in your area?

Several times a day 


Once a day 
Several time a week 
Once a week 
Less often than once a week 
Never 

5. On a scale of 1 to 10, 1 being “Not at all satisfied”and 10 being “Extremely satisfied”, how satisfied are you with all the infor-
mation you see or hear now about the levels of air pollution in your area? [CIRCLE ONE]

Not at all Extremely


satisfied satisfied
1 2 3 4 5 6 7 8 9 10

6. If you heard a message indicating high levels of air pollution, how likely are you to do each of the following?

Very Somewhat Not Very Not Likely


Likely Likely Likely At All
Reduce time spent outdoors    
Reduce car use    
Carpool    
Avoid using gas-powered equipment
(lawnmowers, BBQs, etc..)    

SECTION FIVE: ENVIRONMENT CANADA DELIVERY SERVICES

We would like to talk to you about various weather services that are available to you either by phone or electronically.

Free Recorded Local Weather Message

In most major urban centres, Environment Canada provides a free 24 hour recorded local weather forecast accessible only over
the telephone. Callers in the local dialing area do not pay any charges. However, those calling from outside the local area must
pay long distance charges to hear about weather that affects their area.

1. Are you aware of this Environment Canada 24 hour recorded local weather forecast service message only accessible over
the telephone?
(Words in italics were added to the questionnaire during the field work, on March 5, 1997 – after a review of preliminary
data seemed suspect)

Yes 
No  GO TO QUESTION 8

2. Have you ever used it?

Yes 
No  GO TO QUESTION 8
Guidelines on performance assessment of public weather services 55

3. How often do you use it? [READ LIST. CHECK ONE ONLY]

More than once a day 


Once a day 
Two or more times per week 
Once a week 
Two or more times a month 
Once a month 
Less often than once a month 

4. How often do you try to call this weather line and receive a busy signal? [READ LIST]

Always 
Most of the time 
About half of the time 
Less than half of the time 
Rarely or never 

5. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the type of
information provided through this service?

Not at all Extremely


satisfied satisfied
1 2 3 4 5 6 7 8 9 10

6. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the accessi-
bility of weather information provided by this service?

Not at all Extremely


satisfied satisfied
1 2 3 4 5 6 7 8 9 10

7. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the format
and the presentation of the weather information provided by this service?

Not at all Extremely


satisfied satisfied
1 2 3 4 5 6 7 8 9 10

8. For budgetary reasons, Environment Canada cannot provide such a service free of long distance charges uniformly across
Canada to smaller centres. Do you think that Environment Canada should… [READ AND ROTATE]

Require everyone to pay, even if someone calls from within their local area 

or keep it as it currently is … that is callers from the local calling area are
not charged, but callers from outside the area are charged long distance  GO TO QUESTION 10

[DO NOT READ] No charge/free/1-800 number  GO TO QUESTION 10

[DO NOT READ] Don’t Know  GO TO QUESTION 10


56 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

9a) Would you prefer to pay a fixed fee per call or a charge per minute?

Fixed fee 
Charge per minute  GO TO QUESTION 9C

[DO NOT READ] Both 


[DO NOT READ] Neither  GO TO QUESTION 10

9b) How much would you be willing to pay per call? Would it be... [READ LIST]

Under $1.00 
$1.00 – $1.99 
$2.00 – $2.99 
$3.00 – $3.99 
$4.00 – $4.99 
$5.00 or more 

[DON’T READ] Nothing  GO TO QUESTION 10


[DON’T READ] Don’t Know  GO TO QUESTION 10

IF CHARGE PER MINUTE ABOVE...

9c) How much per minute would you be willing to pay for this service? Would it be... [READ LIST] (IF ASKED, THE
AVERAGE LENGTH IS 3 MINUTES)

50 cents per minute 


$1 per minute 
$2 per minute 
$3 per minute 

[DON’T READ] Nothing 


[DON’T READ] Don’t Know 

10. So that Environment Canada does not charge all users for this service, commercial advertising needs to be played on this
line. Do you think this is .. [READ LIST]

An excellent idea 
A good idea 
A fair idea 
A poor idea 

[DON’T READ] Don’t know 

Environment Canada’s New 1- 900 User-Pay Telephone Weather Services

Environment Canada has recently launched a new national service, a 1-900 user-pay telephone weather service called “Weather
Menu” which provides up-to-date weather and environmental bulletins.

(** If asked ..The phone number is 1-900-565-5000 in English/ 1-900-565-4000 in French called “Meteo à la carte”)

11. Are you aware of this 1-900 User Pay Telephone service?

Yes 
No  GO TO QUESTION 14
Guidelines on performance assessment of public weather services 57

12. Have you ever used it?

Yes 
No  GO TO QUESTION 14

13. How often do you use it? [READ LIST. CHECK ONLY ONE)

More than once a day 


Once a day 
Two or more times per week 
Once a week 
Less than once a week 

14. The cost for this type of service is 95 cents per minute. Do you think this is… (READ LIST. CHECK ONE)

Just right 
Too low 
Too high 

WeatherRadio

WEATHERADIO is an Environment Canada Service that broadcasts weather information 24 hours a day in many areas across
Canada. A special radio must be purchased to receive these weather broadcasts.

(**If asked one can purchase a special receiver at major electronics retailers like RADIO SHACK)

15. Were you aware of Environment Canada’s WEATHERADIO service?

Yes 
No  GO TO QUESTION 21

16. Have you ever used it?

Yes 
No  GO TO QUESTION 21

17. How often do you use it? [READ LIST. CHECK ONE]

More than once a day 


Once a day 
Two or more times per week 
Once a week 
Less than once a week 

18. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the type of
information provided on the WeatheRadio Broadcasts?

(CODE ONLY ONE)

1 2 3 4 5 6 7 8 9 10
58 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

19. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the format
and presentation of information on the WeatheRadio Broadcasts?

(CODE ONLY ONE)

1 2 3 4 5 6 7 8 9 10

20. On a scale of 1 to 10, where 10 is “extremely timely” and 1 is “not timely at all”, how timely do you consider the 20 minute
cycle for the WeatheRadio Broadcasts?

(CODE ONLY ONE)

1 2 3 4 5 6 7 8 9 10

INTERNET “WEB” PAGES

Environment Canada has a World Wide Web Internet site providing weather and environmental information.

[If they ask for the Universal Resource Locator, i.e. the URL, it is: http://www.ec.gc.ca/ ]

21. Were you aware of Environment Canada’s Information centre on the INTERNET.

Yes 
No  GO TO DEMOGRAPHICS

22. Do you use it to obtain weather information and/or forecasts?

Yes 
No  GO TO DEMOGRAPHICS

23. How often do you use it for weather information or forecasts? [READ LIST. CHECK ALL THAT APPLY.]

More than once a day 


Once a day 
Two or more times per week 
Once a week 
Less than once a week 

24. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the type of
weather information provided on Environment Canada’s Internet Pages?

(CODE ONLY ONE)

1 2 3 4 5 6 7 8 9 10

25. On a scale of 1 to 10, where 10 is “extremely satisfied” and 1 is “not satisfied at all”, how satisfied are you with the format
and presentation of weather information in Environment Canada’s Internet Pages?

(CODE ONLY ONE)

1 2 3 4 5 6 7 8 9 10
Guidelines on performance assessment of public weather services 59

G. DEMOGRAPHICS

THE FOLLOWING QUESTIONS ARE FOR CLASSIFICATION PURPOSES ONLY. YOUR ANSWERS ARE STRICTLY
CONFIDENTIAL, AND WILL ONLY BE USED IN COMBINATION WITH OTHER RESPONSES.

1a) In which of the following age categories do you belong?

18 – 24 
25 – 34 
35 – 49 
50 – 64 
65 and over 

1a) Are you...

Married, or living common-law 60-1


Single 2
Divorced 3
Widowed 4
Separated 5

1b) How many people, including yourself, live in your household?

1 59-1 SKIP TO QUESTION 3


2 2
3 3
4 4
5 5
6 or more 6

2a) Do you have any children living in your household under the age of 18?

Yes 61-1
No 2 GO TO QUESTION 3

2b) What ages are the children under the age of 18 that live in your household.
[CHECK ALL THAT APPLY]

0 – 2 yrs old 62-1


3 – 5 yrs old 2
6 – 10 yrs old 3
11 – 15 yrs old 4
16 – 17 yrs old 5

3. What is the highest level of education that you have attained?

Some elementary school 63-1


Completed elementary school 2
Some secondary school 3
Completed secondary school 4
Some post-secondary (community college, university) 5
Completed a post-secondary program (community college, university) 6
60 Appendix 2 — Environment Canada’s atmospheric products and services 1997 national pilot survey

4a) Please indicate which of the following best describes your current status.

Working full-time outside the home 66-1


Working part-time outside the home 2
Working full or part time in your home 3
Unemployed/looking for work 4 GO TO QUESTION 5a)
Retired 5 GO TO QUESTION 5a)
Student 6 GO TO QUESTION 5a)

4b). What is your occupation? ___________________________________________________________________

5a) How many cars, trucks and vans are owned or leased by you or all members of your household?

None 68-1
1 2
2 3
3 4
4 5
5 or more 6

5b) And finally, in which category does your total annual household income fall before income taxes?

Under $25,000 per year 71-1


$25,000 to $49,999 per year 2
$50,000 to $74,999 per year 3
$75,000 to $99,999 per year 4
$100,00 or more per year 5

Refused 6

THANK

Finally, may I have your first name in case my supervisor needs to verify that I conducted this interview with you?

NAME:

PHONE:
Appendix 3
HONG KONG OBSERVATORY SURVEY

MAIN QUESTIONNAIRE

Q1 Do you usually read, watch or listen to weather reports ?


1. Yes Go to Q2
2. No End of questionnaire

Q2 From where do you usually obtain weather information of Hong Kong? Do you obtain from radio, television, newspaper,
weather hotline, internet, pagers / mobile phones, or other sources? Any other? (up to 3 sources)

(For “weather hotline”, probe : Is it Hong Kong Observatory’s Dial-a-Weather hotlines 1878-200, 1878-202 and 1878-066,
or Hong Kong Observatory’s Information Enquiry System 2926-1133 or Hong Kong Telecom’s 18-501 and 18-503, 18-508?)

(For “internet”, probe : Is it Hong Kong Observatory’s Homepage or other homepages?)

1. Radio
2. Television
3. Newspaper
4. Hong Kong Observatory’s Dial-a-Weather hotlines (1878-200 / 202 / 066)
5. Information Enquiry System (2926-1133)
6. Hong Kong Telecom’s 18 501 / 3 / 8
7. Observatory’s Home Page
8. Other homepages
9. Pagers / Mobile Phones
10. Other sources (please specify)

Q3a Do you consider the weather forecasts of the Hong Kong Observatory over the past several months accurate or inaccu-
rate? (Probe the degree)
1. Very accurate
2. Somewhat accurate
3. Average
4. Somewhat inaccurate
5.Very inaccurate
6. Don’t know / no comment

Q3bWhat percentage of weather forecasts of the Hong Kong Observatory over the past several months do you consider accu-
rate ?
1. ___________ per cent
2. Don’t know / No comment
62 Appendix 3 — Hong Kong Observatory Survey

Q4 Do you consider the following aspects of weather forecasts of the Hong Kong Observatory over the past several months
accurate or inaccurate?

Inaccurate Accurate Don’t know/


No comment

Temperature

Fine / Cloudy

Rain storm forecasts / warning

Typhoon prediction / warning

Q5 How do you compare weather forecasts nowadays with those from the past 3 to 4 years ago? Is it more accurate, less accu-
rate or about the same?
1. More accurate
2. About the same
3. Less accurate
4. Don’t know / no comment

Q6 How satisfied are you with the services provided by the Hong Kong Observatory? If you rate on a scale of 0 to 10, with “5”
being the passing mark and “10” being “excellent service”, how many marks will you give?

End of Questionnaire

You might also like