Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
663 views207 pages

Safe Risk Engineer

System safety provides many disciplined approaches to hazard identification and risk analysis. Risk has two components, severity and probability; both must be determined to assess risk. This module presents fifteen techniques from system safety practice. The techniques can be used to assess risk to employees, facilities, equipment, production, quality, and the environment.

Uploaded by

srinivas997
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
663 views207 pages

Safe Risk Engineer

System safety provides many disciplined approaches to hazard identification and risk analysis. Risk has two components, severity and probability; both must be determined to assess risk. This module presents fifteen techniques from system safety practice. The techniques can be used to assess risk to employees, facilities, equipment, production, quality, and the environment.

Uploaded by

srinivas997
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 207

8allC'~~ Hea It~ Awarr.

1Il I
'[fH'

IfilflwllnJ'iI'

lEn~Inc:errng

E:SS

u.s. DEIPAIRTM'ENT

HUMA N SERVICES Public ~eanh Sarvit-E Ce nlors for Dise;] SEJ C ontror and P 1iE1,,~ nhon Na1ional In Mitule (or oecu pa~iolialsa:re ty and I-l eallh
OF HEA lTM AND

coc
. "."-

n"'oJl~~ •. ~.

... t" ~ ~

~'~c,:::

~I.'I·.:;:;." rl...~~~:.H

SYSTEM SAFETY AND RISK MANAGEMENT A Guide for Engineering Educators

Authors
Pat L. Clemens, P.E., CSP Corporate Safety Manager Sverdrup Technology, Incoporated Tullahoma, Tennessee and Rodney J. Simmons, Ph.D., CSP Technical Director, Board of Certified Safety Professionals Savoy, Illinois and Adjunct Assistant Professor of Industrial Engineering Department of Mechanical, Industrial and Nuclear Engineering University of Cincinnati Cincinnati, Ohio

u.s. DEPARTMENT

OF HEALTH AND HUMAN SERVICES Public Health Service Centers for Disease Control and Prevention National Institute for Occupational Safety and Health Cincinnati, Ohio March 1998

ACKNOWLEDGMENTS
We wish to thank Professor Robert McClay, Indiana University of Pennsylvania, and Professors Robert Thomas, Tony Smith and Susan Hankins, Auburn University, for their review of this document. We also thank the following NationalInstitute for Occupational Safety and Health (NIOSH) personnel for their review of this document: Dr. Rebecca Giorcelli. The report was edited by Susan Feldmann.

This report was prepared in support ofNIOSH Project SHAPE (Safety and Health Awareness for Preventive Engineering).Project SHAPE is designed to enhance tbe education of engineers in tbe safety and health field.

DISCLAIMER
The opinions, fmdings and conclusions expressed are not necessarily those of the National Institute for Occupational Safety and Health, nor does mention of company names or products constitute endorsement by the National Institute for Occupational Safety and Health.

NIOSH Project Officer John Talty, P.E., DEE

NIOSH Order No. 96·37768

ABSTRACT
System safety provides many disciplined approaches to hazard identification and risk analysis. Risk has two components, severity and probability; both must be determined to assess risk. The analytical techniques presented in module can be used to assess risk to employees, facilities, equipment, production, quality, and the environment. This module presents fifteen techniques from system safety practice.

this

This module is intended for instructors already acquainted with system safety. Instructors may incorporate individual lessons into existing engineering courses or use the entire module to form the basis of a one-quarter or one-semester class in risk assessment and system safety (perhaps entitled "hazard identification and risk assessment") open to students from any engineering major. It is also suitable for instructors in occupational and environmental safety and health. An important feature of this module is a collection of more than 400 presentation slides, including classroom examples and workshop problems drawn from professional practice. These lecture slides and practice problems (supported by this module) are the result of insights gained while teaching system safety, hazard identification, and risk assessment for more than fifteen years to university students and practicing professionals. The instructor may obtain these slides at the Sverdrup Technology, Inc. website (http://www.sverdrup.comlsvt). The authors have presented the material in an order that has been successful in the classroom. For each of the analytical techniques, the approach is to first walk the instructor though an explanation of the logic underlying the method, then to provide a demonstration example suitable to presentation in the classroom by the instructor. The material in this module has been delivered to more than 2,000 senior and graduate level students at major universities in the United States and abroad. It has also been delivered as an intensive three- to four-day short course for engineers and managers employed by business, industry, and government in the United States and other countries. Comments from these students and other instructors have guided the refmement of the lecture materials. The authors welcome suggestions for improvement of the material.

iii

CONTENTS

Page Abstract ,
111

LESSON I-SYSTEM SAFETY AND RISK MANAGEMENT: AN INTRODUCTION Purpose, Objective, Special Terms What is System Safety? Development of System Safety Practice and Techniques: A Historical Overview Importance of Integrating the System Safety Program Throughout the Product/System/Facility Life Cycle Comparison of System Safety and the Traditional Approach to Safety Comparison of System Safety and Reliability Engineering Organization of the Module ..................•.................................... System Safety and the Design Function Final Words of Caution References and Suggested Readings Sample Discussion and Examination Questions . . . . . . . . . . I-I

1-3 1-3
1-6

1-7 1-7 1-7


1-12 1-12 1-13 1-14

LESSON II-RISK

ASSESSMENT

MATRIX II-I II-3 II-4 II-4 II-8 II-I0 II-lO II-II . II-12

Purpose, Objective, Special Terms . Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application . Procedures . Example . Advantages . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Suggested Readings . Sample Discussion and Examination Questions .

LESSON III-PRELIMINARY

HAZARD ANALYSIS III-I

Purpose, Objective, Special Terms .•..............•................................. Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application . Procedures . Example . Advantages . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . '.' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Suggested Readings . Sample Discussion and Examination Questions .

III-3
I1I-3

III-3
I1I-7

1II-8 1II-8 IJI-9 I1I-lO

iv

LESSON IV-ENERGY

FLOW/BARRIER

ANALYSIS

Purpose, Objective, Special Terms . Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application . Procedures .. Example . Advantages . Limitations . . . . . . . . . .. , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Suggested Readings . Example Discussion and Examination Questions .

IV-I
IV-3

IV-3 IV-3 IV-4 IV-5 IV-6 IV-7 IV-8

LESSON V-FAILURE MODES AND EFFECTS ANALYSIS . FAILURE MODES, EFFECTS, AND CRITICALITY ANALYSIS Purpose, Objective, Special Terms Description . . . . . . . . . . . . . . . . . . . Application . . . . . . . . . . . . . . . . . . . Procedures Exanrple Advantages Limitations . . . . . . . . . . . . . . . . . . . References and Suggested Readings Sample Discussion and Examination . ................................................. ................................................. . '" '" . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . Questions .

V-I V-3 V-3 V-3 V-8 V-IO


V-12 V-13

V.: I I

LESSON VI-RELIABILITY

BLOCK DIAGRAM . . .. . . . . . . VI-l

Purpose, Objective, Special Terms Description Application . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Procedures Example '" '" Advantages Limitations . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Suggested Readings Sample Discussion and Examination Questions

VI-3
VI-4

VI-S VI-S
VI-6 VI-8 VI-9

VI-7

LESSON VII-FAULT

TREE ANALYSIS VII-I VII-3 VII-3

Purpose, Objective, Special Terms ; . Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Application '.' . Procedures ;............................................. . Fault Tree Generation . Probability Determination . Identifying Cut Sets ......................................•............... Determining Cut Sets . Assessing Cut Sets . Identifying Path Sets . Examples ····.·· .. Fault Tree Construction and Probability Propagation .

VII-4
VIlA VII-IO VII-lO VII-lO VII-II VII-12 VII-12

VlI-6

CutSets

Path Sets , Advantages , , .......•.. Limitations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Suggested Readings . Sample Discussion and Examination Questions .

VII-l 2 VII-12 VII-14 VII-14 VII-16 VII-17

LESSON VllI-8UCCESS

TREE ANALYSIS VIII-I VIII-3 VIII-3 VIII-3 VIIl-4

Purpose, Objective, Special Terms . Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application , . Procedures . Example . . ' . Advantages . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Suggested Readings . Sample Discussion and Examination Questions , .

VIII-5 VIU-5
VIU-7 VIII-8

LESSON IX-EVENT

TREE ANALYSIS

Purpose, Objective, Special Terms . Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application , . Procedures ......................................................,.......... . Example .. , , , . Advantages ,. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. References and Suggested Readings . Sample Discussion and Examination Questions , .

IX-I IX-3 IX-4 IX-4 IX-5


IX-7

IX-7
IX-8 IX-9

LESSON X-FAULT TREE, RELIABILITY BLOCK DIAGRAM, AND EVENT TREE TRANSFORMATIONS Purpose, Objective, Special Terms . Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,..........,......... Application , , , ,, , , Procedures ., .. ', , .. , .. " .. , , . Fault Tree to Reliability Block Diagram Transformation . 'Reliability Block Diagram and Fault Tree to Event Tree Transformation . Reliability Block Diagram to FaultTree Transformation , . Event Tree to Reliability Block Diagram and Fault Tree Transformation . X-I

X-3 X-3 X-3 X-3


X-3 X-5
X-5 X-6 X-8 X-8 X-9

Example

'

Advantages , . Limitations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Suggested Readings , . Sample Discussion and Examination Questions , , .

X-IO

vi

LESSON XI- CAUSE-CONSEQUENCE

ANALYSIS . . . . . . . . XI-I XI-3 XI-4 XI-4 XI-6 XI-7 XI-7 XI-9 XI-IO

Purpose, Objective, Special Terms Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application Procedures :................... Example Advantages ' Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,...... ,..... , References and Suggested Readings .. , , , , Sample Discussion and Examination Questions , ,' , ,

LESSON XII- DIRECTED

GRAPmC

(DIGRAPH)

MATRIX ANALYSIS XII-I XII-3 XII-3 XII-3 XII-5 XII-8 XIl-8 XII-9 XII-IO

Purpose, Objective, Special Terms. , , , , . Description . . . . . . . . . . . . . . . . . . . . . . . . . . ,...... , ,..................,. Application .. , , , . Procedures .. , ,..... ,........... ,..................... Example , ,.................................................... . Advantages ' , , . Limitations . References and Suggested Readings . Sample Discussion and Examination Questions .

LESSON XIII-COMBINATORIAL FAILURE PROBABILITY USING SUBJECTIVE INFORMATION Purpose, Objective, Special Terms Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. , Application , , Procedures .. ',....................................... Example Advantages , Limitations References and Suggested Readings .. " ~. . . . . . . . Sample Discussion and Examination Questions

ANALYSIS

,. ,..................... , , , .. .. . . , . . .. . . . . . . . •. . . . . . . . . "., .

XIII-I XIII-3 XIII-3 XIII-3 XIII-4 XIII-6 XIII-6 XIII-7 XIII-8

LESSON XIV-FAILURE

MODE INFORMATION , , , ..•...........

PROPAGATION , , , ,...... , ' ,

MODELING ,. . . . . . . . . XIV-I XIV-3

Purpose, Objective, Special Terms , Description , , Application Procedures Example , .. ' Advantages , Limitations , References Sample Discussion and Examination Questions

XIV-3 XIV-3 XIV-5


XIV-7 XIV-7

XIV-8
XVI-9

vii

LESSON XV-PROBABILISTIC

DESIGN ANALYSIS XV-l XV-3 XV-3 XV-3 XV-7 XV-7 XV-8

Purpose, Objective, Special Terms , . Description . . . . . . . . . . . . . . . . . . , , .. , . . . . . . . . . . . . . . . , , . , . . , . . . . . . . . . . . . . . . .. Application , , , .. , , , . Procedures ,.......,....,....'.........,.......... . Advantages ; , , '.' . Limitations . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Suggested Readings . Sample Discussion and Examination Questions ,., .

XV-9

LESSON XVI-PROBABILISTIC

RISK ASSESSMENT

Purpose, Objective, Special Terms . Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Application , , , . Procedures ,......, ,................,.................... . Advantages . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References and Suggested Readings ,' , . Sample Discussion and Examination Questions , " , .

XVI-I
XVI-3 XVI-4 XVI-4

XVI-5 XVI-5
XVI-6 XVI-7

APPENDICES A B C D E F Integrating System Safety in the Product Life Cycle Sample Worksheet for Preliminary Hazard Analysis Sample Worksheet for Failure Modes and Effects Analysis Sample Hazards Checklist Glossary of Terms Software Tools for System Safety Analysis

viii

~.
/

LIST OF TABLES

Table 1·1 1·2 1-3 1-4 IV·} VI·} VI·2 VII·} VII·2 VII·3 XI-1 XII-} XII-2 XIII-} XIV-! XVl-1

Title Description of system life cycle phases Characteristics of common system safety analytical techniques . . Advantages and limitations ofsystem safety tools and methodologies Symbolic logic techniques Examples of strategies to manage harmful energy flow Simple reliability block diagram construction Reliability bands for example system . . . . . . . . . . . . . . Fault tree analysis procedures Fault tree construction symbols Probability propagation expressions for logic gates Cause-consequence tree construction symbols Comparison of digraph and fault tree logic gates . . Construction of digraph adjacency matrix Combinatorial failure probability analysis subjective scale Symbology used in failure mode information propagation Examples of societal risks . . . . . . . . . . . . . . . . . . . . . . . . . . " .. . . . . . . .

Page

1-6 1-7 1-8 1-11


IV-5 VI-3 VI-6 VII-4 VII-5 VII-9 XI-5 XII-4 XII-5 XIII·4 XIV-4 XVI-3

ix

LIST OF FIGURES

\.....,,1
Figure II-I II-2 11-3 II-4 Title Riskplane Iso-risk contour usage Risk plane to risk matrix transformation o Helpful hints in creating a risk assessment matrix . . . . ... a. Useful conventions b. Don't create too many cells c. A void discontinuities d. Don't create too many zones Typicalrisk assessment matrix .... Severity and probability interpretations Preliminary hazard analysis process flowchart ., Typical preliminary hazard analysis Example of a system breakdown and numerical coding Failure modes, effects, (and criticality) analysis process flowchart .. Typical failure modes, effects, and criticality analysis worksheet Example of a failure modes, effects, and criticality analysis a. System b. System breakdown and coding ..... co Worksheet Typical complex reliability block diagram Example reliability block diagram Fault tree construction process Log average method of probability estimation .. Relationship between reliability and failure probability propagation . Failure probability propagation through OR and AND gates Exact solution of OR gate failure probability propagation Example fault tree . Example of determining cut sets Example of determining path sets Success tree construction process Example success tree Event tree (generic case) Event tree (Bernoulli model) , ,., Example event tree analysis , , Fault tree to reliability block diagram transformation Deriving cut and path sets from a reliability block diagram ,. Reliability block diagram to event tree transformation Reliability block diagram to fault tree transformation Event tree to fault tree transformation , , , Equivalent logic reliability block diagram and fault tree from event tree example a. Reliability block diagram based on event tree example b. Fault tree based on event tree example Relationship between cause and consequence Cause-consequence analysis format Example cause-consequence analysis Example digraph matrix analysis a. Success domain model
0 ••••• 0

Page .
0 ••• 0

.
0 •••

•••

••••

••••

••••••••••••••

•••

••••

•••••••••••••••••••••••

••••••••••••••••••

II-S
II-6 III-I III-2 V-I V-2 V-3 V-4

••••

••••

•••

•••

••••

••

•••

••••

•••••••••••••

.
0 ••••

. .
0 • 0

••••

••••••••••

•••••••••

••••

••••

••••••••••••

•••••••••••••••••••••••••••••

••

VI-l VI-2 VII-l VIJ-2 VII-3 VII-4 VII-S VII-6 VII-7 VII-8 VIII-I VIII-2 IX-l IX-2 IX-3 X-I X-2 X-3 X-4 X-5 X-6

••••••••••••••••

•••••••••••••••••••••••••••••••

••••••••••••••••••••••••••

••••

•••••

•••••••••

••••

.
0 ••••

•••••••••••••••••••

•••••••••••••••••••••••

.
0 •• • • • • •

••••••••••••••

••••••••••••••

••••••••••••••••••••••••••••••••

••••••••••••••

••••••••••••••••

.
0 •

.
0 ••••

.
0 •••

••••

••••

••••••••

II-3 II-4 II-5 II-6 11-6 11-6 II-7 1I-7 II-8 II-9 III-6 III-7 V-4 V-7 V-8 V-9 V-9 V-9 V-lO VI-4 VI-6 VII-6 VII-? VIl-8 VII-8 VII-9 VII-12 VU-13 VII-14 VIII-4 VIII-5 IX-3 IX-4 IX-6 X-3 X-4 X-4 X-5 X-6 X-7 X-7 X-7 XI-3 XI-6 XI-7 XII-6 XII-6

•••••••••••••••••••••••

•••

.
0 •

XI-l XI-2 XI-3 XII-l

. . .

•••••••••••••••••••

•••••

•••••••••

•••••••

XIII-l

XIV-I

XV-l XV-2 XVI-l

b. Failure domain model c. Adjacency matrix d. Adjacency elements e. Reachability matrix f. Reachability elements g. Summary matrix Example combinatorial failure probability analysis a. System schematic b. System fault tree Example failure mode information propagation model a. System schematic b. Model c. Minimal success sets Load and capability transfer function Interference between load and capability density functions Societal risks

. . . . . . . . . . . . . . . .

XII-6 XII-6 XII-6 XII-7 XII-7 XII-8 XIII-5 XIII-5 XIII-5 XIV-6 XIV-6· XIV-6 XIV-7 XV-5 XV-6 XVI-4

Xl

LESSON I SYSTEM SAFETY AND RISK ASSESSMENT: AN INTRODUCTION

PURPOSE:

To introduce the student to the concepts and applications of system safety and risk assessment. To acquaint the student with the following: 1. Definition of system safety 2. Definition of risk analysis 3. Concept oflife cycle 4. Use of risk assessment in system/facility/product 1. System safety Risk Life cycle Severity Probability Likelihood Target Resource Hazard

OBJECTIVE:

design

SPECIAL TERMS:

2. 3.
4.

5.
6. 7. 8.

9.

1-1

1-2

WHAT IS SYSTEM SAFETY?

System safety has two primary characteristics: (1) it is a doctrine of management practice that mandates that hazards be found and risks controlled; and (2) it is a collection of analytical approaches with which to practice the doctrine. Systems are analyzed to identify their hazards and those hazards are assessed as to their risks for a single reason: to support management decision-making. Management must decide whether system risk is acceptable. If that risk is not acceptable, then management must decide what is to be done, by whom, by when, and at what cost. Management decision-making must balance the interests of all stakeholders: employees at all levels of the company, customers, suppliers, the public, and the stockholders. Management decision-making must also support the multiple goals of the enterprise and protect all of its resources: human, equipment, facility, product quality, inventory, production capability, financial, market position, and reputation. The practice of system safety has both art and science aspects. For example, no closedform solutions are available even to its most fundamental process-that of hazard discovery. Mechanical engineering,.in contrast, is a science-based discipline whose fundamental principles rest solely on the physical laws of nature and on applying those laws to the solution of practical problems.

DEVELOPMENT
OF SYSTEM SAFETY PRACTICE AND TECHNIQUES: A HISTORICAL

OVERVIEW

System safety originated in the aircraft and aerospace industries. Systems engineering was developed shortly after World War II. It found application in U.S. nuclear weapons programs because of the complexity of these programs and the perceived costs (risks) of non-attainment of nuclear superiority. Systems engineering seeks to understand the integrated whole rather than merely the component parts of a system, with an aim toward optimizing the system to meet multiple objectives. During the early 1950s, the RAND Corporation developed systems analysis methodology as an aid to economic and strategic decision making. These two disciplines were used in the aerospace and nuclear weapons programs for several reasons: (1) schedule delays for these programs were costly (and perceived as a matter of national security); (2) the systems were complex, and involved many contractors and subcontractors; (3) they enabled the selection of a final design from among various competing designs; and (4) there was intense scrutiny on the part of the public and the funding agencies. Over the years, the distinction between systems engineering and systems analysis has blurred. Together, they form the philosophical foundation for system safety. That is, safety can -and should- be managed in the same manner as any other design or operational parameter. System safety was first practiced by the U.S. Air Force (USAF). Historically, most aircraft crashes were blamed on pilot error. Similarly, in industry, accidents were most commonly blamed on an unsafe act. To attribute an aircraft crash to pilot error or an industrial accident to an unsafe act places very little intellectual burden on the investigator to delve into the design of the system with which the operator (pilot or worker) was forced to coexist. When the USAF began developing intercontinental ballistic missiles (ICBMs) in the 1950s, there were no pilots to blame when the missiles blew up during testing. Because of the pressure to field these weapon systems as quickly as possible, the USAF adopted a concurrent engineering approach. This meant that the training of operations and maintenance personnel occurred simultaneously with the development of the missles and their launch facilities. Remember that these weapon systems were far more complex than had ever been attempted and that many newly developed technologies were incorporated into these designs. Safety was not handled in a systematic manner. Instead, during these early days, safety responsibility was assigned to each subsystem designer, engineer, and manager. Thus safety was compartmentalized, and when these subsystems were finally

1-3

integrated, interface problems were detected=.too late. The USAF describes one incident in a design manual: An ICBM silo was destroyed because the counterweights, used to balance the silo elevator on the way up and down in the silo, were designed with consideration only to raising a fueled missile to the surface for firing. There was no consideration that, when you were not firing in anger, you had to bring the fueled missile back down to defuel. The first operation with a fueled missile was nearly successful. The drive mechanism held it for all but the last five feet when gravity took over and the missile dropped back. Very suddenly, the40-foot diameter silo was altered to about 100-foot diameter. [1] The investigations of these losses uncovered deficiencies in management, design, and operations. The USAF realized that the traditional (reiterative) "fly-crash-fix-fly" approach could not produce acceptable results (because of cost and geopolitical ramifications). This realization led the USAF to adopt a system safety approach which had the goal of preventing accidents before their first occurrence. The Minuteman ICBM (fielded in 1962) was the first weapon system to have a system safety program as a formal, contractual obligation. System safety received increasing emphasis in weapon development programs during the 1960s because of limited opportunities for testing and the unacceptable consequences of potential accidents. The USAF released its first system safety specification in 1966 (MIL-S-38130A). In June 1969, this specification became MIL-STD-882 (System Safety Program/or Systems and Associated Subsystems and Equipment: Requirements fori, issued by the Department of Defense (DoD). The DoD incorporated a system safety program as part of its requirements for all procured systems and products. MIL-STD-882 stated: The contractor shall establish and maintain an effective system safety program that is planned and integrated into all phases of system development, production, and operation. The system safety program shall provide a disciplined approach to methodically control safety aspects and evaluate the system's design: identify hazards and prescribe corrective action in a timely, cost effective manner. The system safety program objectives shall be specified in a formal plan which must describe an integrated effort within the total program .... The system safety program objectives are to ensure that: a. Safety, consistent with mission requirements, is designed into the system. b. Hazards associated with each system, subsystem and equipment are identified and evaluated and eliminated or controlled to an acceptable level. c. Control over hazards that cannot be eliminated is established to protect personnel, equipment, and property. d. Minimum risk is involved in the acceptance and use of new materials and new production and testing techniques. e. Retrofit actions required to improved safety are minimized through the timely inclusion of safety factors during the acquisition of a system. f. The historical safety data generated by similar system programs are considered and used where appropriate. This standard was updated in 1977 as MIL-STD-882A. With the recognition that software was an integral part of modem systems, software requirements were included MIL-STD882B, issued in 1984. During this time period, the USAF issued its own system safety

1-4

standard, MIL-STD-1574A (System Safety Standardfor Space and Military Systems). These two standards were harmonized into a single document in 1993, MIL-STD-882C. One of the features of882C is that software tasks are no longer separated from other safety tasks. The pioneering work embodied in MIL-STD-882 has been incorporated into system safetyoriented standards used in the chemical processing industry (OSHA's 29CFR1910.119 and EPA's 40CFR68), the medical device industry (the Food and Drug Administration's requirements for Pre-Market Notification), and others. The semi-conductor manufacturing industry uses many system safety analytical techniques during the design of production processes, equipment, and facilities, principally because the cost of "mistakes" is enormous in terms of production capability, product quality, and-ultimately-market share. System safety practice is required by a number of standards: • • 21 CFR 807.87 (g) - requires hazard analyses as a part of "pre-market notific ation" for medical devices (a requirement of the u.s. Food and Drug Administration) 29 CFR 1910.119 (e) (2) - requires applying "one or more ... methodologies to determine and evaluate ... hazards ... " (a requirement of the Occupational Safety and Health Administration) 29 CFR 1910.146 (b) (4) - requires identifying hazards in ''permit-required confmed spaces [containing] any ... recognized serious safety or health hazard." (a requirement of the Occupational Safety and Health Administration) 40 CFR 68 - requires applying "one or more ... methodologies to determine and evaluate ... hazards ..." (a requirement of the Environmental Protection Agency) NASA NHB 1700.1; Vol. 3 - "System Safety"

• •

U.S. companies wishing to export industrial products (packaging machinery, for example) to Europe must perform a hazard analysis as part of obtaining a "CE" mark, which is required for industrial products entering Europe. System safety provides the techniques to conduct the hazard analysis. Beyond mere regulatory compliance, companies are realizing that waiting for accidents to occur and then identifying and eliminating their causes is simply too expensive, whether measured in terms of the costs of modification, retrofit, liability, lost market share, or tarnished reputation. After several high-profile incidents in the chemical processing industry, the American Institute of Chemical Engineers formed the Center for Chemical Process Safety (CCPS). The CCPS has published an extensive collection of handbooks and guides covering various aspects of chemical process safety and has also promoted the inclusion of health, safety, and environmental topics in the chemical engineering curriculum. In the U.S. automobile industry, recent collective bargaining agreements have included safety and health language that is in keeping with the system safety philosophy. At General Motors, an active Design In Safety program requires cooperation between engineering, management, and labor to achieve safety objectives. In 1994, the National Safety Council (NSC) inaugurated an Institute for Safety Through Design (ISTD). Members of the NSC's Industrial Division (including GM, IBM, Eastman Kodak, and Boeing) provided funding for the ISTD because they realized that training recently hired engineering graduates to consider safety and health as part of the design process was very expensive. Accordingly, the ISTD has as one of its goals the inclusion of health, safety, and environmental issues in the engineering curricula. The National Institute for Occupational Safety and Health has similar goals for engineering education with its project SHAPE (Safety and Health Awareness for Preventive Engineering). Note that

1-5

efforts to include safety and health in the engineering curricula are rewarded. For example, a major aircraft manufacturing company reported that its engineering recruiters seek graduates who have taken system and occupational safety engineering courses as part of their course work.

IMPORTANCE OF INTEGRATING THE SYSTEM SAFETY PROGRAM THROUGHOUT THE PRODUCTI SYSTEM! FACILITY LIFE CYCLE

The principal advantages of a system safety program--compared with a conventional or traditional industrial safety program-is that early in the design stage, the forward-looking system safety program considers the hazards that will be encountered during the entire life cycle. The industrial safety program usually considers only the hazards that arise during the operational phases of the product or manufacturing system. Usually, the industrial safety practitioner is dealing with a manufacturing facility or process that already exists (together with its associated hazards), and emphasis is placed on training the employees to co-exist with the hazards inherent in the system, rather than removing the hazards from the system. Often, organizational inertia must be overcome if major changes are to be made in the design of the manufacturing system. Management sometimes holds the view that, "Well, we've been doing it like this for twenty years and never had any problem. Why should we change things?" The system safety techniques allow the analysis of hazards at any time during the life cycle of a system, but the real advantage is that the techniques can be used to detect hazards in the early part of the life cycle, when problems are relatively inexpensive to correct. Table 1-1 presents one scheme for describing the major phases of a system life cycle. System safety stresses the importance of designing safety into the system, rather than adding it on to a completed design. Most of the design decisions that have an impact on the hazards posed by a system must be made relatively early in the life cycle. System safety's early-on approach leads to more effective, less costly control or elimination of hazards.

TABLE 1-1 Description of system life cycle phases.


Project Phase A The conceptual trade studies phase of a project. Quantitative andlor qualitative comparison of candidate concepts against key evaluation criteria are performed to determine the best alternative. The concept definition phase of a project. The system mission and design requirements are established, and design feasibility studies and design trade studies are performed during this phase. The design and development phase of a project. System development initiated and specifications are established during this phase. is

Project Phase B

Project Phase C

Project Phase D

Thefabrication integration. test, and evaluation phase ofa project. The system is manufactured and requirements verified during this phase. The operations phase of a project. The system is deployed and system performance is validated during this phase. The decommissioning/disposal/recycle phase of a project. The system has come to the end of its useful life and is ready to be taken out of service.

Project Phase E

Project Phase F

1-6

COMPARISON OF SYSTEM SAFETY AND THE TRADITIONAL APPROACH TO SAFETY

System safety looks at a broader range of losses than is typically considered by the traditional industrial safety practitioner. It allows the analyst (and management) to gauge the impact of various hazards on potential "targets" or "resources," including workers, the public, product quality, productivity, environment, facilities, and equipment. System safety relies on analysis, and not solely on past experience and standards. When designing a new product, no information may be available concerning previous mishaps; a review of history will have little value to the designer. As standards writing is a slow process relative to the development of new technology, a search for -and review ofrelevant standards may not uncover all of the potential hazards posed by the new technology. System safety is broader than reliability. Reliability asks the question, "Does the component or system continue to meet its specification, and for how long?" System safety asks the broader question, "Was the specification correct, and what happens if the component meets (or doesn't meet) the specification?" Reliability focuses on the failure of a component; system safety recognizes that not all hazards are attributable to failures and that all failures do not necessarily cause hazards. System safety also analyzes the interactions among the components in a system and between the system and its environment, including human operators. The basis for system safety analysis is two-fold: recognizing system limits and risk. The next lesson begins with a definition of risk and the options for managing risk to an acceptable level. The later lessons present system safety analysis tools that can be used to identify hazards and their associated risk. The techniques can be classified into two groups: those that rely on a hazard inventory approach, and those that employ symbolic logic to produce a conceptual model of system behavior. Some authors think of the inventory techniques as inductive and the modeling techniques as deductive. Many techniques described in the literature are simply derivatives of others. The techniques tend to be complementary. Table 1-2 shows some of the characteristics of the major system safety analytical techniques.

COMPARISON OF SYSTEM SAFETY AND RELIABILITY ENGINEERING

ORGANIZATION OF THE MODULE

Table 1-2. Characteristics of common system safety analytical techniques


Technique Preliminary Hazard Analysis Inductive Deductive

./ ./ ./
./ ./

Failure Modes and Effects Analysis Fault Tree Analysis Event Tree Analysis Cause-Conseq uence Analysis Sneak Circuit Analysis Probabilistic Risk Assessment

./ ./

Digraph Analysis Hazard and Operability Management (HAZOP) Study


,/

Oversight and Risk Tree (MORT) Analysis

./

1-7

This module describes fifteen system safety and risk assessment tools available to the system engineer analyst. The Appendices include a glossary of terms, sample worksheets, and a hazards checklist. Lecture slides supporting many of the lessons and additional workshop problems are available to the instructor at http://www.sverdrup.com/svt. Many analytical techniques support the identification of hazards and an assessment of their associated risk, with an aim to controlling that risk to acceptable levels [3]. The principal techniques are illustrated in this instructional module. Table 1-3 summarizes the major advantages and limitations of each tool or methodology discussed in this module.

Table 1-3. Advantages and Limitations of System Safety Tools and Methodologies

Tool or Methodology RiskAssessment Matrix

Lesson II

Advantages Provides standard tool to assess risk subjectively,

Disadvantages Only used to assess risk of hazards; does not identify hazards, Does not address co-existing hazards.

Preliminary

Hazard Analysis

HI

Identifies and provides inventory of hazards and countermeasures. Identifies hazards associated with energy sources and determines if barriers are adequate countermeasures.

Energy FlowlBarrier

Analysis

IV

Does not address co-existing system failure modes. Fails to identify certain classes of hazards, e.g., asphyxia in oxygen-deficient confined spaces. Can be extremely labor intensive. Does not address co-existing failure modes,

Failure Modes and Effects (and Criticality) Analysis

Thorough method of identifying single point failures and their consequences. A criticality analysis provides a risk assessment of these failure modes. A symbolic logic model that is relatively easy for the analyst to construct. System reliability can be derived, given component reliability. Enables assessment of probabilities of co-existing faults or failures. May identify unnecessary design elements,

Reliability Block Diagram

VI

Component reliability estimates may not be readily available; total calculated reliability may be unrealistically high. Addresses only one undesirable event or condition that must be foreseen by the analyst. Comprehensive trees may be very large and cumbersome. Addresses only one desirable event or condition that must be foreseen by the analyst. Comprehensive trees may be very large and cumbersome.

Fault Tree Analysis

VII

Success Tree Analysis

VIII

Assesses probability of favorable outcome of system operation,

..-....",1
1-8

~
Tool or Methodology Event Tree Analysis Lesson IX Advantages Enables assessment of probabilities of co-existing faults or failures. Functions simultaneously in failure and success domains. End events need not be anticipated. Accident sequences through a system can be identified. Allows the analyst to overcome weakness of one technique by transforming a model of a system into an equivalent logic model in another analysis technique. Enables assessment of probabilities of co-existing faults or failures. End events need not be anticipated. Discrete levels of success and failure are distinguishable. Allows the analyst to examine the fault propagation through several primary and support systems. Minimal cut sets, single point failures, and double point failures can be determined with less computer computation than fault tree analysis. Allows analyst to perform qualitative probabilistic risk assessment based upon the exercise of subjective engineering judgment when no quantitative data is available. Measurement requirements can be determined that, if implemented, can help safeguard a systemin operation by providing a warning at the onset of a threatening failure mode. Disadvantages Addresses only one initiating challenge that must be foreseen by the analyst. Discrete levels of success and failure are not distinguishable.

Fault Tree, Reliability Block Diagram, and Event Tree Transformation

This technique offers no additional information and is only as good as the input model.

Cause-Consequence

Analysis

XI

Addresses only.one initiating challenge that must be foreseen by the analyst. May be very subjective as to consequence severity.

Directed Graphic (Digraph) Matrix Analysis

XII

Trained analyst, computer codes and resources to perform this technique may be limited. Only identifies single point (singleton) and dual points (doubleton) offailure.

.~.

Combinatorial Failure Probability Analysis Using Subjective Information

XIII

Use of actual quantitative data is preferred to this method. Should only be used when actual quantitative failure data is unavailable.

Failure Mode Information Propagation Modeling

XIV

This technique is only applicable if the system is operating in a near normal range during the instant of time just before initiation of a failure. Data and results, unless used in a comparative fashion, may be poorly understood.

1-9

Tool or Methodology Probabilistic Design Analysis

Lesson XV

Advantages Allows the analyst a practical method of quantitatively and statistically estimating the reliability of a system during the design phase. Provides alternative to the traditional method of imposing safety factors and margins to ensure system reliability. That method might be flawed if significant experience and historical data for similar components are not available. Provides methodology to assess overall system risks; avoids accepting unknown, intolerable, and senseless risk.

Disadvantages \~ Analyst must have significant experience in probability and statistical methods to apply this technique. Historical population data used must be very close to as-planned design population to be viable. Extrapolation between populations can render technique non-viable.

Probabilistic

Risk Assessment

XVI

Performing the techniques of this methodology requires skilled analysts. Techniques can be misapplied and results misinterpreted.

The risk assessment matrix (Lesson II) supports a standard, subjective methodology to evaluate hazards as to their risks. Lecture slides entitled "Concepts of Risk Management" and "Working with the Risk Assessment Matrix" are available (http://www.sverdrup.comlsvt). The risk assessment matrix is used in conjunction with hazard analyses, such as the preliminary hazard analysis (PHA) technique discussed in Lesson III. The PHA can be used to identify hazards and to guide development of countermeasures to mitigate the risk posed by these hazards. Lecture slides covering preliminary hazard analysis are available (http://www.sverdrup.comlsvt). The energy flowlbarrier analysis discussed in Lesson IV is also a technique used to identify hazards and evaluate their corresponding countermeasures. An accompanying set of lecture slides for energy flow/barrier analysis is available (http://www.sverdrup.comlsvt). Once hazards are identified, they can be further explored if the failure modes of the elements of the system are known. The failure modes and effects analysis (FMEA), discussed in Lesson V, can be used to identify failure modes and their consequences or effects. Also discussed in Lesson V is failure modes, effects, and criticality analysis (FMECA). The FMECA is similar to the FMEA, but also addresses the criticality, or risk, associated with each failure mode. Lecture slides for FEMA and FMECA are available (http://www.sverdrup.comlsvt). Several symbolic logic methods are presented in this section. These methods construct conceptual models of failure or success mechanisms within a system. These tools are also used to determine either the probability of system or component failure, or the probability that a system or component will operate successfully. The probability of a successful operation is the reliability. If the failure probability (PF) is examined, then the model is generated in the failure domain and if the probability of success (Ps) is examined, then the model is generated in the success domain. For convenience, the analyst can model either in the failure or success domain (or both domains), then convert the final probabilities to the desired domain using the following expression: PF+ Ps = 1. These models are developed using forward (bottom-up) or backwards (top-down) logic. When using forward logic, the analyst builds the model by repeatedly asking" What happens when a given failure occurs?" The analyst views the system from a "bottom-up" perspective. This means the analyst starts by looking at the lowest level elements in the

1-10

system and their functions. Classically, the FMEA, for example, is a bottom-up technique. When using backwards logic to build a model, the analyst repeatedly asks "What will cause a given failure to occur?" The analyst views the system from a "top-down" perspective. This means the analyst starts by looking at a high level system failure and proceeds down into the system to trace failure paths. Table 1-4 presents symbolic logic techniques discussed in this section and their characteristics.

Table 1-4. Symbolic Logic Techniques


Technique Lesson Success Domain Failure Domain Forward (Bottom-Up) Backwards (Top-Down)

.; Reliability Block Diagram VI Fault Tree Analysis= VII Success Tree Analysis VIII ./ .; Event Tree Analysis" IX .; Cause-Consequence Analysis" XI .; . Directed Graph Matrix Analysis XII • Lecture slides are available (http://www.sverdrup.comlsvt)

.; . ./
./ .; ./ ./ .; ./ ./ .; ./

Each symbolic logic technique has its advantages and disadvantages. Sometimes it is beneficial to construct a model using one technique then transform that model into the domain of another technique to exploit the advantages of both. Fault trees are generated in the failure domain; reliability diagrams are generated in the success domain; and event trees are generated both in the success and failure domains. Methods are presented in Lesson X to transform any of the models into the other two by translating equivalent logic from the success to failure or failure to success domains. Cause-consequence analysis, presented as Lesson XI, allows the analyst to model partial failure/success, along with the effects of timing on the response of a system to a challenge. Lecture slides covering fault tree analysis, event tree analysis and the transformations between these analyses are available, along with lecture slides for cause-consequence analysis (http://www.sverdrup.comlsvt). Probabilities are propagated through the logic models to determine the probability of system failure or success, i.e. the reliability. Probability data may be derived from available empirical data or found in handbooks. If quantitative data are not available, then subjective probability estimates may be used as described in Lesson XII. Lecture slides are available (http://www.sverdrup.comlsvt) for fault tree analysis in the absence of quantitative data (combinatorial analysis). Caution must be exercised when quoting reliability numbers. The use of confidence bands is important. Often the value is in a comparison of numbers that allows effective resource allocation, rather than exact determination of expected reliability levels. Failure mode information propagation modeling is discussed in Lesson XIV. This technique allows the analyst to determine what information is needed, and how and where the information should be measured in a system to detect the onset of a failure mode that could damage the system. Lecture slides for failure mode information propagation modeling are available at http://www.sverdrup.comlsvt. Probabilistic design analysis (PDA) is discussed in Lesson XV. This technique uses advanced statistical methods to determine probabilities of failure modes. Finally, probabilistic risk assessment (PRA) is discussed in Lesson XVI. This is a general methodology that shows how most of the techniques mentioned earlier can be used in conjunction to assess risk with severity and probability.

1-11

SYSTEM SAFETY AND THE DESIGN FUNCTION

When new products or processes are developed, the designer seldom begins with a blank canvas. Rather, there is a mixture of retained knowledge, combined with new technology that is fashioned into the new design. The retained knowledge (lessons learned) and new technology drive the safety program planning, hazard identification and analyses, as well as the safety criteria, requirements, and constraints. The designer's "up stream" knowledge of the safety issues allows for the cost-effective integration of safety, health, and environmental considerations at all points of the product life cycle. Knowledge will be gained as the product/process life cycle moves forward. This knowledge or "lessons learned" can be applied at earlier stages of the product life cycle, leading to changes in design, materials, manufacturing methods, inspection, etc. This approach to continuous process improvement is shown graphically in Appendix A. Appendix A provides a schematic description of the system safety approach as it is successfully used in various settings, including the design of semiconductor manufacturing facilities, chemical and food processing plants, air and ground transportation systems, and consumer products. Many modem systems are software-controlled. This has resulted in increasing recognition of the importance of integrating software safety efforts within the system safety program[3]. System safety aspects of software are not treated in this module.

FINAL WORDS

OF CAUTION

The search continues for the ideal system safety analytical method. The notion that one analytical approach exists that is overwhelmingly superior to all others will not die as long charlatans and shallow thinkers perpetuate the myth. Each analytical technique presented in this module has its advantages and its shortcomings. Each has more or less virtue in some applications than in others. Recourse to a dispassionate, annotated. compendium of techniques can help guide the selection of technique(s) that are appropriate for an application[2, 4]. Just as the search among existing analytical methods for the ideal one does not end, neither does the quest to invent the universal technique. The safety literature is replete with articles describing one-size-fits-all analytical techniques. Usually, the techniques have clever names that spell out memorable acronyms, and the papers that describe them have been given no benefit of sound technical review by peer practitioners. Even as physics struggles to develop a unified field theory, system safety practice seeks to produce an umbrella-style approach to which all system safety problems will succumb. Operations research experts point out that the variability of systems and permutations of failure opportunities within systems make analyses of those failure opportunities intractable by a single analytical approach. Although the Swiss army knife is a marvelous combination of tools, there is no model that has both a bumper jack and a lobotomy kit among its inventory of tools. The design engineer/analyst is well-served by a "toolbox" of system safety analytical techniques, each of which is cherished for the insights it provides. Development of that analytical "toolbox" as part of an engineering education is a primary purpose of this document.

REFERENCES 1. 2. Air Force Space Division [1987]. System safety handbook for the acquisition manager. SDP 127-1, p. 1-1. Center for Chemical Process Safety [1992]. Guidelines for hazard evaluation procedures. 2nd ed. with· worked examples. New York, NY: American Institute of Chemical Engineers. Stephans, RA, Talso, WW, eds. [1997]. System safety analysis handbook. 2nd ed. Albuquerque, NM: New Mexico Chapter, System Safety Society. Leveson, NG [1995]. Safeware: system safety and computers. New York: Addison-Wesley Company. Publishing

3.

4.

5.

CFR. Code of Federal regulations. Washington, DC: U.S. Government Printing Office, Office of the Federal Register. SUGGESTED READINGS

Bemold, T, ed. [1990]. Industrial risk management: a life-cycle approach. New York, NY: Elsevier. Hammer, W [1972]. Handbook of system and product safety. Englewood Cliffs, NJ: Prentice-Hall. Raheja, DG [1991]. Assurance technologies - principles and practices. New York, NY: McGraw-HilL Roland, HE, Moriarty, B [1990]. System safety engineering and management. 2nd ed. New York, NY: Wiley Interscience.

1-13

SAMPLE DISCUSSION AND EXAMINATION


1.

QUESTIONS

2. 3. 4.

Contrast the perspective of the reliability engineer with that of the system safety engineer. At what point during the product/facility/system life cycle should a system safety program be implemented? When can it be implemented? How is risk evaluated? "What is meant by the term "target" in system safety practice?

1-14

LESSON II
RISK ASSESSMENT MATRIX

PURPOSE: OBJECTIVE:

To introduce the student to the foundation and use of the risk assessment matrix. To 1. 2. 3. 4. 5. 6. 7. 8. 1. 2. 3. .4. 5. 6. 7. 8. 9. 10. 11. 12. acquaint the student with the following: Defmition of risk Defmition of severity Definition of probability The concept of the risk plane The iso-risk contour Construction, calibration and use of the risk assessment matrix Importance of exposure interval Concept of multiple targets or exposed resources Risk Severity Probability Worst credible case Iso-risk contour Risk tolerance boundaries Risk acceptance zones Mishap Hazard Target Resource Exposure

SPECIAL TERMS:

II-I

II-2

DESCRIPTION

The risk assessment matrix is a tool to conduct subjective risk assessments for use in hazard analysis[l]. The definition of risk and the principle of the iso-risk contour are the basis for this technique. Please see http://www.sverdrup.comlsvt for two sets of lecture slides (Concepts in Risk Management and Working with the Risk Assessment Matrix) that support this lesson. The risk posed by a given hazard to an exposed resource can be expressed in terms of an expectation of loss, the combined severity and probability of loss, or the long-term rate of loss. Risk is the product of severity and probability (loss events per unit time or activity). Note: the probability component of risk must be attached to an exposure time interval. The severity and probability dimensions of risk define a risk plane. As shown in Figure II-I, iso-risk contours depict constant risk within the plane. The concept of the iso-risk contour is useful to provide guides, convention, and acceptance limits for risk assessments (see Figure II-2). Risk should be evaluated for the worst credible case, not worst conceivable case, or conditions. Failure to assume credible (even if conceivable is substituted) may result in an optimistic analysis; it will result in a non-viable analysis.

SEVERITY and PROBABILITY, the two variables that constitute risk, define a RISK PLANE.
.... --..

1ij
ffi CII

Risk is constant along any iso~risk contour.

exposure interval.

© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee (1) Figure II-I. Risk plane

,//-

II-3

2
RISK ASSESSMENT CONVENTION: If

possible, assess Risk for the worst- credible severitv of outcome. (It'll fall at the top end of its own iso-risk contour.)

1
RISK ASSESSMENT

GUIDES: If risk for a given hazard can be assessed at !!.m: severity level, an iso-risk contour gives its probability at all severity levels. (M2l!!, but nQ! <!!! hazards behave this way. Be wary of exceptions usually high-energy cases.)

o~ __ ~ __ ~

~
PROBABILITY

i!ll1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee [1] Figure 11-2. Iso-risk contour usage

APPLICATION

The risk assessment matrix is typically performed in the design and development phase, but may also be performed in the conceptual trade studies phase. This technique is used as a predetermined guide or criterion to evaluate identified hazards. These risks are expressed in terms of severity and probability. Use of this tool allows an organization to institute and standardize the approach to perform hazard analyses, such as the preliminary hazard analysis (PHA), defmed in Lesson III. Procedures for developing a risk assessment matrix are presented below [1J: (1) Categorize and scale the subjective probability levels for all targets or resources, such as frequent, probable, occasional, remote, improbable, and impossible (adopted from MIL-STD-882C [2]). Note: A target or resource is defmed as the "what" that is at risk. One typical breakout of targets or resources is personnel, equipment, downtime, product loss, and environmental effects. (2) Categorize and scale the subjective severity levels for each target or resource, such as catastrophic, critical, marginal, and negligible. (3) Create a matrix of consequence severity versus the probability of the mishap (the event capable of producing loss). Approximate the continuous, iso-risk contour functions in the risk plane with matrix cells (see Figure 11-3), These matrix cells fix the limits of risk tolerance zones. Note that management-not the analyst-establishes and approves the risk tolerance boundaries. Management will consider social, legal, and fmancial impacts when setting risk tolerance boundaries.

PROCEDURES

II-4

S V. _ R I T Y

"'

,,

--

~~~~~

.. .
, ,

.~~

<.~~~.«<
-\

.~--,--, .. IV;;;:;;~,-.... ~ .....

"Zoning" the risk plane into judgmentally tractable cells produces a matrix.

"'V"IC-::-- ............ ,

PROBABIUTY

Matrix cells approximate the continuous, iso-risk contour functions in the risk plane. Steps in the matrix define risk tolerance boundaries.

IV
PROBABIlITY

© 1997 Figure provided courtesy of Sverdrup Technology, Inc .• Tullahoma. Tennessee [1] Figure II-3. Risk plane to risk matrix transformation

(4) The following hints are helpful for creating the matrix: Increase adjacent probability steps by orders of magnitude. The lowest step, "impossible," is an exception (see Figure II-4.a). Avoid creating too many matrix cells. Since the assessment is subjective, too many steps add confusion with no additional resolution (see Figure II4.b). Avoid discontinuities in establishing the risk zones, i.e., make sure every one-step pathdoes not pass through more than one zone (see Figure II-4.c). Establish only as many risk zones as there are desired categories of resolution to risk issues, i.e. (1) unacceptable, (2) accepted by waiver, and (3) routinely accepted (see Figure II-4.d). Link the risk matrix to a stated exposure period. When evaluating exposures, a consistent exposure interval must be selected, otherwise risk acceptance will be variable. An event for which the probability of occurrence is judged as remote during an exposure period of 3 months may be judged as frequent if the exposure period is extended to 30 years. For occupational applications, the exposure period is typically 25 years. All stakeholders (management or the client) who participate in establishing the risk acceptance matrix must be informed of any changes to the exposure interval for which the matrix was calibrated. (5) Calibrate the risk matrix by selecting a cell and attaching a practical hazard scenario to it. The scenario should be familiar to potential analysts or characterize a tolerable perceivable threat. Assign its risk to the highest level severity cell just inside the acceptable risk zone. This calibration point should be used as a benchmark to aid in evaluating other, less familiar risks.

II-5

Factors of ten separate adjacent probability steps.

••• but F

= 0 ("Impossible")

= 10 x E = 10 x D B = 10 x C A = 10 x B
D
C

r
F S E
V

0
.A..
C E
D B

......
A

Severity Level III is OSHA-Recordable

~11I[=I=I~0)~31:]~~~~~
IV PROBABILITY

E II R

Figure II-4a. Useful conventions

L I S E

V nI

E R IV

= 1= -..'-' , c...; ~ = F; ~ r-..." l'-:


H G FED C

B\ A

l'.: ,'\ ~ I'--._"


........

rr..._'\

:--

V
VI

VI I

I FLAWED II
F PROBABILITY S E V E II R I T Y III IV PROBABILITY E 0 B A

?E , !\: .'\ ~ ........

Subjective judgment can't readily resolve more than six discrete probability steps. Added steps become confused/meaningless.

Keep it SIMPLE! 4 x 6 = 24 cells is better than 7 x 12 84 cells

Figure II-4b. Don't create too many cells ©1997 Figures provided courtesyof Sverdrup Technology, Inc., Tullahoma, Tennessee [l] Figure 11-4. Helpful hints in creating a risk assessment matrix (Continued)

II-6

F S E V E II

Can a countermeasure make the "leap" from zone (1) to zone (3) in a single step?

I T Y III IV PROBABILITY

F
S E V E II R I T y III

Make every one-step path from a high risk zone (1) to a lower risk zone (3) pass through the intermediate zones (2).

IV PROBABILITY

Figure II-4c. Avoid discontinuities

S E V E II R I T y III

A 24-cell matrix can be resolved into 9 levels of "priority," or even more. But what are the rational functions for the many levels?
F
PROBABIUTY S E V E II R I T y III

Three zones will usually suffice. A hazard's risk is either ... • (3) Routinely accepted • (2) Accepted by waiver, or • (1) Avoided.

IV PROBABILITY

Figure II-4d. Don't create too many zones ©1997 Figures provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee [IJ Figure II-4. Helpful hints in creating a risk assessment matrix (Concluded)

II-?

. EXAMPLE

Figure II-5shows a typical risk assessment matrix, adapted from MIL-STD-882C [2]. Figure II -6 shows sample interpretations of the severity and probability steps for this matrix.

Severity of Consequences
F IMPOSSIBLE E IMPROBABLE

Probability of Mishap··

CATASTROPHIC
II

CRITICAL III MARGINAL


IV

CD
Actions Imperative to suppress risk to lower level. Operation requires written, time-limited waiver, endorsed

NEGLIGIBLE

Codel

by management.

Operation permissible.
NOTE : Personnel must not be exposed to hazards in Risk Zones 1 and 2.

"Adapted from MIL-STD-882C

"Life Cycle = 25 yrs. provided courtesy of Sverdrup Technology, Inc .. Tullahoma, Tennessee [1J

© 1997 Figure

Figure II-5. Typical risk assessment matrix

II-8

/--

~~

Q)(I)o::::i m men .... I'O.~ Clol O ...W •- 3:: s:::: s:::: e" ~ 0.::« .!!.ee:: ...Q)mlD Q)O« Q. ...uO U)U)e:: §! 0.1a.
-I-

>

>0

!::ew...i ...JoO::« -._::;» ~ «gOe:: ID~Q.W 0-><1a:a:lw2l:


Q..~
ID ... ·en

......

_. is 21Wu
::2w
2 LL OLL CLW

~ w

/ 1/

u;
u:>

s,
II

1/

""
13 >()
(J)

1/

r'
()

"" <:» 0
co

I-

~ ~
,g
E

a ~
a:I
-0

"0 (])

__

U-9

ADVANTAGES

The risk assessment matrix has the following advantages [1]: The risk matrix provides a useful guide for prudent engineering. The risk matrix provides a standard tool of treating the relationship between severity and probability in assessing risk for a hazard. Subjective risk assessment avoids unknowingly accepting intolerable and senseless risk, allows operating decisions to be made, and improves resource distribution for mitigation of loss resources.

LIMITATIONS

The risk assessment matrix has the following limitations [1]: The risk assessment matrix can only be used if hazards are already identified; this tool does not assist the analyst in identifying hazards. Without data, this method is subjective and is a comparative analysis only.

II-tO

REFERENCES 1. Clemens, PL [1993]. Working with the risk assessment matrix (lecture notes). 2nd ed .. Tullahoma, TN: Sverdrup Technology, Inc. (available at http://www.sverdrup.comlsvt). U.S. Department of Defense [1993]. System safety program requirements. Washington, DC: U.S. Department of Defense, MIL-STD-882C.

2.

SUGGESTED

READINGS

CFR. Code ofF ederal regulations. Pre-market notification (medical devices). Vol. 21, Section 807.9. Washington, DC: U.S. Government Printing Office, Office of the Federal Register. CFR, Code of Federal regulations. Process safety management of highly hazardous chemicals. Vol. 29, Section 1910.119 (e). Washington, DC: Ll.S, Government Printing Office, Office of the Federal Register. National Aeronautics and Space Administration [1970]. System safety. NHB 1700.1 (volume 3). Washington, DC: National Aeronautics and Space Administration. Nuclear Regulatory Commission [1980]. Risk-based inspection - development of guidelines. Washington, DC: Nuclear Regulatory Commission, NUREG/GR-0005.

u. S. Department

of Defense [1970]. System safety engineering and management. Department of Defense Instruction, No. 5000.36.

II-II

SAMPLE DISCUSSION AND EXAMINATION 1.

QUESTIONS

2.
3. 4.

5.
6. 7.

8.
9. 10.

11. 12. 13. 14. 15. 16.

What is the basis for the iso-risk contour? What is the definition of risk? When is a product/system/facility considered safe? Who in an enterprise establishes risk tolerance levels? What is meant by "calibrating" a risk assessment matrix? What role does society play in establishing risk tolerance boundaries? What role do the finances of the enterprise play in establishing risk tolerance boundaries? Why is it important to establish an exposure interval when evaluating risk? What exposure interval is commonly used for occupational safety and health exposures? Suppose that an enterprise establishes a risk assessment matrix, using a nominal 25-year exposure interval. If the risk assessment matrix is then used to guide the enterprise's decision making for a system that is intended to be placed in service for 60 years, what problems may result? How does the risk assessment matrix recognize the various targets or resources of interest? When should an enterprise's risk assessment matrix be revised or reviewed? What role does the risk posed by automobile travel play in establishing risk tolerance levels? What are typical targets for which the risk assessment matrix should be calibrated? How many tolerance zones should appear on a well constructed risk assessment matrix'[ In a risk assessment matrix, what is the usual ratio between adjacent probability steps (except for the probability step labeled as "impossible")? .

II-I2

LESSON III
PRELIMINARY HAZARD ANALYSIS

PURPOSE: OBJECTIVE:

To introduce the student to the concept and application of preliminary hazard analysis. To acquaint the student with the following: 1. Purpose of preliminary hazard analysis 2. Role of preliminary hazard analysis in the integrated system safety approach 3. Procedure for performing a preliminary hazard analysis 4. Timing for preliminary hazard analysis 5. Advantages of preliminary hazard analysis 6. Limitations of preliminary hazard analysis 1. Hazard Target Resource Severity Probability Risk Countermeasure . Control Consequence Mission phase Life-cycle

SPECIAL TERMS:

2.
3. 4. 5.

..

----

6.
7. 8. 9. 10. 11.

III-I

1lI-2

DESCRIPTION

A preliminary hazard analysis (PHA) produces a line item tabular inventory of non-trivial system hazards, and an assessment of their remaining risk after countermeasures have been imposed [1]. This inventory includes qualitative, not quantitative, assessments ofrisks. Also often included is a tabular listing of countermeasures with a qualitative delineation of their predicted effectiveness. A PHA is an early or initial system safety study of system hazards. It is important to remember that each analytical technique discussed in this module complements (rather than supplants) the others. This is so because each technique attacks the system to be analyzed differently-some are top-down, others are bottom-up. Though it has long been sought, there is no "Swiss army knife" technique that answers all questions and is suitable for all situations.

APPLICATION

PHAs are best applied in the design and development phase but may also be applied in the concept definition phase. This tool is applied to cover whole-system and interface hazards for all mission phases. A PHA may be carried out, however, at any point in the life cycle of a system. This tool allows early definition of the countermeasure type and incorporation of design countermeasures as appropriate. Procedures for performing PHAs are presented below [1]: (1) Identify resources of value tobe protected, such as personnel, facilities, equipment, productivity, mission or test objectives, environment, etc. These resources are potential targets. (2) Identify and observe the levels of acceptable risk that have been predetermined and approved by management or the client. These limits may be the risk matrix boundaries defined in a risk assessment matrix (see Lesson II). (3) Defme the extent of the system to be assessed. Define the physical boundaries and operating phases (such as shakedown, startup, standard operation, emergency shutdown, maintenance, deactivation, etc.). State other assumptions such as whether the assessment is based on an as-built or as-designed system, or whether current installed countermeasures will be considered. (4) Detect and confirm hazards to the system. Identify the targets threatened by each hazard. A hazard is defined as an activity or circumstance posing potential loss or harm to a target and is a condition required for an undesired loss event. Hazards should be distinguished from consequences and considered in terms of a source (hazard), mechanism (process) and outcome (consequence). A team approach to identifying hazards, such as brainstorming, is recommended over a single analyst. If schedule and resource restraints are considerations, then a proficient engineer with knowledge of the system should identify the hazards, but that assessment should be reviewed by a peer. A list of proven methods for finding hazards is presented below: Use intuitive "engineering sense." Examine and inspect similar facilities or systems and interview workers assigned to those facilities or systems. Examine system specifications and expectations. Review codes, regulations, and consensus standards.

PROCEDURES

III-3

Interview current or intended system users or operators. Consult checklists (see Appendix D). Review system safety studies from other similar systems. Review historical documents - mishap files, near-miss reports, OSHArecordable injury rates, National Safety Council data, manufacturer's reliability analyses, etc. Consider "external influences" such as local weather, environment, or personnel tendencies. Consider all mission phases. Consider "common causes." A common cause is a circumstance or . envirorunental condition that, if it exists, will induce two or more fault/failure conditions within a system. Brainstorm - mentally develop credible problems and play "what-if' games. Consider all energy sources. What's necessary to keep them under control; what happens ifthey get out of control? (5) Assess worst-credible case (not the worst-conceivable case) severity and probability for eacb hazard and target combination. Keep the following considerations in mind during the evaluation: Remember that severity for a given hazard varies as a function of targets and operational phases. A probability interval must be established before probability can be determined. This interval can be in terms of time, or number of cycles or operations. If a short-term probability interval is used, then the assessment will underestimate the true risk unless the risk acceptance criterion is adjusted accordingly. Probability intervals expressed in hours, days, weeks, or months are too brief to be practical. The interval should depict the estimated facility, equipment, or each human operator working life span. An interval of25 to 30 years is typically used and represents a practical value. The probability for a given hazard varies as a function of exposure time, target, population, and operational phase. Since probability is determined in a subjective manner, draw on the experience of several experts as opposed to a single analyst. (6) Assess risk for each hazard using a risk assessment matrix (see Lesson II). The matrix should be consistent with the established probability interval and force or fleet size for this assessment. (7) Categorize each identified risk as acceptable or unacceptable, or develop countermeasures for the risk, if unacceptable.

III-4

(8) Select countermeasures in the following descending priority order to optimize effectiveness: (1) design change, (2) engineered safety systems (active), (3) safety devices (passive), (4) warning devices, and (5) proceduresand training. Note that this delineation, although in decreasing order of effectiveness, is also typically in decreasing order of cost and schedule impact (i.e., design changes have the highest potential for cost and schedule impact). Note also that the list is . in increasing order of reliance on the human operator or maintainer - to refrain from attempting to defeat the engineered safety systems, to replace the safety devices after servicing, to heed the warning devices, and to remember procedures and training. A trade study might be performed to determine a countermeasure of adequate effectiveness and minimized program impact. (9) Re-evaluate the risk with the new countermeasure installed.

(10) If countermeasures are developed, determine whether they introduce new hazards or intolerably diminish system performance. If added hazards or degraded performance are unacceptable, determine new countermeasures and reevaluate the risk. Figure III-I is a flowchart summarizing the process to perform a PHA.

III-5

Preliminary Hazard Analysis Process Flow

CD CD

Recognize RISK TOLERANCE LIMITS Identify TARGETS to be protected: (i. e., Risk Matrix Boundaries) • Personnel • Product • Environment • Equipment • Productivity ' ... other ... 'SCOPE" system as to: (a) physical boundaries; (b) operating (~;~~~;';~~~;C~~d~i~~'P;;i~~t;;;;~t'~f~~r;'",\ phases (e. go, shakedown, startup, standard run, emergency stop, ~ Describehazard: maintenance); and (c) other assumptions made (e. g., as·is, asdesigned, no countermeasures in place) etc. ,.------, IDENTIFY! ~
0 0 0

CD

"'-"'7~O~,~~~,:::~~~!!,~~~~~,:::~~;9,~~,~"",."

VERIFY HAZARDS

DEVELOP COUNTERMEASURES AND REEVALUATE

EVALUATE WORST·CASE SEVERITY

-~

EVALUATE PROBABILITY

~~

~"''''''_'''_'''''''''''''''''''''''''''''''''''''',", ~ REPEAT. __ ~ ~ TARGET/HAZARD ~ cornblnanon,


,.._.._.._..._..._

for each

~ ..

~ ~

ASSESS RISK ~

........................................................... .._..._...._ ............. ..._.._.._ ............ ~' USE RISK MATRIX -~ ~ MATRIX must be defined for and ' ~ must match the assessment Probability IntelV~1 and • ~ Force/Fleet Size. :

'....

...._

'

..

Do the countermeasures introduce NEW hazards?

or, if so, develop NEW COUNTERMEASURES!

Do the countermeasures IMPAIR system performance?

••

©1997 Figure courtesyof Sverdrup Technology,Inc., Tullahoma,Tennessee[J] Figure III-I. Preliminary hazard analysis process flowchart

III-6

./.-.

EXAMPLE

Figure III-2 shows an example of a completed PHA worksheet (from [l]) for a pressurized chemical intermediate transfer system. (A blank form is included in Appendix B.)

Identify countermeasures by appropriate code lener(s): 0= Design AI""a~on E = Engineered Safety Feature S = Safety Device W Warning Device P = Procedures/Training

Surround fiange wllt1 sealed annular stainless steel catcnment noustnq, wltn gravity runoff conduit led to Detecto-Box'" containing detector/alarm device and chemical neunanzer (S/W). Inspect flange seal at 2-montn Intervals, and re-gasketdorlng annual plant maintenance shutd.own (P). Provide personal protective equipment (Scnedule 4) and training for '-I,,"!-'l.respolnsel'cl",lnup crew (SiP).

© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee [l]

Figure III-2. Typical preliminary hazard analysis

Note that the worksheet from this example contains the following information: Brief description of the portion of the system, subsystem, or operation covered in the analysis, • Declaration of the probability interval, System number, Date of analysis, Hazard (description and identification number),

III-7

Hazard targets (check boxes for personnel, equipment, downtime, product environment), Risk assessment before countermeasures are considered; including severity level, probability level, and risk priority code (zone from risk matrix, see Figure II~5), Description of countermeasure (with codes for various types),

Risk assessment after countenneasuresare considered; including severity level, probability level, and risk priority code, and . Signature blocks for the analyst and reviewers/approvers. The PHA worksheet used in the their own worksheet customized may be listed. In any case, great effective use. Although helpful, in identifying hazards or threats. analyst(s) if it is to be effective. ADVANTAGES example is typical. However, an organization may create for their operation. For example, different target types care should be given to designing the form to encourage a PHA is not a structured approach that assists the analyst Assuch, it relies on the skill and experience of the

A PHA provides the following advantages [1]: Identifies and provides a log of primary system hazards and their corresponding risks. Provides a logically based evaluation of a system's weak points early enough to allow design mitigation of risk rather than a procedural or inspection level approach: Provides information to management to make decisions to allocate resources and prioritize activities to bring risk within acceptable limits. Provides a relatively quick review and delineation of the most significant risks associated with a system.

LIMIT ATIONS

A PHA has the following limitations [1]: A PHA fails to assess risks of combined hazards or co-existing system failure modes. Therefore a false conclusion may be made that overall system risk is acceptable simply because each identified hazard element risk is acceptable when viewed individually. If inappropriate or insufficient targets or operational phases are chosen, the assessment will be flawed. While on the other hand, if too many targets or operational phases are chosen, the effort becomes too large and costly to implement.

..

III-8

REFERENCES 1. ~" Mohr, RR [1993]. Preliminary Hazard Analysis (lecture presentation). 4th ed. Tullahoma, TN: Sverdrup Technology, Inc. (available at http://www.sverdrup.comlsvt).

SUGGESTED READINGS

Browning, RL [1980]. The loss rate concept in safety engineering. New York: Marcel Dekker, Inc. Hanuner, W [1972]. Handbook of system and product safety. Englewood Cliffs, NJ: Prentice-Hall, Inc. Henley, EJ, Kumamoto, H[1991]. Probabilistic risk assessment. New York: The Institute of Electrical and Electronic Engineers, Inc. Malasky, SW [1982]. System safety: technology and application. New York: Garland STPM Press. Raheja, DG [1991]. Assurance technology and application- principles and practices. New York: McGraw-Hill. Roland, HE, Moriarty, B [1990]. System safety engineering and management, 2nd ed. New York: John Wiley & Sons, Inc. Stephans, RA, Talso, WW, eds. [1997]. System safety analysis handbook. 2nd ed. Albuquerque, NM: New Mexico Chapter of the System Safety Society. U S. Air Force [1982]. System safety. Air Force Systems Command design handbook DH 1-6. U.S. Army [1990]. System safety engineering and management. Army Regulation 3895-16.

III-9

SAMPLE DISCUSSION AND EXAMINATION

QUESTIONS

1. 2. 3. 4. 5.

What are the primary reasons for performing a preliminary hazard analysis? During what phase of a product/facility/system life-cycle can a preliminary hazard be performed? What are the advantages of a preliminary hazard analysis? . What is the primary limitation of a preliminary hazard analysis? Can system risk be properly evaluated by means of a preliminary hazard analysis?

Instructors can obtain presentation slides for a workshop problem entitled, "Furry Slurry Processing"at http://www.sverdrup.com!svt.

III-lO

LESSON IV
ENERGY FLOW/BARRIER ANALYSIS

PURPOSE: OBJECTIVE:

To introduce the student to the concepts and applications of energy flowibarrier analysis. To acquaint the student with the following: 1. Philosophical foundation for energyflowibarrier analysis 2. Types of energy sources, barriers and targets which are considered when doing an energy flowlbarrier analysis 3. Use of energy flowlbarrier analysis as a "thought model" when completing a preliminary hazard analysis 4. Use of energy flowlbarrier analysis in the occupational setting 5. Use of energy flowlbarrier analysis in emergency response situations 6. Procedure for performing energy flowlbarrier analysis 7. Advantages and limitations of energy flowlbarrier analysis 1. 2. 3. 4. 5. 6. Barrier Target Energy source Countermeasure Energy flow Energy traceibarrier analysis

SPECIAL TERMS:

IV-I

IV-2

DESCRIPTION

Energy flow/barrier analysis (EFBA) is a system safety analysis tool used to identify hazards and determine the effectiveness of countermeasures employed or proposed to mitigate the risk induced by these hazards [1]. This tool is also known as energy tracelbarrier analysis (ETBA). The energyflow/barrier method is a useful supplement to the preliminary hazard analysis discussed in Lesson III. Energy flowibarrier analysis does not employ a separate worksheet from that used for preliminary hazard analysis (PHA). Most analysts consider EFBA as a thought process that can be used when performing a preliminary hazard analysis. That is, a hazard (energy source) poses a risk to a target if the barriers between the energy source and the target are inadequate. Energy sources are identified, such as electrical, mechanical, chemical, radiation, etc. Resources (targets) to be protected are identified, such as, employees, equipment, facilities, environment, quality, production capability, inventory, etc. Then the analyst assesses opportunities for undesired energy flow between the sources and targets. Barriers are countermeasures (physical or administrative) deployed against hazards caused by flows from these energy sources to targets. Examples of barriers include: barricades, blast walls, fences, lead shields, gloves, safety glasses, procedures, etc. It is important to remember that each analytical technique discussed in this module complements (rather than supplants) the others. This is because each technique attacks the system to be analyzed differently-some are top-down, others are bottom-up. Though it has long been sought, there is no "Swiss army knife" technique that answers all questions and is suitable for all situations. See http://www.sverdrup.comlsvt for presentation slides that support this lesson.

APPLICATION

An energy flowibarrier analysis can be beneficially applied whenever assessments are needed to assure that an identified target (resource) is being safeguarded against a potential energy source that can impose harm. This assessment can be applied during the design and development phase but may also be applied in the operations phase or concept definition phase. This analysis can also be applied in failure investigations and when making "safe to enter" decisions during emergency response situations. Examples of its use in making "safe to enter" decisions include analysis of the state of all utilities (including steam, gas, electrical, etc.) before allowing rescue teams into a damaged building.

PROCEDURES

Procedures to perform an energy flowibarrier analysis are presented below[l]: (I) Examine the system and identify all energy sources.

(2) Examine each potential energy flow path in the system. Consider the following for each energy flow path: What are the potential targets, such as personnel, facilities, equipment, productivity, mission or test objectives, environment, etc.? Remember that every energy source could have multiple flow paths and targets. Is the energy flow unwanted or detrimental to a target? Are existing barriers sufficient countermeasures targets? to mitigate the risk to the

IV-3

(3) Consider the following strategies to control harmful energy flow (1]: Eliminate energy concentrations Limit quantity and/or level of energy Prevent the release of energy Modify the rate of release of energy Separate energy from target in timeandlor space Isolate by imposing a barrier Modify target contact surface or basic structure Strengthen potential target Control improper energy input The reiterative process used in PHA to bring the risk associated with a hazard-target combination under acceptable levels has direct parallels in EFBA. The EFBA is customarily documented using a tabular format similar to that used for the PHA. Many analysts incorporate an EFBA approach when performing a PHA, and thus view EFBA as a variant of PHA. EXAMPLE Table IV-I lists strategies to manage harmful energy flows that focus on the energy source, the target, and the path between the source and the target. Included are physical and administrative barriers.

IV-4

Table IV-I. Examples of strategies to manage harmful energy flow Strategy


Eliminate energy concentrations

Examples
controllIimit floor loading disconnect/remove energy source from system · remove combustibles from welding site change to nonflammable solvent store heavy loads on ground floor lower dam height reduce system design voltage/operating pressure use small(er) electrical capacitors/pressure accumulators reducel control vehicle speed monitorlIimit radiation exposure substitute less energetic chemicals heavy-wall pipe or vessels interlocks tagout - lockouts double-walled tankers wheel chocks flow restrictors in discharge lines resistors in discharge circuits fuses/circuit interrupters evacuate explosive test areas impose explosives quantity-distance rules install traffic signals use yellow no-passing lines on highways control hazardous operations remotely · guard rails · toe boards · hard hats face shields machine tool guards dikes grounded appliance frames/housing safety goggles cushioned dashboard fluted stacks · padded rocket motor test cell interior · Whipple plate meteorite shielding · breakaway highway sign supports foamed runways select superior material substitute forged part for cast part "harden" control room bunker cross-brace transmission line tower

Limit quantity and lor level of energy

Prevent energy release

Modify rate of energy release

Separate energy from target in time and/or space

Isolate by imposing a barrier

Modify target contact surface or basic structure

Strengthen potential target

Control improper energy input

use coded

keyed electrical connectors use match-threaded piping connectors use back flow preventors

©1997 Examples provided courtesy Sverdrup Technology. Inc., Tullahoma, Tennessee [1].

ADVANTAGES

The energy flowlbarrier analysis provides a systematic approach to identify hazards


associated with energy sources and determine whether current or planned barriers are adequate countermeasures to protect exposed targets [1].

IV-5

· LIl\flTATIONS

The EFBA has the following limitations [1]: Even after a thorough analysis, all hazards might not be discovered. Like the PHA (Lesson III), the EFBA fails to assess risks of combined hazards or co-existing system failure modes. This tool also fails to identify certain classes of hazards, e.g.: asphyxia in oxygendeficient confined spaces. Because of design and performance requirements, it is not always obvious that energy may be reduced or redirected. A re-examination of energy as heat, potential vs. kinetic mechanical energy, electrical, chemical, etc. may aid this thought process.

IV-6

REFERENCES 1. Clemens, PL [1993] Energy flow/barrier analysis (lecture presentation). 3rd ed. Tullahoma, IN: Sverdrup Technology, Inc. (see http://www.sverdrup.comlsvt for presentation slides).

SUGGESTED READINGS U.S. Department of Energy [1985]. Barrier analysis. Idaho Falls, ID: System Safety Development Center, EG&G Idaho, Inc. DOE 76-45129, SSDC-29. Haddon, W, Jr. [1973]. Energy damage and the ten countermeasure strategies. Human factors. (August).

Johnson, WG [1980]. MORT safety assurance systems. New York: Marcel Dekker, Inc. Stephans, RA, Talso, WW, eds. [1997]. System safety analysis handbook. 2nd ed. Albuquerque, NM: New Mexico Chapter of the System Safety Society.

IV-7

SAMPLE DISCUSSION 1. 2. 3. 4. S. 6. 7. 8. 9.

AND EXAMINATION

QUESTIONS

What is the basis for energy flowlbarrier analysis (EFBA)? What types of energy sources can be accommodated through EFBA? What is the difference between energy flowlbarrier analysis and energy tracelbarrier analysis? Are all barriers physical? If not, give examples of those barriers that are not physical in nature. Give an example of a combination of barriers that is used to protect a target. Give examples of administrative barriers. What is the relationship between preliminary hazard analysis (PHA) and EFBA? What type of format is used to document an EFBA? How might EFBA be used to make a "safe to enter" decision after a process plant accident or in an emergency response situation? 10. Pick an industrial situation with which you are familiar and apply EFBA to assess the risk posed by an energy flow-target combination.

IV-8

LESSON V FAILURE MODES AND EFFECTS ANALYSIS (FAILURE MODES, EFFECTS, AND CRITICALITY ANALYSIS)

PURPOSE: OBJECTIVE:

To introduce the student to the procedures and applications of failure modes and effects analysis (failure modes, effects, and criticality analysis). To acquaint the student with the following: 1. Basic logic offailure modes and effects analysis (FMEA) or a failure modes, effects, and criticality analysis (FMECA) 2. Procedure for performing a FMEA or FMECA 3. Typical format of FMEAlFMECA analysis worksheet 4. Advantages ofFMEAlFMECA 5. Limitations of FMEAlFMECA 6. Role ofFMEAlFMECA in an integrated system safety program 1. 2. 3. Failure. . Mode Effect Fault Criticality Probability Severity Risk Single-point failure System Subsystem Assembly Subassembly Component Worst-credible

SPECIAL
TERMS:

4.
5. 6.

7. 8.
9. 10. 11. 12. 13. 14. 15 .

V-I

V-2

DESCRIPTION

A failure modes and effects analysis (FMEA) is a forward logic (bottom-up), tabular technique that explores the ways or modes in which each system element can fail and assesses the consequences of each of these failures [1]. In its practical application, its use is often guided by top-down "screening" (as described in the "Procedures" section) to establish the limit of analytical resolution. A failure modes, effects, and criticality analysis (FMECA) also addresses the criticality or risk of individual failures. Countermeasures can be defmed for each failure mode, and consequent reductions in risk can be evaluated. FMEA and FMECA are useful tools for cost and benefit studies, to implement effective risk mitigation and countermeasures, and as precursors to a fault tree analysis (see Lesson VI). Contemporary analysts are coming to recognize FMEA (and FMECA) as the technique of choice to identify potential single-point failures within a system. Applying FMEA to complex systems having redundancy-rich architecture fails to identify or evaluate probability or penalty for system "crashes." It cannot be relied on, therefore, to produce meaningful results in cost-benefit studies. Logic tree methods (fault tree analysis, event tree analysis, and cause consequence analysis) are now viewed as generally more useful for this purpose. See http://sverdrup.comlsvt for presentation slides which support this lesson. It is important to remember that each analytical technique discussed in this module complements (rather than supplants) the others. This is because each technique attacks the system to be analyzed differently-some are top-down, others are bottom-up. Though it has long been sought, there is no "Swiss army knife" technique that answers all questions and is suitable for all situations.

APPLICATION

An FMEA can call attention to system vulnerability to failures of individual components.


Single-point failures can be identified. This tool can be used to provide reassurance that the cause, effect, and associated risk (FMECA) of component failures have been appropriately addressed. These tools are applicable within systems or at the system-subsystem interfaces and can be applied at the system, subsystem, component, or part levels. These failure mode analyses are typically performed during the design and development phase. During this phase, these analyses can be done with or shortly after the PHA (Lesson III). The vulnerable points identified in the analyses can aid management in making decisions to allocate resources in order to reduce vulnerability.

PROCEDURES

Procedures for preparing and performing FMECAs are presented below [1]. Procedures for preparing an FMEA are the same, with Steps 8 through 12 omitted. Steps before performing the FMEA or FMECA: (1) Defme the scope and boundaries of the system to be assessed. Gather pertinent information relating to the system, such as requirement specifications, descriptions, drawings, components and parts lists, etc. Establish the mission phases to be considered in the analysis. (2) Partition and categorize the system into convenient and logical elements to be analyzed. These system elements include subsystems, assemblies, subassemblies, components, and piece parts. (3) Develop a numerical coding system that corresponds to the system breakdown (see Figure V-I).

V-3

I Subsystem 4

Subassembly 2

Component

Part 2

Part 3

Typical Coding System; Subsystem For example.

No. - Assembly

No. - Subassembly

No. - Component

No. - Part No.

code number for part 2 above is 03-01-03-01-02

Figure adapted from [1].

, Figure V-I. Example of system breakdown and numerical coding

Steps in performing the FMEA or FMECA: (4) Identify resources of value to be protected, such as personnel, facilities, equipment, productivity, mission or test objectives, environment, etc. These resources are potential targets. (5) Identify and observe the levels of acceptable risk that have been predetermined and approved by management or the client. These limits may be the risk matrix boundaries defined in a risk assessment matrix (see Lesson II). (6) By answering the following questions [1], the scope and resources required to perform a classic FMEA can be reduced, without loss of benefit: Will failure of the system render an unacceptable or unwanted loss? If the answer is no, the analysis is complete. Document the results. (This has the additional benefit of providing visibility of non-value added systems, or it may correct incomplete criteria used for the FMEA.) If the answer is yes, ask the following question for each subsystem identified in Step 2:

V-4

Will failure of this subsystem render an unacceptable or unwanted loss? If the answer for each subsystem is no, the analysis is complete. Document the results. If the answer is yes for any subsystem, ask the following question for each assembly of those subsystems identified in Step 2: Will failure of this assembly render an unacceptable or unwanted loss? If the answer for each assembly is no, the analysis is complete. Document the results. If the answer is yes for any assembly, ask the following question for each component of those assemblies identified in Step 2: Will failure of this subassembly render an unacceptable or unwanted loss? If the answer for each subassembly is no, the analysis is complete. Document the results. If the answer is yes for any subassembly, ask the following question for each component of those subassemblies identified in Step 2: Will failure of this component render an unacceptable or unwanted loss? If the answer for each component is no, the analysis is complete. Document the results. If the answer is yes for any component, ask the following question for each part of those components identified in Step 2: Will failure of this part render an unacceptable or unwanted loss?

(7) For each element (system, subsystem, assembly, subassembly, component, or part) for which failure would render an unacceptable or unwanted loss, ask and answer the following questions: • What are the failure modes for this element? What are the effects (or consequence) of each failure mode on each target? (8) Assess worst-credible case (not the worst-conceivable case) severity and probability for each failure mode, effect, and target combination. (9) Assess risk of each failure mode using a risk assessment matrix (see Lesson II). The matrix should be consistent with the established probability interval and force or fleet size for this assessment. (10) Categorize each identified risk as acceptable or unacceptable. (11) If the risk is unacceptable, then develop countermeasures (12) Then re-evaluate the risk with the new countermeasure to mitigate the risk. installed.

(13) If countermeasures are developed, determine if they introduce new hazards or intolerable or diminished system performance. If added hazards or degraded performance are unacceptable, develop new countermeasures and re-evaluate the risk.

V-5

(14) Document your completed analysis on an FMEA or FMECA worksheet. The contents and formats of these worksheets vary among organizations. Countermeasures mayor may not be listed. Figure V-2 presents a flowchart for FMEA or FMECA. Figure V-3 presents a sample FMEA worksheet. Appendix C'gives an additional sample FMEA worksheet.

V-6

FMEA Process Flow

...

-.,_

:....

14111'i:--'""-.........;~

REPEAT for each ~ ~ MODElEFFECTfT ARGET ~ ~ ccmbination. ~

,~

~.;

2
DEVELOP COUNTERMEASURES ~

USE RISK MATRIX... ~ MATRIX must be defined for and ~ must match the assessment Probability Interval and , Force/Fleet Size.

,
~ ..
~ ~ ~ ~ }

...:..

ACCEPT (WAIVER)

ABANDON

Do the countenneasures introduce NEW hazards? ... or,

Do the countermeasures IMPAIR system periormance? ... if so, develop NEW COUNTERMEASURES!

© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma. Tennessee [1J Figure V-2. Failure modes, effects, (and criticality) analysis process flowchart

V-7

FMEANO: PROJECT NO.: NO.:

FAll..URE MODES, EFFECTS, AND CRITICALITY ANALYSIS WORKSHEET

SHEET DATE

OF

SUBSYSTEM SYSTEM PROB. NO.:

PREPARED REVIEWED APPROVED

BY: BY BY:

INTERVAL:

. TARGETIRESOURCE

CODE;

P - PERSONNEL

IE -IlQUlPMENT

I T - OOWNTIME

I R - PRODUCT

I D - DATA I V - ENVIRONMENT

T a Risk Assessment Action Required!

Item!
Id. No. Functional Identification

Failure Mode

Failure Cause

Failure Event

g
e

S
e v

P r
0

C
0

Comments

i s
k

d e

..

Figure adapted from [1J.

Figure V-3_ Typical failure modes, effects, and criticality analysis worksheet

EXAMPLE

A sample FMECA[l] is illustrated in Figure V-4. The system being assessed is an automated mountain climbing rig. A schematic of the system is presented in Figure V-4.a., and Figure V-4.b illustrates the breakdown and coding of the system into subsystem, assembly, and subassembly elements. A FMECA worksheet for the control subsystem is presented in Figure V-4.c.

V-8

Figure V-4a. System

Subsystem
Hoist (A)

Assemblv
Motor (A-D1)

Subassemblv
Windings (A-01-a) Inboard bearing (A-01-b) .Outboard bearing (A-D1-c) Rotor (A-01-d) Stator (A-D1-e) Frame (A-OH) Mounting plate (A-D1-g) Wiring terminals (A-D1-h)

Drum (A-02) External power source (B) Cage (c) Frame (C-01) Lifting lug (C-02) Cable (0-01) Hook (0-02) Pulleys (0-03) Electrical (E-01) START Switch(E-01-a) FULL UP LIMIT Switch(E-D1-b) Wiring (E-01-c)

Cabling (D)

Controls (E)

Operator (E-D2)

Figure V-4b. System breakdown and coding


© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee [I}

Figure V -4. Example of a failure modes, effects, and criticality analysis

V-9

FMEANO; PROJECT NO.: ~

FAILURE

MODES, EFFECTS, ANALYSIS

SHEET DATE

OF

SlJ.6SYSTEMNO.: SYSTEM PROB. NO.:

AND CRITICALITY
30

PREPARED REVIEWED APPROVED

BY: BY, BY,

Mountain Climbing Rig :tears

WORKSHEET

INTERVAL:

TARGETIRESOURCE

CODE:

PERSONNEL

I Eo EQUIPMENT

ITo

DOWNTIME

IR

PRODUCTS

ID

DATA

I V ENVIRONMENT
0

T Item! Id. No. Functional Identification Failure Mode Failure Cause Failure Event a r

Risk Assessment Action Required/ Comments

g
e t

S
e v IV IV IV

P
r
0

R i s k

C
0

d e

b E-Ol-a Start Switch Switch fails closed. Mechanical failure or corrosion. E-Ol-b Full Up Switch Switch fails open. Mechanical failure or corrosion. E-02 Wiring Cut, Disconnected. Varmint invasion, faulty assembly No response a switch. Start Cage does not stop. Cage will not move.

P
E T P

C C C A

3 3 3
1

II

P
E T

IV IV IV

D D D

3 3 3

switch fails open. Stop switch fails closed. Cage stays in safe position.

Figure V-4c. Worksheet Figure V-4. Example of a failure modes, effects, and criticality analysis (concluded)

ADVANTAGES

Performing FMEAs and FMECAs has the following advantages [1]: Provides an exhaustive, thorough mechanism to identify potential single-point failures and their consequences. An FMECA provides risk assessments of these failures. Results can be used to optimize reliability, optimize designs, incorporate "fail safe" features into the system design, obtain satisfactory operation using equipment of "low reliability," and guide in component and manufacturer selection. Provides further analysis at the piece-part level for high-risk hazards identified in aPHA. Identifies hazards caused by failures that may have been previously overlooked in the PHA. These can be added to the PHA.

V-IO

Provides a mechanism for more thorough analysis than a fault tree analysis, since every failure mode of each component of the system is assessed [6]. LIMITATIONS The following limitations are imposed when performing FMEAs and FMECAs [1]: Costly in man-hour resources, especially when performed at the parts-count level within large, complex systems. Probabilities or the consequences of system failures induced by co-existing, multiple-element faults or failures within the system are not addressed or evaluated. Although systematic, and guidelines/check sheets are available for assistance, no check methodology exists to evaluate the degree of completeness of the analysis. This analysis depends heavily on the ability and expertise of the analyst for finding all necessary modes. Human error and hostile environments frequently are overlooked. Failure probability data are often difficult.to obtain for a FMECA. If too much emphasis is placed on identifying and eliminating single-point failures, then focus on more severe system threats (posed by co-existing failures/faults) may be overlooked. A FMECA can be a very thorough analysis suitable for prioritizing resources to higher risk areas if it can be performed early enough in the design phase. However, the level of design maturity required for a FMECA is not generally achieved until late in the design phase, often too late to guide this prioritization.

/~.

V-ll

REFERENCES 1. Mohr, RR [1992]. Failuremodes and effects analysis (lecture presentation). 6th ed. Tullahoma, TN: Sverdurp Technology. (available at http://www.sverdrup.com!svt).

SUGGESTED READINGS Layton, D [1989]. System safety - including DOD standards. Chester, OH: Weber Systems Inc. Lees, FP [1980]. Loss prevention in the process industries (2 volumes). London: Butterworths. Raheja, DG [1991]. Assurance technologies - principles and practices. New York: Prentice-Hall, Inc. Roberts, NH, Vesely, WE, Haasl, DF, Goldberg, FF [1981]. Fault tree handbook. Washington, DC: U.S. Government Printing Office, NUREG-0492. Roland, HE, Moriarty, B [1990J. System safety engineering and management. 2nd ed. New York: John Wiley & Soos. . Stephans, RA, Talso, WW, eds. [1997].System safety analysis handbook. 2nd ed. Albuquerque, NM: New Mexico Chapter of the System Safety Society. U.S. Department of Defense [1980J. Procedures for performing a failure modes, effects, and criticality analysis. MIL-STD-1629A.

V-12

SAMPLE DISCUSSION AND EXAMINATION 1. 2. 3. 4. 5. 6. 7. 8. 9.

QUESTIONS

What is the difference between a fault and a failure? What is a single-point failure? Does a classic FMEA allow prioritization of single-point failures? Why or why not? What are the differences between an FMEA and an FMECA? Describe a strategy for minimizing the time required to perform an FMECA (as well as the size of the resulting document). What is a major weakness of the FMEA or FMECA technique? Can an FMEA be (usefully) performed in the conceptual design phase of a project? Can anFMECA be started before a risk assessment matrix has been constructed? How does FMEA deal with co-existing faults/failures?

The instructor may obtain presentation slides for a workshop problem entitled, "Furry Slurry Processing" at http://www.sverdrup.comlsvt.

V-13

LESSON VI
RELIABILITY BLOCK DIAGRAM

PURPOSE: OBJECTIVE:

To introduce the student to the procedures and application of reliability block diagram analysis. To 1. 2. 3. 4. 5. 1. 2. 3. 4. 5. 6. acquaint the student with the following: Symbology of reliability block diagram Depiction of series and parallel circuits Procedures for performing reliability block diagram analysis Reliability bands System reliability Reliability Series circuit Parallel circuit Series-parallel circuits Parallel-series circuits Reliability band

SPECIAL TERMS:

VI-l

VI-2

DESCRIPTION

A reliability block diagram (RBD) is a backwards (top-down) symbolic logic model. generated in the success domain. Each RBD has an input and an output and flows left to right from the input to the output. Blocks may depict the events or system element functions within a system. However, these blocks typically depict system element functions only. A system element can be a subsystem, subassembly, component, or part [1,2]. Simple RBDs are constructed of series, parallel, or combinations of series and parallel elements (see Table VI-I). Each block represents an event or system element function. These blocks are connected in series if all elements must operate successfully for the system to operate successfully. These blocks are connected in parallel if only one element needs to operate successfully for the system to operate successfully. A diagram may contain a combination of series and parallel circuits. The system operates if an uninterrupted path exists between the input and output [1,2].

Table VI-I.
Type of Circuit
Series

Simple reliability block diagram construction


System Reliability
#.

Block Diagram Representation

Parallel

R, = (1 - [(1-R (l-Ram
A )

(1- [(I-RJ (l-~)])

Parallel-Series

R, = I -[(l-(RARa))
(I-(RA))]

# Assumes all components function independently of each other.

RBDs illustrate system reliability. Reliability is the probability of successful operation over a defined time interval. Each element of a block diagram is assumed to function (operate successfully or fail) independently of every other element. The relationships between element reliability and system reliability for series and parallel systems are presented below, and their derivations are found in [2].

VI-3

Series Systems
n

Rg=UR;
I

=RRR
I 2

3.

....

n,

Parallel Systems
n

Rs=l-II(1-Ri)=[l-[(l-Rl
I

)(1-~)(1-~)

····(l-~)]]

where

Rs =
R;
n
= =

system reliability, system element reliability, and number of system elements (which are assumed to function independently)

Not all systems can be modeled with simple RBDs. Some complex systems cannot be modeled with true series and parallel circuits. These systems must be modeled with a complex RBD, as presented in Figure VI-l. Notice that in this example, if component E fails, then paths B, E, G and B, E, H are not success paths. Thus, this is not a true series or parallel arrangement.

Figure VI-I. Typical complex reliability block diagram

It is important to remember that each analytical technique discussed in this module complements (rather than supplants) the others. This is because each technique attacks the system to be analyzed differently-some are top-down, others are bottom-up. Though it has long been sought, there is no "Swiss army knife" technique that answers all questions and is suitable for all situations. APPLICATION An RBD allows evaluation of various potential design configurations [2]. Required subsystem and element reliability levels can be determined to achieve the desired system reliability. Typically, these functions are performed during the design and development phase. An RBD may also be used to identify elements and logic as a precursor to performing a fault tree analysis (Lesson VII).

VI-4

~-

PROCEDURES

The. procedures (adapted from [2]) to generate a simple RBD are presented below. (1) Divide a system into its elements. A functional diagram of the system is helpful. (2) Construct a block diagram using the convention illustrated in Table VI-4. (3) Calculate system reliability band, RsL (low) to RSH (high), from each individual element's reliability band , ~L (low) to ~H (high), in the following manner: a. For series systems with n elements that are to function independently,
n

b.

For parallel systems with n elements that are to function independently,


n

RSL = I-II (I-RpL)


I

[l-[(l~RIL

)(l-RzL ) (1-R3L ) ....

(l-RuLm

RSH =

I-IT (l-RpH)
I

= [1-[(l-RlH

)(l-RzH) (1-R3H ) ....

(l-RnJd]]

Note: The reliability band is analogous to a confidence interval for the reliability of an individual element or system. For an individual element, the reliability band ranges from a low (~L) to a high to a (~H) estimate, both of which are selected by the analyst, on the basis of available data. Using a mathematical representation of the system, the corresponding reliability band [with a range from RSL (low) to RSH (high)] for a system is calculated from the individual element reliability bands. , . c. For series-parallel systems, first determine the reliability for each parallel branch using the equations in Item 3b. Then treat each parallel branch as an element in a series branch and determine the system reliability by using the equations in Item 3a. For parallel-series systems, first determine the reliability for each series branch using the equations in Item 3a. Then treat each series branch as an element in a parallel branch and determine the system reliability by using the equations in Item 3b. For systems that are composed of the four above arrangements, determine the reliability for the simplest branches. Then, treat these as branches within the remaining block diagram, and determine the reliability for the new simplest branches. Continue this process until one of the above four basic arrangements remains. Then determine the system reliability.

d.

e.

EXAMPLE

A system has two subsystems designated 1 and 2. Subsystem 2 is a backup for subsystem 1. Subsystem 1 has three components and at least one of the three must function successfully for the subsystem to operate. Subsystem 2 has three components that all need to function successfully for the subsystem to operate. Table VI-2 present the estimated reliability band for each component over the system's estimatedlO-year life interval.

VI-5

Table VI,.2. Reliability bands for example system


Subsystem 1 1 1 2 2 2 . Component
A

B C
D E

Reliability Bands Low High 0.70 0.72 0.80 0.84 0.60 0.62 0.98 0.99 0.96 0.97 0.98 0.99

Figure VI~2 presents an RBD for the system. Note that the components for subsystem 1 are in a parallel circuit with the components of subsystem 2. Also note that the components for subsystem 1 form a series circuit and the components for subsystem 2 form a parallel circuit.

Figure VI-2. Example reliability block diagram Calculations for subsystem and system reliabilities are presented below: Subsystem 1: RIL = 1-[(1-0.70)(1-0.80)(1-0.60)] RIH = 1-[(1-0.72)(1-0.84)(1-0.62)]
~L ~H = = =

= 0.983 (High band value)


(Low band value) (High band value) 0.998 0.999 (Low band value) (High band value)

0.976

(Low band value)

Subsystem 2:

(0.98)(0.96)(0.98) (0.99)(0.97)(0.99)

= =

0.922 0.951
= =

System:

RsL "'" 1-[(1-0.976)(1-0.922)] RsH = 1-[(1-0.983)(1-0.951)]

Therefore, the reliability band for the system is 0.998 to 0.999. ADVANTAGES An RED hasthe following advantages: Allows early assessment of design concepts when design changes can be readily and economically incorporated [2]. Tends to be easier for an analyst to visualize than other logic models such as a fault tree [1]. Blocks representing elements in an RBD can be arranged in a manner that represents how these elements function in the system [1].

VI-6

Since RBDs are easy to visualize, they can be generated before performing a fault tree analysis and transformed into a fault tree by the method discussed in Lesson X. LIMITATIONS An RED has the following limitations: Systems must be broken down into elements for which reliability estimates can be obtained. Such a breakdown for a large system can be a significant effort [2]. System element reliability estimates might not be readily available for all elements. Some reliability estimates may be very subjective, difficult to validate, and not be accepted by others in the decision making process. If the element reliability values have different confidence bands, this can lead to significant problems. Not all systems can be modeled with combinations of series, parallel, seriesparallel, or parallel-series circuits. These complex systems can be modeled with a complex RBD. However, determining system reliability for such a system is more difficult than for a simple RBD [1, 2].

VI-7

REFERENCES

1.

Gough, WS, Riley.T, Koren, JM: [1990]. A new approach to the analysis of reliability block diagrams. Proceedings from annual reliability and maintainability symposium. SAlC, Los Altos, CA. -Kampur, KC, Lamberson, LR [1977]. Reliability in engineering design. New York: John Wiley & Sons.

2.

SUGGESTED READINGS

Pages, A, Godran, M [1986]. System preliminary evaluation and prediction in-engineering. New York: Springer Verlag.

VI-8

SAMPLE DISCUSSION 1. 2. 3. 4. 5. 6. 7.

AND EXAMINATION

QUESTIONS

During what project phase can a reliability block diagram be constructed? Name four circuit types that can be modeled with a reliability block diagram. In reliability block diagrams, are the elements assumed to operate (or not operate, as the case may be) independently of one another? Write an expression for the reliability of a circuit consisting of three resistors (a, b, and c) arranged in series. Write an expression for the reliability of a circuit consisting of four resistors (d, e, f, and g) arranged in parallel. Diagram a series-parallel circuit. Diagram a parallel-series circuit.

VI-9

LESSON VII FAULT TREE ANALYSIS

PURPOSE: OBJECTIVE:

To introduce the student to the procedures and applications of fault tree analysis. To acquaint the student with the following: 1. Logic of fault tree analysis 2. Procedures for fault tree analysis 3. Symbology for fault tree analysis 4. Procedures for calculating top event probability 5. Procedures for determining cut sets and cut set probability 6. Procedures for determining path sets 7. Probability propagation through logic gates 8. . Rare event approximation for propagating failure probabilities through OR gates 9. Exact solution of OR gate failure probabilities 10. Structural and quantitative significance of cut sets 11. Log-average method of probability estimation 12. Application, advantages, and limitations of fault tree analysis 1. 2. 3. 4. 5. AND gate OR gate INHIBIT gate External event Undeveloped event Conditioning event Basic event Top event Contributor Intermediate event Necessary and sufficient conditions Cut set Cut set probability Cut set importance Item importance Path set Delphi technique Boolean-indicated cut sets Minimal cut sets

SPECIAL TERMS:

6. 7.
8. 9. 10.· 11. 12. 13. 14. 15. 16. 17. 18. 19.

VII-l

VII-2

DESCRIPTION

A fault tree analysis (FTA) is a top-down symbolic logic model generated in the failure domain. This model traces the failure pathways from a predetermined, undesirable condition or event, called the TOP event, of a system to the failures or faults (fault tree initiators) thai could act as causal agents. Previous identification of the undesirable event also includes a recognition of its severity. An FTA can be carried out either quantitatively or subjectively [1]. The FTA includes generating a fault tree (symbolic logic model), entering failure probabilities for each fault tree initiator, propagating failure probabilities to determine the TOP event probability, and determining cut sets and path sets. A cut set is any group of initiators that will, if they all occur, cause the TOP event to occur. A minimal cut is a least group of initiators that will, if they all occur, cause the TOP event to occur. A path set is a group of fault tree initiators that, if none of them occurs, will guarantee that the TOP event cannot occur. See http://www.sverdrup.comlsvt for supporting presentation slides. The probability of failure for an event is defmed as the number of failures per number of attempts. This can be expressed as: PF
=

F/(S+F) , where F

number of failures and S = number of successes

Since reliability for an event is defmed as the number of successes per number of attempts, then the relationship between the probability of failure and reliability can be expressed as follows: R = S/(S+F), therefore R + PF

= S/(S+F) + F/(S+F) = 1 and

PF= l-R
It is important to remember that each analytical technique discussed in this module complements (rather than supplants) the others. This is so because each technique attacks the system to be analyzed differently-some are top-down, others are bottom-up. Though it has long been sought, there is no "Swiss army knife" technique that answers all questions and is suitable for all situations. APPLICATION FTAs are particularly useful for high-energy systems (i.e., potentially high severity events), to ensure that an ensemble of countermeasures adequately suppresses the probability of mishaps. An FTA is a powerful diagnostic tool for analysis of complex systems and is used as an aid for design improvement. This type of analysis is sometimes useful in mishap investigations to determine cause or to rank potential causes. Action items resulting from the investigation may be numerically coded to the fault tree elements they address, and resources prioritized by the perceived highest probability elements. Fault tree analyses are applicable both to hardware and non-hardware systems and allow probabilistic assessment of system risk as well as prioritization of the effort based upon root cause evaluation. The subjective nature of risk assessment is relegated to the lowest level (root causes of effects) in this study rather than at the top level. Sensitivity studies can be performed allowing assessment of the sensitivity of the TOP event to basic initiator probabilities. FT As are typically performed in the design and development phase, but may also be performed in thefabrication, integration, test, and evaluation phase. FTAs can be used to identify cut sets and initiators with relatively high failure probabilities. Therefore, deployment of resources to mitigate risk of high-risk TOP events can be optimized.

VII-3

-PROCEDURES

The procedures for perfonningan FTA are. presented below. These procedures are divided into the four phases: (1) fault tree generation, (2) probability determination, (3) identifying and assessing cut sets, and (4) identifying path sets. The analyst does not have to perform all four phases, but can progress through the phases until the specific analysis objectives are met. Table VII-l summarizes the benefitsfor.the four procedural phases. Table VII-I. Fault tree analysis procedures

Procedural Phases 1. Fault tree generation

Benefits
All basic events (initiators), intermediate events, and the TOP event are identified. A symbolic logic model illustrating fault propagation to the TOP event is produced. Probabilities are identified for each initiator and propagated to intermediate events and the TOP event. All cut sets and minimal cuts sets are determined. A cut set is any group of initiators that will, if they all occur, cause the TOP event to occur. A minimal cut set is a least group of initiators that, if they all occur, will cause the TOP event to occur. Analysis of a cut set can help evaluate the probability of the TOP event, identify qualitative common cause vulnerability, and assess quantitative common cause probability. Cut sets also enable analyzing structural, quantitative, and item significance of the tree. All path sets are determined. A path set is a group of fault tree initiators that, if none of them occurs, will guarantee the TOP event cannot occur.

2. Probability

determination

3 _Identifying

and assessing cut sets

4. Identifying

path sets

The procedural phases listed in Table VII-l are further described in the following section.

Fault Tree Generation

Fault trees are constructed with various event and gate logic symbols. These symbols are defined in Table VII-2. Although many event and gate symbols exist, most fault trees can be constructed with the following four symbols: (I) TOP or intermediate event, (2) inclusive OR gate, (3) AND gate, and (4) basic event. Figure VII-I illustrates the procedures to construct a fault tree [1]. A frequent error in fault tree construction is neglecting to identify common causes. A common cause is a condition, event, or phenomenon that will simultaneously induce two or more elements of the fault tree to occur. A method for detecting common causes is described in Section 3 (item 8). Additional details are included in the latter sections of this lesson to provide insight into the mathematics involved in the commercially available fault tree programs. All large trees are typically analyzed using these programs; for small trees, hand analysis may be practical,

VII-4

Table VII-2. Fault tree construction symbols


Symbol Name Description TOP Event- This is the conceivable, undesired event to which failure paths oflower level events lead. Intermediate Event- This event describes a system condition produced by preceding events.

Event (TOP or Intermediate)

of<

Q
~ ~ ~

Inclusive OR Gate ...

An output occurs if one or more inputs exist. Any single input is necessary and sufficient to cause the output event to occur. Refer to Table VII-3 for additional information.

Exclusive OR Gate

An output occurs if one, but only one input exists. Any single input is necessary and sufficient to cause the output event to occur. Refer to Table VIl-3for additional information.

Mutually Exclusive OR Gate

An output occurs if one or more inputs exist. However, all other inputs are then precluded. Any single input is necessary and sufficient to cause the output event to occur. Refer to Table VII-3 for additional information.

Q
~

AND Gate '"

An output occurs if all inputs exist. All inputs are necessary and sufficient to cause the output event to occur.

Priority AND Gate

An output occurs if all inputs exist and occur in a predetermined sequence. All inputs are necessary and sufficient to cause the output event to occur,

Basic Event '"

An initiating fault or failure that is not developed further. These events determine the resolution limit of the analysis. They are also called leaves or in itiators.

INHIBIT Gate

An output occurs if a single input event occurs in presence of an enabling condition. Mathematically treated as an AND Gate.

External Event

An event that under normal conditions occur. Probability = I .

is expected to

... Most fault trees can be constructed

with these four logic symbols.

i/~'

VII-5

o
Symbol

Name

Description

Undeveloped

Event

An event not further developed because of a lack of need, resources, or information.

Conditioning

Event

These symbols are used to affix conditions, restraints, or restrictions to other events.

Identify undesirable TOP event. Link contributors to TOP by logic gates. Identify first-level contributors. Link second-level contributors to TOP by logic gates.

Identify second-level contributors.

Repeat / continue ...

© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee [1] Figure VII-I. Fault tree construction process

Probability Determination

If a fault tree is to he used as a quantitative tool, the probability of failure must be determined for each basic event or initiator. Sources for these failure probabilities may be found from manufacturer's data, industry consensus standards, MIL standards, historical evidence (of the same or similar systems), simulation or testing, Delphi estimates, and the log average method. A source for human error probabilities is found in [2]. The Delphi technique derives estimates from the consensus of experts. The log average method is useful when the failure probability cannot be estimated but credible upper and lower boundaries can be estimated. This technique is described in [3] and is illustrated in Figure VII-2. Failure probabilities can also be determined from a probabilistic design analysis, as discussed in Lesson XV.

VII-6

If probability is not estimated easily, but upper and lower credible bounds can be judged ..•
• Estimate upper and lower credible bounds of probability for the phenomenon in question . • Average the logarithms of the upper and lower bounds. • The antilogarithm of the average of the logarithms of the upper and lower bounds is less than the upper bound and greater than the lower bound by the same factor. Thus, it is geometrically midway between the limits of estimation. 0.01 0.02 0.03 10.0316+
Log Average = Antilog LogPL +LogPu 2

0.04

0.05

0.07
I

0.1

PL Lower probability bound 10'"

'+

= Antilog -2-

(-2)

+ (-1)

~
= 10-1.5= 0.0316228

Pu Upper probability bound 10-1

Note that, for the example shown, the arithmetic average would be•.•

am; 0.1 = 0,055

l.e., 5.5 times the lower bound and 0.55 times the upper bound.
'REFERENCE: Briscoe. Glen J.; "Risk Management Guide;" System Safety Development Center; SSDC·,,; ODE 76-45111;September 1982.

© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee [1J Figure VII-2. Log average method ofprobability estimation

Probabilities must be used with caution to avoid the loss of credibility of the analysis. In many cases, it is best to stay with comparative probabilities rather than the "absolute" values. Normalizing data to a standard, explicitly declared, meaningless value is a useful technique here. Also, confidence or error bands on each cited probability number are required to determine the significance of any quantitatively driven conclusion. Once probabilities are estimated for all basic events or initiators, they are propagated through logic gates to the intermediate events and finally the TOP event. The probability of failure of independent inputs through an AND gate is the intersection of their respective individual probabilities. The probability of failure of independent events through an OR (inclusive) gate is the union of their respective individual probabilities. Propagation of confidence and error bands is performed simply by propagation of minimum and maximum values within the tree. Figure VII-3 illustrates the relationship between reliability and failure probability propagation of two and three inputs through OR (inclusive) and AND gates. Propagation offailure probabilities for two independent inputs through an AND and OR (inclusive) gate is conceptually illustrated in Figure VU-4. As shown in Figure VII-3, the propagation solution through an OR gate is simplified by the rare event approximation assumption. Figure VII-5 presents the exact solution for OR gate propagation. However, the use of this exact solution is seldom warranted. Table VII-3 presents the propagation equations for the logic gates, including the gates infrequently used.

VII-7

OR Gate

Either of two, independent, element failures produces syste~ failure. 1R + P RT - RARe

IFor 2 Inputsl
F

AND Gate

Both independent elements must fail to produce system = 11 _ failure. _ .' RT - RA + Rs (RARe)

PF = 1 - F\ PF = 1 - (RA + Re - RARe) PF = 1 -[(1 - PA) + (1 - Pe) - (1 - PA)(1 - Pa)]

PF = PAPS
"Rare event approximation" _

IFor 3 Inputs I
Omit for approximation

© 1997 Figure provided courtesy 0/ Sverdrup Technology, Inc., Tullahoma, Tennessee [1J Figure Vll-3. Relationship between reliability and failure probability propagation

AND Gate ••.

OR Gate ...

1 and 2 are INDEPENDENT events

© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee [1J Figure VII-4. Failure probability propagation through OR and AND gates

vrr-s

P,

p.

p. :

• P,

= (1 -

P,) \

~.L

p. = (1 - Po) p. = (1 - p.):

P,

p.

P3

The ip operator (U) is the co-function of pi (rr). It provides an exact solution for propagating probabilities through the OR gate. Its use is rarely justifiable.

PT =I1Pe = 1 -

(II (1-

P))

PT = 1 - [(1- P1) (1 - Pz) (1 - P3}'" (1 - Pn)]

© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee {I] Figure VII-5. Exact solution of OR gate failure probability propagation Table VII-3. Probability propagation expressions for logic gates
Symbol Name Inclusive OR Gate Venn Diagram Propagation Expressions

Exclusive OR Gate


PI P1
PI

PT= PI + P2 - (Pl2)
p=p+p#
T

P2
Pr= PI + Pz - [2 (PIP2)] P =p+p# T I
2

0
P2 P2

Mutually Exclusive OR Gate

Q
~
/

AND Gate t and (Priority AND Gate)

••
cu
PI P2

PT = PI + Pz

PT

Pl2

Most fault trees can be constructed with these two logic gates, Simplified expression for rare event approximation assumption,

VII-9

Identifying Cut Sets

A cut set is any group of initiators that will produce the TOP event, if all the initiators in the group occur. A minimal cut set is the smallest number (in terms of elements, not probability) of initiators that will produce the TOP event, if all the initiators in the group occur. One method of determining and analyzing cut sets is presented below. These procedures for determining cut sets are described in [1] and are based on the MOeUS computer algorithm attributed to J.B. Fussell. Analysis of a cut set can help evaluate.the probability of the TOP event, identify common cause vulnerability, and assess common cause probability. Cut sets also enable analyzing structural, quantitative, and item significance of the tree. Cut sets are determined via the following procedure: (1) Consider only the basic events or initiators (discarding intermediate events and the TOP event). (2) Assign a unique letter to each gate and a unique number to each initiator,.starting from the top of the tree. (3) From the top of the tree downwards, create a matrix using the letters and numbers. The letter for the gate directly beneath the TOP event will be the first entry in the matrix. Proceed through the matrix construction by (a) substituting the letters for each AND gate with letters for the gates and numbers of the initiators that input into that gate (arrange these letters and numbers horizontally in the matrix rows) and (b) substituting the letters for each OR gate with letters for the gates and numbers of the initiators that input into that gate (arrange these letters and numbers vertically in the matrix columns). (4) When all the gates/letters have been replaced, a final matrix is produced with only numbers of initiators. Each row of this matrix represents aBooleanindicated cut set. (5) Visually inspect the final matrix and eliminate any row that contains all elements of a lesser row. Next, through visual inspection, eliminate redundant elements within rows and rows that repeat other rows. The remaining rows defme the minimal cut sets of the fault tree.

Determining Cut Sets

Assessing Cut Sets

(6) A cut set is initiators in that the cut propagation

any group of initiators that will produce the TOP event, if all the the group occur. Thus, the cut set probability, PK (the probability set will induce the TOP event) is mathematically the same as the through an AND gate, expressed as:

(7) Determine common cause vulnerability by assigning unique letter subscripts for common causes to each numbered initiator (such as m for moisture, h for human operator, q for heat, v for vibration, etc.). Note that some initiators may have more than one subscript, while others will have none. Identify minimal cut sets for which all elements have identical subscripts. If any are identified, then the TOP event is vulnerable to the common cause the subscript represents. This indicates that the probability number, calculated as above, may be significantly in error, since the same event (the so-called common cause) could act to precipitate each event, i.e., they no longer represent statistically independent events.

VII-tO

(8) Analyze the probability of each common cause occurring, and inducing all terms within the affected cut set. (9) Assess the structural. significance of the cut sets to provide qualitative ranking of contributions to system failure. Assuming all other things are equal then: a. b. c. d. A cut set with many elements indicates low vulnerability. A cut set with few elements indicates high vulnerability. Numerous cut sets indicates high vulnerability. A cut set with a single initiator, called a singleton, indicates a potential singlepoint failure.

(10) Assess the quantitative importance, IK• of each cut set, K. That is, determine the numerical probability that this cut set induced the TOP event, assuming it has occurred.

where PK = the probability that the cut set will occur (see Item 6 above), and PT· = the probability of the TOP event occurring. (11) Assess the quantitative importance, Ie' of each initiator, e. That is, determine the numerical probability that initiator e contributed to the TOP event, if it has occurred.

where Ne = number of minimal cut sets containing initiator


IKe

e, and

= importance of the minimal cut sets containing initiator e.

Identifying Path Sets

A path set is a group of fault tree initiators that, if none of them occurs, ensures the TOP event cannot occur. Path sets can be used to transform a fault tree into a reliability diagram (see Lesson X). The procedures to determine path sets are as follows: (1) Exchange all AND gates for OR gates and all the OR gates for AND gates on the fault tree. (2) Construct a matrix in the same manner as for cut sets (see Determining Cut Sets, Steps 1-5). Each row of the final matrix defines a path set of the original fault tree.

VII-li

EXAMPLES Fault Tree Construction and Probability Propagation

Figure VII -6 gives an example of a fault tree with probabilities propagated to the TOP event. In this example, the TOP event is the "artificial wakeup fails." The system being examined consists of alarm clocks used to awaken someone. In this example, for brevity, only a nominal probability value for each fault tree initiator is propagated through the fault tree to the TOP event. However, for a thorough analysis, both low and high probability values that define a probability band for each initiator could be propagated through the fault tree to determine a probability band for the TOP event.

KEY:

Faults/Operation

8. x 10-3
211

Rate, Faults/Year

Assume 260 Operations/Year

4.

10-4 1110

2. X 10-4 1120

© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee [/ J

Figure VII-6. Example fault tree

Cut Sets

Figure VII-7 gives an example of how to determine Boolean-Indicated a fault tree.

minimal cut sets for

Path Sets

Figure VII-8 gives an example of how to determine path sets for a fault tree.

VII-12

PROCEDURE: • Assign letters to gates. (TOP gate is "A.") Do not repeat letters. • Assign numbers to basic initiators. If a basic initiator appears more than once, represent it by the same number at each appearance. • Construct a matrix, starting with the TOP "A" gate ...

TOP event gate is A, the initial matrix entry.

A is an AND gate. B & D, its inputs, replace it horizontally.

B is an OR gate. 1 & C, its inputs, replace it verti· cally. Each requires a new row.

C is an AND gate. 2 & 3, its inputs, replace it horizontally.

~ t!:fffj""'OOI&J!%Jt'"
D (top row), is an OR gate. 2 & 4, its inputs, replace it vertically. Each requires a new row.

12

223 14 243

Th ese Boolean- ndicate d Cut Sets ... ... reduce to these Minimal Cut Sets.

[I
3

D (2nd row), is an OR gate. Replace as before.

© 1997 Figure provided-courtesy

of Sverdrup Technology, Inc., Tullahoma, Tennessee {IJ

Figure VII-7. Example of determining cut sets

VII-13

Path Sets are least groups of initiators which, if they cannot occur, guarantee against TOP occurring. Tree has ...

12
13
1-:_1.,.-+-_4.,.-+-----l_-I

3456

... these Minimal Cut Sets

... and these Path Sets.

3 T4 15 T6 234
1

"Barring' terms (ff) denotes consderation of their ~ properties.

© 1997 Figure provided courtesy of Sverdrup Technology, Inc., Tullahoma, Tennessee [iJ Figure VII-8. Example of determining path sets ADVANTAGES An FTA has the following advantages [1]: Enable assessment of probabilities of combined faults/failures within a complex system. Single-point and common cause failures can be identified and assessed. System vulnerability and low-payoff countermeasures are identified, thereby guiding deployment of resources for improved control of risk. This tool can be used to reconfigure a system to reduce vulnerability. • Path sets can be used in trade studies to compare reduced failure probabilities with cost increases to implement countermeasures.

LIMITATIONS

An FTA has the following limitations [1]: Address only one undesirable condition or event that must be foreseen by the analyst. Thus, several or many fault tree analyses may be needed for a particular system. Fault trees used for probabilistic assessment of large systems may not fit or run on conventional PC-based software.

VII-14

The generation of an accurate probabilistic assessment may require significant time and resources. Caution must be taken not to "overwork" determining probabilities or evaluating the system; i.e, limit the size of the tree. • A fault tree is not accurate unless all significant contributors of faults or failures are anticipated. Events or conditions under the same logic gate must be independent of each other. A fault tree is flawed if common causes have not been identified. Events or conditions at any level of the tree must be independent and immediate contributors to the next level event or condition. The failure rate of each initiator must be constant and predictable. Specificmoncomparative) estimates of failure probabilities are typically difficult to find, to achieve agreement on, and to successfully use to drive conclusions. Comparative analyses are typically as valuable with better receptions from the program and design teams.

VII-1S

REFERENCES

1.
2. 3.

Clemens, PL [1993]. Fault tree analysis (lecture presentation). 4th ed .. Tullahoma, 1N: Sverdrup Technology, Inc. (see http://www.sverdrup.comlsvt for a set of presentation slides). Swain, AD, Guttman, HE [1980]. Handbook of Human Reliability Analysis with Emphasis on Nuclear Power Plant Applications. Washington, DC: U.S. Government Printing Office, NUREG/CR-1278. Briscoe, GJ [1982]. Risk Management Guide. Idaho Falls, ID: EG&G Idaho, Inc. SSDC-Il, 45/11. SUGGESTED READINGS DOE 76-

Crosetti, PA [1982]. Reliability and fault tree analysis guide. Washington, DC: Department of Energy No. DOE 7645/22. Dillon, BS, Singh, C [1981]. Engineering reliability - new techniques and applications. New York: John Wiley and Sons. Fussell, lB, Burdick, GR [1977]. Nuclear systems reliability engineering and risk assessment. Philadelphia, PA: Society for Industrial and Applied Mathematics. Gough,WS, Riley, J, Koren, JM [1990]. A new approach to the analysis of reliability block diagrams. Proceedings from annual reliability and maintainability symposium. SAle, Los Altos, CA. Hammer, W [1972]. Handbook of system and product safety. Englewood Cliffs, NJ: Prentice Hall. Henley, EJ, Kumamoto, H [1991]. Probabilistic risk assessment. New York: The Institute of Electrical and Electronic Engineers, Inc. Malasky, SW [1983] System safety: technology and application. New York: Garland Press. Roberts, NH, Vesely, WE, Haasl, DF, Goldberg, FF[1980]. Fault tree handbook. Washington, DC: U.S. Government Printing Office, NUREG-0492. Roland, HE, Moriarty, B [1990]. System safety engineering and management. 2nd ed. New York: John Wiley and Sons. Stephans, RA, Talso, WW, eds.[1997]. System safety analysis handbook. 2nd ed. Albuquerque, NM: New Mexico Chapter of the System Safety Society. Wynholds, W, Potterfield, R, Bass, L [1975]. Fault tree graphics - application to system safety. Proceedings of the second international system safety conference.

VII-16

SAMPLE DISCUSSION AND EXAMINATION 1. 2. 3. 4. 6. 7. 8. 9.

QUESTIONS

5.

10. 11. 12. 13.

Why do system safety analysts refer to fault tree analysis as a top-down or deductive technique? What is the first requirement for constructing a fault tree (where do you start)? What is the rare-event approximation, and why is it used in fault tree analysis? What role does Boolean algebra play in fault tree analysis? Why does the fault tree analyst determine and evaluate cut sets for a fault tree? What is meant by cut set importance (or how is it used)? What is meant by item importance (or how is it used)? What is the difference between a primary and a secondary component failure? Which symbols are traditionally used to depict primary and secondary component failures when constructing a fault tree? What is a primary advantage of fault tree analysis over failure modes and effects analysis? How can fault tree analysis be used to detect vulnerability to common cause failure? What is the purpose of assessing cut sets? What is the purpose of assessing path sets?

Lecture slides for workshop problems entitled, "Furry Slurry Processing," "Test Cell Entry," "Dual Hydraulic Brake System - a Flawed Fault Tree," "Auxiliary Feed Water System," "Rocket Motor Firing Circuit," "The Stage to Placer Gulch," and "Competing Redundant Valve Systems" are available at http://www.sverdrup.comlsvt.

VII-17

You might also like