Fault Tree Analysis (Reliability)
for Beginners
Introduction to Reliability & its
Applications
• The probability that an item can perform a
required function under given conditions for
a given time interval.
• Reliability is a measure of failures over a
period of time and applies to any
mechanical or electrical as well as human
tasks.
Bath - tub Curve
• Reliability applied to system components
which are assumed to have settled down
into steady state or useful life phase.
• The reliability characteristics of most
components follow the so called reliability
‘Bath - tub Curve’.
I II III
BURN-IN USEFUL LIFE WEAR-OUT
HAZARD RATE (OR NUMBERS
OF FAILURES)
1 2 3 30 YEARS
COMPONENT LIFE CYCLE
RELIABILITY BATHTUB CURVE
The three phases of the ‘Bath - tub
Curve’
• Phase I - Declining Hazard rate as weak
components are eliminated (Burn in period )
• Phase II - Approximately constant Hazard rate due
to chance failure (Useful life)
• Phase III - Increasing hazard rate due to age (Wear
out period)
COST IMPLICATION OF
RELIABILITY
• Reliability assessment and implementation
will optimise life cycle cost of equipment.
• The life cycle is: -
• Design and prototype development
• Manufacture
• In-service development
• Maintenance
• Obsolescence
• Withdrawal from service
• Eventual scrapping and disposal
Reliability Methodology
Mechanical
Components
Reliability Database
Reliability
Electrical Reliability Driven
Components Modelling & Design;
Reliability Database Analysis System Design
Performance
Program Monitoring
Criteria
Instrument
Components and Life Cycle
Reliability Database
Costs
Human Error
Reliability Database
Analysis Framework
• Define the existing system design concept
and various sub systems
• Identify the components (i.e. within the sub
systems)
• Specify the requirement and performance
criteria
• Apply reliability analysis to the overall
system
Some Useful Definitions
• Failure Rate; The number of failures experienced or expected for a
device divided by the total equipment operating time. The failure rate is
the numerical inverse of the mean time between failures (MTBF).
• Mean Time to Repair (MTTR); The total amount of time spent
performing all corrective maintenance repairs.
• Redundancy; The existence of more than one piece of equipment, any
of which could perform a given function.
• Fault Tree Analysis; A deductive, top-down method of analyzing
system design and performance based of system components failure
logic. It is performed graphically using a logical structure of AND, OR
and Voting gates.
• Common Cause Failure; An event or mechanism that can cause two or
more failures simultaneously is called a common cause.
Basics of Simple Reliability
Modelling
• Reliability Modelling is based on Boolean
Algebra and uses Fault Tree Analysis.
• Object for this Presentation is to simplify it
and use Excel spreadsheets for high level
applications (e.g. SIL high level
calculations).
Introduction to Fault Tree Logic Gates
Input 1
Output ‘OR’ Gate
Input n
Input 1
Output ‘AND’ Gate
Input n
Input 1
Output ‘Voting’ Gate
Input n
‘OR’ Gate Modelling
‘OR’ P1
Ptotal
Gate
P2
• For ‘OR’ Gate add Probability of Failures, thus P total = P1 + P2
• Example 1: For ESD loop to fail, any failure of Valve or Sensor can
cause total failure.
• Example 2. For household domestic power cut, any failure of power
transmission lines or power generation can cause supply failure.
‘AND’ Gate Modelling
‘AND’ P1
Ptotal Gate
P2
• For ‘AND’ Gate multiply Probability of Failures, thus P total = P1 x P2
• Example 1: For compressor damage caused by hydraulic imbalance, the
surge control valve should fail as well as the vibration interlock failure
to shutdown the compressor, i.e. there must be coincidence of
hazardous events.
• Example 2: For kitchen fire / explosion, oven gas should leak as well as
presence of an ignition source.
‘VOTING’ Gate Modelling
Device 1
PD
Ptotal ‘Voting’ Device 2
Gate
PD
Device 3
PD
• The most common form is “2 out of 3” sensors / switches.
• Example: 2oo3 pressure detection
• The detectors are commonly identical and have similar failure
probabilities ‘PD’
• 2 3
Ptotal 3PD (1 PD ) PD
How to Calculate Item Failure
Probability
• Basic Equation; R e t
• Reliability = e - Failure Rate x Time
• Failure Probability = 1 - Reliability
• At low values of λt (i.e. λ<<1) Reliability = 1 - λt
• Thus
Pfailure t
Revealed Failure Probability
Pfailure t
• Pfailure = Failure probability between 0 and 1
• t = Mean time to repair an item (MTTR) - hour
• λ= Failure rate (per million item hours)
Hidden (Dormant) Failure Probability
Pfailure MTTR
2
• θ = Maintenance Frequency - hour
• Pfailure = Failure probability between 0 and 1
• t = Mean time to repair an item (MTTR) - hour
• λ= Failure rate (per million item hours)
Failure Data
• Estimate component failure rate for reliability
analysis - use Generic historical performance
analysis of components or Vendor Data.
• Failure rate is the rate at which failure occurs as a
function of time or as a function of demand.
• It is expressed as the expected number of failures
of a given failure mode, per item, as failures per
million item hours.
Typical Generic Failure Data (OREDA -2002)
• Turbine Driven Centrifugal Compressor (10000-20000) kW Mean
Critical Failure Rate = 79.26e-6/hr and Active Repair Time = 5.6 hours
• Gas Turbine Aeroderivative (10000-20000) kW Mean Critical Failure
Rate = 591.75e-6/hr and Active Repair Time = 15.9 hours
• Centrifugal Oil Export Pump Mean Critical Failure Rate = 183.60e-
6/hr and Active Repair Time = 53.3 hours
• Conventional PSV Mean Critical Failure Rate = 3.84e-6/hr and Active
Repair Time = 6.3 hours
• ESD Valve Mean Critical Failure Rate = 18.82e-6/hr and Active
Repair Time = 3.5 hours
• Process Control Valve Mean Critical Failure Rate = 6.91e-6/hr and
Active Repair Time = 14.2 hours
• Level Process Sensor Mean Critical Failure Rate = 3.99e-6/hr and
Active Repair Time = 5.0 hours
Revealed Failure Probability Example
If the car gearbox failure rate is = 40e-6/hr (or
0.350 failures per year), then using the following
equation, Pfailure t
(It is revealed failure as car will not move if
gearbox is broken)
t = Mean time to repair car gearbox is 5 hrs
The failure probability is P = 0.0002
(i.e. reliability at 99.98%)
Hidden (or Dormant) Failure Probability
Example
If the car brake failure rate = 1.5e-6/hr (or 0.013 failures
per year), then using the following equation;
Pfailure MTTR
2
(It is hidden ‘or dormant’ as its failure is only revealed on
application)
θ = 4320 hrs (6 monthly maintenance frequency)
t = Mean time to repair car brake is 4 hrs
The failure probability is P = 0.0032
(i.e. Reliability of 99.68%)
Effect of Maintenance Frequency
on Dormant Failures
In Previous Example if change the maintenance
frequency;
θ = 8640 hrs (12 monthly maintenance frequency)
The failure probability is P = 0.0065
(i.e. Reliability of 99.35%)
Common Cause Failure
• Common Cause Failure; An event or mechanism that can
cause two or more failures simultaneously is called a
common cause.
• Applies only to dormant (hidden) failures.
• Use β-factor and incorporate in component failure rate;
Pfailure MTTR
2
• Use Concept of Diversity;
Redundant Systems β = 2x10-1
Partially Diverse Systems β = 2x10-2
Fully Diverse Systems β = 2x10-3
• Add it to Fault Tree as an Event
Common Cause Failure
• It is not easy to model CCF
• Use it carefully
• Apply it when not sure on the ‘components
quality’
Application of Reliability Modelling
Exothermic Reaction Reliability Analysis
Exothermic Reaction Reliability
• Feeds A and B are reacted to produce C.
• Feeds A and B, and Product C are flammable and, under
certain conditions, explosive.
• If the flow rates of either Feeds A or B exceed certain
levels, the reaction will runaway.
• If the reaction temperature is not controlled, the reaction
path can shift, resulting in a runaway reaction.
• The runaway reaction results in vaporisation of the
reactants and overpressure of the vessel.
• The overpressure is developed too quickly to be relieved
using a pressure relief valve.
Exothermic Reaction Fault Tree Model
Reactor Process Control and Safety System Failure
Cooling Water System Failure Reactants Flow Control System Failure
Cooling Water Cooling Water
Reactant A Reactant B
Temperature Pump System
Flow Control Flow Control
Control Failure Failure
Failure Failure
Temperature Temperature Cooling Water Cooling Water Line A Flow Line A Flow Line B Flow Line B Flow
Control Sensor Pump Failure Pump Motor Sensor Control Valve Sensor Control Valve
Valve Failure Failure Failure Failure Failure Failure Failure
Note: Fault Tree not includes SIL 3 ESD loops at Feed A & B inlet lines
Fault Tree Analysis on Excel (in the
absence of Fault Tree Software)
Example:
A gas/liquid separator operates at high pressure
and require avoiding high pressure gas
breakthrough to a downstream lower pressure
rated vessel if level detection and control fail.
Fault Tree Analysis on Excel
Need to install ESD system on high pressure
separator liquid outlet and perform Safety
Integral Level Calculation for SIL 1, SIL 2
and SIL 3 ESD systems configurations.