Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views60 pages

Lecture3 2

The document discusses evaluation techniques for fault-tolerant systems, focusing on qualitative and quantitative evaluations of dependability attributes such as reliability, availability, and safety. It details common measures like failure rate, mean time to failure (MTTF), mean time to repair (MTTR), and mean time between failures (MTBF), along with methods for calculating these metrics. Additionally, it covers system modeling approaches, including reliability block diagrams and Markov processes, to analyze system performance and reliability.

Uploaded by

dibachaklader123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views60 pages

Lecture3 2

The document discusses evaluation techniques for fault-tolerant systems, focusing on qualitative and quantitative evaluations of dependability attributes such as reliability, availability, and safety. It details common measures like failure rate, mean time to failure (MTTF), mean time to repair (MTTR), and mean time between failures (MTBF), along with methods for calculating these metrics. Additionally, it covers system modeling approaches, including reliability block diagrams and Markov processes, to analyze system performance and reliability.

Uploaded by

dibachaklader123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Evaluation Techniques

Two approaches

• Qualitative evaluation
– aims to identify, classify and rank the failure
modes, or event combinations that would lead
to system failures
• Quantitative evaluation
– aims to evaluate in terms of probabilities the
attributes of dependability (reliability,
availability, safety)

p. 2 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Common dependability measures

• failure rate
• mean time to failure
• mean time to repair
• mean time between failures
• fault coverage

p. 3 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Failure rate

• failure rate
– expected # of failures per time-unit
– example
• 1000 controllers working at t0
• after 10 hours: 950 working
• failure rate for each controller:
0.005 failures / hour
(50 failures / 1000 controllers) / 10 hours

p. 4 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Failure rate and reliability
Reliability R(t) is the conditional probability that the
system will perform correctly throughout [0,t], given
that it worked at time 0

N operating(t )
R (t ) 
N operating(t )  N failed(t )

p. 5 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Failure rate
• typical evolution of l(t) for hardware:
l(t)
I III
II
t
• bathtub: I infant mortality, II useful life, III wear-out
• for useful life period l = constant, the reliability is
given by
R ( t )  e  lt
p. 6 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Exponential failure law

If l is constant, R(t) varies


R ( t )  e  lt
exponentially as a function of time

1
0.8
0.6
0.4
0.2
0

p. 7 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Time varying failure rate
• Failure rate is not always constant
– software failure rate decreases as package matures
• Weibull distribution:
a 1
z(t )  al ( lt )
• if a=1, then z(t) = constant = l
if a>1, then z(t) increases as time increases
if a<1, then z(t) decreases as time increases

 ( lt )a
R (t )  e

p. 8 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Failure rate calculation
• determined for components
– systems: combination of components
– l of the system = sum of l of the components
• determine l experimentally
– slow
• e.g. 1 failure per 100 000 hours (=11.4 years)
– expensive
• many components required for significance
• use standards for l

p. 9 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


MTTF

• MTTF: mean time to failure


– expected time until the first failure occurs
• If we have a system of N identical
components and we measure the time ti
before each component fails, then MTTF is
given by
N
MTTF  N 
1 . ti
i 1
p. 10 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
MTTF

MTTF is defined in terms of reliability as:



MTTF   R (t )dt
0

If R(t) obeys the exponential failure law, then


MTTF is the inverse of the failure rate:

 lt 1
MTTF   e dt 
0 l
p. 11 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
MTTF

R ( t )  e  lt

R(t) 1
0.8
0.6
0.4
0.2
0
1/l 2/l 3/l
t

p. 12 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


MTTF
• MTTF is meaningful only for systems which
operate without repair until they experience a
failure
• Most of mission-critical systems a undergo a
complete check-up before the next mission
– all failed redundant components are replaced
– system is returned to fully operational state
• When evaluating reliability of such system,
mission time rather then MTTF is used

p. 13 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


MTTR

• MTTR: mean time to repair


– expected time until repaired
• If we have a system of N identical
components and ith component requires
time ti to repair, then MTTR is given by
N
MTTR  1
N . ti
i 1

p. 14 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


MTTR

• difficult to calculate
• determined experimentally
• normally specified in terms of repair rate
repair rate m, which is the average number
of repairs that occur per time period

1
MTTR 
m
p. 15 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
MTTR

• Low MTTR requirement implies high


operational cost
– if hardware spares are kept on cite and the cite
is maintained 24hr a day, MTTR=30min
– if the cite is maintained 8hr 5 days a week,
MTTR = 3 days
– if system is remotely located MTTR = 2 weeks

p. 16 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


MTBF

• MTBF: mean time between failures


• functional + repair
• MTBF = MTTF + MTTR
– small time difference: MTBF  MTTF
– conceptual difference
time of 1st failure time of 2nd failure
MTBF

time
MTTF MTTR MTTF MTTR
p. 17 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Fault coverage
• Fault detection coverage is the conditional
probability that, given the existence of a fault, the
fault is detected
• Difficult to calculate
• Usually computed as

number of faults which can be detected


C= total number of faults

p. 18 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Example

• Suppose your circuit has 10 lines and you


use single-stuck at fault as a model
• Then the total number of faults is 20
• Suppose you have 1 undetectable fault
• Then the coverage is
19
C=
20

p. 19 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Dependability modelling

• up to now: l and R(t) for components


• systems are sets of components
• system evaluation approaches:
– reliability block diagrams
– Markov processes

p. 20 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Serial system

• system functions
if and only if all components function

reliability block diagram


(RBD)

p. 21 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Serial system

C1 C2 CN

if Ci are independent: R series ( t )   R i ( t )

N
lseries   li
i 1

p. 22 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Parallel system

• system works
C1
as long as
one component
works C2

CN
p. 23 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Parallel system

unreliabity: Q(t) = 1 - R(t)


N
if Ci are independent: Qparallel (t )   Qi (t )
i 1
N
Rparallel (t )  1   1  Ri (t )
i 1

p. 24 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Reliability block diagram

• RBD
– may be difficult to build
– equations get complex
– difficult to take coverage into account
– difficult to represent repair
– not possible to represent dependency between
components

p. 25 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Markov chains

• Markov chains
– illustrated by state transition diagrams
• idea:
– states
• components working or not
– state transitions
• when components fail or get repaired

p. 26 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Single-component system, no repair
• Only two states
– one operational (state 1) and one failed (state 2)
– if no repair is allow, there is a single, non-reversible
transition between the states (used in availability
analysis)
– label l corresponds to the failure rate of the component

l
1 2

p. 27 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Single-component system with repair

• If repair is allowed (used in availability


analysis)
– then a transition between the failed and the
operational state is possible
– the label is the repair rate m
l

1 2
m

p. 28 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Failed-safe and failed-unsafe
• In safety analysis, we need to distinguish between failed-
safe and failed-unsafe states
– let 2 be a failed-safe state and 3 be a failed-unsafe state
– the transition between the 1 and 2 depends on failure rate
and the probability that, if a fault occurs, it is detected and
handled appropriately (i.e. fault coverage C)
– if C is the probability that a fault is detected, 1-C is the
probability that a fault is not detected

lC 2
1
l(1-C) 3
p. 29 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Two-component system
• Has four possible states
2 l2
O O state 1 l1
F O state 2
O F state 3 1 4
F F state 4
l2 3 l1
• Components are assumed to be independent and non-
repairable
• If components are in serial
– state 1 is operational state, states 2,3,4 are failed states
• If components are in parallel
– states 1,2,3 are operational states, state 4 is failed state

p. 30 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


State transition diagram
simplification
• Suppose two components are in parallel
• Suppose l1 = l2 = l
• Then, it is not necessary to distinguish between between
the states 2 and 3
– both represent a condition where one component is
operational and one is failed
– since components are independent events, transition rate
from state 1 to 2 is the sum of the two transition rates

2l l
1 2 3

p. 31 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Markov chain analysis
• The aim is to compute Pi(t), the probability that
the system is in the state i at time t
• Once Pi(t) is known, the reliability, availability or
safety of the system can be computed as a sum
taken over all operating states
• To compute Pi(t), we derive a set of differential
equations, called state transition equations, one
for each state of the system

p. 32 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Transition matrix
• State transition equations are usually presented
in matrix form
• Transition matrix M has entries mij, representing
the rates of transition between the states i and j
– index i is used for the number of columns
– index j is used for the number of rows

m11 m21
M=
m12 m22
p. 33 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Single-component system, no repair

l
1 2

• Transition matrix M has the form:


-l 0
M=
l 0
• entries in each columns must sum up to 0
– entries mii, corresponding to self-transitions, are
computed as –(sum of other entries in this column)
p. 34 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Single-component system with repair
l

1 2
m

• Transition matrix M has the form:

-l m
M=
l -m

p. 35 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Single-component system, safety
analysis

lC 2
1
l(1-C) 3

• Transition matrix M has the form:

-l 0 0
M = lC 0 0
l(1-C) 0 0
p. 36 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Two-component parallel system

2l l
1 2 3

• Transition matrix M has the form:


-2l 0 0
M = 2l -l 0
0 l 0

p. 37 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Important properties of matrix M

• Sum of the entries in each column is 0


• Positive sign of an ijth entry indicates that
the transition originates from the ith state
• In reliability analysis, M allows us to
distinguish between the operational and
failed states
– each failed state i has a zero diagonal element
mii (a failed state cannot leaved)

p. 38 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


State transition equations

• Let P(t) be a vector whose ith element is the


probability Pi(t), the probability that the
system is in the state i at time t
• The matrix representation of a system of
state transition equations is given by

d
P(t) = M • P(t)
dt

p. 39 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Two-component parallel system
• Using transition matrix derived earlier, we get:
P1(t) -2l 0 0 P1(t)
d
P2(t) = 2l -l 0 · P2(t)
dt
P3(t) 0 l 0 P3(t)
• This represents the following system of equations
d
dt P1(t) = -2lP1(t)
d
dt P2(t) = 2lP1(t) - lP2(t)
d
dt P3(t) = lP2(t)
p. 40 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Solving state transition equations
• By solving these equations, we get

P1(t) = e-2lt
P2(t) = 2e-lt - 2e-2lt
P3(t) = 1- 2e-lt + e-2lt
• Since the Pi(t) are known, we can compute the reliability of
the system as a sum of probabilities taken over all
operating states

Rparallel(t) = P1(t) + P2(t) = 2e-lt - e-2lt

p. 41 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Comparison to RBD result
• Since R = e-lt, the previous equation can be
written as

Rparallel(t) = 2R – R2

• which agrees with the expression derived using


RBD
• two results are the same because we assumed
that the failure rates of the two components are
independent

p. 42 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Dependant component case
• The value of Markov chains become evident
when component failures cannot be assumed to
be independent
– load-sharing components
– examples: electrical load, mechanical load, information
load
• If two components share the same load and one
fails, the additional load on the second
component increases its failure rate

p. 43 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Parallel system with load sharing
• As before, we have four states, but after the 1st
component failure, the failure rate of the 2nd
component increases

2 l'2
l1
1 4
l2 3 l'1

p. 44 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Parallel system with load sharing
• State transition equations are:
P1(t) -l1-l2 0 0 0 P1(t)
d P2(t) l1 -l'2 0 0 P2(t)
= ·
dt
P3(t) l2 0 -l'1 0 P3(t)
P4(t) 0 l'2 l'1 0 P4(t)
d
dt P1(t) = (-l1-l2)P1(t)
d P (t)
dt 2
= l1P1(t) -l'2P2(t)
d P (t)
3 = l2P1(t) -l'1P3(t)
dt
d P4(t) = l'2P2(t)+l'1P3(t)
dt
p. 45 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Effect of the load

• If l'1= l1 and l'2= l2 , the equation of load


sharing parallel system reduces to well-
known
Rparallel(t) = 2e-lt - e-2lt

p. 46 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Availability evaluation
• Difference with reliability analysis:
– in reliability analysis components are allowed to be
repaired as long as the system has not failed
– in availability analysis components can also be
repaired after the system failure

p. 47 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Two-component standby system
• First component is primary
• Second is held in reserve and only brought to
operation if the first component fails
• We assume that
– fault detection unit which detect failure of the primary
component are replace is with standby is perfect
– standby component cannot fail while in the standby
mode

p. 48 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


State transition diagram for reliability
analysis with repair
l1 state 1: both OK
l2
state 2: primary failed and
1 2 3
m replaced by spare
state 3: both failed
Repair replaces a broken
-l1 m 0 component by a working
M = l1 -l2-m 0 one.
0 l2 0

p. 49 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


State transition diagram for
availability analysis with repair
l1 l2 States are the same.

1 2 3 Repair replaces a broken


m m component by a working
one. Here we assume that
there is only one repair
-l1 m 0 team.
M = l1 -l2-m m
0 l2 -m

p. 50 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


State transition diagram for
availability analysis with repair
l1 l2 If we assume that there are
two independent repair
1 2 3 teams, then m on the edge
m 2m
from 3 to 2 gets the coefficient
2 (the rate doubles).
-l1 m 0
M = l1 -l2-m 2m
0 l2 -2m

p. 51 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Availability analysis
• None of the diagonal elements of M are 0
• By solving the system, we can get Pi(t) are compute the
availability as a sum of probabilities taken over all
operating states
• Usually steady-state availability rather than time
dependent one is of interest
• As time approaches infinity, the derivative of the right-
hand side of the equation d/dt P(t) = M • P(t) vanishes and
we get time-independent relationship

M • P() = 0
p. 52 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Two-component standby system
• Using transition matrix derived earlier, we get the following
system of equations

-l1P1() + mP2() = 0
l1P1() – (l2+ m)P2() + mP3() = 0
l2P2() – mP3() = 0

• By solving the equations, we get

A()  1 - (l/m)2

p. 53 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Safety evaluation

lC 2
1
l(1-C) 3

• The state transition equations are:

P1(t) -l 0 0 P1(t)
d
P2(t) = lC 0 0 · P2(t)
dt
P3(t) l(1-C) 0 0 P3(t)

p. 54 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Safety evaluation
• By solving these equations, we get
P1(t) = e-lt
P2(t) = C(1- e-lt)
P3(t) = (1-C) – (1-C)e-l t
• Since the Pi(t) are known, we can compute the reliability of
the system as a sum of probabilities of neing the
operational and fail-safe states

R(t) = P1(t) + P2(t) = C + (1-C)e-lt

• At time t=0, the safety is 1. As time approaches infinity,


the safety approaches C
p. 55 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
How to deal with cases of systems
with “k out of n choices”
• Suppose we want to solve the following task:
What is the probability that more than two engines in a
4-engine airplane will fail during a t-hour flight if the
failure rate of a single engine is l per hour?
• The probability that more than two engines fail can
be expressed as:
P2 failed = ( 41 )P1 works 3 failed + P4 failed
= 1 – (P4 work + ( 4 )P3 work 1 failed + ( 4 )P2 work 2 failed )
3 2
• Only probabilities of mutually exclusive events can
be summed up like this
p. 56 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
“k out of n choices”

• “k out of n choices” can be computed as

n n!
( )=
k (n-k)! k!

• For example
4 4!
( )=
2
=6
(4-2)! 2!

p. 57 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Example cont.
So, we get
P2 failed = 4 P1 works 3 failed + P4 failed

where
P1 works 3 failed = R (1-R)3
P4 failed = (1-R)4

where R is the reliability of a single engine


computed as R = e-lt

p. 58 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Summary

• Methods for evaluating the reliability,


availability and safety of a system
– RBDs
– Markov chains

p. 59 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab


Next lecture

• Hardware redundancy

Read chapter 4
of the text book

p. 60 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab

You might also like