A Model-Based Approach For Reliability Assessment in Component-Based Systems
A Model-Based Approach For Reliability Assessment in Component-Based Systems
Based Systems
Saideep Nannapaneni1, Abhishek Dubey2, Sherif Abdelwahed3, Sankaran Mahadevan4, Sandeep Neema5
1,4
Department of Civil and Environmental Engineering, Vanderbilt University, Nashville, TN, 37235, USA
[email protected]
[email protected]
2,3,5
Department of EECS/ISIS, Vanderbilt University, Nashville, TN, 37235, USA
[email protected]
[email protected]
[email protected]
ABSTRACT 1. INTRODUCTION
This paper describes a formal framework for reliability In recent years, model-based design (Schattkowsky & Muller
assessment of component-based systems with respect to 2004; Mosterman, 2007), which is a simulation-based
specific missions. A mission comprises of different timed
approach, has become a powerful framework for the design
mission stages, with each stage requiring a number of high-
of complex systems using component behavior models. It is
level functions. The work presented here describes a
modeling language to capture the functional decomposition also used to analyze and manage the complexities and failures
and missions of a system. The components and their due to component-to-component interactions during the
alternatives are mapped to basic functions which are used to design phase of the system. Several design alternatives are
implement the system-level functions. Our contribution is the possible for the same system and a single design is to be
extraction of mission-specific reliability block diagram from chosen based on several factors such as cost, performance,
these high-level models of component assemblies. This is reliability. Each design choice is associated with a different
then used to compute the mission reliability using reliability cost, performance, reliability. The selection of a particular
information of components. This framework can be used for design is made through a tradeoff between the cost,
real-time monitoring of system performance where reliability performance and safety of the system. (eg., In an inertial
of the mission is computed over time as the mission is in measurement unit (IMU) (Dubey, Mahadevan & Karsai
progress. Other quantities of interest such as mission 2012) used in Boeing aircraft, 6 accelerometers are provided
feasibility, function availability can also be computed using
even though only 4 are necessary to improve the reliability
this framework. Mission feasibility answers the question
whether the mission can be accomplished given the current under additional costs). For commercial airplanes where
state of components in the system and function availability people are involved, safety takes preference over
provides information if the function is available in the future performance and cost. For unmanned vehicles where people
given the current state of the system. The software used in are not involved, performance might take preference over
this framework includes Generic Modeling Environment safety. Each design alternative is tested under several
(GME) and Python. GME is used for modeling the system scenarios before the final design alternative is selected. A
and Python for reliability computations. The proposed scenario is termed as mission in this paper. A mission can be
methodology is demonstrated using a radio-controlled (RC) understood as a collection of activities or functions to be
car in carrying out a simple surveillance mission. performed. A more formal definition of a mission is provided
in Section 4.
Saideep Nannapaneni et al. This is an open-access article distributed Usually, mission requirements are independent of the
under the terms of the Creative Commons Attribution 3.0 United States
License, which permits unrestricted use, distribution, and reproduction systems used to undertake the mission. The components used
in any medium, provided the original author and source are credited. to accomplish the mission functions are indigenous to the
system that is carrying the mission. As an example, a simple
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2014
mission description can be to move from point A to point B. represented in a hierarchical form using Boolean logic such
There can be many choices to move from A to B such as using that the system output occurs at the top. For each system
a gas-powered car or an electric car. The components used in failure, the causes are inferred using a top-down approach.
the gas-powered car (fuel-tank, engine) are completely Event trees are used to follow a sequence of events from an
different from the components used in the electric car initiating event of a component until the end state of the
(batteries) to carry out the same function. In general, not all system. The probability of the outcome of end state is
the components in the system are used to carry out the determined from the probabilities of individual events. In the
mission. A system may provide many more functions that are RBD approach, the system is represented using a network
not necessary for the mission. In such cases, all the diagram of blocks representing components connected in
components corresponding to those functions will be unused series and/or in parallel. The PRA approach uses fault tree
and do not appear in the reliability assessment. Assume that and event tree diagrams in a probabilistic framework to
B can be reached from A without taking any diversion. In compute the probability of a failure outcome. In this paper,
such a case, the steering wheel component will be unused and reliability assessment is performed using reliability block
does not appear in reliability assessment. diagrams because they can be constructed easily using the
Boolean expressions employed in the proposed
Reliability assessment in component-based systems provides
methodology. A detailed introduction to reliability block
a mechanism to predict the failure probabilities for the overall
diagrams is provided in Section 2.
system from the failure probabilities of individual
components (Kececioglu, 1972; Krishnamurthy & Mathur, The main contribution of this paper is the extraction of the
1997). It is used to evaluate design feasibilities, compare components involved in carrying out the mission and then
design alternatives, identify potential failure areas in design, constructing the mission-specific reliability block diagram to
trade-off between design factors, provide an insight on the compute the reliability of the mission using the reliability
need for redundant systems, and replace existing systems information of the components in the system. Also, a
with better reliable systems (Elsayed, 2012). There are two procedure to extend the proposed methodology to real-time
types of mechanical components – repairable and irreparable reliability assessment is provided.
components. Repairable components are the components that
The paper is organized as follows. Section 2 discusses the
if failed can be brought to working condition. Similarly,
reliability modeling of mechanical components and the
irreparable components cannot be brought back to the
procedure for construction of the reliability block diagram.
working state when failed. In the case of repairable
Section 3 provides the details of systems for which the
components, Mean time between failures (MTBF) is a
proposed methodology can be applied. In Section 4, the
measure of reliability whereas Mean time to failure (MTTF)
proposed methodology for reliability assessment in
is a measure of reliability for irreparable components (Wood,
component-based systems is presented. In Section 5, the
2001). In this paper, all the components are assumed to be
proposed methodology is demonstrated using an example in
irreparable. Reliability assessment is essential before the
which a radio-controlled (RC) car is used to carry out a
beginning of mission and also during the mission. Reliability
simple surveillance mission. Concluding remarks are
assessment during the mission is necessary to calculate the
provided in Section 6. A list of necessary definitions are
reliability of the mission in real-time during the mission in
provided in the appendix.
the presence of failure of any of the components. This
provides an idea on the redundancy available in the system 2. BACKGROUND
and assists in real-time decision making process.
2.1 Reliability Modeling of a Component
Some of the traditional techniques used for system reliability
assessment include Failure Modes, Effects and Criticality A typical component is subjected to three kinds of failures
Analysis (FMECA; Bauti & Kadi, 1994; Teng & Ho, 1996), during its service life – (1) early life failures, (2) random
failures, and (3) wearout failures. The failure rate
Fault Tree Analysis (FTA; Lee, Grosh, Tillman & Lie, 1985),
corresponding to the early-life failures decreases as a
Event Tree Analysis (ETA; Ericson, 2005), Reliability Block
function of service time of component. Random failures are
Diagrams (RBD; Elsayed, 2012), Probabilistic Risk characterized by constant failure rates because failures can
Assessment (PRA; Modarres, 2008; Greenfield; 2001). occur at any time during the service time of the component.
FMECA is an extension to Failure Modes and Effects Wearout failures are characterized by an increasing failure
Analysis (FMEA) developed by NASA to improve the rate, where the failure rate of a component increases with the
reliability of space hardware program. In this method, all the service time of the component. The total failure rate at any
potential failures in the design are identified and their severity time instant is equal to the sum of all the three failure rates.
on the system output is included. In FTA, the system is The total failure rate can be modeled using a bathtub curve.
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2014
Figure 1 shows a typical failure rate curve for a typical overall reliability is the product of individual reliabilities of
component (Filliben, 2002). The bathtub curve consists of components assuming independence between components
three phases. In the first phase, the early-life failures are (Eq. 3). When components are connected in parallel, the
overall reliability is obtained using the union rules from set
theory. Also assuming independence between components
the expression for overall reliability is obtained using Eq. (4).
Figure 1. Bathtub curve showing failure rate of a component (a) Series (b) Parallel (c) 𝑟 from 𝑛
predominant; this is known as infant mortality period. In the Figure 2. Series and Parallel connections of components
second phase, random failures are predominant and this phase
is known as stable failure period or intrinsic failure period. In
the third phase, wearout failures are predominant and this 𝑅(𝑆) = 𝑅(𝐶1 ) × 𝑅(𝐶2 ) (3)
phase is known as wearout failure period. The failure 𝑅(𝑆) = 𝑅(𝐶1 ) + 𝑅(𝐶2 ) − 𝑅(𝐶1 )𝑅(𝐶2 ) (4)
probability during the third phase is generally modeled using
a Weibull distribution (Eq. 1) and that during the second In Eq. (3) and Eq. (4), 𝑅(𝑆), 𝑅(𝐶1 ), 𝑅(𝐶2 ) refer to the
phase is modeled using an exponential distribution (Eq. 2). reliabilities of the overall system, components 𝐶1 and 𝐶2
The first phase does not have a failure probability evaluation respectively. When the component requirement for a function
but early failures are used for design and development. is specified using “𝑟 from 𝑛” operator, then all possible
combinations are obtained and connected in parallel. The
𝑡 𝛽 reliability of this component-system is calculated using series
(− ) (1)
𝑃𝑓 (𝑡) = 1 − 𝑒 𝜂
and parallel connection rules as stated above. The number of
𝑛!
combinations is equal to 𝑛𝑟𝐶 , which is equal to (𝑛−𝑟)!𝑟!
𝑃𝑓 (𝑡) = 1 − 𝑒 −𝜆𝑡 (2)
.Consider an example where a function 𝐹 requires two out of
In Eq. (1), 𝜂 represents the scale parameter (time at which the available three components. Let the three components
failure rate is 0.632) and 𝛽 represents the shape parameter. be 𝐶1 , 𝐶2 , 𝐶3 . In this case, 𝐹 can be carried out using 𝐶1 , 𝐶2 or
The shape parameter describes how the failure rate varies 𝐶2 , 𝐶3 or 𝐶1 , 𝐶3 . The combinatory can be represented in the
with time. In Eq. (2), 𝜆 represents the mean time between reliability block diagram as shown in Figure 2(c).
failures (MTTF). The values of these parameters can be
3. SYSTEM MODEL
obtained from the manufacturer, historical data or can be
estimated through simulations. In this paper, all the The systems under consideration are mechanical systems or
components are assumed to be in the second phase of random cyber-physical systems (CPS). Though, CPS have both
failures.
mechanical and software components, we currently consider
2.2 Reliability Block Diagrams the reliability and failure possibility of mechanical systems
only. Software components are assumed to be functional.
A reliability block diagram is a graphical representation Consideration of software component reliability metrics
showing the logical connections between the components in require additional future work as these components do not
the system. These diagrams are used to compute the overall
typically age as mechanical components and do not follow
reliability of the system/functions using the reliability
the typical bathtub curve. All the mechanical components are
information of individual components and Boolean rules of
combinations (Bennetts, 1982). When two components are assumed to be in the second phase of the bathtub curves,
connected in series, then the function requires both the where the failures are random ie the failure rates are constant
components and if the components are connected in parallel, and the failure probabilities are modeled using exponential
either of the components is sufficient to carry out the distributions. Also, it is assumed that the failures in the
function. The terms series and parallel carry the same components are independent, thus the failure of one
meaning as in the electrical circuits. Figures 2(a) and 2(b) component does not influence the functioning of other
shows series and parallel connections for two components components in the system. Once a component fails in the
𝐶1 and 𝐶2 . When components are connected in series, the system, it remains in the failed state till the end of mission.
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2014
4. PROPOSED METHODOLOGY
simplified reliability block diagram can be obtained. From out a simpler mission (a mission with lower outcomes than
the mission description, we can obtain the required functions originally intended) can be made. Also, decisions in choosing
and also the time each function is required for. Using this alternate paths to maximize the reliability of the mission can
function-time information, we can calculate the time each of be made. When a component becomes unavailable, it can be
the components is required for. Using the time information, specified in PyEDA, and it produces a resultant Boolean
MTTF values and the reliability block diagram, the reliability expression by removing the unavailable component(s). The
resultant Boolean expression can be used for reliability
of the mission can be calculated using series and parallel
assessment of the mission. Figure 3 shows the proposed
connection rules given in Eqs. (3) and (4).
methodology for reliability assessment.
Step 5. Real-Time monitoring for decision making: In Figure 3, the mission is described using high level
During the course of the mission, the health of all the functions 𝐹1 , 𝐹2 , 𝐹3 , 𝐹4. Then, using functional
components can be monitored (failed, or working). If a decomposition, the high level functions are decomposed to
component is in failed state, all the functions that the leaf-level functions. Then each of the leaf-level functions 𝐹𝑘
component is associated with will not be available. From the (k = 5 to 14) is associated to its component assembly. The
health of the components, availability or unavailability of the function-component association also represents the reliability
functions can be inferred. Mission feasibility, as defined in block diagram of the leaf-level function. The reliability block
the previous section, can also be analyzed using the health of diagrams of the leaf-level functions are combined to obtain
the components. At any time instant, real-time reliability the reliability block diagram of the high-level functions. The
assessment of the system can be carried out using Step 4. reliability block diagrams of all the high-level functions are
Using the results of real-time reliability assessment, decisions combined to obtain the reliability block diagram of the
on continuing the mission, aborting the mission or carrying mission.
5. EXAMPLE: Radio-Controlled Car mission can be divided into two high-level functions – 1) A
function 𝐹𝐴𝐵 that represents the movement of the RC Car
Mission Description - The RC Car, which initially is at point
from A to B and 2) a function 𝐹𝑆 that represents the
A has to move to point B and perform surveillance at point B
surveillance activity at point B. To complete function 𝐹𝐴𝐵 ,
using a camera mounted on it. The car is amphibious and can
the RC Car can choose between two alternate paths – to move
move from A to B either on land or in water as shown in
Figure 4. Along with the land powertrain, a propeller system on land, represented by 𝐹𝐴𝐵𝐿 or in water, represented
is also built-in to the RC Car to move in water. The width of by 𝐹𝐴𝐵𝑊 . The function 𝐹𝐴𝐵𝐿 is decomposed into three sub-
the water body is assumed to be 1.5 mile. The total distance functions - 1) Moving from A to C, represented by 𝐹𝐴𝐵𝐿 . 𝐹𝐴𝐶
to be covered when moving on land from A to B is 2.5 mile. 2) Moving from C to D, represented by 𝐹𝐴𝐵𝐿 . 𝐹𝐶𝐷 3) Moving
The speeds when moving on land and in water are assumed from D to B, represented by 𝐹𝐴𝐵𝐿 . 𝐹𝐷𝐵 . The locations of
to be 7.5 mph and 3 mph respectively. The RC Car as points C, D are shown in Figure 4. The successful completion
modeled in GME (Ledeczi, Maroti, Bakay, Karsai, Garrett, of all these three sub-functions results in the successful
Thomason & Volgyesi, 2001) is shown in Figure 5. A simple completion of function 𝐹𝐴𝐵𝐿 . Each of the sub-functions is
model of the RC Car is used for illustration and therefore has further decomposed into a number of smaller leaf-level
limited capabilities in terms of functions that can be carried functions and successful completion of all the leaf-level
out. The RC Car can move forward, backward, turn left and function results in the completion of a sub-function. Table 1
turn right. To stop the car, thrust is to be exerted in the shows the sub-functions of 𝐹𝐴𝐵𝐿 and their associated leaf-
opposite direction of motion i.e., if the car is moving forward
level functions. In the case of function 𝐹𝐴𝐵𝑊 , the function
then thrust is to be exerted in the reverse direction to stop the
car. This forms the primary braking system and along with itself is a leaf-level function and therefore cannot be
decomposed further. Figure 7 provides the decomposition of
this, a secondary emergency braking system is also assumed
to be available. From the mission description, the function- the high- level function in moving from A to B (𝐹𝐴𝐵 ) along
time plot can be constructed as shown in Figure 6. The with duration of each of the leaf-level functions required.
Table 1.Sub-functions of 𝑭𝑨𝑩𝑳 and their leaf-level functions The next step after obtaining the hierarchical decomposition
is to associate component assemblies to carry out each of the
Sub-Function Leaf-Level Function Notation atomic-level functions. Table 2 shows the list of component
Turn Left at A 𝐹1 assemblies available in the RC Car system along with their
𝐹𝐴𝐵𝐿 . 𝐹𝐴𝐶 Move Forward from A to C 𝐹2 MTTF values and Table 3 shows the association between
Turn right at C 𝐹3 atomic-level functions and component assemblies. To
Move forward from C to D 𝐹4 demonstrate the methodology, MTTF values for the
𝐹𝐴𝐵𝐿 . 𝐹𝐶𝐷 components are assumed. After obtaining the functional
Turn right at D 𝐹5
Move forward from D to B 𝐹6 decomposition (hierarchical decomposition) and associations
between functions and components, the reliability of the
𝐹𝐴𝐵𝐿 . 𝐹𝐷𝐵 Turn left at B 𝐹7
overall mission is computed from reliability information of
Brake and stop at B 𝐹8
component assemblies through a reliability block diagram.
The construction of a reliability block diagram can be carried
Using the hierarchical decomposition, the function 𝐹𝐴𝐵 can out in two steps – (1) the atomic functions in Equation 1 are
be expressed in terms of the leaf-level functions as substituted with their associated component assemblies from
Table 3, (2) all the components connected with ′ ∧ ′ are
𝐹𝐴𝐵 = ((𝐹1 ∧ 𝐹2 ∧ 𝐹3 ∧ 𝐹4 ∧ 𝐹5 ∧ 𝐹6 ∧ 𝐹7 ∧ 𝐹8 ) written in series, whereas components connected with ′ ∨ ′
(5)
∨ (𝐹9 ∧ 𝐹8 )) are written in parallel. The reliability block diagram for the
mission is assembled using the PyEDA package in python.
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2014
All the components are assumed to be in the second phase of failure probability is modeled using exponential distribution
the bathtub curve where the failure rates are constant and as stated in Section 3.
The reliability block is constructed using the functional mission, therefore T=0 and T=36 refer to the start and the end
decomposition and function-component association. Using of the mission (Figure 6). Tables 4 show the functions
the available MTTF values, the reliability of the mission can required to complete the mission at time T=0 and time T=20.
be computed as 0.909.
The third column in Table 4 can be interpreted as follows -
Case 1: Real-time reliability assessment At T=20, for successful completion of the mission, 𝐹𝐴𝐵𝑊 is
Assume that the mission was being undertaken by moving in required for 10 more minutes (T=20 to T=30), Braking is
water to reach from A to B. Let T denote the time into the required for 1 minute and surveillance for 5 minutes. And all
ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2014
these three functions are required in succession, as shown in secondary), the reliability of the braking function decreases.
the function-time diagram (Figure 6). The reliability block The reliability of the remaining mission, given that there is
diagram for the mission at time T=20, is assembled using the no failure up to T = 20, decreases from 0.963 to 0.959.
PyEDA package. Using the reliability block diagram and the
MTTF values of the components, the reliability (probability Case 3: Mission Feasibility
of success) of the remaining portion of mission can be Assume that the camera fails during the travel from A to B in
computed. water. Since camera component becomes unavailable, the
Case 2: Component unavailability surveillance cannot be carried out at point B because there is
no redundancy available for the surveillance function.
Assume that at time T = 20, the secondary brake fails and Therefore, the mission cannot be carried out successfully. A
becomes unavailable (due to some unknown reason). Since real-time decision can be made to abort the mission and bring
the braking function has redundancy (primary and back the RC Car to point A.
Teng, S. H. G., & Ho, S. Y. M. (1996). Failure mode and components are available to carry out all the functions
effects analysis: an integrated approach for product required at later times in the mission, then it can be concluded
design and process control. International Journal of that the mission is feasible given the current state of the
Quality & Reliability Management, 13(5), 8-26. components. If any of the components becomes unavailable
Wood, A. P. (2001). Reliability-metric varieties and their and the component is required at a later time, then the
relationships. Proceedings of Reliability and corresponding function cannot be carried out. If there are no
Maintainability Symposium (pp. 110-115). IEEE. alternate possibilities available to carry out this function, then
this results in the mission being infeasible.
APPENDIX
Redundancy: If a function can be carried out even when a
Definitions component becomes unavailable, then it can be concluded
that there is redundancy in the function with respect to that
Mission: A mission can be regarded as a time-interval component.
sequence of high-level functions. A mission provides
information of all the high-level functions to be carried out at
each instant of time. At each time instant, one or more high-
level functions can be carried out. The mission is usually
represented using a function-time plot.
Functional Decomposition: Functional decomposition is the
process of decomposing a high-level function into a set of
leaf-level functions (Kurtoglu & Tumer, 2008). A leaf-level
function is a function that cannot be decomposed any further.
All the leaf-level functions are required for the successful
completion of the high-level function. Functional
decomposition of a high-level function can be represented
using a hierarchical tree-structure. The dependency
relationships can be written using the following Boolean
relationships – and, or, r-out-of-n. The number of branches in
the tree depends on the fidelity of the analysis required. At
any instant of time, one or more high-level functions can be
happening; therefore one or more dependency trees are
active. A leaf-level function might be required for several
high-level functions and therefore appears in several trees
Function-Component association: The next step after
functional decomposition is association of each leaf-level
function to corresponding component or component
assemblies (Kurtoglu, Tumer & Jensen, 2010). Again,
Boolean relationships are used to represent the association of
components to its functions. The Boolean relationships – and,
or, r-out-of-n, are used to associate each leaf-level function
to its component assembly. A component can provide more
than one leaf-level functions but a leaf-level function cannot
be associated with more than one component unless the
components are the same.
Component availability: Component availability refers to
the availability of a component for usage at any time instant
during the mission.
Function availability: Function availability refers to the
availability of a function for operation. For a function to be
available, all the components required for the implementation
of this function should be available.
Mission Feasibility: Mission Feasibility refers to the
possibility of completion of the mission given the current
state of the components. At any instant of time, if all the