Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views88 pages

02 Openclosed

Uploaded by

kevinlin13588
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views88 pages

02 Openclosed

Uploaded by

kevinlin13588
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

SE320

Software Verification &


Validation
Week 2: Closed-box & Open-box Testing, Review & Fundamentals

Fall 2024
Homework 1
Homework 1: Open & Closed Review
• Out Thursday, due in 2 weeks (from Thursday)
• Test binary search
• Test multiple pieces using techniques discussed today
• You choose the appropriate techniques
• You explain why they’re appropriate
• Elements of open- and closed-box testing
• Several discussion / reflection questions to answer in addition to the
code
Specification-Based / Closed-box
/ Blackbox Testing
Possible Test Cases
Blackbox / Closed-box Testing
Testing without knowledge of software’s internals; testing only
externally visible behaviors
Specification-Based Testing
Testing software based on its external specification
• These are essentially the same
• There are arguably small differences, but they largely overlap
• We will use them interchangeably
Blackbox vs. Closed-Box Testing
• Why are there multiple terms here?
• The terms are meant to suggest testing without being able to “see inside the box”
• Historically, blackbox arose in contrast to whitebox testing….
• But typically you can’t see inside a black box or a white box…
• Which is why whitebox testing is also known as glass box, clear box, or open box testing…
• Recently computing has been reckoning with the fact that much of our
terminology is rooted in language with racist connotations in the US
• E.g., blacklist for a list of bad things vs whitelist for a list of good things, master/slave
terminology in distributed systems
• Blackbox vs. whitebox isn’t *quite* the same, as these since *both* are
important and good, and neither subsumes the other
• But closed-box and open-box avoid similar connotations and are more accurate
anyways
Closed-box Testing
• Testing based on the specification, without knowledge of the
implementation
• Sometimes called functional testing or behavioral testing
• Requires an executable program
• Requires a specification
• Could sometimes be done with a user manual
• Requires end-user perspective
Closed-box Testing
Questions
• Could you still test the software if no specification is available?
• How would you do this?
• Most software carries implicit specifications
• Shouldn’t dereference null
• Shouldn’t access an array out of bounds
• Shouldn’t divide by 0
• Shouldn’t exhibit undefined behavior (C/C++)
• …
• Should produce some kind of error message for bad input
• Many static analysis tools look for violations of these implicit specifications
(later this term)
Test Data, Oracles, and Test Cases
Let’s make a distinction
• Test Data: Inputs chosen to test the system
• i.e., Arrange
• Test Oracle: Expected output for a specific input, or a predicate
checking a property of output for a specific input, whose
matching/passing indicates correct behavior on that input
• i.e., Assert
• Test Cases: Inputs to test the system and outputs for each input that
indicate the system is operating correctly (i.e., data and oracles) for
particular pieces of the system
Test Data, Oracles, and Test Cases
Okay, I know how to write a test… but what should I test?
• Ideally we’d test everything
• But a function with 3 32-bit integer arguments has nearly 12.9 billion inputs
• Testing focuses on risk management
• We want to test the riskiest pieces first — likely to occur (common use cases)
and/or high cost of failure (e.g., security concerns)
• How can we systematically select test cases to verify our software?
Closed-box Testing Techniques
There are many ways to choose tests based on the specification. We’ll
focus on 4 techniques:
1. Equivalence Partitioning (most important)
2. Boundary Value Analysis
3. Edge Cases for specific data types
4. State Transitions (mostly later)
Equivalence Partitioning
Equivalence Partition
An equivalence partition is a subset of a program’s input where all
inputs in the set are equivalent with respect to some correctness
criteria
• Pick groupings of input such that:
• if one input in each group is handled correctly, it is very likely all others are handled
correctly.
• if one input in a group is handled incorrectly, it is very likely all others are handled
incorrectly.
• i.e., groupings such that each input in a group tests the same thing /
reveals the same bug
Equivalence Partitioning: Save As Dialog
Example: What would you consider to be important factors of
equivalence partitions of a file name in the “Save As” dialog box?
• Valid characters
• Invalid characters
• Valid length names
• Names that are too short
• Names that are too long
• Names with unexpected extension
Aside: Equivalence Partitioning and Input Size
Q: How many inputs do most programs have?
A: Infinite! (effectively, modulo size of memory)

Finite Equivalence Partitioning


An infinite input space may have only a finite (or even small!)
number of equivalence partitions!

• Good equivalence partitioning makes testing complicated programs


feasible
• This is one of the most important ideas in testing, but somehow
rarely pointed out
Search Routine Specification
method Find(a:array<int>, key:int)
returns (found: bool, location:int)
requires a != null
requires a.Length > 0 // At least one element
// Returned location valid ensures 0 <= location < a.Length
// Returns location of key if found
ensures found ==> a[location] == key
// Returning failure means the key wasn't there
ensures !found ==>
forall l::0 <= l < a.Length ==> a[l] != key
// A valid Dafny spec (demos later in the term)
Search Routine Input Partitions?
• Inputs which conform to the preconditions
• Inputs where the precondition does not hold
• People forget this all the time!
• Inputs where the key element is a member of the array
• Inputs where the key element is not a member of the array
Search Routine Test Case Outline
Array Element
Single value In array
Single value Not in array
More than 1 value First element in array
More than 1 value Last element in array
More than 1 value Middle element in array
More than 1 value Not in array
… …
Search Routine Test Case Outline
Input Array Key Outcome
17 17 true, 0
17 0 false, ??
17,29,21,23 17 true, 0
41,18,9,31,30,16,45 45 true, 6
17,18,21,23,29,41,38 23 true, 3
21,23,29,33,38 25 false, ??
… … …
What’s the Equivalence Here?
We’re assuming the exact choice of numbers shouldn’t affect
behaviors
Example: The NextDate Program
Given an input date (mm dd yyyy) the NextDate program returns the date of
the following day. For example:
• 01 10 1986 à 01 11 1986
• 02 28 1995 à 03 01 1995
• 02 28 2004 à 02 29 2004
• 12 30 2010 à 12 31 2010
• 12 31 2015 à 01 01 2016
Conditions for valid input:
• 1 ≤ month ≤ 12
• 1 ≤ day ≤ 31
• 1812 ≤ year ≤ 2025
Equivalence Classes for NextDate (Valid Input)
What do you consider to be the basic considerations for choosing
equivalence classes for NextDate on valid input?
• M:1≤month≤12
• D:1≤day≤31
• Y: 1812 ≤ year ≤ 2025
This is basically taking all valid input as one equivalence class

What are other things to consider?


Improved (Valid Input) Equivalence Classes
Improved classes could be derived from combinations of the following:
• M1: month with 30 days
• M2: month with 31 days Are all combinations valid?
• M3: month is February
• D1: 1≤day<28
Actually no! Some
• D2:day=28
• D3:day=29 combinations are still invalid
• D4:day=30 despite seeking valid inputs!
• D5:day=31 Still need to test them,
• Y1: century leap year
• Divisible by 400; divisible by only 100 is not a leap year! though!
• Y2: typical leap year
• Y3: common year
Identifying Equivalence Classes
• The problem of identifying equivalence relations is not a trivial one
and has implications on the quality of testing.
• There is no general recipe; it is application-dependent. Application
domain expertise is an important ingredient.
• It is very unlikely that equivalence relations are explicitly specified in
the requirements documents, or even that such documents have
been written in a way to facilitate such analysis.
• When thinking about equivalence classes need to keep in mind that
there are different types of equivalence.
Identifying Equivalence Classes
• In general, a good set of equivalence classes can be obtained by:
• Identify all sources of input
• Arguments, environmental settings, file contents, global program state
• Identify for each source which groups of choices should differ
• E.g., different lengths of months, common/leap/century-leap year
• Consider all valid combinations of input groups
• E.g., M1+D1+Y1, M1+D1+Y2, …, M3+D5+Y3
Combinatorial Testing
• Any systematic approach to generating equivalence classes, can still
produce too many test cases to manage
• We worked out nextDate equivalence classes just for valid input:
3x5x3=45
• 45 is far less than all possibilities (232*232*232=296>7.92e28), but
• Making sure sure you got them all is a pain
• More complicated systems will have way more
• Using parameterized unit tests / JUnit theories (next lecture) to check simpler
properties of all cases helps, but is still error-prone

So how do we choose which subset of these to actually test?


Combinatorial Testing
Most bugs do not depend on all parts of the input, but only some.
• Single-mode faults are bugs that are triggered by mishandling one
particular part of the input.
• Double-mode faults are bugs that are triggered by mishandling a
particular combination of two inputs.
•…
In each of these cases, the other input qualities are irrelevant
Combinatorial Testing
• A test suite achieves all singles when each parameter option (i.e.,
category) is exercised by at least one test.
• A test suite satisfies pairwise testing when each pair of parameter
options is exercised by at least one test.
• Covering all triples improves confidence, but usually has diminishing
returns
• i.e., there are some bugs that require triples to uncover, but there will be
fewer than there were bugs requiring doubles, etc.
Caveats
This reasoning still assumes the equivalence class hypothesis: that
either all inputs in a partition/category succeed or fail together!
Boundary Value Analysis
• Boundary values are the values at the edge of an equivalence
partition, or at the edge of valid input
• Many bugs are due to incorrect handling of values at or near the
boundaries, so it pays off to test them
• Off-by-one / fencepost errors
• Integer rounding (i.e., dividing an odd number by 2)
Testing at the Boundaries
If your equivalence partition or input range has a limit:
• Test values just barely inside the partition or within the limit
• Test values just barely outside
• If you’re testing “adjacent” equivalence partitions — e.g., adjacent
ranges of integers — this isn’t as many tests as it sounds
Where do Boundaries Come From?
• Equivalence Partitions
• e.g., search routine with sought-after element in first or last slot
• Specification
• e.g., specs often delimit acceptable inputs, so inputs that are just barely valid
or barely invalid are important
• Common sense and experience
• A lot of software breaks if you enter 0 in the wrong place
Boundaries Beyond Numbers
• Boundaries don’t just refer to numeric types
• For finding strings matching a pattern, a string with one character
different from the search pattern is just outside the boundary of a
partition
• For locations or positions, boundaries correspond to n-dimensional
regions
• For dates, cut-off dates
• ...
Internal Boundaries
Some boundaries arise from representation choices internal to the
software rather than from the problem itself.
• Size of integer representation
• What happens if you enter Integer.MAX_VALUE+1?
• File / data size
• Most software doesn’t pre-check this; will not degrade gracefully
• Try opening a 6GB log file in your favorite text editor
• It usually doesn’t go well
Data-Type-Specific Edge Cases: Numbers
Numbers are relatively easy to find boundaries for:
• If your valid input (or equivalence partition) range is [m,n], then test
m-1, m, n, and n+1
• Test 0
• Test the most-positive and most-negative values for common
representation types
• Integer.MAX_VALUE and Integer.MIN_VALUE in Java, similarly for Long, etc.
• The magnitude (distance from 0) for the maximum and minimum values of
signed integers isn’t the same!
Data-Type-Specific Edge Cases: Numbers
(cont.)
• Test the tricky cases for floating point (float, double)
• Did you know floating point has both positive and negative versions of 0?
• Try NaN
• Actually, there are 2^53 64-bit NaNs…
• If you know a certain precision is required, try exceeding that precision
Data-Type-Specific Edge Cases: Strings
• Empty string – common when a user leaves a form field blank
• Null strings!
• Unicode!
• ASCII uses 8 bits (really 7) to encode basically only US English characters
• Unicode has many different encodings
• UTF-8 uses one byte for ASCII characters, but multiple for others
• UTF-16 uses 2 bytes for most characters, but still more for some. . .
Y2K
An Aside
Anyone remember what the Y2K bug was?
The Y2K bug was actually a ridiculously large class of bugs related to handling
of the new millennium, circa 1999.
• Since software didn’t exist prior to the 50s, and didn’t really live long until
the late 60s, many software systems represented years in the 20th century
by only the last two digits, sticking a “19” in front if necessary.
• Largely for silly reasons like space — 0–99 could be represented in a signed character
(1 byte), so more felt wasteful
• Obviously, once you need to distinguish 1900 from 2000, this becomes an
issue
Y2K (cont.)
• In the late 1990s, people started getting jury summons for the early
1900s
• Generally any system that needed to manipulate dates past December 31,
1999 had some interesting problems, ranging from paying negative interest to
mixing up 4th and 104th birthdays
• But in principle, the results were pretty unpredictable
• Concerning for nuclear reactors, defense, medical devices…
• Governments and industry spent billions of dollars fixing the problem
• i.e., using a few extra bits to represent the year, and removing default
assumptions about the year being in the 20th century
• US-only estimate was 100 billion (in 1999; roughly 189 billion in 2024 USD)
Y2K (cont.)
• A significant number of people sincerely believed the year 2000 would be
the end of the world
• Some simply because it was 2000!
• Others thought the Y2K bug would lead to global nuclear war and the end of
civilization (remember Stanislav Petrov?)
• Others thought the Y2K bug would lead computer to rise up and crush us
• Yes, seriously. These were not software developers.
• Ultimately, through a combination of preparation and luck, nothing serious
happened
• Some trains were late
• Funny date mix-ups continued for a bit
• Some software broke later because the Y2K fixes forgot to handle 2000 being a leap
year…
Data-Type-Specific Edge Cases: Dates & Time
Or: Don’t write the next Y2K bug
• Dates are complicated (see the NextDate example)
• Many seemingly reasonable dates are invalid (e.g., September 31)
• Time zones
• Some time zones are off by half-hours, not just full hours
• Formats: mm/dd or dd/mm depending on locale
• Daylight savings time
• Not everywhere observes it
• Different places start it at different times!
• Servers in different time zones communicate
• What happens if you select January 31 in a date picker, then switch to February?
Data-Type-Specific Edge Cases: Collections
• As with strings, null and empty collections are important edge cases
• Collections with more than one element are also important
• A lot of code is written assuming certain methods return a collection
of exactly 1 element
• This fails if the collection is empty
• This may misbehave if the collection has multiple elements
Testing with Reckless Abandon
Lots of software breaks if you just feed it the most useless input you can
think of
• Do not type an entry in a field, just hit Enter.
• When default values are present, blank them out.
• If the software wants numbers, give it letters.
• If it accepts positive numbers, give it negative.
• Press multiple keys at the same time
• Feed the program overwhelming nonsense
• On UNIX systems, you can access the HD as a file
• Until a couple years ago, if you dumped the raw HD contents into a Linux terminal, it
would corrupt the font and you couldn’t use the terminal anymore
Testing with Structured Input
• Many systems accept structured input:
• Text format of data input from users
• File formats
• Database schemata
• HTML / XML / JSON
• C, Java, C#
• Network protocols (TCP/IP packets, HTTP requests)
• Data formats can be mechanically converted into many input
validation tests
Garbage In, Garbage out
• “Garbage in, garbage out” is one of the worst cop-outs ever invented by
the computing industry*
• GI-GO usually explains a failure to:
• Install good validation checks
• Produce actionable error messages
• Test tolerance to invalid inputs
• Systems that face the public (e.g., web services) must be especially robust,
and must have prolific input validation
• Affects usability, too (e.g., invalid config files)
• *Some exceptions apply (e.g., feeding biased data to an ML algorithm is
legitimately garbage-in garbage-out, feeding bad sensor data is GI-GO…)
Input Tolerance Testing
• Good user interface designers design their systems so that it just
doesn’t accept garbage.
• Good testers subject systems to the most creative “garbage” possible.
• Input-tolerance testing is also done as part of system testing and
usually by independent testers.
• This is literally the first kind of probing done by anyone looking for
security vulnerabilities.
Open-box / Whitebox Testing
Possible Test Cases
Whitebox / Open-box Testing
Testing with knowledge of software’s internals and/or testing
internally visible behaviors
• Primarily this is about testing pieces of a system, as opposed to the
end-to-end system
• i.e., all unit testing and integration testing is open-box testing
• Not mutually exclusive with closed-box testing
• E.g., can do closed-box testing of an internal component
• Another critical, but narrow part of open-box testing is code coverage
/ control-flow testing
• Since code coverage is about individual lines / branches in your code, clearly
aware of internals!
Control-Flow Testing
• Control-flow testing is a strategy for testing guided by a model of the
source code: the control flow structure
• Control-flow testing techniques are based on carefully choosing a set
of control flow paths through the program
• The set of paths chosen can be compared to various notions of
coverage of paths satisfying certain criteria, as a way of targeting
some objective level of thoroughness
• e.g., ensure every statement is executed by at least one test path
Control-Flow Open-Box Testing
Control-flow testing is a type of open-box testing. Not all open-box
testing involves control flow! (e.g., unit testing internal components)
Motivation
• Source code is the “ultimate” program specification
• “Ultimate” does not imply “without errors,” but rather “the most detailed
we have”
• Execution control is one of the primary things we implement when writing
a program
• We would like to test our program with respect to the control flow
structure we have implemented
• We hope that by exercising the control flow structure systematically, we
can expose faults related to unexpected combinations of control flow
• Control-flow testing is clearly open-box testing
• Control-flow testing is most useful in unit testing
Program Representation
Control Flow Graphs
• A Control Flow Graph is a static abstract representation of a program
• Commonly used in many program analysis tools & compiles, including
coverage tools
• A CFG is a directed graph G = (N,E)
• Each node in N is either a statement node or a predicate node
• A statement node represents a simple statement. Alternatively, a statement
node can be used to represent a basic block.
• A predicate node represents a conditional statement.
• Each edge in E represents the flow of control between statements.
• Optionally, we use circles/ovals to represent statement nodes, and rectangles
to represent predicate nodes.
Example of a CFG
scanf ( … )

scanf(“%d, %d”, &x, &y); y < 0


if (y < 0) T F
pow = -y; pow = -y pow = y
else
pow = y; z = 1.0
z = 1.0;
while (pow != 0) { F
pow != 0 y < 0
z = z * x;
pow = pow – 1; T T
z = z * x F
} z = 1.0/z
if (y < 0)
z = 1.0 / z ; pow = pow-1
printf(“%f”, z);
printf ( … )
CFG: The For Loop
S0;
for (j = 1;
j <= limit;
j = j + 1)
{
S1;
}
Sn;

What should the CFG look like?


CFG: The For Loop S0

S0;
for (j = 1; j = 1

j <= limit;
j = j + 1) j £ limit
{
T F
S1;
}
S1
Sn;
Sn

j=j+1

What should the CFG look like?


CFG: Switch Statements
S0;
switch (e) {
case v1:
S1;
break;
case v2:
S2;
break;
default:
S3;
}
Sn;

What should the CFG look like?


CFG: Switch Statements S0
A common first attempt is
shown on the right.
e

However, e is not really a v1


v2 default
predicate
S1 S2

Sn
CFG: Switch Statements S0

S0;
switch (e) { e == v1

case v1: T F
S1;
e == v2
break;
case v2: T F
S2; S1

break; S2 S3

default:
S3;
} Sn

Sn;

What should the CFG look like?


S0

CFG: Switch Statements e == v1


S0 ;
switch (e) { T F
case v1:
S1; e == v2

break; T F
case v2:
S2; e == v3
S1 S2
break;
T F
case v3;
S3
S3;
default: S4
S4;
} Sn
Sn ;

What should the CFG look like?


S0

CFG: Switch Statements e == v1


S0;
switch (e) { T F
case v1:
e == v2
case v2:
S2; F
break;
e == v3
case v3; T
S3; T F
break; S2

default: S3 S4
S4;
}
Sn; Sn

What should the CFG look like?


S0

CFG: Switch Statements


e == v1
S0;
switch (e) { T F

case v1: S1 e == v2
S1; T F
case v2: e == v3
S2
S2;
case v3; T F
S3;
S3
default: S4
S4;
}
Sn
Sn;

What should the CFG look like?


Program Paths
• A path is a unique sequence of executable statements from one point
in the program to another point in a program
• In a graph, a path is a sequence (n1,n2,...,nt) of nodes such that
∀i ∈ [1, t − 1]. ⟨ni , ni +1 ⟩ is an edge in the graph
• Alternatively: a path through a control flow graph
Path Condition
• Path condition: the conjunction of the individual predicate conditions
which are generated at each branch point along the path.
• The path condition must be satisfied by the input data in order for the
path to be executed.
Path Condition (cont.)
1
Example:
2 y < 0
T F PC(1,2,4,5,6,8,10)= y ≥ 0 ∧ pow = 0 ∧ y ≥ 0
3 4

6 pow != 0
T F
7 8 y < 0
T
F
9

10
Number of Paths S1 max = x

The simple CFG shown here has 2


different paths: P1 y > x

(S1, P1, S2, S3)


T F
(S1, P1, S3)
max = y S2

S3 return max
Number of Paths (cont.)
How many paths?

if ((A || B) && (C || D)) {


S1;
} else {
S2;
}

Surprise!
It’s not just two!
Number of Paths (cont.)
A
There are:
F 2 statements + Sn
T B 4 branches
T
F
7 paths (4T + 3 F)
C
F (A, C, S1)
T (A, C, D, S1)
D
(A, C, D, S2)
T F (A, B, C, S1)
(A, B, C, D, S1)
S1 S2 (A, B, C, D, S2)
(A, B, S2)

Sn
Number of Paths (cont.)
A A
Two CFGs with the
same number of
S1 predicates and
B
S1 branches, but
B different numbers
S2
of paths!
S2 C

S3
S3
S4
S4
Number of Paths
a = atoi(argv[1])
b = 0
while (a > 0) {
a--;
b++;
}
if (b > 5)
print (“b > 5”);
else
print (“b <= 5”);

How many paths?


Infinite Paths
Since the value for the variable a comes from the environment /
external input, we consider this program to have an infinite number of
paths.

In general, programs that contain loops with control variables whose


value is supplied outside the program have infinite paths.
Infeasible Paths
• A path is said to be feasible if it can be exercised by some input data.
• Otherwise the path is said to be infeasible.
• Infeasible paths are the result of contradictory predicates.
Infeasible Paths
Are there any infeasible paths?
P1 x < 10

T F If so, how many?


S1

P2 x < 20

T F
S2 S3

S4
Infeasible Paths (cont).
P1 x < 10

T F

S1 The path
(P1,S1,P2,S3,P4)
P2 x < 20
is not feasible
T F
S2 S3

…assuming S1 doesn’t modify x


S4
Infeasible Paths (cont.)
• Researchers have analyzed many programs and observed that
infeasible paths occur quite frequently.
• Infeasible paths are undesirable. Why?
• Infeasible paths may indicate errors in control flow — maybe that path should
be feasible!
• Future changes to the software may make the path suddenly feasible, having
never been tested — and usually infeasible paths come in clusters!
• …and many more
Control-Flow Criteria
Statement Coverage
• Other names: node coverage, basic block coverage
• Let T be a test suite for program P
• T satisfies the statement adequacy criterion for P iff. for each
statement S of P, there exists at least one test case in T that causes
execution of S.
• Statement Coverage requires each statement to be exercised by at
least one test case.
• Statement here is a non-conditional program statement (i.e., not a predicate)
Statement Coverage
Exercising all statements in the program sounds like a reasonably
thorough criterion.

What do you think?

Can we do better?
Statement Coverage
• It is possible to achieve full statement coverage without having seen the
outcome of the program when a condition is false.
if (fileIsReadable) {
open the file;
read a line;
}
close the file;
• You can test this program with a readable file and successfully cover all the
statemtents
• However, if the file does not exist, the program will attempt to close a file
which has not previously been opened!
Statement Coverage (cont.)
• The standard metric for statement coverage Cstatement is
• Cstatement = (# of statements executed) / (# of statements)
• Consider an if-else statement containing one statement in the then
clause and 99 statements in the else clause.
• Statement coverage can be 1% or 99%! (in addition to 0 or 100)
• This doesn’t tell you how many more test cases you need
• Basic block coverage eliminates this problem.
Statement Coverage (cont.)
• Statement coverage is considered to be the weakest criterion.
• Statement coverage would probably be reasonable if all faults were
highly local.
• But they’re not (though many are, which is why this is still useful)
• Many faults are related to the combination of different parts of the program
— the “if” branch here and the “else” branch there — so we consider other
types of coverage.
Branch Coverage
• Other names: decision coverage, all-edges coverage
• Decision ≡ Predicate ≡ The entire relational expression
• Branch: A possible predicate outcome (True or False)
• i.e., an if condition yields two branches
• Let T be a test suite for program P.
• T satisfies the branch adequacy criterion for P, iff. for each branch B of P,
there exists at least one test case in T that causes the execution of B.
• Branch coverage requires that every decision in the program has taken all
possible outcomes at least once.
• How good is branch coverage?
Branch Coverage (cont.)
• Clearly, better than statement coverage, as both branches must be
exercised. Unfortunately not as strong as it might appear at first.
• Even when exercising all branches, values matter: Consider:
if ((x + y + z) / 3 == x) {
printf(“x, y, z are equal in value”);
} else {
printf(“x, y, z are not equal in value”);
}
• {x=10, y=10, z=10} and {x=15, y=20,z=25} exercise both branches…
but reordering the second input to {x=20, y=15, z=25} would execute
the first branch…
Coverage tests still need good test oracles!
Branch Coverage (cont.)
Consider the following:
if (A && (B || f()) )
S1;
else
S2;

It is possible to exercise both S1 and S2 without ever calling f(). How?


Possible Predicate Faults
What can go wrong with predicates?

• Arithmetic expression fault


• Relational operator fault
• Boolean operator fault
• Boolean variable fault
• Parenthesis fault
• Dereferencing invalid pointer (e.g. null, freed memory)
• Anything that goes wrong in a function call (invalid argument…)
Condition Coverage
• Condition = A boolean expression containing no boolean operators. (A
decision without boolean operators.)
• “x < 3” is a condition
• “x<3 || y>4” is not a condition, because it contains a boolean operator
• Condition coverage requires that every condition in a decision of the
program has taken all possible outcomes at least once.
• For the decision (C1βC2β...βCn), condition Ci must evaluate once to true and
once to false
Condition Coverage (cont.)
• Consider the decision (A ∨ B)
• The following test cases satisfy Condition coverage:
Test A B
t1 T F
t2 F T

Major Problem
You can satisfy condition coverage even if a predicate always
evaluates to true!
Condition/Decision Coverage
• Condition/Decision coverage requires that every condition in a
decision of the program has taken all possible outcomes at least once,
and every decision in the program has taken all possible outcomes at
least once.
• Consider the decision (A ∨ B)
• The following test cases satisfy Condition/Decision coverage:
Test A B
t1 T T
t2 F F
Condition/Decision Coverage (cont.)
• But..
• The previous set of test cases don’t distinguish:
• (A∨B)
• (A)
• (B)
• (A∧B)
• E.g., what if the code behaves differently when A is true while B is
false?
Other Coverage Criteria
• There are other, more stringent criteria
• Multiple Condition Coverage (MCC) requires all possible combinations
of condition outcomes (i.e., exponential # of tests)
• Modified Condition/Decision Coverage (MCDC) is more often used for
critical code: basically Condition/Decision plus tests to show each
condition actually matters to outcomes
• Coverage criteria are about establishing baselines for how many paths
the test suite covers.
• More is always more thorough, but the trick is to require additional tests that
are likely to find bugs --- i.e., good return on testing effort
End of Lecture: Coverage Live Demo
• Get code from https://github.com/Drexel-se320/examples
• The commands you’ll see will be:
• ./gradlew test --tests TestEditDistance jacocoTestReport
• ./gradlew clean
• The report file we’ll open will be under
• build/reports/coverage/index.html

You might also like