Black Box Software TestingSpring 2005 : Black Box Software Testing Spring 2005 MULTI-VARIABLE TESTING
by
Cem Kaner, J.D., Ph.D.
Professor of Software Engineering
Florida Institute of Technology
and
James Bach
Principal, Satisfice Inc. Copyright (c) Cem Kaner & James Bach, 2000-2005
This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
These notes are partially based on research that was supported by NSF Grant EIA-0113539 ITR/SY+PE: "Improving the Education of Software Testers." Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Combination Chart : Combination Chart In a combination test, we test several variables together. Each test
explicitly sets values for each of the variables under test.
Challenges of multivariable testing : Challenges of multivariable testing The space of possible tests is enormous
How to decide which variables to combine?
What IS the fault model?
How to figure out what the relationships among the variables actually are, in detail?
Combination Testing : Combination Testing There are several approaches to combination testing:
Mechanical (or procedural). The tester uses a routine procedure to determine a good set of tests
Risk-based. The tester combines test values (the values of each variable) based on perceived risks associated with noteworthy combinations
Scenario-based. The tester combines test values on the basis of interesting stories created for the combinations
Domain testing : Domain testing In 1-dimensional testing, we run two tests for every boundary:
Test the boundary-valid value
Test the boundary-invalid value
For example if X < 24 defines the boundary, we use X=24 (invalid) and X=24-delta (valid). We choose the smallest workable delta to minimize the possibility of an error hiding within the gap between 24 and 24-delta.
In multi-dimensional testing, we start by testing each dimension on its own, reasonably thoroughly.
Then we reduce the set of values to test per dimension, probably to the boundaries:
Too-low (TL), valid lowest (VL), valid biggest (VB), too big (TB)
where
TL = VL-delta and TB=VB+delta
Defining the domain: 2 variables : Defining the domain: 2 variables Suppose we have two numeric variables, V1and V2.
We analyze each variable in terms of its subdomains and boundaries. Thus we have for each variable:
V1: Too-low (TL), valid lowest (VL), valid biggest (VB), too big (TB)
V2: Too-low (TL), valid lowest (VL), valid biggest (VB), too big (TB)
Where we set
TL = VL-delta (smallest available difference between two numbers)
TH = VB+delta
Example: 2 variables : Example: 2 variables Consider the following domain definition:
1 <= V1 < 4
1 <= V2 < 4
Store data to 3 digits precision:
TL = 0.999
VL = 1.00
VB = 3.99
TB = 4.00
Defining the domain3 independent variables : Defining the domain 3 independent variables Suppose we have 3 numeric variables, V1, V2, V3.
We analyze each variable in terms of its subdomains and boundaries. Thus we might have for each variable:
V1: Too-low (TL), valid lowest (VL), valid biggest (VB), too big (TB)
V2: Too-low (TL), valid lowest (VL), valid biggest (VB), too big (TB)
V3: Too-low (TL), valid lowest (VL), valid biggest (VB), too big (TB) In this simple model, anything inside the box is a valid value, and anything outside the box is not.
(When we restrict ourselves to valid values, we are thinking inside the box.)
Mechanical approach #1"Weak testing“ version 1 : Mechanical approach #1 "Weak testing“ version 1 We create enough tests to cover every value of every variable, once. If the largest number of values is N, we need only N tests
Note the collisions of error cases. If Test 3 fails, is it because of the bad value of V1, V2, V3, or some combination of them?
What bug do we expect to find in Test 3 that we would not find in a test of single dimension, with a bad value? Why do we need a combination? Too-low (TL), lowest valid (VL), biggest valid (VB), too big (TB)
"Weak testing" version 2 : "Weak testing" version 2 In this second version, we treat error cases specially:
Generate a core set of tests for "valid" (non-error) inputs
Generate additional tests in which one error case is allowed per test case. (Jorgensen calls this “weak robust equivalence class testing.”
We might also add a few market-critical combinations Too-low (TL), lowest valid (VL), biggest valid (VB), too big (TB)
"Weak testing" version 3 - All Singles - : "Weak testing" version 3 - All Singles - Drop the error cases
test them in single-variable tests.
Create tests only for valid values
Jorgensen calls this “weak normal equivalence class testing”
Note the coverage that we do and do not achieve:
We have a test for every valid value of interest of every variable
We are not set up to detect interactions among variables.
Here, for example, we check all minima together and all maxima.
Should we worry about Low-High combinations?
Mechanical approach #2"Strong testing" version 1 : Mechanical approach #2 "Strong testing" version 1 Test every combination of values of interest:
Jorgensen calls this "strong robust equivalence class testing" This is part of the table. The complete table has
4 * 4 * 4 tests.
In general, if N is the number of variables we test together and they have k1, k2 … kN values, strong testing requires
k1 x k2 x … x kN tests Too-low (TL), lowest valid (VL), biggest valid (VB), too big (TB)
"Strong testing" version 2All n-tuples : "Strong testing" version 2 All n-tuples Start with strong testing
But restrict the values of interest to valid values.
Jorgensen calls this “strong normal equivalence class analysis”
Cover error cases in the one-variable tests.
If there are N independent dimensions, and we test only LV and BV for each, there are 2N tests
More “strong testing" : More “strong testing" Another variation includes all valid-value combinations plus a separate set of combination tests in which one, some, or all variables have an error value.
Tests that include several errors are of interest only if we think that multiple errors might have some type of cumulative effect.
Mechanical approach #3Combinatorial testing : Mechanical approach #3 Combinatorial testing We have N variables
We assume the variables are independent
A value of one variable does not change the effects or validity of values of other variables
We consider only valid values of interest
An invalid value stops the test.
(V1, V2, Bad, V4, V5) what do we learn about V1, V2, V4 or V5?
Anything in this test of interest other than “Bad” will be masked
Our goal is to sample from the space of possible N-tuples in way that assures a minimum level of combination coverage:
All N-tuples all combinations of valid values
All singles all individual valid values
All pairs all pairs of valid values
All triples all triplets of valid values
Combinatorial Example : Combinatorial Example Here is a simple Find dialog. It takes three inputs:
Find what: a text string
Match case: yes or no
Direction: up or down
Simplify this by considering only three values for the text string, “lowercase” and “Mixed Cases” and “CAPITALS”.
Combinations Example : Combinations Example 1 How many combinations of these three variables are possible?
2 List ALL the combinations of these three variables.
3 Now create combination tests that cover all possible pairs of values, but don’t try to cover all possible triplets. List one such set.
4 How many test cases are in this set?
Combinations Example : Combinations Example 1. How many combinations of these three variables are possible?
Find what has 3 values (lowercase, mixed, caps) (L M C)
Match case has 2 values (Yes / No) (Y N)
Direction has 2 values (Up / Down) (U D)
So there will by 3 x 2 x 2 = 12 tests
2. List ALL the combinations of these three variables.
L Y U M Y U C Y U
L Y D M Y D C Y D
L N U M N U C N U
L N D M N D C N D
3. By the way, a more complete analysis will also consider whether the string is in the document or not. We’ll add a 4th binary variable to the analysis soon.
Building an all-pairs table : Building an all-pairs table Label the columns with the variable names.
List variables in descending order (of number of possible values)
Each column will have repetition.
To determine how many times (rows in which) to repeat the first value before creating a row for the second multiply the number of variable values in column 1 x the number that will be in column 2
In our example,
Find What has 3 values
Match Case has 2 values
So there will be at least 6 rows
Combination Testing : Combination Testing Building an all-pairs combination table:
In the second column, list all the values of the second variable, skip the line, list the values again, etc. In our example, variable 2’s possible values are U,D so the table looks like this so far
Combination Testing : Combination Testing Building an all-pairs combination table:
Each section of the third column (think of LL as defining a section, MM as defining another) will have to contain every value of variable 3. Order the values such that the variables also make all pairs with variable 2.
Our variable 3 has two values, U and D
The third section can be filled in either way, and you might highlight it so that you can reverse it later. The decision (say D, U) is arbitrary.
Combination Testing : Combination Testing Now that we’ve solved the 3-column exercise, let’s try adding more variables. Each will have two values.
Let’s start by making this look a little more general
The 4th column goes in easily:
We start by making sure we hit all pairs of values of column 4 and column 2
then all pairs of column 4 and column 3.
Combination Testing : Combination Testing Watch this first attempt on column 5. We achieve all pairs of JK with columns 1, 2, and 3, but miss it for column 4.
The most recent arbitrary choice was KJ in the 2nd section. (Once that was determined, we had to pick JK for the third in order to pair K with an F in the 3rd column.)
So we will erase the last choice and try again:
Combination Testing : Combination Testing We flipped the last arbitrary choice (column 5, section 2, to JK from KJ) and erased the JK in section 3.
We then fill in section 3 by checking for missing pairs.
JK, JK, JK gives us three DJ, DJ, DJ pairs (2nd and 5th columns) so we have to flip to KJ for the third section.
Now everything works
Combination Testing : Combination Testing But when we add the next column, we see that we just can’t achieve all pairs with 6 values. The first one works up to column 4 but then fails to get pair KL or JM. The next fails on HM and IL
Combination Testing : Combination Testing When all else fails, add rows. We need one for HM and one for IL, so add two rows. In general, we would need as many rows as the last column has values.
The other values in the two rows are arbitrary, leave them blank and fill them in as needed when you add new columns. At the very end, fill the remaining blank ones with arbitrary values
We have 8 tests instead of 3x2x2x2x2x2=96
Let’s try this again on an old Netscape preference dialog : Let’s try this again on an old Netscape preference dialog
The Netscape example : The Netscape example If we just look at the Appearance tab of the Netscape Preferences dialog, we see the following variables:
Toolbars -- 3 choices (P, T, B)
(pictures, text or both)
On Startup Launch --(browser, mail, news). Each is an independent binary.
Browser (Y, N)
Mail (Y, N)
News (Y, N)
Start With -- 3 choices (B,V,E)
(blank page, valid existing file, error (syntax) in the URL)
(Many more cases are possible)
Links -- 2 choices (D,U)
(don’t underline, underlined)
Followed Links -- 2 choices (N,E)
(never expire, expire after 30 days) (Many more cases are possible)
The Netscape example : The Netscape example I simplified the combinations by simplifying the choices for two fields.
In the Start With field, I used either a valid home page name or a blank. Some other tests for this field are:
Link to a different type of file, such as pdf
Link to a nonexistent file
Abbreviated URL, such as name.htm instead of http://
File on the local drive, the local network drive, or the remote drive
maximum length file names, maximum length paths
Note that a bad URL won’t stop Netscape from starting, so we should be able to use an error case here without blocking testing of the other variables
For combination testing, select a few of these that look like they might interact with other variables. Test the rest independently.
Similarly for the Expire After field. This lets you enter the number of days to store links. If you use more than one value, use boundary cases, not all the numbers in the range.
In multi-variable testing, use partition analysis or other special values instead of testing all values in combination with all other variables’ all values.
All N-tuples : All N-tuples We can create 3 x 2 x 2 x 2 x 3 x 2 x 2 = 288 different test cases by testing these variables in combination. Here are some examples, from the combination table.
This is what Jorgensen would call “strong normal” testing.
Strong because we test for faults triggered by a combination of conditions.
Normal because we omit error cases.
Here are the 288 test cases. Every value of every variable is combined with every combination of the other variables. : Here are the 288 test cases. Every value of every variable is combined with every combination of the other variables.
All N-tuples : All N-tuples When creating a combination table, I strongly recommend that you order the columns from the variable with the most values to the variable with the least.
All Singles : All Singles There are 3+3+2+2+2+2+2=16 different individual (single) values of interest.
We can cover them in 3 tests
What about pairs? : What about pairs? To simplify this, many testers would test variables in pairs, each test involving only 2 values.
There are 109 pairs in our example.
Testing only 2 variables at once is an inefficient form of combination testing.
One test that combines 7 variables incorporates 21 tests of pairs of variables.
Combinatorics : Combinatorics “Combinatorics is, loosely, the science of counting. This is the area in mathematic in which we study families of sets (usually) finite with certain characteristic arrangements of their elements or subsets, and ask what combinations are possible, and how many there are. This includes numerous quite elementary topics, such as enumerating all possible permutations or combinations of a finite set.” www.albany.edu/faculty/tangr/isp602/notes/terms.htm
In combinatorial testing, we test many variables together as an efficient way of testing many of the combinations of those variables (e.g. testing 7 variables together in one test captures 7C2 = 21 tests of the pairs).
So how many tests would we have to run to cover all the pairs?
All pairs for Netscape : All pairs for Netscape We can cover all 109 pairs inside 9 tests
All pairs for Netscape : All pairs for Netscape Let’s work it through.
We start with the first two variables (biggest and second biggest number of values of interest.)
Here are all the pairs of those two variables. There are 3x3 = 9 of them
All pairs for Netscape : All pairs for Netscape Add the next variable
We need all the pairs of the
1st and 2nd variables
1st and 3rd variables
2nd and 3rd variables
We already have the pairs for 1st & second variables
For the 1st and 3rd, we need
a Y with a P, an N with a P,
a Y with a T, an N with a T,
a Y with a B and an N with a B.
The values of the 3rd variable for the other cases don’t matter for 1st & 3rd, but they might matter for 2nd & 3rd.
All pairs for Netscape : All pairs for Netscape Add the 4th variable. We have the pairs for the 1st 3, we just have to work in the 4th.
All pairs for Netscape : All pairs for Netscape Keep going, through the 7th variable.
All pairs : All pairs Reminder of a common misconception.
The lower bound on the number of rows is the number of values of column 1 times column 2 but we often need more than that.
Combinatorial testing : Combinatorial testing At one of the LAWST meetings, we were advised that Microsoft often uses a modified all-singles in configuration testing:
All singles, plus
All other combinations designated by marketing (or by error history) as particularly interesting
Similarly if we use all pairs, we might add to the set of tests:
Special cases (marketing)
Special cases (identified risks of higher-order interactions)
Combinatorial testing : Combinatorial testing www.pairwise.org has been collecting references and links to tools, including free tools
Another free tool is at http://www.satisfice.com/tools/pairs.zip
Rob Vanderwall has developed VPTAG, which allows you to specify some constraints (a given value of X makes a range of values of Y impossible).
Let’s add some complications : Let’s add some complications So far, we’ve assumed
Independent variables
All valid values are equivalent
What if we have multiple valid equivalence classes?
Let’s assume fixed precision, to 1 digit after decimal
Invalid: X < -100 boundary -100.1
Valid 1 -100 <= X < 0 bounds -100.0, -0.1
Valid 2 0 <= X <=5 bounds 0.0, 5.0
Valid 3 5 < X < 10 bounds 5.1, 9.9
Invalid 10<= X bounds 10.0
These all become values of interest in combinatorial tests
Let’s add more complications : Let’s add more complications So far, we’ve assumed
Independent variables
All valid values are equivalent
What if the values of one variable affect the validity or effect of the values of another?
A common example:Testing a date field : A common example: Testing a date field 0 < day < it depends
1 <= month <= 12
2000 < year < 3000 (whatever limits you choose)
For month 2 1 <= day <= 28 or 29
For months 4, 6, 9, 11 1 <= day <= 30
For months 1,3,5,7,8,10,12 1 <= day <= 31
See Jorgensen for a thorough analysis
Let’s make this more challenging.The next slides present the Open Office Writer page style dialog. : Let’s make this more challenging. The next slides present the Open Office Writer page style dialog.
These are interesting as a group because they all interact. : These are interesting as a group because they all interact.
Just print a page. Its layout is jointly determined by all of these : Just print a page. Its layout is jointly determined by all of these
So how do we test all of these? : So how do we test all of these?
Can you list the relevant variables? : Can you list the relevant variables?
Slide52 : How many variables are on this page?
Slide53 : The number of variables on this page depends on how many columns you choose.
Slide54 : At last, we’re through this one (1) dialog.
You can see why people would give up and do all singles or random combination.
Thoughts on all pairs : Thoughts on all pairs All pairs is ideal for independent variables.
A classic use is configuration testing.
But if they’re independent, why test them in combination?
We are managing a type of coverage here.
We are rarely working from a theory of error.
Schroeder & Bach argue that in this case, we are probably as well off using a random combination algorithm. The set of tests will approximate all pairs
If we combination-test the program several times, randomness creates variation in the testing
All pairs is adaptable when the number of constraints is small
Whenever a test has an invalid pair, substitute two tests, identical except that you substitute a valid value for the first (second) value of the pair. All other pairs in the test stay intact and are tested
Thoughts on all pairs : Thoughts on all pairs All pairs is inefficient when the number of constraints is significant.
Beizer (Black Box Testing) discusses the general case in which there are several levels for each variable and the program behaves differently as a joint function of the settings of several variables. His presentation of a domain testing approach to this problem is interesting but as described, I find it challenging to apply.
In electrical engineering, this situation is analyzed as Combination Circuit Testing. Given a set of values of interest for several variables (you arrive at them through domain analysis or in some other way), the question is whether the program behaves correctly for each combination.
In software testing, the analysis is called Cause-Effect Graphing
Alternative approaches : Alternative approaches Some groups of variables involve too complex a set of relationships for you to analyze (given your skills, tools and the time available) or are not well enough specified for you to analyze.
If you believe that you need to test combinations anyway, and want to consciously control the design of the tests, you might want a technique that helps you explore relationships and make sense of them. There are several approaches to combination testing
Mechanical (or procedural). The tester uses a routine procedure to determine a good set of tests
Risk-based. The tester combines test values (the values of each variable) based on perceived risks associated with noteworthy combinations
Scenario-based. The tester combines test values on the basis of interesting stories created for the combinations.
Exploring relationships : Exploring relationships Look at this record (bigger on the next slide), from the Timeslips Deluxe time and billing database. In this dialog box, click the arrow next to the Consultant field to edit the Consultant record (my name, billing info, etc.) or enter a new one.
If I edit it here, will the changes carry over to every other display of this Consultant record?
Also, note that the End Date for this task is before the Start Date. That’s not possible.
Exploring relationships : Exploring relationships
Exploring relationships : Exploring relationships The program checks the End Date against the Start Date and rejects this pair as impossible because the task can’t end before it starts.
The value of End Date is constrained by Start Date, because End Date can’t be earlier than Start Date.
The value of Start Date constrains End Date, because End Date can’t be earlier than Start Date.
Exploring relationships : Exploring relationships A relationship table
Relationship Table : Relationship Table THE TABLE’S FIELDS
Field: Create a row for each field (Consultant, End Date, and Start Date are examples of fields.)
Entry Source: What dialog boxes can you use to enter data into this field? Can you import data into this field? Can data be calculated into this field? List every way to fill the field -- every screen, etc.
Display: List every dialog box, error message window, etc., that can display the value of this field. When you re-enter a value into this field, will the new entry show up in each screen that displays the field? (Not always -- sometimes the program makes local copies of variables and fails to update them.)
Print: List all the reports that print the value of this field (and any other functions that print the value).
Related to: List every variable that is related to this variable. (What if you enter a legal value into this variable, then change the value of a constraining variable to something that is incompatible with this variable’s value?)
Relationship: Identify the relationship to the related variable.
Exploring relationships : Exploring relationships Given the relationship,
Try to enter relationship-breaking values everywhere that you can enter V1 and V2.
Pay attention to unusual entry options, such as editing in a display field, import, revision using a different component or program
Once you achieve a mismatch between V1 and V2,
the program's data no longer obey rules the programmer expected would be obeyed, so anything that assumes the rules hold is vulnerable.
Do follow-up testing to discover serious side effects of the mismatch
Many relationships among data : Many relationships among data Independence
Varying one has no effect on the permissible values of the other or on how the computer responds to a value of the other variable.
Causal determination
By changing the value of one, we determine the value of the other. For example, in selecting page layout, if you select “Letter” the page becomes 8.5x11.
Constrained to a range
For example, width of a line must be less than the width of the page.
In a date field, the max day is determined by the month
Selection of rules
Example, hyphenation rules depend on the language you choose.
Relations are often reciprocal, so if V2 constrains V1, then V1 might constrain V2 (try to change V2 after setting V1)
Many relationships : Many relationships Logical selection from a list
processes the value you entered and then figures out what value to use for the next variable. Example: timeouts in phone dialing:
0 seconds on complete call 555-1212 but 95551212?
10 seconds on ambiguous completion 955-5121
30 seconds on incomplete 555-121
Logical selection of a list:
For example, in printer setup, choose:
OfficeJet
get Graphics Quality, Paper Type, and Color Options
LaserJet 4
get Economode, Resolution, and Half-toning.
Marick (Craft of Software Testing) discusses catalogs of tests for data relationships.
Complex Relationships : Complex Relationships
Data Relationship Table : Data Relationship Table Looking at the Word options, you see the real value of the data relationships table. Many of these options have a lot of repercussions (they impact many features).
You might analyze all of the details of all of the relationships later, but for now, it is challenging just to find out what all the relationships ARE.
The table guides exploration and will surface a lot of bugs.
-------------------------------------
PROBLEM
Works great for this release. Next release, what is your support for more exploration?
Let’s sum up : Let’s sum up Mechanical approaches:
Give you a handle on some complex problems
Provide easy justification for management. The number of tests needed is driven by theory and computed by the tool. Doesn’t appear discretionary. This is an important difference from random testing.
Provide an intuitively appealing coverage model
Appeal to the mathematically inclined
Are rarely based on a plausible theory of risk. (They’re wasteful, however, if and only if a risk-based model would generate substantially different or fewer tests.) There are several approaches to combination testing
Mechanical (or procedural). The tester uses a routine procedure to determine a good set of tests
Risk-based. The tester combines test values (the values of each variable) based on perceived risks associated with noteworthy combinations
Scenario-based. The tester combines test values on the basis of interesting stories created for the combinations.