Testing Interactive Software: A Challenge for Usability and Reliability: Testing Interactive Software: A Challenge for Usability and Reliability Philippe Palanque
LIIHS-IRIT,
University Toulouse 3,
31062 Toulouse, France
palanque@irit.fr Regina Bernhaupt
ICT&S-Center,
Universität Salzburg
5020 Salzburg, Austria
Regina.Bernhaupt@sbg.ac.at Ronald Boring
Idaho National Laboratory
Idaho Falls 83415, Idaho, USA
ronald.boring@inl.gov Chris Johnson
Dept. of Computing Science,
University of Glasgow,
Glasgow, G12 8QQ, Scotland
johnson@dcs.gla.ac.uk Sandra Basnyat
LIIHS-IRIT,
University Toulouse 3,
31062 Toulouse, France
basnyat@irit.fr Special Interest Group – CHI 2006 – Montréal – 22nd April 2006
Outline of the SIG: Outline of the SIG Short introduction about the SIG (10 mn)
Short presentations (20 mn)
Software engineering testing for reliability (Philippe)
Human reliabilty for interactive systems testing (Ron)
Incident and accident analysis and reporting for testing (Sandra)
HCI testing for usability (Regina)
Gathering feedback from audience (10 mn)
Presentation of some case studies (20 mn)
Listing of issues and solutions for interactive systems testing (20 mn)
Discussion and summary (10 mn)
Introduction: Introduction What are interactive applications
What is interactive applications testing
Coverage testing
Non regression testing
Usability versus reliability
What about usability testing of a non reliable interactive application
What about reliable applications with poor usability
Interactive Systems: Interactive Systems
A paradigm switch : A paradigm switch Control flow is in the hands of the user
Interactive application idle waiting for input from the users
Code is sliced
Execution influenced by internal and external states
Nothing new but …
Classical Behavior: Classical Behavior Read Input Exit ? End Read Input Process Input Process Input Exit ?
Event-based Functioning : Event-based Functioning Application Window Manager States At startup Get next event Dispatch event Register Event Handlers Call Window Manager Finished Event Handler 1 Event Handler 2 Event Handler n EH Registration Event
Queue At runtime Ack received Wait for next event
Safety Critical Interactive Systems: Safety Critical Interactive Systems Safety Critical Systems
Software Engineers
System centered
Reliability
Safety requirements (certification)
Formal specification
Verification / Proof
Waterfall model / structured
Archaic interaction techniques
Interactive Systems
Usability experts
User centered
Usability
Human factors
Task analysis & modeling
Evaluation
Iterative process / Prototyping
Novel Interaction techniques
Some Well-known Examples (1/2): Some Well-known Examples (1/2)
Some Well-known Examples: Some Well-known Examples
The Shift from Reliability to Fault-Tolerance: The Shift from Reliability to Fault-Tolerance Failures will occur
Mitigate failures
Reduce the impact of a failure
A small demo …
Informal Description of a Civil Cockpit application: Informal Description of a Civil Cockpit application The working mode
The tilt selection mode: AUTO or MANUAL (AUTO) The CTRL push-button allows to swap between the two modes
The stabilization mode: ON or OFF The CTRL push-button allows to swap between the two modes The access to the button is forbidden when in AUTO tilt selection mode
The tilt angle: a numeric edit box permits to select its value into range [-15°; 15°]
Modifications are forbidden when in AUTO tilt selection mode
Various perspectives of this Special Interest Group: Various perspectives of this Special Interest Group Software engineering testing for reliability
Human reliability testing
Incident and accident analysis and reporting for testing
HCI testing for usability
What do we mean by human error?: 14 Consequence: Inconvenience Consequence: Danger What do we mean by human error?
Conceptualizing error: Conceptualizing error Humans are natural “error emitters”
On average we make around 5-6 errors every hour
Under stress and fatigue that rate can increase dramatically
Most errors are inconsequential or mitigated
No consequences or impact from many mistakes made
Where there may consequences, many times defenses and recovery mechanisms prevent serious accidents 15
Human Reliability Analysis (HRA): Human Reliability Analysis (HRA) Classic Definition
The use of systems engineering and human factors methods in order to render a complete description of the human contribution to risk and to identify ways to reduce that risk
What’s Missing
HRA can be used to predict human performance issues and to identify human contributions to incidents before they occur
Can be used to design safe and reliable systems 16
Performance Shaping Factors (PSFs): Performance Shaping Factors (PSFs) Are environmental, personal, or task-oriented factors that influence the probability of human error
Are an integral part of error modeling and characterization
Are evaluated and used during quantification to obtain a human error rate applicable to a particular set of circumstances
Specifically, the basic human error probabilities obtained for generic circumstances are modified (adjusted) per the specific situation
17
Example: SPAR-H PSFs: Example: SPAR-H PSFs
Maximizing Human Reliability: Maximizing Human Reliability Increasingly, human reliability needs to go beyond being a
diagnostic tool to become a prescriptive tool
NRC and nuclear industry are looking at new designs for control rooms and want plants designed with human reliability in mind, not simply verified after the design is completed
NASA has issued strict Human-Rating Requirements (NPR 8705.2) that all space systems designed to come in contact with humans must demonstrate that they impose minimal risk, they are safe for humans, and they maximize human reliability in the operation of that system
How do we make reliable human systems?
Design
Test
Model 19 } “classic” human factors } human reliability analysis
Best Achievable Practices for HR: Best Achievable Practices for HR The Human Reliability Design Triptych
20
Concluding Thoughts: Concluding Thoughts Human error is ubiquitous
Pressing need to design ways to prevent human error
Impetus comes from safety-critical systems
Lessons learned from safety-critical systems potentially apply across the board, even including designing consumer software that is usable
Designing for human reliability requires merger of two fields
Human factors/HCI for design and testing
Human reliability for modeling
Incidents and Accidents as a Support for Testing : Incidents and Accidents as a Support for Testing Aim, contribute to a design method for safer safety-critical interactive systems
Inform a formal system model
Ultimate goals
Embedding reliability, usability, efficiency and error tolerance within the end product
While ensuring consistency between models
The Approach (1/2): The Approach (1/2) Address the issue of system redesign after the occurrence of an incident or accident
2 Techniques
Events and Causal Factors Analysis
Marking Graphs extracted from a system model
2 Purposes
Ensure current system model accurately models the sequence of events that led to the accident
Reveal further scenarios that could eventually lead to similar adverse outcomes
The Approach (2/2): The Approach (2/2) Incident & accident
investigation part Part of the whole process System design
part
Slide25: ECFA Chart of the Accident
Marking Trees & Graphs: Marking Trees & Graphs Marking Tree – identify the entire set of reachable states
Is a form of state transition diagram
Analysis support tools available
However, can impose considerable overheads when considering complex systems such as those in case study
The Approach : The Approach Not Simplified
Usability Evaluation Methods (UEM): Usability Evaluation Methods (UEM) UEMs conducted by experts
Usability Inspection Methods, Guideline Reviews, …
Any type of interactive systems
UEMs involving the user
Empirical evaluation, Observations, …
Any type of interactive systems (from low-fi prototypes to deployed applications)
Usability Evaluation Methods (UEM): Usability Evaluation Methods (UEM) Computer supported UEMs
Automatic testing based on guidelines, …
Task models-based evaluations, metrics-based evaluation, …
Applications with standardized interaction techniques (Web, WIMP)
Issues of Reliability and Usability: Issues of Reliability and Usability Testing the usability of a non reliable system?
Constructing reliable systems without concerning usability?
Possible ways to enhance, extend, enlarge UEMs to address these needs?
Gathering feedback from the audience through case studies: Gathering feedback from the audience through case studies Do we need to integrate methods OR develop new methods ?
In favor of integration
Joint meetings (including software developers) through brainstorming + rapid prototyping (more problems of non usable reliable systems)
Problems
Some issues are also related to system reliability (ATMs) problem of testing a prototype versus testing the system
Issues of development time rather than application type
Application type has an impact of the processes selected for development
Don’t know how to build a reliable interactive system … whatever time we have
How can reliablity-oriented methods support usability-oriented methods
Gathering feedback from the audience through case studies: Gathering feedback from the audience through case studies How to design for testability (both the reliability of the software and the usability)
Is testing enough or do we need proof
Usability testing is at higher level of abstraction (goal oriented) while software testing is at lower level (functions oriented)
Is there an issue with interaction techniques (do we need precise description of interaction techniques and is it useful for usability testing?)
Automated testing through user-events simulation (how to understand how the user can react to that?)
Issue of reliability according to the intention of the user? and not only the reliability of the system per se
Beyond one instance of use but on reproducing the use many times
Gathering feedback from the audience and case studies: Gathering feedback from the audience and case studies Control Room (Ron)
Home/Mobile – testing in non traditional environments (Regina)
Mining case study (Sandra)
First Case Study: Control Room : First Case Study: Control Room
Slide35: Advanced Control Room Design Transitioning to new domains of Human System Interaction Problem: Next generation nuclear power plants coupled with advanced instrumentation and controls (I&C), increased levels of automation and onboard intelligence all coupled with large-scale hydrogen production present unique operational challenges. PBMR Conceptual design Typical Design Hybrid Controls
Example: Example Software Interface with:
Cumbersome dialog box
No discernible exits
Good shortcuts
Example: Example 10 1 1 1 10 .1 1 1 1 0.1 UCC =
0.1 x 2 =
0.2
Second Case Study: Mobile interfaces: Second Case Study: Mobile interfaces
Testing Mobile Interfaces: Testing Mobile Interfaces Lab or field
Method selection
Data gathering/ analysis
Problematic Area: Testing in non traditional environment
Slide40: Non Traditional Environments Combine and balance different UEMs according to usability/reliability issues
Combine Lab and Field
Select UEMs according to development phase
Third Case Study: Mining Accident: Third Case Study: Mining Accident
Slide42: Reminder
Events & Causal Factors Analysis (ECFA): Events & Causal Factors Analysis (ECFA) Provides scenario of events and causal factors that contributed to the accident
Chronologically sequential representation
Provides overall picture
Relation between factors
Gain overall perspective of
Casual factors such as conditions (pressure, temperature…), evolution of system states
Analysing the accident : Analysing the accident Fatal mining accident involving human operators, piping system & control system
Decided to switch from North to South
Fuel didn’t arrive to plant kilns
Bled pipes while motors in operation
Motor speed auto-increase due to low pressure
Fuel hammer effect
Grinder exploded
Slide45: ECFA Chart of the Accident
Listing of issues and solutions for interactive systems testing: Listing of issues and solutions for interactive systems testing
Slide47: Hybrid methods (Heuristic evaluation refined (prioritisation of Heuristics))
Remote usability testing
Task analysis + system modelling
Cognitive walkthrough (as is)
Towards Solutions : Towards Solutions Formal models for supporting usability testing
Formal models for incidents and accidents analysis
Usability and human reliability analysis
Usability Heuristics: Usability Heuristics Heuristics are key factors that comprise a usable interface (Nielsen & Molich, 1990)
Useful in identifying usability problems
Obvious cost savings for developers
9 heuristics identified for use in the present study
In our framework, these usability heuristics are used as
“performance shaping factors” to constitute a usability error probability (UEP)
Heuristic Evaluation and HRA: Heuristic Evaluation and HRA “Standard” heuristic evaluation HRA-based heuristic evaluation
Heuristic Evaluation Matrix: Heuristic Evaluation Matrix Steps
Determine level of heuristic
Determine product of heuristic multipliers
Multiply product by nominal error rate
Consequence Determination: Consequence Determination Strict consequence assignment in PRA/HRA, part of cut sets approach
More molar approach taken in the present study
“Likely effect of usability problem on usage”
Not literal consequence model
Results in usability consequence coefficient (UCC)
Four consequence levels assigned
high, medium, low, and none
Usability Consequence Matrix: Usability Consequence Matrix Steps
Determine level of usability consequence
Multiply UEP by consequence Multiplier
Usability Consequence Coefficient determines priority of fix
Example: Example Software Interface with:
Cumbersome dialog box
No discernible exits
Good shortcuts
Example: Example 10 1 1 1 10 .1 1 1 1 0.1 UCC =
0.1 x 2 =
0.2
Listing of issues and solutions for new interaction techniques testing: Listing of issues and solutions for new interaction techniques testing
Slide57: Automated autonomous Real-Time Systems (VAL, TCAS) B (Atelier B), Z, … No Interaction Technique WIMP - hierarchical Direct Manipulation All Types of Applications Web Applications Business Applications UML, E/R, … Mobile phones
Future Plans and Announcements : Future Plans and Announcements Future plans
Web site is setup and will be populated (slides, list of attendees, topics, …) http://liihs.irit.fr/palanque/SIGchi2006.html
Further work
IFIP WG 13.5 on Human Error Safety and System Developement ifipwg13-5@irit.fr
NoE ResIST (Resilience for IST) www.resist-noe.org
Workshop on Testing in Non-Traditional Environments at CHI 2006
MAUSE: www.cost-294.org
Announcements
DSVIS 2006, HCI Aero, HESSD next year
Best Achievable Practices for HR: Best Achievable Practices for HR The Human Reliability Design Triptych
63
Best Practices for Design: Best Practices for Design Compliance with applicable standards and best practices documents
Where applicable, ANSI, ASME, IEEE, ISO, or other discipline-specific standards and best practices should be followed
Consideration of system usability and human factors
System should be designed according to usability and human factors standards such as NASA-STD-3000, MIL-STD-1472, or ISO
Iterative design-test-redesign-retest cycle
Tractability of design decisions
Where decisions have been made that could affect the functions of the system, these decisions should be clearly documented
Verified reliability of design solutions
Reliability of systems should be documented through vendor data, cross-reference to the operational history of similar existing systems, and/or test results.
It is especially important to project system reliability throughout the system lifecycle, including considerations for maintenance once the system has been deployed
It is also important to incorporate the estimated mean time before failure into the estimated life of the system 64
Best Practices for Testing: Best Practices for Testing Controlled studies that avoid confounds or experimental artifacts
Testing may include hardware reliability testing, human-system interaction usability evaluation, and software debugging
Use of maximally realistic and representative scenarios, users, and/or conditions
Testing scenarios and conditions should reflect the range of actions the system will experience in actual use, including possible worst-case situations
Use of humans-in-the-loop testing
A system that will be used by humans should always be tested by humans
Use of valid metrics such as statistically significant results for acceptance criteria
Where feasible, the metrics should reflect system or user performance across the entire range of expected circumstances
In many cases, testing will involve use of a statistical sample evaluated against a pre-defined acceptance (e.g., alpha) level for “passing” the test
Documented test design, hypothesis, manipulations, metrics, and acceptance criteria
Should include the test design, hypothesis (or hypotheses), manipulations, metrics, and acceptance criteria 65
Best Practices for Modeling: Best Practices for Modeling Compliance with applicable standards and best practices documents
E.g., NASA NPR 8705.5, Probabilistic Risk Assessment (PRA) Procedures for NASA Programs and Projects or NRC NUREG-1792, Good Practices for Implementing Human Reliability Analysis
Use of established modeling techniques
It is better to use an existing, vetted method than to make use of novel techniques and methods that have not been established
Validation of models to available operational data
To ensure a realistic modeling representation, models must be baselined to data obtained from empirical testing or actual operational data
Such validation increases the veracity of model extrapolations to novel domains
Completeness of modeling scenarios at the correct level of granularity
A thorough task analysis, a review of relevant past operating experience, and a review by subject matter experts help to ensure the completeness of the model
The appropriate level of task decomposition or granularity should be determined according to the modeling method’s requirement, the fidelity required to model success and failure outcomes, and specific requirements of the system that is being designed
Realistic model end states
End states should reflect reasonable and realistic outcomes across the range of operating scenarios 66