Presentation Transcript
Evaluation Methods for Human-System Performance of Intelligent Systems : Evaluation Methods for Human-System Performance of Intelligent Systems Jean Scholtz
Information Access Division
National Institute of Standards and Technology
jean.scholtz@nist.gov
“The System”: “The System” User Interface Human performance Robot
performance Usability and utility of
interaction devices
HCI compared to HRI: HCI compared to HRI Autonomy of systems
Operate in changing Real-world environment
Users have other demanding tasks
Interaction with >1 system
>1 users interacting with a given system
Systems are command driven
Deterministic
Operates in virtual world
In many cases, the computer IS the task
One user/one system
User Interfaces: User Interfaces It’s not just the GUI -
It’s the information
And the interaction…..
The “interface” can’t be done “at the end”
It has to be designed along with the architecture of the system
And to do that, we need to determine what information and what interactions are needed
Usability versus Utility: Usability versus Utility Usability -
whether the user can figure out how to do a task with a given user interface
Utility-
whether the tasks supported in the software and accessed via the user interface provide the user with the functionality appropriate for her task
User Interfaces that provide utility MAY be used even if they lack a certain level of usability
Usability user interfaces that have little utility will NOT be used
Interaction Roles: Interaction Roles Why distinguish?
different interactions needs
different information needs
Proposed roles
supervisor
oversees a number of heterogeneous robots
mixed-initiative: either human or robots can see a problem and request or give help
operator
“inside” robot, takes over and moves robot
mechanic
“outside” robot, actually performs physical adjustments
team mate
human and robot are jointly performing a mission; need to be aware of each other’s actions
bystander
occasional user; needs to understand robot’s actions well enough to co-exist in same environment
Interaction Centric Design/ Evaluation: Interaction Centric Design/ Evaluation Is necessary information present for human/ robot to intervene?
Is information presented appropriately?
Is interaction language efficient for human/robot?
Are interactions efficient and effective?
Do interactions scale to multiple robots?
Do interactions support robot evolution?
Framework for Research: Framework for Research Then user interfaces are constructed by
-deciding which roles are going to be supported
-incorporating that information and interactions into the interface
Information Presence and Presentation Evaluation: Information Presence and Presentation Evaluation Situational awareness assessment
Level one: perception of cues
Level two: comprehension of cues
Level three: ability to predict what will happen next
Direct experimentation using queries
freeze task
SAGAT methodology
Interaction Performance: Interaction Performance Measure the ability of the user to formulate the correct interaction and the robot to understand and carry out
Typical HCI evaluation
construct a set of tasks
give representative users the tasks
compute measures of effectiveness, efficiency, user satisfaction, accuracy
PLUS
robot performance
time and accuracy of robot performance
Scalability and Evolution Evaluation: Scalability and Evolution Evaluation Support for 1:n and n:1 evaluations
use the information presence and presentation with tasks which include 1:n and n:1 interactions
Support for evolution
evaluate the appropriateness of the interaction language, including the level of abstractions
evaluate the information needs
Case Study: Situational Awareness Assessment Tool: Case Study: Situational Awareness Assessment Tool Supervisory role, driving domain, urban terrain
Steps:
construct various scenarios and informational needs in the domain
construct user interface
develop assessment queries for these scenarios
collect data from representative set of users
validate by having users also use driving simulator and note scores
Deliverables:
baseline UI with situational awareness metrics for a set of scenarios
assessment tool
Status
Currently under development
Case Study: ByStander Role: Case Study: ByStander Role Exploratory study on the effects of consistency and expectness of behaviors on ability to construct mental models
Used Sony Aibo, programmed with 4 sets of behaviors
consistent, expected
consistent, unexpected
inconsistent, expected
inconsistent, unexpected
Experiment
asked subjects how they thought they could interact
explained interactions modalities and asked subjects to interact for 10 minutes
asked subjects what the various interaction modalities did
Slide14: Results: Interaction Expectations
Slide15: Results: Mental Model Accuracies
Results: Results Consistent, expected behaviors resulted in higher recall
Unexpected but consistent and expected but inconsistent were lower but were about the same
Unexpected, inconsistent behaviors has lots of variance
Subjects expressed frustration with unexpected behaviors, tried to rationalize inconsistent behaviors
Subjects found it difficult to tell when behaviors ended and tried to interrupt with new command
playing with ball
long action (dancing)
Voice recognition errors tolerated
like real dog
Conclusions: Conclusions Performance of the “system” is more than hardware/software performance
human-robot team performance needs to be measured
HRI interaction roles and issues constitute a framework for the development of evaluation methodologies that address the information centric aspects of the human-robot system