Presentation Transcript
Slide1 : Adaptive Intelligent Mobile Robotics
Leslie Pack Kaelbling
Artificial Intelligence Laboratory
MIT
Pyramid : Pyramid Addressing problem at multiple levels
Built-in Behaviors : Built-in Behaviors Goal: general-purpose, robust visually guided local navigation
optical flow for depth information
finding the floor
optical flow information
Horswill’s ground-plane method
build local occupancy grids
navigate given the grid
reactive methods
dynamic programming
Reactive Obstacle Avoidance : Reactive Obstacle Avoidance Standard method in mobile robotics is to use potential fields
attractive force toward goal
repulsive forces away from obstacles
robot moves in direction given by resultant force
New method for non-holonomic robots: move the center of the robot so that the front point is holonomic
Human Obstacle Avoidance : Human Obstacle Avoidance Control law based on visual angle and distance to goal and obstacles
Parameters set based on experiments with humans in large free-walking VR environment
Humans are Smooth! : Humans are Smooth!
Behavior Learning : Behavior Learning Typical RL methods require far too much data to be practical in an online setting. Address the problem with
strong generalization techniques
locally weighted regression
“skeptical” Q-Learning
bootstrapping from human-supplied policy
need not be optimal and might be very wrong
shows learner “interesting” parts of the space
“bad” initial policies might be more effective
Two Learning Phases : Two Learning Phases Learning
System Phase One A R O
Two Learning Phases : Two Learning Phases Learning
System A R O Phase Two
New Results : New Results Drive to goal, avoiding obstacles in visual field
Inputs (6 dimensions):
heading and distance to goal
image coordinates of two obstacles
Output:
steering angle
Reward:
+10 for getting to goal; -5 for running over obstacle
Training: simple policy that avoids one obstacle
Robot’s View : Robot’s View
Local Navigation : Local Navigation
Map Learning : Map Learning Robot learns high-level structure of environment
topological maps appropriate for large-scale structure
low-level behaviors induce topology
based on previous work using sonar
vision changes problem dramatically
no more problems with many states looking the same
now same state always looks different!
Sonar-Based Map Learning : Sonar-Based Map Learning Data True Model
Current Issues in Map Learning : Current Issues in Map Learning
segmenting space into “rooms”
detecting doors and corridor openings
representation of places
stored images
gross 3D structure
features for image and structure matching
Large Simulation Domain : Large Simulation Domain Use for learning and large-scale experimentation that is impractical on a real robot
built using video-game engine
large multi-story building
packages to deliver
battery power management
other agents (to survey)
dynamically appearing items to collect
general Bayes-net specification so it can be used widely as a test bed
Hierarchical MDP Planning : Hierarchical MDP Planning Large simulated domain has unspeakably many primitive states
Use hierarchical representation for planning
logarithmic improvement in planning times
some loss of optimality of plans
Existing work on planning and learning given a hierarchy
temporal abstraction: macro actions
spatial abstraction: aggregated states
Where does the hierarchy come from?
combined spatial and temporal abstraction
top-down splitting approach
Region-Based Hierarchies : Region-Based Hierarchies Divide state space into regions
each region is a single abstract state at next level
polices for moving through regions are abstract actions at next level
Choosing Macros : Choosing Macros Given a choice of a region, what is a good set of macro actions for traversing it?
existing approaches guarantee optimality with a number of macros exponential in the number of exit states
our method is approximate, but works well when here are no large rewards inside the region
Point-Source Rewards : Point-Source Rewards Compute a value function for each possible exit state, offline
Given a new valuation of all exit states online
Quickly combine value functions to determine near-optimal action
Approximation is Good : Approximation is Good
How to Use the Hierarchy : How to Use the Hierarchy Off line:
Decompose environment into abstract states
Compute macro operators
On line:
Given new goal, assign values to exits at highest level
Propagate values at each level
In current low-level region, choose action
What Makes a Decomposition Good? : What Makes a Decomposition Good? Trade off
decrease in off-line planning time
decrease in on-line planning time
decrease in value of actions
We can articulate this criterion formally but…
… we can’t solve it
Current research on reasonable approximations
Next Steps : Next Steps Low-level
apply JAQL to tune obstacle avoidance behaviors
Map learning
landmark selection and representation
visual detection of openings
Hierarchy
algorithm for constructing decomposition
test hierarchical planning on huge simulated domain
Catch the
buzz on authorSTREAM
Copyright © 2002-2008 authorSTREAM. All rights reserved.