logging in or signing up ICRA2007 Danielle Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 157 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 30, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Value Function Approximationon Non-linear Manifolds for Robot Motor Control : Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu Vijayakumar2) 1) Computer Science, Tokyo Institute of Technology 2) School of Informatics, University of EdinburghMaze Problem: Guide Robot to Goal: Maze Problem: Guide Robot to Goal Robot knows its position but doesn’t know which direction to go. We don’t teach the best action to take at each position but give a reward at the goal. Task: make the robot select the optimal action. Up Right Left Down Possible actions Position (x,y) Goal rewardMarkov Decision Process (MDP): Markov Decision Process (MDP) An MDP consists of : set of states, : set of actions, : transition probability, : reward, An action the robot takes at state is specified by policy . Goal: make the robot learn optimal policyDefinition of Optimal Policy: Definition of Optimal Policy Action-value function: discounted sum of future rewards when taking in and following thereafter Optimal value: Optimal policy: is computed if is given. Question: How to compute ?Policy Iteration: Policy Iteration Starting from some initial policy iterate Steps 1 and 2 until convergence. Step 1. Compute for current Step 2. Update by Policy iteration always converges to if in step 1 can be computed. Question: How to compute ? (Sutton & Barto, 1998)Bellman Equation: can be recursively expressed by can be computed by solving Bellman equation Drawback: dimensionality of Bellman equation becomes huge in large state and action spaces Bellman Equation high computational costLeast-Squares Policy Iteration: Least-Squares Policy Iteration Linear architecture: is learned so as to optimally approximate Bellman equation in the least-squares sense # of parameters is only : LSPI works well if we choose appropriate Question: How to choose ? (Lagoudakis and Parr, 2003) : fixed basis functions : parameters : # of basis functionsPopular Choice: Gaussian Kernel (GK): Popular Choice: Gaussian Kernel (GK) Smooth Gaussian tail goes over partitions : Euclidean distance : Centre state PartitionsApproximated Value Function by GK : Approximated Value Function by GK Values around the partitions are not approximated well. Approximated by GK Optimal value function Log scale 20 randomly located Gaussians Policy Obtained by GK : Policy Obtained by GK GK provides an undesired policy around the partition. GK-based policy Optimal policy Aim of This Research: Aim of This Research Gaussian tails go over the partition. Not suited for approximating discontinuous value functions. We propose new Gaussian kernel to overcome this problem. State Space as a Graph: State Space as a Graph Ordinary Gaussian uses Euclidean distance. Euclidean distance does not incorporate state space structure, so tail problems occur. We represent state space structure by a graph, and use it for defining Gaussian kernels. (Mahadevan, ICML 2005)Geodesic Gaussian Kernels: Natural distance on graph is shortest path. We use shortest path in Gaussian function. We call this kernel geodesic Gaussian. SP can be efficiently computed by Dijkstra. Geodesic Gaussian Kernels Euclidean distance Shortest pathExample of Kernels: Example of Kernels Tails do not go across the partition. Values smoothly decrease along the maze. Geodesic Gaussian Ordinary GaussianComparison of Value Functions: Values near the partition are well approximated. Discontinuity across the partition is preserved. Ordinary Gaussian Optimal Comparison of Value Functions Geodesic Gaussian Comparison of Policies: Comparison of Policies GGKs provide good policies near the partition. Geodesic Gaussian Ordinary Gaussian Experimental Result: Average over 100 runs Geodesic Ordinary Ordinary Gaussian: tail problem Geodesic Gaussian: no tail problem Experimental Result Number of kernels Fraction of optimal statesRobot Arm Reaching: Robot Arm Reaching 2-DOF robot arm State space Joint 1 Joint 2 End effector Object Obstacle Joint 1 (degree) Joint 2 (degree) 0 180 -180 100 0 -100 Reward: +1 reach the object 0 otherwise Task: move the end effector to reach the objectRobot Arm Reaching: Robot Arm Reaching Successfully avoids the obstacle and can reach the object. Moves directly towards the object without avoiding the obstacle. Ordinary Gaussian Geodesic GaussianKhepera Robot Navigation: Khepera Robot Navigation Khepera has 8 IR sensors measuring the distance to obstacles. Task: explore unknown maze without collision Reward: +1 (forward) -2 (collision) 0 (others) Sensor value: 0 - 1030State Space and Graph: State Space and Graph Discretize 8D state space by self-organizing map. Partitions 2D visualizationKhepera Robot Navigation: Khepera Robot Navigation When facing obstacle, goes backward (and goes forward again). When facing obstacle, makes a turn (and go forward). Ordinary Gaussian Geodesic GaussianExperimental Results: Experimental Results Average over 30 runs Geodesic outperforms ordinary Gaussian. Geodesic Ordinary Conclusion: Conclusion Value function approximation: good basis function needed Ordinary Gaussian kernel: tail goes over discontinuities Geodesic Gaussian kernel: smooth along the state space Through the experiments, we showed geodesic Gaussian is promising in high-dimensional continuous problems! You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
ICRA2007 Danielle Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 157 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 30, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Value Function Approximationon Non-linear Manifolds for Robot Motor Control : Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu Vijayakumar2) 1) Computer Science, Tokyo Institute of Technology 2) School of Informatics, University of EdinburghMaze Problem: Guide Robot to Goal: Maze Problem: Guide Robot to Goal Robot knows its position but doesn’t know which direction to go. We don’t teach the best action to take at each position but give a reward at the goal. Task: make the robot select the optimal action. Up Right Left Down Possible actions Position (x,y) Goal rewardMarkov Decision Process (MDP): Markov Decision Process (MDP) An MDP consists of : set of states, : set of actions, : transition probability, : reward, An action the robot takes at state is specified by policy . Goal: make the robot learn optimal policyDefinition of Optimal Policy: Definition of Optimal Policy Action-value function: discounted sum of future rewards when taking in and following thereafter Optimal value: Optimal policy: is computed if is given. Question: How to compute ?Policy Iteration: Policy Iteration Starting from some initial policy iterate Steps 1 and 2 until convergence. Step 1. Compute for current Step 2. Update by Policy iteration always converges to if in step 1 can be computed. Question: How to compute ? (Sutton & Barto, 1998)Bellman Equation: can be recursively expressed by can be computed by solving Bellman equation Drawback: dimensionality of Bellman equation becomes huge in large state and action spaces Bellman Equation high computational costLeast-Squares Policy Iteration: Least-Squares Policy Iteration Linear architecture: is learned so as to optimally approximate Bellman equation in the least-squares sense # of parameters is only : LSPI works well if we choose appropriate Question: How to choose ? (Lagoudakis and Parr, 2003) : fixed basis functions : parameters : # of basis functionsPopular Choice: Gaussian Kernel (GK): Popular Choice: Gaussian Kernel (GK) Smooth Gaussian tail goes over partitions : Euclidean distance : Centre state PartitionsApproximated Value Function by GK : Approximated Value Function by GK Values around the partitions are not approximated well. Approximated by GK Optimal value function Log scale 20 randomly located Gaussians Policy Obtained by GK : Policy Obtained by GK GK provides an undesired policy around the partition. GK-based policy Optimal policy Aim of This Research: Aim of This Research Gaussian tails go over the partition. Not suited for approximating discontinuous value functions. We propose new Gaussian kernel to overcome this problem. State Space as a Graph: State Space as a Graph Ordinary Gaussian uses Euclidean distance. Euclidean distance does not incorporate state space structure, so tail problems occur. We represent state space structure by a graph, and use it for defining Gaussian kernels. (Mahadevan, ICML 2005)Geodesic Gaussian Kernels: Natural distance on graph is shortest path. We use shortest path in Gaussian function. We call this kernel geodesic Gaussian. SP can be efficiently computed by Dijkstra. Geodesic Gaussian Kernels Euclidean distance Shortest pathExample of Kernels: Example of Kernels Tails do not go across the partition. Values smoothly decrease along the maze. Geodesic Gaussian Ordinary GaussianComparison of Value Functions: Values near the partition are well approximated. Discontinuity across the partition is preserved. Ordinary Gaussian Optimal Comparison of Value Functions Geodesic Gaussian Comparison of Policies: Comparison of Policies GGKs provide good policies near the partition. Geodesic Gaussian Ordinary Gaussian Experimental Result: Average over 100 runs Geodesic Ordinary Ordinary Gaussian: tail problem Geodesic Gaussian: no tail problem Experimental Result Number of kernels Fraction of optimal statesRobot Arm Reaching: Robot Arm Reaching 2-DOF robot arm State space Joint 1 Joint 2 End effector Object Obstacle Joint 1 (degree) Joint 2 (degree) 0 180 -180 100 0 -100 Reward: +1 reach the object 0 otherwise Task: move the end effector to reach the objectRobot Arm Reaching: Robot Arm Reaching Successfully avoids the obstacle and can reach the object. Moves directly towards the object without avoiding the obstacle. Ordinary Gaussian Geodesic GaussianKhepera Robot Navigation: Khepera Robot Navigation Khepera has 8 IR sensors measuring the distance to obstacles. Task: explore unknown maze without collision Reward: +1 (forward) -2 (collision) 0 (others) Sensor value: 0 - 1030State Space and Graph: State Space and Graph Discretize 8D state space by self-organizing map. Partitions 2D visualizationKhepera Robot Navigation: Khepera Robot Navigation When facing obstacle, goes backward (and goes forward again). When facing obstacle, makes a turn (and go forward). Ordinary Gaussian Geodesic GaussianExperimental Results: Experimental Results Average over 30 runs Geodesic outperforms ordinary Gaussian. Geodesic Ordinary Conclusion: Conclusion Value function approximation: good basis function needed Ordinary Gaussian kernel: tail goes over discontinuities Geodesic Gaussian kernel: smooth along the state space Through the experiments, we showed geodesic Gaussian is promising in high-dimensional continuous problems!