fac talk 1 web

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Integrated Learning & Training for Interactive Characters: 

Integrated Learning & Training for Interactive Characters Bruce Blumberg & the Synthetic Characters Group www.media.mit.edu/~bruce

Field work…: 

Field work…

Where we have been and are going…: 

Where we have been and are going…

Practical & compelling real-time learning: 

Practical & compelling real-time learning Easy for interactive characters to learn what they ought to be able to learn Easy for a human trainer to guide learning process A compelling user experience Provide heuristics and practical design principles

Our bias & focus: 

Our bias & focus Learning occurs within an innate structure that biases… Attention Motivation Innate frequency, form and organization of behavior When certain things are most easily learned What are the catalytic components of the scaffolding that dramatically facilitate the learning & training process?

Where we draw from…: 

Where we draw from… Reinforcement learning Barto & Sutton 98, Mitchell 97, Kaelbling 90, Drescher 91 Animal training and ethology Lindsay 00, Lorenz & Leyhausen 73, Ramirez 99, Pryor 99, Coppinger 01 Motor learning van de Panne et al 93,94, Grzeszczuk & Terzopoulos 95, Hodgins & Pollard 97, Gleicher 98, Faloutsos et al 01 Behavior Architectures Reynolds 87, Tu & Terzopoulos 94, Perlin & Goldberg 96, Funge et al 99, Burke et al 01 Computer games & digital pets Dogz, AIBO, Black & White

Dobie T. Coyote Goes to School: 

Dobie T. Coyote Goes to School Short Dobie Video

Reinforcement Learning (R.L.) As Starting Point: 

Reinforcement Learning (R.L.) As Starting Point Utility of taking action A3 in state S2 Set of all possible actions Set of all possible states of world

The problem facing dogs (real and synthetic): 

The problem facing dogs (real and synthetic) Set of all possible actions Set of all motivational goals Set of all possible stimuli What do I do, when, in order to best satisfy my motivational goals?

The space of possible stimuli is wicked big: 

The space of possible stimuli is wicked big Time of Occurrence State Space

The space of possible actions is also very big: 

The space of possible actions is also very big Set of all possible actions Action Time of Performance Action Space

Who gets credit for good things happening?: 

Who gets credit for good things happening? Yumm.. Action Figure -8 Shake High -5 Beg Down Left ear twitch Modality of Stimuli

Dogs seem to constrain search for causal agents: 

Dogs seem to constrain search for causal agents Time Consequences Window: Trainer “clicks” signaling reward is coming. When reward is actually received Attention Window: Cue given immediately before or as dog is moving into desired pose Sit Approach Eat Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows

Dogs seem to use implicit feedback to guide perceptual learning: 

Dogs seem to use implicit feedback to guide perceptual learning Sit Time “sit-utterance” perceived. Approach Eat “click” perceived. Dog decides to sit Build & update perceptual model of “sit-utterance” Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models

Dogs seem to give credit where credit is due: 

Dogs seem to give credit where credit is due Sit Time “sit-utterance” perceived. Approach Eat “click” perceived. Dog decides to sit Credit sitting in presence of “sit-utterance” Build & update perceptual model of “sit-utterance”

Dogs seem to give credit where credit is due…: 

Dogs seem to give credit where credit is due… Trainer repeatedly lures dog through a trajectory or into a pose Eventually, dog performs behavior spontaneously Implication Dog associates reward with resulting body configuration or trajectory and not just with “follow-your nose”

D.L.: Take Advantage of Predictable Regularities: 

D.L.: Take Advantage of Predictable Regularities Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Use consequences to bias choice of action But vary performance and attend to differences Explore state and action spaces on “as-needed” basis Build models on demand

D.L.: Make Use of All Feedback: Explicit & Implicit: 

D.L.: Make Use of All Feedback: Explicit & Implicit Use rewarded action as context for identifying Promising state space and action space to explore Good examples from which to construct perceptual models, e.g., A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

D.L.: Make Them Easy to Train: 

D.L.: Make Them Easy to Train Respond quickly to “obvious” contingencies Support Luring and Shaping Techniques to prompt infrequently expressed or novel motor actions “Trainer friendly” credit assignment Assign credit to candidate that matches trainer’s expectation

The System: 

The System

Representation of State: Percept: 

Representation of State: Percept Percepts are atomic perception units Recognize and extract features from sensory data Model-based Organized in dynamic hierarchy

Representation of State-Action Pairs: Action Tuples: 

Representation of State-Action Pairs: Action Tuples Action Tuples are organized in dynamic hierarchy and compete probabilistically based on their learned value and reliability

Representation of Action: Labeled Path Through Space of Body Configurations: 

Representation of Action: Labeled Path Through Space of Body Configurations A motor program generates a path through a graph of annotated poses, e.g., Sit animation Follow-your-nose procedure Paths can be compared and classified just like perceptual events using Motor Model Percepts

Use Time to Constrain Search for Causal Agents: 

Use Time to Constrain Search for Causal Agents Sit Attention Window: Look here for cues that appear correlated with increased likelihood of action being followed by a good thing Good Thing Consequences Window: Assume any good or bad things that happen here are associated with the preceding action and the context in which it was performed Scratch Time

Four Important Tasks Are Performed During Credit Assignment: 

Four Important Tasks Are Performed During Credit Assignment Choose most worthy Action Tuple heuristically based on reliability and novelty statistics Update value Create new Action Tuples as appropriate Guide State and Action Space Discovery

Implicit Feedback Guides State Space Discovery: 

Implicit Feedback Guides State Space Discovery Good Thing appears. Create a new Percept with “beg” example as initial model Time Utterance occurs within window but not classified by any existing percept “beg” This means that Percepts are only created to recognize “promising” utterances Beg Good Thing Scratch

Implicit Feedback Identifies Good Examples: 

Implicit Feedback Identifies Good Examples Beg Good Thing Good Thing appears. Update model of “beg” utterance using “beg” that occurred in attention window Scratch Time Classify utterance as “beg”. “beg” This means model is built using good examples

Unrewarded Examples Don’t Get Added to Models: 

Unrewarded Examples Don’t Get Added to Models Beg Sit Beg ends without food appearing. Do not update model since example may have been bad. Scratch Time Utterance classified as “Beg” by mistake. Beg becomes active. “Leg” Actually, bad examples can be used to build model of “not-Beg.”

Most Worthy Action Tuple Gets Credit: 

Most Worthy Action Tuple Gets Credit Sit Time “sit-utterance” perceived. Good Thing “click” perceived. <true/Sit> begins But credit goes to <“sit-utterance”/Sit>

Create New Action Tuples As Appropriate: 

Create New Action Tuples As Appropriate

Implicit Feedback Guides Action Space Discovery: 

Implicit Feedback Guides Action Space Discovery Good Thing appears. Compare accumulated path to known paths Time “Follow-your-nose” action accumulates path through pose-space Down Down gets the credit for Good Thing appearing, rather than “Follow-your-nose.” Follow-your-nose Good Thing

If Path Is Novel, Create a New Motor Program and Action: 

If Path Is Novel, Create a New Motor Program and Action Good Thing appears. Compare accumulated path to known paths Time “Follow-your-nose” action accumulates path through pose-space Figure-8 is created and subsequent examples of Figure-8 are used to improve model of path Figure-8 Follow-your-nose Good Thing

Dobie T. Coyote…: 

Dobie T. Coyote… Long Dobie Video

Limitations and Future Work: 

Limitations and Future Work Important extensions Other kinds of learning (e.g., social or spatial) Generalization Sequences Expectation-based emotion system How will the system scale?

Useful Insights: 

Useful Insights Use Temporal proximity to limit search. Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration “trainer friendly” credit assignment Luring and shaping are essential

Future work: the problem of sequences…: 

Future work: the problem of sequences…

Who gets credit for good things happening?: 

Who gets credit for good things happening? Yumm.. Time

Conventional idea: back propagation from goal: 

Conventional idea: back propagation from goal stalk grab-bite eye orient kill-bite chase Yumm.. Time Credit flows backward

Conventional idea: back propagation from goal: 

Conventional idea: back propagation from goal stalk grab-bite eye orient kill-bite chase Yumm.. Time Credit flows backward

Conventional idea: back propagation from goal: 

Conventional idea: back propagation from goal stalk grab-bite eye orient kill-bite chase Yumm.. Time Credit flows backward

Back propagation is a slow way to learn…: 

Back propagation is a slow way to learn… Search space is potentially huge Individual elements of sequence may need to be perfected in order to reach goal at all. Necessary but rarely successful behaviors may be very difficult to learn.

Leyhausen’s suggestion…: 

Leyhausen’s suggestion… stalk grab-bite eye orient kill-bite chase Time Each element is innately self-motivating and has innate reward metric motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward

Leyhausen’s suggestion…: 

Leyhausen’s suggestion… stalk grab-bite eye orient kill-bite chase Time Each element is innately self-motivating and has innate reward metric motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward

Functional goal plays incidental role: 

Functional goal plays incidental role stalk grab-bite eye orient kill-bite chase Time Propagated value from functional goal plays incidental role Yumm..

Coppinger’s suggestion, part 1…: 

Coppinger’s suggestion, part 1… grab-bite eye orient kill-bite chase Time Varying innate tendency to follow behavior with “next” in sequence Internal motivation External motivation stalk

Coppinger suggestion, part 2: 

Coppinger suggestion, part 2 Wolf Border Collie Live stock Guarding dog

Coppinger’s suggestion, part 3 : 

Coppinger’s suggestion, part 3 Border Collie Livestock Guarding dog Sensitive period for social development Onset of predatory behaviors Border Collies incorporate predatory patterns into social play because of early onset of these patterns After Coppinger

Future work: the problem of sequences: 

Future work: the problem of sequences What can we learn from how animals “learn” sequences? Sequence may be “learned” apart from function Elements may be self-motivating and have local metric of goodness Innate bias of varying degree to perform all or part of “sequence” Role of developmental timing in determining function

Future work: learning from learning: 

Future work: learning from learning Does it pay to explore? Does exploring lead to good things or bad things? Best environment in which to explore? What can be learned from watching others? How do I know if I am doing it right?

Predictability & control: 

Predictability & control Is it a predictable world? Can I predict the potential arrival of good or bad things so as to: Increase chances of good thing happening Decrease chances of bad thing happening Is it a controllable world? Can I control the world so as to satisfy my motivational goals?

P & C are learned, and in turn affect learning: 

P & C are learned, and in turn affect learning

Predictability and Controllability: 

Predictability and Controllability -p unpredictable p predictable c controllable -c uncontrollable After Lindsay Anxiety Confidence Depression Frustration

Practical & compelling real-time learning: 

Practical & compelling real-time learning Easy for interactive characters to learn what they ought to be able to learn Easy for a human trainer to guide learning process A compelling user experience Provide heuristics and practical design principles

Acknowledgements: 

Acknowledgements Members of the Synthetic Characters Group, past, present & future Gary Wilkes Funded by the Digital Life Consortium

The problem: 

The problem If each element in sequence has 3 variants, there are 729 possible combinations of which 1 may work (ignoring stimuli) If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of stimuli-action pairs to explore. Don’t know if it is the right sequence until goal is reached What happens if “variant” needs to be learned?

Big idea: innate biases facilitate learning : 

Big idea: innate biases facilitate learning Biases include… Temporal Proximity implies causality Attend more readily to certain classes of stimuli than to others (motion vs. speech) Lazy discovery (pay attention once you have a reason to pay attention) Elements may be “innately” self-motivating and have local metric of “goodness”