1 / 16

Life-long Learning in Sociable Agents

Life-long Learning in Sociable Agents. A Hierarchical Reinforcement Learning Approach Professor Andrea Thomaz Peng Zhou. Sociable Agents. What are sociable agents? Essentially, agents that must interact with humans in a social manner Why sociable agents?. Major Issues.

pancho
Download Presentation

Life-long Learning in Sociable Agents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Life-long Learning in Sociable Agents A Hierarchical Reinforcement Learning Approach Professor Andrea Thomaz Peng Zhou

  2. Sociable Agents • What are sociable agents? • Essentially, agents that must interact with humans in a social manner • Why sociable agents?

  3. Major Issues • Natural language processing • Required for talking systems • Activity recognition • Not just in the real world • User interface • Agent-human communication, non-linguistic • Life-long learning • Teach, explore, revise • The role of emotions • Not just fluff

  4. My Focus (for the moment) • How to build persistent agents that accumulate concepts and skills “opportunistically” from its environment • Environment includes humans (usually non-expert) • Socially guided learning

  5. Background: Teaching Agents Through Social Interaction • Human input is a long-standing topic in machine learning (ie supervised learning, learning by demonstration, etc.) • Many existing techniques for “teaching” the robot • Psychological benefits • Ease of use (“how humans want to teach”), increased believability, personal investment

  6. Previous Work: Sophie’s Kitchen • Reinforcement Learning, domain ~1000 states • Autonomous exploration • Human input: guidance & state rewards • Communication channel: gazing, explicit actions • Conducted user studies • Results: • Improved learning speed • Insight into how humans like to teach • Fun for the human

  7. Reinforcement Learning • Basic idea: Finding an Optimal Policy • Act in the environment, receive rewards, modify policy accordingly • Typical formulation: a MDP defined by (S, A, R, T) • Advantages: • Desirable statistical properties • Unsupervised, autonomous learning • Limitations • The curse of scale • Poor transfer of knowledge • Rewards can be hard to define

  8. Hierarchical Reinforcement Learning • Tackles scaling and transfer problems • May more closely resemble human cognitive process and therefore inform their expectations for the agent • “I’m trying to teach you how to open doors, darn it!” • Two main components • Hierarchical task structure • State abstraction • Learning the hierarchy (as opposed to handcrafting) • U-trees, HEXQ, diverse density approaches, …

  9. My Approach: Extend Sophie’s Kitchen to HRL • Basic idea behind Sophie’s Kitchen: unsupervised learning is great, but if non-expert supervision is available, why not make use of it? • Humans typically have insights into the domain • HRL could make very good use of those structures • Challenges extending this to HRL • Adapting non-expert, ambiguous input • Modifying existing HRL algorithms to use adapted input • Skill reuse and retention, evaluation of human suggestions, improvement through practice, personality and trust issues

  10. Current Research Status • Extended Sophie’s Kitchen domain to a tool-use grid world domain: Sophie’s Adventure • Basic Features • Navigation • Tool use • Hierarchical Structure • Transferrable skills • Large number of states

  11. Current Research Status • Options • Sutton, Precup, Singh (1999) • HRL method that addresses hierarchical task structure • Temporally extended actions consisting of: • (Ι, π, β), where input set I is a subset of S, πis a local policy, βis the termination condition mapping states in S to [0, 1] • Learning options is a natural extension of RL learning • Primitive actions can be thought of as one-step options, options framework optimal if augmenting

  12. Current Research Status • Learning Options • Feature-based • “Clapping” reward channel • Multi-step guidance • Intra-options learning • Keep track of successes and failures • Practice when user is not around • Aggregate similar options

  13. In Progress • Formalize Reward Types • State rewards: “doing good” • Object-specific rewards: “look at this…” • Special rewards: “that’s the way to do it” • Extracting state abstractions from rewards • Object-specific reward -> make object a feature • ???

  14. Planned Future Work • Options-level state abstraction (MAXQ, HAM, etc.) • Learning options-level state abstraction • U-trees • Involving human input – ie pointing out salient features of the environment • The “trust” issue: extending the user evaluation process for the purpose of formulating “trust” for certain users

  15. Planned Future Work • Actual transfer learning experiments, and exploring how humans could facilitate the process • Carry out user studies on the system • Agent transparency in HRL – how to communicate internal state to the human • Ambiguous user signals • Should agent ask for clarification?

  16. Conclusion • Sociable agents are, or will be, ubiquitous • These agents should be able to learn from humans • Socially guided learning can both improve the learning speed and “personalize” the agent • Higher-order learning likely necessary for realistic applications • Interesting inquiry into our own social expectations and desires

More Related