1 / 29

Multi-Agent Strategic Modeling in a Robotic Soccer Domain

Multi-Agent Strategic Modeling in a Robotic Soccer Domain. Talk Outline. Overview of the Problem Multi-Agent Strategy Discovering Algorithm Results on the RoboCup Domain Results on the 3vs2 Keepaway Domain *. * not in the paper ( latest results )!.

hbates
Download Presentation

Multi-Agent Strategic Modeling in a Robotic Soccer Domain

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Agent Strategic Modeling in a Robotic Soccer Domain

  2. Talk Outline • Overview of the Problem • Multi-Agent Strategy Discovering Algorithm • Results on the RoboCup Domain • Results on the 3vs2 Keepaway Domain* *not in the paper (latest results)!

  3. Schema of Multi-Agent Strategy Discovering Algorithm (MASDA) Input:Basic domain knowledge (E.g. Basic soccer and RoboCup domain knowledge) MASDA Input: Multi-agent action sequence (E.g. A RoboCup game) Output: Strategic concepts(E.g. Describing a specificRoboCup game)

  4. An example MAS problem:a RoboCup attack

  5. Goal: Human description of strategic action concept left forward player dribbles from the left half of the middle third into thepenalty box left forward makes a pass into the the penalty box center forward in the center ofthepenalty box successfully shootsinto the right part of the goal.

  6. Multi-Agent Strategy Discovering Algorithm (MASDA) I.1 I.2, I.3 II.1 II.2 II.3 III.1, III.2, III.3 Increasing abstraction

  7. Step I. Data preprocessing:I.1. Detection of actions in raw data

  8. Step I. Data preprocessing:I.2. Action sequence generation

  9. Step I. Data preprocessing:I.3. Introduction of domain knowledge 

  10. Step II: Graphical description:II.1. Action graph creation L-MF:attack support L-MF:creating space L-MF:dribble C-MF:creating space C-MF:pass to player  C-MF:dribble

  11. Step II: Graphical description:II.1. Action graph creation

  12. 161514131211109876543210 Step II: Graphical description:II.2. Abstraction process Abstraction

  13. 161514131211109876543210 Step II: Graphical description:II.3. Strategy selection Abstraction

  14. Step III: Symbolic description learning:III.1. Generation of action descriptions LTeam.C-MF: Successful shoot LTeam.MF: Pass to player LTeam.R-FW: Pass to space LTeam.R-FW: Long dribble

  15. Step III: Symbolic description learning:III.2. Generation of learning examples

  16. Step III: Symbolic description learning:III.3. Rule induction • Each edge in a strategy represents one class. • 2-class learning problem: • positive examples: action instances for a given edge • negative examples: all other action instances • Induce rules for a positive class (i.e. edge) • Repeat for all edges in a strategy

  17. Testing on the RoboCup Simulated League Domain • Input: • 10 RoboCup games: a fixed team vs. various opponent teams • Basic soccer knowledge (no knowledge about strategy, no tactics, and no rules of the game): • soccer roles (e.g. left-forward) • soccer actions (e.g. control dribble) • relations between players (e.g. behind) • playing-field areas (e.g. penalty box) • Output: • strategic concepts (shown on next slide) http://www.robocup.org/

  18. RoboCup Domain: an example strategic concept LTeam.FW:Long dribble: RTeam.C-MF:Moving-away-slow RTeam.L-FB:Still RTeam.R-FB:Short-distance LTeam.FW:Pass to player:RTeam.R-FB:Immediate LTeam.FW:Successful shoot: RTeam.C-FW:Moving-away LTeam.R-FW:Short-distance LTeam.FW:Successful shoot (end): RTeam.RC-FB:Left RTeam.RC-FB:Moving-away-fast RTeam.R-FB:Long-distance

  19. RoboCup Domain:testing methodology • Create a reference strategic concept on 10 RoboCup games • Leave-one-out cross validation to generate 10 learning tasks (learn: 9 games, test: 1 game) • positive examples: examples matching with a reference strategic concept • negative examples: all other examples • Generate strategic concepts on 9 learning games and test on the remaining game • Measure accuracy, recall and precision for a given strategy using: • only action description • only generated rules • both • Varying level of abstraction: 1-20

  20. RoboCup Domain:analysis of 10 RoboCup games

  21. 3vs2 Keepaway Domain • Motivation: • RoboCup is too complex to play with learned concepts • In 3vs2 Keepwaway domain we are able play with learned concepts • Basic domain info: 5 agents, 3 high-level agent actions, 13 state variables http://www.cs.utexas.edu/~AustinVilla/sim/keepaway/ (Peter Stone et al.)

  22. 3vs2 Keepaway Domain • Measure average episode duration • Two handcoded reference strategies: • good strategy: hand (14s) - hold the ball till the nearest opponent is within 5m, then pass to the most open player • random: rand (5.2s) - randomly choose possible actions • Our task: learn rules for reference strategies and play as similar as possible • MASDA remains identical • Modified only domain knowledge: • roles (K1, K2, K3, T1, T2), • actions (hold, passK2, passK3) • 13 domain attributes

  23. Testing Methodology Reference game with a known strategy MASDA(rule induction) Rulesare handcoded into the program Game with alearned strategy Compute average episode duration Comparison of episode duration Compute average episode duration

  24. Episode duration comparison of reference and learned game

  25. Visual comparison of reference and learned game reference game:handcoded (hand.avi) reference game: random (rand.avi) learned random (rand-pass4.avi) learned handoced(hand-holdpass2.avi)

  26. if dist(K1, T1) > 5m => hold dist(K1, T1) <= 5m player K2 is not free => pass to K3 player K2 is free => pass to K2 Comparison of handcoded strategy and learned rules • DistK1T1 [6, 16)  DistK1T2  [6, 16) DistK1C  [6, 12)  MinAngK3K1T1T2  [0, 90)=> Hold • DistK1T1  [6, 12)  DistK1T2  [6, 16) DistK1K3  [10, 14)  DistK1K2  [8, 14) => Hold • MinDistK2T1T2  [12, 16)  DistK3C  [8, 16) DistK1T2  [2, 10)  DistK1T1  [0, 6)  MinAngK2K1T1T2  [15, 135) => pass to K2 • DistK1T1  [2, 6)  MinDistK3T1T2  [10, 16) DistK1K2  [10, 16)  DistK2C  [4, 14) DistK1T2  [2, 8)  MinAngK2K1T1T2  [0, 15) => pass to K3

  27. Conclusion • We have designed a domain independent strategy learning algorithm (MASDA), which learns from action trace and basic domain knowledge • Successful implementation on: • RoboCup domain evaluated by human expert and cross validation. • 3vs2 Keepaway domain evaluated by comparing with two reference strategies thru episode duration, visual comparison and rule inspection

  28. Questions http://dis.ijs.si/andraz/logalyzer/

  29. RoboCup Domain:successful attack strategies L-FW:long dribble →L-FW:pass → FW:shoot L-FW:pass to player →FW:dribble → FW:shoot C-FW:long dribble → C-FW:pass → FW:dribble → FW:shoot R-FW: pass to player →FW:control dribble → FW:shoot R-FW:dribble →R-FW:pass to player → FW:shoot FW:pass to player →L-FW:control dribble → L-FW:shoot

More Related