1 / 34

Game-Theoretic Multi-Agent Learning

Game-Theoretic Multi-Agent Learning. By: Mostafa Sahraei-Ardakani. Research Groups:. Stanford University (Yoav Shoham) Rutgers University (Michael Litmaan) University of Michigan (Michael Wellman) University of Alberta (Michael Bowling) University of British Columbia (Kevin Leyton-Brown)

trent
Download Presentation

Game-Theoretic Multi-Agent Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Game-Theoretic Multi-Agent Learning By: Mostafa Sahraei-Ardakani

  2. Research Groups: • Stanford University (Yoav Shoham) • Rutgers University (Michael Litmaan) • University of Michigan (Michael Wellman) • University of Alberta (Michael Bowling) • University of British Columbia (Kevin Leyton-Brown) • McGill University (Shie Mannor) • Brown University (Ammy Greenwald) • Carnegie Mellon University

  3. Basic Definitions • Markov Decision Process(MDP) • Stage Games • Repeated Games : Repeated Stage Game • Stochastic Games (Markov Games) : A generalization of Repeated games and MDPs

  4. Definitions in SG point of view • Repeated Game: Stochastic game with only one stage (state) • MDP: Stochastic game with only one agent • So SG is a generalization of RG and MDP and has both properties.

  5. What is the question?!!! • What exact question(s) is MAL addressing? • What is the yardstick? • Which information is available? • Game rules • Play observability • Rivals’ actions • Rivals’ Strategies • Learning or/and Teaching • Rock-Paper-Scissors • Repeated Prisoners’ Dilemma

  6. Engineering Application • Distributed Controllers • Simplifies design of independent controllers • Equilibrium or Global Optimum? • Problem of Exploitation of Learning

  7. Model-Based Approaches • Of Game Theorists Interest • Start with some model of opponent’s strategy • Compute and play best response • Observe the opponent’s play and update the model of her strategy • Go to Step 2 • Example: Fictitious Play (1951) • Compute rivals’ mixed strategy according to the history • Play the best response

  8. Fictitious Play (FP) • assumes opponent’s play stationary strategies • multiple best responses choosen with positive probability Convergence guarantees: • Games which are iterated dominance solvable (strict Nash equilibrium) • Cooperative • In zero-sum games the empirical distribution converges to the unique mixed startegy Nash equilibrium. Note : Smooth FP can play mixed strategies

  9. Incremental Gradient Ascent Learners (IGA) • incrementally climbs on the mixed strategy space • for 2-player 2-action general sum games • guarentees convergence to a Nash equilibrium or guarantees convergence to an average payoff that is sustained by some Nash equilibrium

  10. The Dynamics

  11. The Update Rule

  12. AWESOME! • Adapt When Everybody is Stationary, Otherwise Move to Equilibrium • Is not RL • Converges to NE in Self Play • Plays Some epochs • APPE • APS  Adapts and finds the best respons

  13. Some Indices

  14. Model-Free Approaches • Reinforcement learning • AVOID Building an explicit model • How well ones’ own various possible actions fare. • Mostly studied in: Computer Science- AI

  15. Single Agent Q-Learning • The Environment is no longer Stationary • Therefore, the convergence is not guaranteed.

  16. Bellman’s Heritage • Single agent Q-learning converges to optimal value function V* • Simple extension to multi-agent SG setting Q values updated without regard of opponents’ actions Justified if opponents’ choice of actions are stationary

  17. Bellman’s Heritage • Cure: Define Q-values as a function of all agents’ actions Problem: How to update V? • Maximin Q-learning Problem: Motivated only for zero-sum SG

  18. Mini-Max Learning • For Zero-Sum Games, or conservative play

  19. Nash Q-Learning for GSSG • Max operator (Q-Learning)  Nash operator(Nash-Q)

  20. Friend or Foe Q-Learning • Adversarial Equilibrium • Coordination Equilibrium

  21. Friend or Foe Q-Learning (2) • Opponent Considered as Friend: • Opponent Considered as Foe

  22. Friend or Foe Q-Learning (3) • The Opponent may act differently! • Results on two common grid games

  23. Correlated Q-Learning • What is Correlated Equilibrium? • Example • Benefits over Mixed Strategy Nash • Convex ploytope  Linear Programming • Better Outcomes and Denial • Independent action selection with a shared signal

  24. Correlated Q-Learning(2) • Need not be well-defined like Nash Value function • Generalizes the before mentioned functions

  25. Correlated Q-Learning(3) • Utilitarian • Egalitarian • Republican • Libertarian

  26. Correlated Q-Learning(4)

  27. Correlated Q-Learning(5)

  28. Platform for MARL • http://www.cs.ubc.ca/~kevinlb/malt • GAMUT  Stanford

  29. New Approach: Time-order Policy Update • Make the Environment stationary • How to observe rivals’ actions? • Keep the MAX operator! • No direct focus on equilibria

  30. QDTO

  31. QDTO- Convergence

  32. QDTO-Simulations

  33. QTDO- Market

  34. Thanks for your Attention!

More Related