1 / 27

Convergent Learning in Unknown Graphical Games

Convergent Learning in Unknown Graphical Games. Dr Archie Chapman, Dr David Leslie , Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University of Bristol and School of Electronics and Computer Science University of Southampton david.leslie@bristol.ac.uk. Playing games?.

elom
Download Presentation

Convergent Learning in Unknown Graphical Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University of Bristol and School of Electronics and Computer Science University of Southampton david.leslie@bristol.ac.uk

  2. Playing games?

  3. Playing games?

  4. Playing games? Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment. Berkeley Engineering

  5. Learning in games • Adapt to observations of past play • Hope to converge to something “good” • Why?! • Bounded rationality justification of equilibrium • Robust to behaviour of “opponents” • Language to describe distributed optimisation

  6. Notation • Players • Discrete action sets • Joint action set • Reward functions • Mixed strategies • Joint mixed strategy space • Reward functions extend to

  7. Best response / Equilibrium • Mixed strategies of all players other than i is • Best response of player i is • An equilibrium is a satisfying, for all i,

  8. Fictitious play Estimate strategies of other players Game matrix Select best action given estimates Update estimates

  9. Belief updates • Belief about strategy of player i is the MLE • Online updating

  10. Stochastic approximation • Processes of the form • where and • F is set-valued (convex and u.s.c.) • Limit points are chain-recurrent sets of the differential inclusion

  11. Best-response dynamics • Fictitious play has M and e identically 0, and • Limit points are limit points of the best-response differential inclusion • In potential games (and zero-sum games and some others) the limit points must be Nash equilibria

  12. Generalised weakened fictitious play • Consider any process such that where and and also an interplay between and M. • The convergence properties do not change

  13. Fictitious play Estimate strategies of other players Game matrix Select best action given estimates Update estimates

  14. Learning the game

  15. Reinforcement learning • Track the average reward for each joint action • Play each joint action frequently enough • Estimates will be close to the expected value • Estimated game converges to the true game

  16. Q-learned fictitious play Estimate strategies of other players Estimated game matrix Game matrix Select best action given estimates Select best action given estimates Update estimates

  17. Theoretical result Theorem – If all joint actions are played infinitely often then beliefs follow a GWFP Proof: The estimated game converges to the true game, so selected strategies are -best responses.

  18. Playing games? Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment. Berkeley Engineering

  19. It’s impossible! • N players, each with A actions • Game matrix has AN entries to learn • Each individual must estimate the strategy of every other individual • It’s just not possible for realistic game scenarios Massive observational and computational requirement

  20. Marginal contributions • Marginal contribution of player i is total system reward – system reward if i absent • Maximised marginal contributions implies system is at a (local) optimum • Marginal contribution might depend only on the actions of a small number of neighbours

  21. Sensors – rewards • Global reward for action a is • Marginal reward for i is • Actually use

  22. Local learning Estimate strategies of other players Estimated game matrix Game matrix Select best action given estimates Select best action given estimates Update estimates

  23. Local learning Estimated game matrix for local interactions Estimate strategies of neighbours Game matrix Select best action given estimates Select best action given estimates Update estimates

  24. Theoretical result Theorem – If all joint actions of local games are played infinitely often then beliefs follow a GWFP Proof: The estimated game converges to the true game, so selected strategies are -best responses.

  25. Sensing results

  26. So what?! • Play converges to (local) optimum with only noisy information and local communication • An individual always chooses an action to maximise expected reward given information • If an individual doesn’t “play cricket”, the other individuals will reach an optimal point conditional on the behaviour of the itinerant

  27. Summary • Learning the game while playing is essential • This can be accommodated within the GWFP framework • Exploiting the neighbourhood structure of marginal contributions is essential for feasibility

More Related