1 / 54

Games, Optimization, and Online Algorithms

Games, Optimization, and Online Algorithms. Martin Zinkevich University of Alberta November 8 th , 2006. Question. Does pop culture have anything to offer advanced research projects?. Fun and Games for Scientists. Fun problem (in scientist-ese)

rogan-rich
Download Presentation

Games, Optimization, and Online Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta November 8th, 2006

  2. Question • Does pop culture have anything to offer advanced research projects?

  3. Fun and Gamesfor Scientists • Fun problem (in scientist-ese) • (1) A problem which has a wide base of players at a variety of levels • (2) A problem which has aspects which provide interesting challenges for the human mind

  4. Fun and Gamesfor Scientists • Game problem (in scientist-ese) • (1) A problem which has a formal structure (rules) with a variety of parameter settings (opponents). • (2) A problem where the world IS out to get you.

  5. Fun and Games • “Fun” can capture aspects of difficultythat are orthogonal to the size of the state space or the algorithmic complexity of the problems involved. • “Games” are environments where issues such as: • learning-to-learncan be studied amongst a variety of opponents, and • non-stationarity can be studied in the presence of other learning agents.

  6. Two Objectives of This Talk • Finding Nash equilibria • Developing “experts” a priori in games

  7. Main Point • Algorithms that learn in self-play can be utilized to generate both an equilibrium as well as experts. • Constraint/column generation is among these

  8. Question in This Talk • What are interesting unbalanced strategies to consider?

  9. Outline • Introduction • Iterated Best Response • Iterated Generalized Best Response • Other Applications • Conclusion

  10. Iterated Best Response(Broken Version) • One broken idea • INIT: start with an arbitrary strategy • RESPONSE: Compute the best response • REPEAT: step 2 until satisfied

  11. Hide and Seek HIDE ACTIONS:BLUE SEEK ACTIONS: RED

  12. Hide and Seek SEEK ACTIONS: RED HIDE ACTIONS:BLUE

  13. Problem: No Balance • There is no one killer strategy in some games. • Without adding some balance, there is no way to fully explore the space.

  14. What Games Require Balance? • Simultaneous move games • Imperfect Information Games (games with private information).

  15. Balancing Existing Strategies SEEK ACTIONS: RED HIDE ACTIONS:BLUE 50/50 RESTRICTED NASH 50/50

  16. Iterated Balanced Best Response • INIT: Start with strategies S for player 1 and T for player 2. • BALANCE: Make a bimatrix game and solve for equilibrium. • RESPONSE: Add the best responses to the equilibrium of the game to S and T. • REPEAT 2 and 3 until satisfied

  17. What’s The Point? • In general, equilibrium computations are significantly harder than best responses. • In practice, it is easier to compute an approximate best response than an approximate Nash equilibrium.

  18. Pure Poker • Player 1, Player 2 each receive a “card” in [0,1] (a real number) • Then, player 1 bets or checks. • If player 1 bets, player 2 calls or folds.

  19. Fold Call Fold Call Check Bet Strategies Player 1 Player 2 Probability Mass Probability Mass 1 1 0 1 0 1 Card Card

  20. Fold Call Check Pure Poker • Continuous state space • Given a strategy that splits [0,1] into a finite number of intervals and plays a fixed distribution in each interval, the best response is also of this form.

  21. F F F Call Call Call Fold Fold Call Call Fold Call Check Check Check Bet Bet Bet Check Bet B C B Pure Poker Player 2 Call Player 1 Bet Bet Call

  22. Real Poker • In one abstraction we are currently working with, each player has 625 private states, and there are about 16,000 betting sequences, for over several BILLION states. While it is possible to iterate over all possible states in a short period of time, you can’t really perform complex operations on this size of problem.

  23. Positive Results • In under a hundred iterations, this technique can approximately solve simple variants of poker, such as Kuhn and Leduc Poker.

  24. Outline • Introduction • Iterated Best Response • Iterated Generalized Best Response • Other Applications • Conclusion

  25. Practical Problem • Although balance-response technique above works, it can generate lots of strategies before equilibrium is achieved. Is there a way to cut down on this?

  26. Robustness • How do you develop a strategy that is robust assuming that your opponent will play a strategy you have already seen?

  27. Strat a b c Min A 3 1 2 1 B 9 2 10 2 X 3 7 5 3 Y 5 4 4 4 Z 7 3 1 1 Robustness:Generalized Best Response Maximize the MINIMUM against a set of opponents

  28. Strat a b c Min A 3 1 2 1 B 9 2 10 2 X 3 7 5 3 Y 5 4 4 4 Z 7 3 1 1 Robustness: Generalized Best Response Maximize the MINIMUM against a set of opponents The set of possible actions could be INFINITE

  29. Iterated Generalized Best Response • Start with strategies S and T. • Add to T a generalized best response to S. • Add to S a generalized best response to T. • Repeat until satisfied.

  30. Hide and Seek HIDE ACTIONS:BLUE SEEK ACTIONS: RED

  31. How to Compute aGeneralized Best Response? • Use a linear program. • Could be slow • Could be arbitrarily high precision • Use iterated best response • Start with sets of strategies S and possibly empty T. • Compute a Nash equilibrium between S and T. • Find a best response to the mixture over S. • Add it to T.

  32. Results in Poker • Using this technique (iterated GBR), we solved a four-round game of Texas Hold’Em • We beat Opti4 (Sparbot)! • By 0.01 small bets/hand 

  33. Other Applications • Economics (non-zero sum) • Counterstrike/RTS Games (best response not easy)

  34. Extensions • Non-zero sum games • Approximate best response operation (through reinforcement learning) • Learning the abstraction while learning the strategy

  35. Conclusions • Algorithms that learn in self-play (such as iterated generalized best response) yield a wealth of useful strategies including approximate Nash equilibrium.

  36. How Hard is a Game? • For a game to be hard, it has to be at least POSSIBLE to play it badly: otherwise, regardless of how complex it is, it is still easy. • The depth of human skill in a particular game indicates its complexity.

  37. How Hard is a Game?

  38. Formalism • If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.

  39. How Hard is a Game?

  40. Important Property: Transitivity in The List

  41. Formalism • If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.

  42. Why People?

  43. Why People? • Choose a number between 1 and 100. Highest number wins a dollar, no money is exchanged on a tie.

  44. Formalism • If the complexity of a game is at least k, then there exists strategies 1 to k, such that for any two strategies in the list i>j, strategy i can beat strategy j with at least 2/3 probability.

  45. Formalism • The epsilon-complexity of a game is at least k if there exists strategies 1 to k, and for any two strategies i>j, EV[i playing against j]>epsilon

  46. Make it a Linear Program? • The linear program (sequence form) has a number of constraints x variables roughly proportional to the size of the game tree. • The coefficient matrix is big: this makes inversion difficult. • Also: numerical instabilities

  47. A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUE SEEK ACTIONS: RED

  48. The Theoretical Problem • Each new bot is a best response to a particular mixture of the previous bots. • There could be a different mixture over those bots which would do BETTER against that new bot: in fact, it could even beat the new bot!

  49. A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUE SEEK ACTIONS: RED

  50. A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUE SEEK ACTIONS: RED

More Related