1 / 77

Smoothness and Learning Equilibria

Smoothness and Learning Equilibria. Price of Anarchy Seminar Presenting: Aviv Kuvent Professor: Michal Feldman. Reminder. PoA bound results for PNE, MNE. However: PNE need not always exist MNE is hard to compute Introduces 2 additional types of equilibrium: Correlated Equilibrium (CE)

karsen
Download Presentation

Smoothness and Learning Equilibria

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Smoothness and Learning Equilibria Price of Anarchy Seminar Presenting: Aviv Kuvent Professor: Michal Feldman

  2. Reminder • PoA bound results for PNE, MNE. • However: • PNE need not always exist • MNE is hard to compute • Introduces 2 additional types of equilibrium: • Correlated Equilibrium (CE) • Coarse Correlated Equilibrium (CCE)

  3. Reminder - CE • Definition (CE): • A distribution σ on the set of outcomes of a cost-minimization game is a correlated equilibrium if for every player , strategy , and every deviation ,

  4. Reminder - CCE • Definition (CCE): • A distribution σ on the set of outcomes of a cost-minimization game is a coarse correlated equilibrium if for every player and every deviation ,

  5. Reminder – Eq. Hierarchy

  6. Reminder • We saw that in smooth games, the PoA bounds proven for PNE/MNE also hold for CE/CCE. • CE/CCE are easy to compute: • Linear Programming • No Regret Dynamics

  7. The Game Model • Single Player • Set of actions , where • Adversary • time steps

  8. The Game Model • At time : • Player picks a distribution over his action set . • Adversary picks cost vector • Action is chosen according to the distribution , and the player incurs the cost . • Player is aware of the values of the entire cost vector (Full Information Model) • Partial Information Model • Player only knows the value of the incurred cost.

  9. Regret • The difference between the cost of our algorithm and the cost of a chosen benchmark • -the benchmark we choose. • Time-averaged Regret:

  10. No Regret Algorithm • An algorithm has no-regret if for every sequence of cost vectors, the expected time-averaged regret as . • Equivalently, we would like to show that: • Goal: develop such an algorithm • Which benchmark to use?

  11. Optimal Sequence • Or “Best Action in Hindsight” • Too strong to be a useful benchmark. • Theorem: • For any algorithm there exists a sequence of cost vectors such that , where is the size of the action set of the player.

  12. Optimal Sequence • Proof: • Build the following sequence: in time , • ⇒ Expected cost of in time • The cost of for time steps • An optimal sequence of cost 0 exists. ⇒

  13. External Regret • Benchmark – the best fixed action: • Playing the same action in all time steps • External Regret:

  14. The Greedy Algorithm • First attempt to develop no-external regret algorithm • Notations: • The cumulative cost of action up to time : • The cost of best fixed action: • The cost of the greedy algorithm:

  15. The Greedy Algorithm • For simplicity, assume that . • The idea – choose the action which incurred minimal cost so far. • Greedy Algorithm: • Initially: select to be some arbitrary action in the action set . • At time : • Let , • Choose .

  16. The Greedy Algorithm • Theorem: • For any sequence of losses, the greedy algorithm has • Proof: • At each time where the greedy algorithm incurs a cost of 1 and does not increase, at least one action is removed from the set . • This can occur at most times before increases by 1. • Therefore, the greedy algorithm incurs cost at most between successive increments in ⇒ • The additional is for the cost incurred in the steps from the last time increased.

  17. Deterministic Algorithms • The greedy algorithm has a cost of at most factor larger than that of the fixed best action – not “no regret algorithm” • Theorem:For any deterministic algorithm , there exists a cost sequence for which and . • Corollary:There does not exist a no-regret deterministic algorithm.

  18. Deterministic Algorithms • Proof: • Fix deterministic algorithm • - action it selects at time • Generate the following cost sequence: • incurs a cost of 1 in each time step ⇒ • selects some action at most times, that is the best fixed action ⇒ ⇒ not dependent on ⇒ is not a no-regret algorithm.

  19. Lower Bound on Regret • Theorem: • Even for actions, no (randomized) algorithm has expected regret which vanishes faster than . • Proof: • Let be a randomized algorithm. • Adversary: on each time step , chooses uniformly at random between two cost vectors - • ⇒ . • Think of adversary as flipping a fair coins : , standard deviation =.

  20. Lower Bound on Regret • Proof (cont): • ⇒ One action has an expected cost of other action has an expected cost of . • Second action - the best fixed action ⇒ • ⇒ • General case of actions, similar argument ⇒ lower bound of .

  21. Multiplicative Weights (MW) • The idea: • Keep a weight for each action • Probability of playing the action depends on weight • The weights are updated; used to "punish" actions which incurred cost. • Notations: • – weight of action in time t • – sum of the weights of the actions in time t • – cost of action in time t • – cost of the MW algorithm in time t • – cost of the MW algorithm • – cost of best fixed action

  22. Multiplicative Weights (MW) • MW Algorithm: • Initially: and for each • At time step : • Play an action according to distribution , where • Given the cost vector , decrease the weights:

  23. Multiplicative Weights (MW) • What is ? • A tends towards 0, the distribution tends towards a uniform distribution (exploration) • As tends towards 1, the distribution tends towards a distribution which places most of the mass on the action which incurred the minimal cost so far (exploitation). • Set specific later, for now assume

  24. Multiplicative Weights (MW) • Theorem: • MW is a no-regret algorithm, with expected (time-averaged) external regret of . • Proof: • The weight of every action (and therefore their sum, ) can only decrease with . • The idea of the proof -relate and to , and from there get the bound.

  25. Multiplicative Weights (MW) • Proof (cont): • Relate to : • Let be a the best fixed action with a low cost • if no such an action exists – we can always get low regret with respect to the fixed action benchmark • : the weight of action at time • ⇒

  26. Multiplicative Weights (MW) • Proof (cont): • Relate to : • At time : • for

  27. Multiplicative Weights (MW) • Proof (cont): • ⇒ • So:

  28. Multiplicative Weights (MW) • Proof (cont): • We got: , • ⇒ • Taylor expansion of • So, , and because ,

  29. Multiplicative Weights (MW) • Proof (cont): • ⇒ • ⇒

  30. Multiplicative Weights (MW) • Proof (cont): • We take ⇒ , and get: ⇒ • This assumes that is known in advance. If it is not known, at time step take , where is the largest power of 2 smaller than (doubling trick).

  31. Multiplicative Weights (MW) • Up to the factor of 2, we have reached the lower bound for external regret rate (). • To achieve a regret of at most , we only need iterations of the MW algorithm.

  32. No Regret Dynamics • Moving from a single player setting to a multi player setting • In each time step : • Each player independently chooses a mixed strategy using some no-regret algorithm . • Each player receives a cost vector , which is the cost of each of the possible actions of player , given the mixed strategies played by the other players:

  33. No Regret Dynamics and CCE • Theorem: • Let be the outcome sequence generated by the no-regret dynamics after time steps. • Let be a uniform distribution over the multi-set of the outcomes . • is an approximate Coarse Correlated Equilibrium (CCE). • Corollary: smooth games PoA bounds apply (approximately) to no regret dynamics.

  34. No Regret Dynamics and CCE • Proof: • Player has external regret : • From definition of • Expected cost of player when playing according to his no-regret algorithm: • Expected cost of player when always playing fixed strategy is:

  35. No Regret Dynamics and CCE • Proof (cont): • ⇒ • For any fixed action : • ⇒ • All players play no regret algorithms: as , and we get the definition of CCE.

  36. Swap Regret • Another benchmark • Define a switching function on the actions • Swap Regret: • External regret is a subset of swap regret, where we only look at the constant switching functions .

  37. No Swap Regret Algorithm • Definition: • An algorithm has no swap regret if for all cost vectors and all switching functions , the expected (time averaged) swap regret: as

  38. Swap-External Reduction • Black box reduction from no swap regret to no external regret • (Corollary: there exists poly-time no swap regret algorithms) • Notations: • - number of actions • - instantiations of no external regret algorithms. • Algorithm receives a cost vector and returns a probability distribution over actions.

  39. Swap-External Reduction • Reduction: • The main algorithm , at time : • Receive distributions over actions from algorithms • Compute and output consensus distribution • Receive a cost vector from the adversary • Give algorithm the cost vector • Goal: use the no external regret guarantee of the algorithms , and get a no swap regret guarantee for . • Need to show how to compute

  40. Swap-External Reduction M1 M2 . . . Mn

  41. Swap-External Reduction • Theorem: • This reduction provides a no swap regret algorithm • Proof: • Time averaged expected cost of : • Time averaged expected cost with some switching function :

  42. Swap-External Reduction • Proof (cont): • Want to show: for all switching functions , , where when • Time-averaged expected cost of no-external regret algorithm : • is a no regret algorithm ⇒for every fixed action :

  43. Swap-External Reduction • Proof (cont): • Fix . • Summing the inequality over , with for each :

  44. Swap-External Reduction • Proof (cont): • For each , as , and is fixed ⇒ as • If then we’ve shown , where when , as needed. • Need to choose so that for every :

  45. Swap-External Reduction • Proof (cont): • is a stationary distribution of a Markov chain. • Build the following Markov chain: • The states - the actions set - • For every , the transition probability from to is

  46. Swap-External Reduction • Proof (cont): • Every stationary distribution of this Markov chain satisfies • This is an ergodic Markov chain • For such type of chain there is at least one stationary distribution, and it can be computed in polynomial time (eigenvector computation).

  47. Swap Regret Bound • We have shown a no regret reduction, with regret • If we take MW as the algorithms , we get a swap regret bound of (each is bounded by ) • We can improve this: • In MW, the actual bound we found was , and we just bounded by . • If we take the original bound, we get:

  48. Swap Regret Bound • The cost vector we gave to at time was scaled by ⇒ • Square root function is concave ⇒Jansen’s inequality: • ⇒

  49. Swap Regret Bound • ⇒ • We get the swap regret bound of

  50. No Swap Regret Dynamics and CE • Equivalent definition of CE using switching functions: • A distribution σ on the set of outcomes of a cost-minimization game is a correlated equilibrium if for every player , every strategy , and every switching function ,

More Related