1 / 50

Multi Armed Bandits

Multi Armed Bandits. chalpert@meetup.com. Survey. Click Here. Click-through Rate (Clicks / Impressions) 20%. Click Here. Click Here. Click Here. Click-through Rate 20% ?. Click Here. Click Here. AB Test. Randomized Controlled Experiment

takoda
Download Presentation

Multi Armed Bandits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi Armed Bandits chalpert@meetup.com

  2. Survey

  3. Click Here

  4. Click-through Rate (Clicks / Impressions) 20% Click Here

  5. Click Here Click Here

  6. Click-through Rate 20% ? Click Here Click Here

  7. AB Test • Randomized Controlled Experiment • Show each button to 50% of users Click-through Rate 20% ? Click Here Click Here

  8. AB Test Timeline Time AB Test AB Test Before Test After Test (show winner) Exploitation Phase (Show Winner) Exploration Phase (Testing)

  9. Click-through Rate 20% ? Click Here Click Here

  10. Click-through Rate 20% 30% Click Here Click Here

  11. 10,000 impressions/month • Need 4,000 clicks by EOM • 30% CTR won’t be enough

  12. Need to keep testing (Exploration)

  13. Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here ABCDEFG... Test Each variant would be assigned with probability 1/N N = # of variants Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here

  14. Not everyone is a winner

  15. Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here ABCDEFG... Test Each variant would be assigned with probability 1/N N = # of variants Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here

  16. Need to keep testing (Exploration) Need to minimize regret (Exploitation)

  17. Multi Armed Bandit Balance of Exploitation & Exploration

  18. Bandit Algorithm Balances Exploitation & Exploration Time Bandit Favors Winning Arm Discrete Exploitation & Exploration Phases AB Test Before Test After Test AB Test Continuous Exploitation & Exploration Before Test Multi Armed Bandit

  19. Bandit Algorithm Reduces Risk of Testing AB Test Best arm exploited with probability 1/N • More Arms: Less exploitation Bandit Best arm exploited with determined probability • Reduced exposure to suboptimal arms

  20. Demo Borrowed from Probabilistic Programming & Bayesian Methods for Hackers

  21. Split Test Still sending losers Bandit AB test would have cost 4.3 percentage points Winner Breaks Away!

  22. How it works Epsilon Greedy Algorithm ε = Probability of Exploration ε / N 1 / N Click Here Exploration ε 1 / N ε / N Start of round Click Here Epsilon Greedy with ε = 1 = AB Test 1-ε 1 - ε Exploitation (show best arm) Click Here

  23. Epsilon Greedy Issues • Constant Epsilon: • Initially under exploring • Later over exploring • Better if probability of exploration decreases with sample size (annealing) • No prior knowledge

  24. Some Alternatives • Epsilon-First • Epsilon-Decreasing • Softmax • UCB (UCB1, UCB2) • Bayesian-UCB • Thompson Sampling (Bayesian Bandits)

  25. Bandit Algorithm Comparison Regret:

  26. Thompson Sampling Setup: Assign each arm a Beta distribution with parameters (α,β) (# Success, # Failures) Beta(α,β) Beta(α,β)Beta(α,β) Click Here Click Here Click Here

  27. Thompson Sampling Setup: Initialize priors with ignorant state of Beta(1,1) (Uniform distribution) - Or initialize with an informed prior to aid convergence Beta(1,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here

  28. Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution • 2: Select the arm with largest X • 3: Observe the result of selected arm • 4: Update prior Beta distribution for selected arm Success! 0.7 0.2 X 0.4 Beta(1,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here

  29. Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution • 2: Select the arm with largest X • 3: Observe the result of selected arm • 4: Update prior Beta distribution for selected arm Success! 0.7 0.2 X 0.4 Beta(2,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here

  30. Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution • 2: Select the arm with largest X • 3: Observe the result of selected arm • 4: Update prior Beta distribution for selected arm Failure! 0.4 0.8 X 0.2 Beta(2,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here

  31. Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution • 2: Select the arm with largest X • 3: Observe the result of selected arm • 4: Update prior Beta distribution for selected arm Failure! 0.4 0.8 X 0.2 Beta(2,1) Beta(1,2) Beta(1,1) Click Here Click Here Click Here

  32. Posterior after 100k pulls (30 arms)

  33. Bandits at Meetup

  34. Meetup’s First Bandit

  35. Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms

  36. Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms

  37. Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms

  38. Avoid Linkbaity Subject Lines

  39. Coupon Email 16 Arms Control: Save 50%, start your Meetup Group – 42% Open Rate Winner: Here is a coupon – 53% Open Rate (+26%)

  40. 398 Arms

  41. 210% Click-through Difference: Best: Looking to start the perfect Meetup for you? We’ll help you find just the right people Start the perfect Meetup for you! We’ll help you find just the right people Worst: Launch your own Meetup in January and save 50% Start the perfect Meetup for you 50% off promotion ends February 1st.

  42. Choose the Right Metric of Success • Success tied to click in last experiment • Sale end & discount messaging had bad results • Perhaps people don’t know that hosting a Meetup costs $$$? • Better to tie success to group creation

  43. More Issues • Email open & click delay • New subject line effect • Problem when testing notifications • Monitor success trends to detect weirdness

  44. Seasonality • Thompson Sampling should naturally adapt to seasonal changes • Learning rate can be added for faster adaptation Winner all other times Click Here Click Here

  45. Bandit or Split Test? AB Test good for: - Biased Tests - Complicated Tests Bandit good for: - Unbiased Tests - Many Variants - Time Restraints - Set It And Forget It

  46. Thanks! chalpert@meetup.com

More Related