CS534 Spring 2019 Adversarial Search, Game Playing

CS534 Spring 2019 Adversarial Search, Game Playing Showcase by: Varun Bhat, Ruofan Hu, Jiayi Li, Justin Seeley, and Matthew Szpunar Showcasing work by: David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis Hassabis on AlphaGo

References and Resources Chen, Jim X. “The Evolution of Computing: AlphaGo.” Computing in Science & Engineering 18.4 (2016): 4–7. Web. David Silver et al. “Mastering the Game of Go with Deep Neural Networks and Tree Search.” Nature 529.7587 (2016): 484–489. Web. GeeksforGeeks. (2019). Reinforcement learning - GeeksforGeeks. [online] Available at: https://www.geeksforgeeks.org/what-is-reinforcement-learning/ [Accessed 3 Feb. 2019]. Gibney, Elizabeth. “What Google’s Winning Go Algorithm Will Do Next.” Nature 531.7594 (2016): 284–285. Web. Jonghoon Bae et al. “Social Networks and Inference About Unknown Events: A Case of the Match Between Google’s AlphaGo and Sedol Lee.” PLoS ONE 12.2 (2017): e0171472. Web. Langford, John. “AlphaGo Is Not the Solution to AI.(artificial intelligence)(Blog@CACM)(Blog Entry).” 59.6 (2016): n. pag. Print. Senseis.xmp.net. (2019). Number of Possible Go Games at Sensei's Library. [online] Available at: https://senseis.xmp.net/?NumberOfPossibleGoGames [Accessed 3 Feb. 2019]. Shannon, Claude E. “XXII. Programming a Computer for Playing Chess.” The London, Edinburgh and Dublin philosophical magazine and journal of science. 41.314 (1950): 256–275. Web. Hui, J. (2019). AlphaGo: How it works technically? – Jonathan Hui – Medium. [online] Medium. Available at: https://medium.com/@jonathan_hui/alphago-how-it-works-technically-26ddcc085319 [Accessed 5 Feb. 2019].

Outline AlphaGo Summary/Introduction What is AlphaGo? Differences from AlphaGo and other GoAI AlphaGo Algorithms Monte Carlo Tree Search Convolutional Neural Networks Reinforcement Learning

What is AlphaGo? Difference between AlphaGo and other Go AI AlphaGo Introduction

What is AlphaGo? Game of Go in Progress • AI to play the game Go • Territory control game • 2 * 10170 possible game states • Go is played on a 19x19 board • Two Players (Black and White) • Take turns placing stones on the board • The Winner wins by: • Controlling more space on the board • Capturing more pieces https://www.mastersofgames.com/images/orientalboard/go-table-board-pay.jpg

What is AlphaGo? Position for Black to Capture • AI to play the game Go • Territory control game • 2 * 10170 possible game states • Go is played on a 19x19 board • Two Players (Black and White) • Take turns placing stones on the board • The Winner wins by: • Controlling more space on the board • Capturing more pieces • To Capture a piece: • Surround it on all sides https://online-go.com/puzzle/14036

What is AlphaGo? • Success in 2016 against Lee Sidol • 2nd Highest Ranked Go Player • Won the series 4-1 Screenshot of AlphaGo playing Lee Sidol (2016) https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2017/01/09112900/alphago-vs-lee-sedol-2_w_600.jpg

AlphaGo vs. other Go AI Various Go AI vs. Skill Ranking (ELO) • The strong Go AIs all rely on Monte Carlo Tree Search (MCTS). • AlphaGo however makes extensive use of machine learning to avoid using hand-crafted rules. Skill Ranking (ELO) “Mastering the Game of Go with Deep Neural Networks and Tree Search.” (2017)

AlphaGo Algorithms Monte Carlo Tree Search Convolutional Neural Network Reinforcement Learning

Monte Carlo Tree Search(MCTS) • Go has approximately 360 actions • Typical rules give 30s to 1min for each move • Needs efficient (time-wise) algorithm • Problems with Minimax • Takes a long time if an optimal solution is sought • Relies on an evaluation function and heuristics to prune the minimax tree to prune the tree if an optimal solution would take too much time to calculate • AlphaGo uses Monte Carlo Tree Search instead • Can be interrupted at any time and give the optimal path so far • Does not need explicit evaluation function so it can be used in games without developed theory

picture from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf How to make MCTS work for GO?

picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa How to make MCTS work for GO? • Two ideas to make MCTS work for GO. • Idea 1: Value function to truncate tree -> shallower MCTS • Idea 2: Better tree & default policies -> smarter MCTS • Value function • Expected future reward to a point from a board assuming we play perfectly from that point • Tree & Default Policy • Tree policy • Selecting which part of the search tree to expand • Default policy • Determine how simulations are run

How to make MCTS work for GO? In AlphaGo • Uses both ideas for improving MCTS • Two resources to train • Expert data • Simulator (self-play)

How to make MCTS work for GO? Main idea for AlphaGo: For better policies and value functions, use Convolutional Neural Networks to train.

Large search tree https://i0.wp.com/erickimphotography.com/blog/wp-content/uploads/2018/09/alphago-netflix-documentary-4.png?w=1600 https://i0.wp.com/erickimphotography.com/blog/wp-content/uploads/2018/09/alphago-netflix-documentary-4.png?w=1600

Convolution Neural Network • Use Network to take the current game state • Produce smaller subset of actions to reduce search space • Policy Network • Predicting what the next move will be • Value Network • What is the value of that predicted move

Policy-network (Supervised learning) Reducing “action candidates” (Breadth Reduction) • Train by supervised learning from a database of human professional games. • The output is a probability value for each possible legal move. https://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works

Value network Position evaluation ahead of time (Depth Reduction) • There is no need to simulate until the maximum depth if there is a function V(s) can measure “board evaluation of state s” https://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works

Reinforcement learning • AlphaGo collects moves for 30 million games. • It then uses these positions to generate a rollout policy which is used to quickly compute and classify the probability of the moves that an expert human might make. picture taken from medium.com

Reinforcement learning - training picture taken from medium.com

Thank you! Questions

Backup Slides

How does Reinforcement Learning Work Input: The input should be an initial state from which the model will start Output: There are many possible output as there are variety of solution to a particular problem Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output. The model continues to learn. The best solution is decided based on the maximum reward. picture taken from GeeksforGeeks

Go Positions according to policies

picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa How to make MCTS work for GO? Each move ○ Time constraint ○ Deepen/build our MCTS search tree ○ Select our optimal move and only consider subtree from there

picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa How does Monte Carlo Tree Search Work Loop: Select-Expand-Update-Explore Application of the Bandit-Based Method. ● Two Fundamental Concepts: ○ The true value of any action can be approximated by running several random simulations. ○ These values can be efficiently used to adjust the policy (strategy) towards a best-first strategy. ● Builds a partial game tree before each move. Then selection is made. ○ Moves are explored and values are updated/estimated.

How does Monte Carlo Tree Search Work Selecting • Select a node on the tree that has the highest possibility of winning. • For Example — Consider the moves with winning possibility 2/3,0/1 & 1/2 after the first move 4/6, the node 2/3 has the highest possibility of winning. picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa

How does Monte Carlo Tree Search Work Expanding • After selecting the right node. Expandingis used to increase the options further in the game by expanding the selected node (2/3) and creating many children nodes. We are using only one children node in this case. These children nodes are the future moves that can be played in the game. picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa

How does Monte Carlo Tree Search Work Simulating|Exploring • Since the best child is unknown, we need to find the best-performing move that isn’t a dead state • To do that, we use Reinforcement Learning to make random decisions in the game further down from every children node. A value is assigned to every children node — by calculating how close the output of their random decision was from the final output that we need to win the game. picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa

How does Monte Carlo Tree Search Work Updating|Back-propagation • The total scores of their parent nodes must be updated by going back up the tree one-by-one. • The new updated scores changes the state of the tree and may also change new future node of the selection process. • Loop begins again using the newly updated probabilities picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa

picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa How does Monte Carlo Tree Search Work Loop: Select-Expand-Update-Explore • Instead of brute forcing from millions of possible ways to find the right path Monte Carlo Tree Search algorithm chooses the best possible move from the current state of Game’s Tree with the help of Reinforcement Learning.

How does Monte Carlo Tree Search Work Selecting • Start at root node • Based on Tree Policy select child • Apply recursively - descend through tree • Stop when expandable node is reached • Expandable • Node that is non-terminal and has unexplored children picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa

How does Monte Carlo Tree Search Work Expanding • Add one or more child nodes to tree • Depends on what actions are available for the current position • Method in which this is done depends on Tree Policy picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa

How does Monte Carlo Tree Search Work Simulating|Exploring • Runs simulation of path that was selected • Get position at end of simulation • Default Policy determines how simulation is run • Board outcome determines value. picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa

How does Monte Carlo Tree Search Work Updating|Back-propagation • Moves backward through saved path • Value of Node • representative of benefit of going down that path from parent • Values are updated dependent on board outcome • Based on how the simulated game ends, values are updated picture from https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa

Increased Proficiency with More Power Elo Ranking

Convolutional Neural Networks

CS534 Spring 2019 Adversarial Search, Game Playing