1 / 16

CEC 2006 Othello Competition

CEC 2006 Othello Competition. Simon M. Lucas Computer Science Dept, University of Essex Thomas P. Runarsson Science Institute, University of Iceland. Motivation. Othello is an interesting unsolved game Good test-bed for CI-Games research Objective: Find best position evaluation function

liz
Download Presentation

CEC 2006 Othello Competition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CEC 2006 Othello Competition Simon M. Lucas Computer Science Dept, University of Essex Thomas P. Runarsson Science Institute, University of Iceland

  2. Motivation • Othello is an interesting unsolved game • Good test-bed for CI-Games research • Objective: Find best position evaluation function • Challenges for competition: • Which architecture works best? • Which is the best way to train such an architecture • E.g. Temporal difference learning or co-evolution or …

  3. TDL versus CEL • Which method works best to learn a game strategy? • Which method learns fastest? • Which method achieves ultimately higher play? • Standard CEL only uses game results • TDL also exploits information available during the game

  4. The setup • Each game is played as follows • All legal next board positions are generated • The position evaluation function of the next player is used to evaluate each board position • Move is chosen that leads to most favourable position for that player • i.e. 1-ply lookahead

  5. Motivation • Focus on machine learning rather than game-tree search • Force random moves (with prob. 0.1, 0.01 or 0.0) • Get a more robust evaluation of playing ability

  6. The Game

  7. Volatile Piece Difference move Move

  8. Standard “Heuristic” Weights(lighter = more advantageous)

  9. Learned Weights

  10. Web-based League(CEC 2006 Competition)

  11. Random move prob=0.0 0 22 20 0 2 name: kjkim-mlp-3 1 22 17 1 4 name: AleZ V 2 22 17 0 5 name: NButtBradford1b 3 22 14 2 6 name: mlp-again2 4 22 13 2 7 name: delete-me-cel-1-10 5 22 13 1 8 name: brookdale4 6 22 8 2 12 name: tomy0 7 22 7 1 14 name: fedevadeculo 8 22 5 0 17 name: last weebl 9 22 5 1 16 name: jesz3 10 22 5 2 15 name: Jorge 11 22 1 2 19 name: tpr-tdl-01-500000

  12. Random Move Prob = 0.01 0        220     181     4       35      name: kjkim-mlp-3 1        220     170     7       43      name: AleZ V     2        220     161     12      47      name: mlp-again2         3        220     157     5       58      name: NButtBradford1b 4        220     138     10      72      name: brookdale4 5        220     137     17      66      name: delete-me-cel-1-10 6        220     73      7       140     name: fedevadeculo 7        220     70      14      136     name: tomy0 8        220     58      17      145     name: Jorge 9        220     55      7       158     name: jesz3 10       220     46      2       172     name: last weebl 11       220     17      12      191     name: tpr-tdl-01-500000

  13. Random Move Prob = 0.1 0 220 163 1 56 name: kjkim-mlp-3 1 220 161 4 55 name: mlp-again2 2 220 158 3 59 name: AleZ V 3 220 153 9 58 name: brookdale4 4 220 150 6 64 name: delete-me-cel-1-10 5 220 147 5 68 name: NButtBradford1b 6 220 73 7 140 name: fedevadeculo 7 220 71 5 144 name: Jorge 8 220 68 4 148 name: jesz3 9 220 67 3 150 name: tomy0 10 220 58 6 156 name: last weebl 11 220 21 7 192 name: tpr-tdl-01-500000

  14. Winner: Kyung-Joong KimYonsei University, Seoul • Approach • Used GA • Initialised population with 64 : 32 : 1 MLP supplied as sample by organisers • Then used GA to evolve it (100 generations of 100 MLPs) • Interesting: not significantly better against standard heuristics player (with eps = 0.1) • But better on average against a wider range of players

  15. Evolutionary progress against WPC

  16. Future Competitions • Implement additional standard architectures • Blondie-style MLP • Scanning n-tuple grid features • Encourage people to supply their own architectures (allowed for this contest, but not well publicised)

More Related