220 likes | 330 Views
The Pentium Goes to Vegas. Training a Neural Network to Play BlackJack. Paul Ruvolo and Christine Spritke. Goals. Investigate result based learning Develop strategy for a highly random game Train network to play effectively without explicitly teaching the rules of the game. Strategy.
E N D
The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke
Goals • Investigate result based learning • Develop strategy for a highly random game • Train network to play effectively without explicitly teaching the rules of the game
Strategy • Simplify game to only allow for HIT or STAY • Feedforward 3-layer backpropagation network • Give input units information about the hand and the dealer’s up card • 2 output units for HIT and STAY • 1 hidden layer • Measure performance with Efficiency • Efficiency = (win % * 2) + (tie %) • Return on a dollar
Background • To form a basis of comparison we measured efficiency on a player using: • Random Guessing • Efficiency = 60.3% • Dealer’s Algorithm • Hit when below 17, otherwise Stay • Efficiency = 92.2%
PHASE I Input Specific Cards Showing
PHASE I – Network Setup • 104 Input Units • 52 input units for possible cards in player’s hand • 52 input units for possible dealer’s up card • 20 Hidden Units • 2 Output Units • HIT and STAY • Learning Rate = 0.3; Momentum = 0.3
PHASE I – Network Setup • Target High = 0.9 • Target Low = 0.1 • Target Mid = 0.5 • If hitting and staying yield same result • HIT = STAY = Target Mid • If hitting produces a win while staying produces a loss • HIT = Target High • STAY = Target Low • Vice versa
PHASE I – Results Efficiency peaks at about 88% but never settles
PHASE I – Modifications • Tried multiple variations on initial network • Hidden units ranging from 1 to 20 • Learning rate and momentum adjustments • Aging algorithm for learning rate • 20 Input Units • 10 possible values for player’s cards • 10 possible values for dealer’s up card • No significant changes in performance
PHASE I - Analysis • Analyzed why the network can’t improve, or even learn the dealer’s algorithm • Network hits on a hand summing to 21
PHASE II Input “best” sum of current hand
PHASE II – Strategy • 4 types of inputs • No dealer card, no ace differentiation • No dealer card, with ace differentiation • Include dealer card, no ace differentiation • Include dealer card, with ace differentiation • All use 2 output units and 4 hidden units
PHASE II – No dealer, no aces • 18 input units • Represent all possible hand values when making a decision (ranging from 4 to 21) • Results: • Develops the dealer’s algorithm • Hits on sum < 17 • Stays on sum > 16
PHASE II – Dealer, no aces • 28 input units • 18 possible player hand values • 10 possible values for dealer’s up card • Results: • High efficiency • Good at accounting for dealer’s card in boundary cases
PHASE II – Dealer, no aces Network is more likely to stay when the dealer has a bust card
PHASE II – Dealer, aces • 38 input units • 28 units for player’s hand • 18 possible hard hand values • 10 possible soft hand values • 10 units for the dealer’s up card • Results: • Good at adjusting strategy for hard vs. soft hands
PHASE II – Dealer, aces Network always hits a soft 17 and stays on a hard 17
Conclusion • Neural networks are not magical! • Require the teacher to eliminate duplicate patterns • 5 of diamonds + 7 of clubs is equivalent to 8 of hearts + 4 of spades • Result based training is inherently more difficult • 2 hidden layers might help • We’re not optimistic!