The Pentium Goes to Vegas

The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke

Goals • Investigate result based learning • Develop strategy for a highly random game • Train network to play effectively without explicitly teaching the rules of the game

Strategy • Simplify game to only allow for HIT or STAY • Feedforward 3-layer backpropagation network • Give input units information about the hand and the dealer’s up card • 2 output units for HIT and STAY • 1 hidden layer • Measure performance with Efficiency • Efficiency = (win % * 2) + (tie %) • Return on a dollar

Background

Background • To form a basis of comparison we measured efficiency on a player using: • Random Guessing • Efficiency = 60.3% • Dealer’s Algorithm • Hit when below 17, otherwise Stay • Efficiency = 92.2%

PHASE I Input Specific Cards Showing

PHASE I – Network Setup • 104 Input Units • 52 input units for possible cards in player’s hand • 52 input units for possible dealer’s up card • 20 Hidden Units • 2 Output Units • HIT and STAY • Learning Rate = 0.3; Momentum = 0.3

PHASE I – Network Setup • Target High = 0.9 • Target Low = 0.1 • Target Mid = 0.5 • If hitting and staying yield same result • HIT = STAY = Target Mid • If hitting produces a win while staying produces a loss • HIT = Target High • STAY = Target Low • Vice versa

PHASE I – Results Efficiency peaks at about 88% but never settles

PHASE I – Modifications • Tried multiple variations on initial network • Hidden units ranging from 1 to 20 • Learning rate and momentum adjustments • Aging algorithm for learning rate • 20 Input Units • 10 possible values for player’s cards • 10 possible values for dealer’s up card • No significant changes in performance

PHASE I - Analysis • Analyzed why the network can’t improve, or even learn the dealer’s algorithm • Network hits on a hand summing to 21

PHASE II Input “best” sum of current hand

PHASE II – Strategy • 4 types of inputs • No dealer card, no ace differentiation • No dealer card, with ace differentiation • Include dealer card, no ace differentiation • Include dealer card, with ace differentiation • All use 2 output units and 4 hidden units

PHASE II – No dealer, no aces • 18 input units • Represent all possible hand values when making a decision (ranging from 4 to 21) • Results: • Develops the dealer’s algorithm • Hits on sum < 17 • Stays on sum > 16

PHASE II – No dealer, aces

PHASE II – Dealer, no aces • 28 input units • 18 possible player hand values • 10 possible values for dealer’s up card • Results: • High efficiency • Good at accounting for dealer’s card in boundary cases

PHASE II – Dealer, no aces

PHASE II – Dealer, no aces Network is more likely to stay when the dealer has a bust card

PHASE II – Dealer, aces • 38 input units • 28 units for player’s hand • 18 possible hard hand values • 10 possible soft hand values • 10 units for the dealer’s up card • Results: • Good at adjusting strategy for hard vs. soft hands

PHASE II – Dealer, aces Network always hits a soft 17 and stays on a hard 17

Conclusion • Neural networks are not magical! • Require the teacher to eliminate duplicate patterns • 5 of diamonds + 7 of clubs is equivalent to 8 of hearts + 4 of spades • Result based training is inherently more difficult • 2 hidden layers might help • We’re not optimistic!

The Pentium Goes to Vegas

The Pentium Goes to Vegas

Presentation Transcript

Pentium microprocessors

Pentium 4

Intel Pentium 4

Goes to the movies…

The Pentium Pin Functions

Pentium Real Mode

The Pentium Processor

Pentium PC

Pentium Architecture

The Pentium: A CISC Architecture

The Pentium Processor

The Pentium Processor

The Pentium Processor

The Pentium processors

Pentium Microprocessor

The Pentium Processor

The Pentium Processor