1 / 37

Introduction to Neural Networks

Introduction to Neural Networks. John Paxton Montana State University Summer 2003. Chapter 3: Pattern Association. Aristotle’s observed that human memory associates similar items contrary items items close in proximity items close in succession (a song). Terminology and Issues.

reyna
Download Presentation

Introduction to Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Neural Networks John Paxton Montana State University Summer 2003

  2. Chapter 3: Pattern Association • Aristotle’s observed that human memory associates • similar items • contrary items • items close in proximity • items close in succession (a song)

  3. Terminology and Issues • Autoassociative Networks • Heteroassociative Networks • Feedforward Networks • Recurrent Networks • How many patterns can be stored?

  4. Hebb Rule for Pattern Association • Architecture w11 x1 y1 xn ym wnm

  5. Algorithm 1. set wij = 0 1 <= i <= n, 1 <= j <= m 2. for each training pair s:t 3. xi = si 4. yj = tj 5. wij(new) = wij(old) + xiyj

  6. Example • s1 = (1 -1 -1), s2 = (-1 1 1) • t1 = (1 -1), t2 = (-1 1) • w11 = 1*1 + (-1)(-1) = 2 • w12 = 1*(-1) + (-1)1 = -2 • w21 = (-1)1+ 1(-1) = -2 • w22 = (-1)(-1) + 1(1) = 2 • w31 = (-1)1 + 1(-1) = -2 • w32 = (-1)(-1) + 1*1 = 2

  7. Matrix Alternative • s1 = (1 -1 -1), s2 = (-1 1 1) • t1 = (1 -1), t2 = (-1 1) 1 -1 1 -1 2 -2 -1 1 -1 1 = -2 2 -1 1 -2 2

  8. Final Network • f(yin) = 1 if yin > 0, 0 if yin = 0, else -1 2 x1 y1 -2 -2 x2 2 y2 -2 x3 2

  9. Properties • Weights exist if input vectors are linearly independent • Orthogonal vectors can be learned perfectly • High weights imply strong correlations

  10. Exercises • What happens if (-1 -1 -1) is tested? This vector has one mistake. • What happens if (0 -1 -1) is tested? This vector has one piece of missing data. • Show an example of training data that is not learnable. Show the learned network.

  11. Delta Rule for Pattern Association • Works when patterns are linearly independent but not orthogonal • Introduced in the 1960s for ADALINE • Produces a least squares solution

  12. Activation Functions • Delta Rule (1) wij(new) = wij(old) + a(tj – yj)*xi*1 • Extended Delta Rule (f’(yin.j)) wij(new) = wij(old) + a(tj – yj)*xi*f’(yin.j)

  13. Heteroassociative Memory Net • Application: Associate characters.A <-> aB <-> b

  14. Autoassociative Net • Architecture w11 x1 y1 xn yn wnn

  15. Training Algorithm • Assuming that the training vectors are orthogonal, we can use the Hebb rule algorithm mentioned earlier. • Application: Find out whether an input vector is familiar or unfamiliar. For example, voice input as part of a security system.

  16. Autoassociate Example 1 1 1 1 1 1 1 0 1 1 1 = 1 1 1 = 1 0 1 1 1 1 1 1 1 0

  17. Evaluation • What happens if (1 1 1) is presented? • What happens if (0 1 1) is presented? • What happens if (0 0 1) is presented? • What happens if (-1 1 1) is presented? • What happens if (-1 -1 1) is presented? • Why are the diagonals set to 0?

  18. Storage Capacity • 2 vectors (1 1 1), (-1 -1 -1) • Recall is perfect 1 -1 1 1 1 0 2 2 1 -1 -1 -1 -1 = 2 0 2 1 -1 2 2 0

  19. Storage Capacity • 3 vectors: (1 1 1), (-1 -1 -1), (1 -1 1) • Recall is no longer perfect 1 -1 1 1 1 1 0 1 3 1 -1 -1 -1 -1 -1 = 1 0 1 1 -1 1 1 -1 1 3 1 0

  20. Theorem • Up to n-1 bipolar vectors of n dimensions can be stored in an autoassociative net.

  21. Iterative Autoassociative Net • 1 vector: s = (1 1 -1) • st * s = 0 1 -1 1 0 -1 -1 -1 0 • (1 0 0) -> (0 1 -1) • (0 1 -1) -> (2 1 -1) -> (1 1 -1) • (1 1 -1) -> (2 2 -2) -> (1 1 -1)

  22. Testing Procedure 1. initialize weights using Hebb learning 2. for each test vector do 3. set xi = si 4. calculate ti 5. set si = ti 6. go to step 4 if the s vector is new

  23. Exercises • 1 piece of missing data: (0 1 -1) • 2 pieces of missing data: (0 0 -1) • 3 pieces of missing data: (0 0 0) • 1 mistake: (-1 1 -1) • 2 mistakes: (-1 -1 -1)

  24. Discrete Hopfield Net • content addressable problems • pattern association problems • constrained optimization problems • wij = wji • wii = 0

  25. Characteristics • Only 1 unit updates its activation at a time • Each unit continues to receive the external signal • An energy (Lyapunov) function can be found that allows the net to converge, unlike the previous system • Autoassociative

  26. Architecture x2 y2 y1 y3 x1 x3

  27. Algorithm 1. initialize weights using Hebb rule 2. for each input vector do 3. yi = xi 4. do steps 5-6 randomly for each yi 5. yin.i = xi + Syjwji 6. calculate f(yin.i) 7. go to step 2 if the net hasn’t converged

  28. Example • training vector: (1 -1) y1 y2 -1 x1 x2

  29. Example • input (0 -1) update y1 = 0 + (-1)(-1) = 1 update y2 = -1 + 1(-1) = -2 -> -1 • input (1 -1) update y2 = -1 + 1(-1) = -2 -> -1 update y1 = 1 + -1(-1) = 2 -> 1

  30. Hopfield Theorems • Convergence is guaranteed. • The number of storable patterns is approximately n / (2 * log n) where n is the dimension of a vector

  31. Bidirectional Associative Memory (BAM) • Heteroassociative Recurrent Net • Kosko, 1988 • Architecture x1 y1 ym xn

  32. Activation Function • f(yin) = 1, if yin > 0 • f(yin) = 0, if yin = 0 • f(yin) = -1 otherwise

  33. Algorithm 1. initialize weights using Hebb rule 2. for each test vector do 3. present s to x layer 4. present t to y layer 5. while equilibrium is not reached 6. compute f(yin.j) 7. compute f(xin.j)

  34. Example • s1 = (1 1), t1 = (1 -1) • s2 = (-1 -1), t2 = (-1 1) 1 -1 1 -1 2 -2 1 -1 -1 1 2 -2

  35. Example • Architecture 2 x1 y1 -2 2 y2 x2 -2 present (1 1) to x -> 1 -1 present (1 -1) to y -> 1 1

  36. Hamming Distance • Definition: Number of different corresponding bits in two vectors • For example, H[(1 -1), (1 1)] = 1 • Average Hamming Distance is ½.

  37. About BAMs • Observation: Encoding is better when the average Hamming distance of the inputs is similar to the average Hamming distance of the outputs. • The memory capacity of a BAM is min(n-1, m-1).

More Related