1 / 14

Neural Networks - Adaline

Neural Networks - Adaline. L. Manevitz. Plan of Lecture. Perceptron: Connected and Convex examples Adaline Square Error Gradient Calculate for and, xor Discuss limitations LMS algorithm: derivation. What is best weights?. Most classified correctly? Least Square Error

Download Presentation

Neural Networks - Adaline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Networks - Adaline L. Manevitz NNs Adaline

  2. Plan of Lecture • Perceptron: Connected and Convex examples • Adaline Square Error • Gradient • Calculate for and, xor • Discuss limitations • LMS algorithm: derivation. NNs Adaline

  3. What is best weights? • Most classified correctly? • Least Square Error • Least Square Error Before Cut-Off! • To Minimize S (d – Sw x)**2 (first sum over examples; second over dimension) NNs Adaline

  4. Least Square Minimization • Find gradient of error over all examples. Either calculate the minimum or move opposite to gradient. • Widrow-Hoff(LMS): Use instantaneous example as approximation to gradient. • Advantages: No memory; on-line; serves similar function as noise to avoid local problems. • Adjust by w(new) = w(old) + a (d)x for each x. • Here d = (desired output – Swx) NNs Adaline

  5. LMS Derivation • Errsq = S (d(k) – W x(k)) ** 2 • Grad(errsq) = 2S(d(k) – W x(k)) (-x(k)) • W (new) = W(old) - mGrad(errsq) • To ease calculations, use Err(k) in place of Errsq • W(new) = W(old) + 2mErr(k) x(k)) • Continue with next choice of k NNs Adaline

  6. Applications • Adaline has better convergence properties than Perceptron • Useful in noise correction • Adaline in every modem. NNs Adaline

  7. LMS (Least Mean Square Alg.) • 1. Apply input to Adaline input • 2. Find the square error of current input • Errsq(k) = (d(k) - W x(k))**2 • 3. Approximate Grad(ErrorSquare) by • differentiating Errsq • approximating average Errsq by Errsq(k) • obtain -2Errsq(k)x(k) • Update W: W(new) = W(old) + 2mErrsq(k)X(k) • Repeat steps 1 to 4. NNs Adaline

  8. Comparison with Perceptron • Both use updating rule changing with each input • One fixes binary error; the other minimizes continuous error • Adaline always converges; see what happens with XOR • Both can REPRESENT Linearly separable functions NNs Adaline

  9. Convergence Phenomenom • LMS converges depending on choice of m. • How to choose it? NNs Adaline

  10. Limitations • Linearly Separable • How can we get around it? • Use network of neurons? • Use a transformation of data so that it is linearly separable NNs Adaline

  11. Multi-level Neural Networks • Representability • Arbitrarily complicated decisions • Continuous Approximation: Arbitrary Continuous Functions (and more) (Cybenko Theorem) • Learnability • Change Mc-P neurons to Sigmoid etc. • Derive backprop using chain rule. (Like LMS TheoremSample Feed forward Network (No loops) NNs Adaline

  12. Replacement of Threshold Neurons with Sigmoid or Differentiable Neurons Sigmoid Threshold NNs Adaline

  13. Prediction delay Compare Input/Output NN NNs Adaline

  14. Sample Feed forward Network (No loops) Weights Output Weights Weights Input Wji Vik F(S wji xj NNs Adaline

More Related