1 / 34

ICO Learning

ICO Learning. Gerhard Neumann Seminar A, SS06. Overview. Short Overview of different control methods Correlation Based Learning ISO Learning Comparison to other Methods ([Wörgötter05]) TD Learning STDP ICO Learning ([Porr06]) Learning Receptive Fields ([Kulvicius06]).

airlia
Download Presentation

ICO Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICO Learning Gerhard Neumann Seminar A, SS06

  2. Overview • Short Overview of different control methods • Correlation Based Learning • ISO Learning • Comparison to other Methods ([Wörgötter05]) • TD Learning • STDP • ICO Learning ([Porr06]) • Learning Receptive Fields ([Kulvicius06])

  3. Comparison of ISO learning to other Methods • Comparison for Classical Conditioning learning Problems (open loop control) • Relating RL to Classical Conditioning • Classical Conditioning: Pairing of two subsequent stimuli is learned such that the presentation of the first stimulus is taken as a predictor of the second one. • RL: Maximization of Rewards: • v … Predictor of future reward

  4. RL for Classical Conditioning • TD-Error: • Derivation Term : • Weight Change: • => Nothing new so far… • Goal: Output v should react after learning to the onset of the CS xn, and remains active until the reward terminates • Present CS internally by a chain of n + 1 delayed pulses xi • Replace the states from traditional RL with time steps

  5. RL for Classical Conditioning • Special kind of E-Trace • Serial Compound Representation • Learning Steps: • Rectangular response of v • Special Treatment of the reward not necessary • x0 can replace the reward when setting w0 to 1 at the beginning

  6. Comparison for Classical Conditioning • Correlation Based Learning • „Reward“ x0 is not an independent term as in TD learning • TD-Learning

  7. Comparison for Classical Conditioning • TD-Learning • ISO-Learning • Uses another form of E-Traces (Band-pass filters) • Used for all input pathways • -> also for calculating the output

  8. Comparison for the closed loop • Closed loop • Actions of the agent affect future sensory input • Comparison not so easy any more, because behavior of the algorithms is now quite different • Reward Based Architectures • Actor-Critic Architecture • Use Evaluative Feed-Back • Reward Maximation • A good reward signal is very often hard to find • In nature: Found by evolution • Can theoretically be applied to any learning problem • Resolution in the State Space: • Only applicable for low dimensional state spaces • -> Curse of dimensionality!

  9. Comparison for the closed loop • Correlation Based Architectures • Non-evaluative feedback, all signals are value free • Minimize Disturbance • Valid Regions are usually much bigger than in for reward maximation • Better Convergence !! • Restricted Solutions • Evaluations are implicitely build into the sign of the reaction behavior • Actor and Critic are the same architectureal building block • Only for a restricted set of learning problems • Hard to apply for complex tasks • Resolution in Time: • Only looks at temporal correlation of the input variables • Can be applied for high dimensional state spaces

  10. Comparison of ISO learning and STDP • ISO learning generically produces a bimodal weight change curve • Similiar to the STDP (Spike timing dependent plasticity) learning weight change curve • ISO learning STDP rule: • Potential from the synapse: Filtered version of a spike • Gradient Dependent Model • Much faster time scale used in STDP • Can model different kind of synapses with different filters easily

  11. Overview • Short Overview of different control methods • Correlation Based Learning • ISO Learning • Comparison to other Methods ([Wörgötter05]) • TD Learning • STDP • ICO Learning ([Porr06]) • Learning Receptive Fields([Kulvicius06])

  12. ICO (Input Correlation Only) Learning • Drawback of Hebbian Learning • Auto-Correlation can result in divergence even if x0 = 0 • ISO learning: • Relies on orthogonal filters of different inputs • Orthogonal to its derivative • Only works for if steady state is assumed • Auto correlation does not vanish any more if the weights are changed during the impulse response of the filters • -> can not be applied for large learning rates • => Can be used only for small learning rates, otherwise Auto-Correlation causes divergence of the weights

  13. ICO & ISO Learning • ISO Learning • ICO Learning

  14. ICO Learning • Simple adaption of the ISO Learning rule • Correlate only inputs with each other • No correlation with the output • -> No Auto Correlation • Define one Input as the reflex input x0 • Drawback: • Loss of Generality: Not Isotropic any more • Not all inputs are treated equally any more • Advantage: • Can use much higher learning rates (up to 100x faster) • Can use almost arbitrary types of filter • No Divergence in weights any more

  15. ICO Learning • Weight change curve (open loop, just one Filter bank) • Same as for ISO learning • Weight changing curve • ISO learning contains exponential instability • Even after setting x0 to 0 after 100000 timesteps

  16. ICO Learning: Closing the Loop • Output of learner v feeds back to its inputs xj after being modified by the environment • Reactive Pathway: Fixed Reactive Feedback control • Learning Goal: • Learn earlier reaction to keep x0 (Disturbance or error signal) at 0 • One can proof that under simplified conditions that one shootlearning is possible • With one filter bank, impulse signals • Using Z-Transform

  17. ICO Learning: Applications • Simulated Robot Experiment: • Robot has to find food (disks in the environment) • Sensors for Uncondition Stimulus: • 2 Touchsensors (Left + Right) • Reflex: Robot elicits a sharp turn as it touches a disk • Pulls the robot into the centre of the disk • Sensors for predictive Stimulus • 2 Sound (Distance) Sensors (Left + Right), Disks • Can measure distance to the disk • Stimulus: Difference between Left + Right sound signals • Use 5 filters (resonators) in the filter bank • Output v: Steering angle of the Robot

  18. ICO Learning: Simulated Robot • Only One experience has been sufficient to show an adapted behavior • Only Possible with ICO learning

  19. Simulated Robot • Comparison for different Learning rates • ICO Learning ISO Learning • Learning was successful if for a sequence of four contacts • Equivalent for small learning rates • Small Auto correlation term

  20. Simulated Robot • Two Different Learning Rates • Divergent Behavior of ISO learning for high learning rates • Robot shows avoidance behavior from food disks

  21. Applications continued • More Complex Task: • Three food disks simultanously • No simple relationship between the reflex input and the predictive input any more • Superimposed Sound Fields • Is only learned by ICO learning, not by ISO learning

  22. ICO: Real Robot Application • Real Robot: • Target White disk from a distance • Reflex: Pulls the robot into the white disk just at the moment the robot drives over the disk • Achieved by analysing the bottom-scanline of a camera • Predictive input: • Analysing Scanline from the top of the image • Filter Bank • 5 FIR Filters with different filter length • All coefficients set to 1 -> smear out signal • Narrow viewing angle of the camera • Put robot more or less in front of the disk

  23. ICO: Real Robot Experiment • Processing the input • Calculate the deviation of the positions of all white points in a scanline to the center of the scanline • 1D signal • Results: • A before learning • B & C After learning • 14 contacts • Weights oscillate around their best values, but do not diverge

  24. ICO Learning: Other Applications • Mechanical Arm • Arm is always controlled with a PI controller to a specified set point • Input of the PI controller: Motor position • PI controller is used as reactive filter • Disturbance: • Pushing force of a second small arm mounted to the main arm • Fast reacting touch sensors measures D. • Use 10 resonator filters in the filter bank

  25. ICO Learning: Other Applications • Result: • Control is shifted backwards in time • Error signal (derivation to the set point) almost vanishes • Other example: Temperature Control • Predict temperature changes caused by another heater

  26. Overview • Short Overview of different control methods • Correlation Based Learning • ISO Learning • Comparison to other Methods ([Wörgötter05]) • TD Learning • STDP • ICO Learning ([Porr06]) • Learning Receptive Fields([Kulvicius06])

  27. Development of Receptive fields through temporal Sequence learning [Kulvicius06] • Develop receptive fields by ICO learning • Learn behavior and receptive fields simultanously • Usually these 2 learning processes are considered seperately • First approach where the receptive field and the behavior is trained simultanously!! • Shows the application of ICO learning for high dimensional input spaces

  28. Line Following • System: • Robot should learn to better follow a line painted on the ground • Reactive Input: • x0… Pixels at the bottom ot the image • Predictive Input • x1… Pixels in the middle of the image • Use 10 different filters in the filter bank (resonators) • Reflexive Output: • Brings robot back to the line • Not a Smooth behavior • Motor Output • S… Constant Speed • v modifies speed and steering of the robot • Use Left-Right symmetry

  29. Line Following • Simple System • Fixed sensor banks, all pixels are summed up • Input x1 predicts x0

  30. Line Following • Three different Tracks • Steep, Shallow, Sharp • For one learning experiment always the same track is used • Robot steers much smoother • Usually 1 trial is enough for learning • Videos • Without Learning • Steep • Sharp

  31. Line Following: Receptive Fields • Receptive fields • Use 225 pixels for the far sensors • Use individual filter banks for each pixel • 10 filters per pixel • Left-Right Symmetry: • Left Receptive field is a mirror of the right

  32. Line Following: Receptive Fields • Results • Lower learning rates have to be used • More trials are needed (3 to 6 trials) • Different RFs are learned for different tracks • Steep and Sharp Track, Plots show the sum of all filter weights for one pixel

  33. Conclusion • Correlation Based Learning • Tries to minimize the influence of disturbances • Easier to learn than Reinforcement Learning • The framework is less general • Questions: • When to apply Correlation Based Learning and when Reinforcement Learning • How is it done by Animals/Humans? • How can these two methods be combined • Correlation learning in early learning stage • RL for fine tuning • ICO Learning • Improvement of ISO learning • More Stable, higher learning rates can be used • One Shoot Learning is possible

  34. Literature: • [Porr05]: F. Wörgötter and B. Porr, Temporal Sequence Learning, Prediction and Control, A Review of different control methods and their relation to biological mechanisms • [Porr03]: B. Porr, F. Wörgötter, Isotropic Sequence Order Learning • [Porr06]: B. Porr, F. Wörgötter, Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only • [Kulvicius06]: T. Kulvicius, B. Porr and F. Wörgötter, Behaviourally Guided Development of Primary and Secondary Receptive Fields through temporal sequence learning

More Related