9. Modular networks, motor control, and reinforcement learning

9.Modular networks, motor control, and reinforcement learning Fundamentals of Computational Neuroscience (The 2nd Ed.), T. P. Trappenberg, 2010. Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer Science and Engineering Graduate Programs in Cognitive Science, Brain Science and Bioinformatics Brain-Mind-Behavior Concentration Program Seoul National University E-mail: btzhang@bi.snu.ac.kr This material is available online at http://bi.snu.ac.kr/

Outline

9.1 Modular mapping networks • Modular networks • Large-scale networks with constraints • Modular specialization in brain • Mixture of experts • Combining feedforward mapping networks • Experts: working modules

9.1 Modular mapping networks • Mixture of experts (cont’d) • Property: - Universal function approximator → can solve any mapping task- ex) abstract function (Fig. 9.2) • Divide and conquer strategy • Training networks • Assign the experts to particular tasks • Train each expert on the designated task • Train the gating network(Credit-assignment problem) • Not appropriate in biology systems

9.1 Modular mapping networks • The ‘what-and-where’ task • Two visual pathways • Ventral visual pathway (what) • Dorsal visual pathway (where) • Modular networks • What → object recognition (what) • Where → location of objects (where)

9.1 Modular mapping networks • The ‘what-and-where’ task • Jacob’s idea (1991) • Input channels (26): retinal (25) & the task specification (1) • Output channels (18): objects (9) & location (9) • A single network with 36 hidden nodes using back-propagation • Conflicting training information • Temporal cross-talk • Spatial cross-talk • Task decomposition

9.1 Modular mapping networks • Modular network for what-and-where task • Architectural constraints • Where: linear separable → a single layer network→ a simple expert without hidden layer • What: linear inseparable → need hidden nodes • Jordan’s study • Considering physical location • Objective function • The 1st term: error term • The 2nd term: distance bias

9.1 Modular mapping networks • Product of experts (G. Hinton) • Summation of experts • Normalization • Averaged • In wide distribution, do not provide precise answer • Product of experts • Opinions of experts outside their domain of expertise have less of an effect

9.2 Coupled attractor networks • Coupled attractor networks • The combination of basic recurrent networks • Distinguish between network groups • Strongly connected subsystem (intra-module) • Weakly connected subsystem (inter-module)

9.2 Coupled attractor networks • Imprinted and composite patterns • Comparison of a single attractor with two single attractors • Objects described by two independent features • Left-right visual fields (Fig. 9.5B) • Two independent sub-networks (with 1000 nodes each network) • # ofweights: 10002×2 = 2×106 • One attractor network • At least 138,000 nodes • # of weights: 138,0002 • The reason to use large single networks • Specific combination of features • Green square vs. blue triangle

9.2 Coupled attractor networks • Signal-to-noise analysis • Provide insights of behavior of coupled attractor networks • N: # of nodes, N’: # of nodes in each module, m: # of modules • Weights with Hebbian rule • New matrix with components

9.2 Coupled attractor networks • Evaluation of the stability of the imprinted pattern • Simplified z2 instead of z1

9.2 Coupled attractor networks • The case of starting the network from states that correspond to different sub-patterns in the different modules • The starting state • After one update • Signal and noise • Lower bound of g (with (9.11), Fig. 9.6B)

9.2 Coupled attractor networks • The reverse case • The starting state • After one update • Signal and noise • Upper bound of g-factor (Fig. 9.6B) • Allude to the possible interaction between sub-networks in modular networks

9.3 Sequence learning • Sequential aspects in brain processing • Some memories trigger other memories → dynamic system • Modular architecture provides some merits in sequence learning to associative networks • Hetero-association • Auto-associative weights: clean up the noisy version of the new system • Hetero-associative weights: drive the system to a noisy and new version • Hopfield network

9.3 Sequence learning • Modular networks for sequence learning

9.4 Complementary memory systems • Distributed model of working memory • R. O’Reilly’s model • PFC (prefrontal cortex) • Many independent recurrent subsystems • Short period • HCMP (hippocampus and related area) • Rapid learning of association for episodic memory • PMC (perceptual and motor cortex) • Semantic memory and action-repertoires

9.4 Complementary memory systems • Limited capacity of working memory • Magical number 7 ± 2 • Task to remember numbers • Limitations of working memory • Various hypotheses on the reason of limited working memory • A bottleneck in the information processing capabilities of the brain (D. Broadbent) • How limits in attentional systems (N. Cowan) • To reverberating neural models

9.4 Complementary memory systems • The spurious synchronization hypothesis • Luck and Vogel’s study • Computational neuroscience model

9.4 Complementary memory systems • The interacting-reverberating-memory hypothesis

9.5 Motor learning and control • Motor learning • Activities: catch a ball, ride a bicycle • Require more times compared than associative learning • Using reinforcement leanrning • Feedback controller • Using feedforward mapping network

9.5 Motor learning and control • Forward and inverse model controller

9.5 Motor learning and control • The cerebellum and motor controller

9.6 Reinforcement learning • Supervised learning vs. reinforcement learning • Answers vs. feedback (rewards) • Classical conditioning and the reinforcement learning problem • Conditioning • Temporal assignment problem

9.6 Reinforcement learning • Formulation • Policies and value functions • r(s, a): (s: state, a: action, r: reward function) • Goal: maximizing future reward R • R(t) = summation of r(s, a) in some time window before t • Policies: π(s, a) • State value function: Vπ(s) • Action value function: Qπ(s, a) • Temporal difference learning • Off-policy vs. on-policy

9.6 Reinforcement learning • Temporal delta rule • Learn from reward within neural architectures • Episodes → Vπ(s) • riin(s): a specific pattern of rates in the input channels • wi(t): the weight in time t

9.6 Reinforcement learning • Temporal difference learning • Limitation of temporal delta learning: next time step only • Different time steps • Introduction of discount factor γ (0 < γ < 1) • The perfect prediction V* • Minimizing the temporal difference error

9.6 Reinforcement learning • The learning of a state value function • The learning of an action value function

9.6 Reinforcement learning • Simulation MATLAB code (produce Fig. 9.16)

9.6 Reinforcement learning • The actor-critic scheme and the basal ganglia • The actor-critic scheme • Temporal difference learning into a control method • Sutton and Barto proposed • Actor: the motor command generator • The adaptive critic: estimate value functions and guide actions

9.6 Reinforcement learning • Information stream • The basal ganglia • Anatomical overview

9.6 Reinforcement learning • Signals of neural activities in the basal ganglia • MATLAB code for Fig. 9.20

9.6 Reinforcement learning • Q-learning for the basal ganglia functions

9. Modular networks, motor control, and reinforcement learning

9. Modular networks, motor control, and reinforcement learning

Presentation Transcript

Introduction to Motor Learning and Control

Motor Learning and Control

Introduction to Motor Learning and Control

Motor Learning and Control

Traffic Light Control Using Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

REINFORCEMENT LEARNING

Reinforcement Learning Control with Robust Stability

Reinforcement Learning

The Reinforcement Learning Toolbox – Reinforcement Learning in Optimal Control Tasks

Motor Control and Motor Learning in Rehabilitation

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Motor Control and Motor Learning Frames of Reference