1 / 14

Dopamine, Uncertainty and TD Learning

Dopamine, Uncertainty and TD Learning. Yael Niv Michael Duff Peter Dayan. CoSyNe’04. What does Dopamine encode?. Important neuromodulator Neurological/psychiatric disorders Drug addiction/self stimulation Fundamental role in RL Classical/Pavlovian conditioning

gitano
Download Presentation

Dopamine, Uncertainty and TD Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dopamine, Uncertainty and TD Learning Yael Niv Michael Duff Peter Dayan CoSyNe’04

  2. What does Dopamine encode? • Important neuromodulator • Neurological/psychiatric disorders • Drug addiction/self stimulation • Fundamental role in RL • Classical/Pavlovian conditioning • Instrumental/operant conditioning • DA neurons respond to: • Unexpected (appetitive) rewards • Stimuli predicting (appetitive) rewards • Withdrawal of expected rewards • Novel/Salient stimuli

  3. What does Dopamine encode? • DA represents some aspect of reward, but not rewards as such.

  4. <-DA DA encodes the reward prediction error DA δ(t) Stimulus Reward Stimulus Reward Stimulus Reward The TD Hypothesis of Dopamine Precise theory for the generation of DA firing patterns Compelling account for the role of DA in classical conditioning

  5. CS = 2 sec visual stimulus US (probabilistic) = drops of juice But: Fiorillo, Tobler & Schultz 2003 • Introduce inherent uncertainty into the classical conditioning paradigm • Five visual stimuli indicating different reward probabilities: P=0,¼,½,¾,1

  6. Fiorillo, Tobler & Schultz 2003 • At stimulus time: DA represents mean expected reward • Interesting: A ramp in activity up to reward (highest for p=½) • Hypothesis:DA ramp encodes uncertainty in reward

  7. Dopamine: Uncertainty or TD error? • No apparent reason for ramp • The ramp is predictablefrom the stimulus • TD predicts away predictable quantities contradiction ! • Side issue: the ramp is like a constantly surprising reward -- it can’t influence action choice

  8. p = 0.5 p = 0.75 A closer look at FTS’s results: At time of reward: • Prediction errors result from uncertainty • Crucially: Positive and negative errors cancel out

  9. DA 270% δ(t) 55% A closer look at FTS’s results: • TD error δ(t) can be positive or negative • Neuronal firing rate is only positive (negative values are coded relative to base firing rate) But: • DA base firing rate is low -> asymmetric encoding ofδ(t)

  10. x(1) x(2) … V(1) V(20) δ(t) r(t) Modeling TD with asymmetric errors • Tapped delay line • Standard online TD learning • Fixed learning rate • Negative δ(t) scaled by d=1/6 prior to PSTH Learning proceeds normally (without scaling) • Necessary to produce the right predictions • Can be biologically plausible

  11. Modeling TD with asymmetric errors TD learning with asymmetric prediction errors replicates the recorded data accurately. • Ramps result from asymmetrically coded prediction errors propagating back to stimulus Artifact of summing PSTHs over nonstationary recent reward histories

  12. DA: Uncertainty or Temporal Difference? Analytically deriving the maximum error at the time of the reward we get: => the ramp is indeed highest for P=½ But: • DA Encodes nothing but temporal difference error! • Experimental test: Ramp as within or between trial phenomenon?

  13. CS = short visual stimulus Trace period US (probabilistic) = drops of juice Trace conditioning: A puzzle and its resolution • Same (if not more) uncertainty, but… no DA ramping! (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman) • Resolution: lower learning rate in trace conditioning eliminates ramp

  14. Conclusions Preserve the TD hypothesis of Dopamine: • No explicit coding of uncertainty • Ramping explained by neural constraints • Explains the disappearance of the ramp in trace conditioning Important challenges to the TD hypothesis • Conditioned inhibition • Effects of timing

More Related