1 / 27

Learning and Memory

Learning and Memory. Reinforcement Learning. Learning Levels. Darwinian Trial -> death or children Skinnerian Reinforcement learning Popperian Our hypotheses die in our stead Gregorian Tools and artifacts. Machine Learning. Unsupervised Cluster similar items

aricin
Download Presentation

Learning and Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning and Memory Reinforcement Learning

  2. Learning Levels • Darwinian • Trial -> death or children • Skinnerian • Reinforcement learning • Popperian • Our hypotheses die in our stead • Gregorian • Tools and artifacts

  3. Machine Learning • Unsupervised • Cluster similar items • Association (no “right” answer) • Supervised • For observations/features, teacher gives the correct “answer” • E.g., Learn to recognize categories • Reinforcement • Take action, observe consequence • bad dog!

  4. Pavlovian Conditioning • Pavlov • Food causes salivation • Sound before food • -> sound causes salivation • Learn to associate sound with food

  5. Operant Conditioning

  6. Associative Memory • Hebbian Learning • When two connected neurons are both excited, the connection between them is strengthened Neurons that fire together, wire together

  7. Explanations of Pavlov • S-S (stimulus-stimulus) • Dogs learn to associate sound with food • (and salivate based on “thinking” of food) • S-R (stimulus-response) • Dogs learn to salivate based on the tone • (and salivate directly without “thinking” of food) • How to test? • Do dogs think lights are food?

  8. Conditioning in humans • Two pathways • The “slow” pathway dogs use • Cognitive (conscious) learning • How to test this hypothesis • Learn to blink based on a stimuli associated with a puff of air.

  9. Blocking • Tone -> Shock -> Fear • Tone -> Fear • Tone + Light -> Shock -> Fear • Light -> ?

  10. Rescorla-Wagner Model • Hypothesis: learn from observations that are surprising • Vn<- Vn + c (Vmax - Vn) • D Vn= c (Vmax - Vn) • Vn is strength of association between US and CS • c is the learning rate • Predictions • contingency

  11. Limitations of Rescorla-Wagner • Tone -> food • Light -> food • Tone + light -> ?

  12. Reinforcement Learning • Many times one takes a long sequence of actions, and only discovers the result of these actions later (e.g. when you win or lose a game) • Q: How can one ascribe credit (or blame) to one action is a sequence of actions • A: by noting surprises

  13. Consider a game • Estimate probability of winning • Take an action, see how the opponent (or the world) responds • Re-estimate probability of winning • If it is unchanged, you learned nothing • If it is higher, the initial state was better than you thought • If it is lower, the state was worse than you thought

  14. Tic-tac-toe example • Decision tree • Alternate layers give possible moves for each player

  15. Reinforcement Learning • State • E.g. board position • Action • E.g. move • Policy • State -> Action • Reward function • State -> utility • Model of the environment • State, action -> state

  16. Definitions of key terms • State • What you need to know about the world to predict the effect of an action • Policy • What action to take in each state • Reward function • The cost or benefit of being in a state • (e.g. points won or lost, happiness gained or lost)

  17. Value Iteration • Value Function • Expected value of a policy over time = sum of the expected rewards • V(s) <- V(s) + c[V(s’) - V(s)] • s = state before the move • s’ = state after the move • “temporal difference” learning

  18. Mouse in Maze Example policy value function

  19. Dopamine & Reinforcement

  20. Exploration - Exploitation • Exploration • Always try a different route to work • Exploitation • Always take the best route to work that you have found so far • Learning requires exploration • Unless the environment is noisy

  21. RL can be very simple • Simple learning algorithm leads to optimal policy • Without predicting the effects of the agents actions • Without predicting immediate payoffs • Without planning • Without explicit model of the world

  22. How to play chess • Computer • Evaluation function for board positions • Fast search • Human (grandmaster) • Memorize tens of thousands of board positions and what do to • Do a much smaller search!

  23. AI and Games • Chess Backgammon Deterministic Stochastic Position Policy evaluation + search

  24. Scaling up value functions • For small number of states • Learn the value function of each state • Not possible for Backgammon • 1020 states • Learn mapping from features to value • Then use reinforcement learning to get improved value estimates

  25. Q-learning • Instead of the Value of a state, learn the value Q(s,a) of taking an action a from a state s. • Optimal policy: take best action • maxa Q(s,a) • Learning rule • Q(s, a) <- Q(s, a) + c[rt + maxb Q(s’, b) - Q(s, a)]

  26. Learning to Sing • Zerbra Finch hears father’s song • Memorizes it • Then practices for months to learn to reproduce it • What kind of learning is this?

  27. Controversies? • Is conditioning good? • How much learning do people do? • Innateness, learning, and free will

More Related