1 / 15

Classical Situation

Classical Situation. heaven. hell. World deterministic State observable. MDP-Style Planning. Policy Universal Plan Navigation function. [Koditschek 87, Barto et al. 89]. heaven. hell. World stochastic State observable. Stochastic, Partially Observable. heaven?. hell?. sign.

rowland
Download Presentation

Classical Situation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classical Situation heaven hell • World deterministic • State observable

  2. MDP-Style Planning • Policy • Universal Plan • Navigation function [Koditschek 87, Barto et al. 89] heaven hell • World stochastic • State observable

  3. Stochastic, Partially Observable heaven? hell? sign [Sondik 72] [Littman/Cassandra/Kaelbling 97]

  4. Stochastic, Partially Observable heaven hell hell heaven sign sign

  5. Stochastic, Partially Observable start 50% 50% heaven hell ? ? hell heaven sign sign sign

  6. Robot Planning Frameworks

  7. MDP-Style Planning • Policy • Universal Plan • Navigation function [Koditschek 87, Barto et al. 89] heaven hell • World stochastic • State observable

  8. Markov Decision Process (discrete) r=1 0.1 s2 0.9 0.7 0.1 0.3 0.99 r=0 s3 0.3 s1 r=20 0.3 0.4 0.2 s5 s4 r=0 0.8 r=-10 [Bellman 57] [Howard 60] [Sutton/Barto 98]

  9. Value Iteration • Value function of policy p • Bellman equation for optimal value function • Value iteration: recursively estimating value function • Greedy policy: [Bellman 57] [Howard 60] [Sutton/Barto 98]

  10. Stochastic, Partially Observable ? ? heaven hell ? ? hell heaven start start sign sign sign sign 50% 50%

  11. Introduction to POMDPs (1 of 3) action b 100 -100 100 -40 80 0 a b a b action a -100 action b action a s1 s2 s1 s2 p(s1) [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

  12. Value Iteration in POMDPs Substitute b for s • Value function of policy p • Bellman equation for optimal value function • Value iteration: recursively estimating value function • Greedy policy:

  13. Missing Terms: Belief Space • Expected reward: • Next state density: Bayes filters! (Dirac distribution)

  14. Value Iteration in Belief Space state s next state s’, reward r’ observation o . . . . . . . . belief state b next belief state b’ Q(b, a) max Q(b’, a) value function

  15. Why is This So Complex? ? State Space Planning (no state uncertainty) Belief Space Planning (full state uncertainties)

More Related