1 / 72

An Incremental Sampling-based Algorithm for Stochastic Optimal Control

An Incremental Sampling-based Algorithm for Stochastic Optimal Control. Martha Witick Department of Computer Science Rice University Vu Anh Huynh, Sertac Karamanm Emilio Frazzoli. ICRA 2012. Motion Planning!. Continuous time Continuous state space Continuous controls Noisy. MDPs?. S.

walden
Download Presentation

An Incremental Sampling-based Algorithm for Stochastic Optimal Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Incremental Sampling-based Algorithm for Stochastic Optimal Control Martha Witick Department of Computer Science Rice University Vu Anh Huynh, Sertac Karamanm Emilio Frazzoli. ICRA 2012.

  2. Motion Planning! • Continuous time • Continuous state space • Continuous controls • Noisy. MDPs? S

  3. Motivation • Cnts-time, cnts space stochastic optimal control problem: no closed form, exact algorithmic solutions • Approximate cnts problem with discrete MDP and compute solution • ... but exponential with number of state and control spaces • Sampling-based methods: fast and effective, but... • RRT: not optimal • RRT*: no systems with uncertain dynamics

  4. Overview • Have a continuous time/space stochastic optimal control problem • Want an optimal cost function J* and ultimately an optimal policy μ* • Create a discrete-state Markov Decision Process and refine, iterating until the current cost-to-go Ji* is close enough to J* • What the iterative Markov Decision Process (iMDP) algorithm does.

  5. Outline • Continuous Stochastic Optimal Control Problem Definition • Discrete Markov Chain approximation • iMDP Algorithm • iMDP Results • Conclusion

  6. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS Consider a stochastic dynamical system that looks like this: robot's dynamics state x(t) ∊ S control u(t) timet ≥ 0 δS S0 S

  7. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS Consider a stochastic dynamical system that looks like this: robot's dynamics noise δS state x(t) ∊ S control u(t) timet ≥ 0 S0 S

  8. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS Consider a stochastic dynamical system that looks like this: robot's dynamics noise δS state x(t) ∊ S control u(t) timet ≥ 0 S0 S

  9. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S control u(t) timet ≥ 0 S0 S Solutions looks like this until x(t) hits δS:

  10. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S control u(t) timet ≥ 0 S0 S Solutions looks like this until x(t) hits δS:

  11. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S controlu(t) timet ≥ 0 S0 S Solutions looks like this until x(t) hits δS:

  12. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S A Markov control (or policy) μ(t): S→ℝ needs only state x(t).

  13. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ

  14. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ first exit time

  15. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ discount rate α ∊ [0,1)

  16. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ cost rate function

  17. Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ terminal cost function

  18. Continuous Stochastic Dynamics • We want the optimal cost-to-go functionJ*: • We want to compute J* so we can get its optimal policy μ* • But solving this continuous problem is hard.

  19. Solving this is hard • Lets make a discrete model!

  20. Outline • Continuous Stochastic Optimal Control Problem Definition • Discrete Markov Chain approximation • iMDP Algorithm • iMDP Results • Conclusion

  21. Markov Chain Approximation Markov Chain Approximation Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0 Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0

  22. Markov Chain Approximation Markov Chain Approximation Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0 Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0

  23. Markov Chain Approximation Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0 Discretize states: grab finite set of states Sn from S assign transition probabilities Pn Discretize time: assign non-negative holding time Δtn(z) to each state z Don't need to discretize controls.

  24. Markov Chain Approximation • Local Consistency Property: • For all states z ∊ S, • For all states z ∊ S and all controls v ∊ U,

  25. Markov Chain Approximation • Local Consistency Property: • For all states z ∊ S, • For all states z ∊ S and all controls v ∊ U,

  26. Markov Chain Approximation • Local Consistency Property: • For all states z ∊ S, • For all states z ∊ S and all controls v ∊ U,

  27. Markov Chain Approximation • Local Consistency Property: • For all states z ∊ S, • For all states z ∊ S and all controls v ∊ U,

  28. Markov Chain Approximation • Local Consistency Property: • For all states z ∊ S, • For all states z ∊ S and all controls v ∊ U,

  29. Markov Chain Approximation Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0 Control problem is analogous, so let's define a discrete discounted cost: Continuous for comparison:

  30. Markov Chain Approximation Markov Chain Approximation Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0 Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0

  31. Discontinuity and Remarks • F, f, g, and h can be discontinuous and still work • While the controlled Markov chain has a discrete state structure and the stochastic dynamical system has a continuous model, they BOTH have a continuous control space.

  32. Outline • Continuous Stochastic Optimal Control Problem Definition • Discrete Markov Chain approximation • iMDP Algorithm • iMDP Results • Conclusion

  33. iMDP Algorithm Set 0th MDP M0 to empty S0 δS S zs

  34. iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP S0 δS S

  35. iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn S0 S0 zs δS S

  36. iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn S0 δS S

  37. iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 zs S0 δS S

  38. iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn zs S0 δS znearest S

  39. iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn Compute trajectory x:[0,t] with control u from znearest to zs zs x:[0,t] S0 δS znearest S

  40. iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn Compute trajectory x:[0,t] with control u from znearest to zs Set z to x(0) zs z x:[0,t] S0 δS znearest S

  41. iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn Compute trajectory x:[0,t] with control u from znearest to zs Set z to x(0) and add to Sn zs z x:[0,t] S0 δS znearest S

  42. iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn Compute trajectory x:[0,t] with control u from znearest to zs Set z to x(0) and add to Sn Compute cost and save Update() new z and Kn states in Sn z S0 δS znearest S

  43. iMDP Algorithm: Update Loop Update() new z and Kn states in Sn:

  44. iMDP Algorithm: Update Loop Update() new z and Kn states in Sn:

  45. iMDP Algorithm: Update Loop Update() new z and Kn states in Sn:

  46. iMDP Algorithm: Update()

  47. iMDP Algorithm: Update()

  48. iMDP Algorithm: Update() Uniformly sample or create Cn controls from z to nearest Cn states

  49. iMDP Algorithm: Update() For each control v in Un

  50. iMDP Algorithm: Update() Compute new transition probability to nearest log(Sn) states Znear

More Related