An Incremental Sampling-based Algorithm for Stochastic Optimal Control

An Incremental Sampling-based Algorithm for Stochastic Optimal Control Martha Witick Department of Computer Science Rice University Vu Anh Huynh, Sertac Karamanm Emilio Frazzoli. ICRA 2012.

Motion Planning! • Continuous time • Continuous state space • Continuous controls • Noisy. MDPs? S

Motivation • Cnts-time, cnts space stochastic optimal control problem: no closed form, exact algorithmic solutions • Approximate cnts problem with discrete MDP and compute solution • ... but exponential with number of state and control spaces • Sampling-based methods: fast and effective, but... • RRT: not optimal • RRT*: no systems with uncertain dynamics

Overview • Have a continuous time/space stochastic optimal control problem • Want an optimal cost function J* and ultimately an optimal policy μ* • Create a discrete-state Markov Decision Process and refine, iterating until the current cost-to-go Ji* is close enough to J* • What the iterative Markov Decision Process (iMDP) algorithm does.

Outline • Continuous Stochastic Optimal Control Problem Definition • Discrete Markov Chain approximation • iMDP Algorithm • iMDP Results • Conclusion

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS Consider a stochastic dynamical system that looks like this: robot's dynamics state x(t) ∊ S control u(t) timet ≥ 0 δS S0 S

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS Consider a stochastic dynamical system that looks like this: robot's dynamics noise δS state x(t) ∊ S control u(t) timet ≥ 0 S0 S

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S control u(t) timet ≥ 0 S0 S Solutions looks like this until x(t) hits δS:

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S controlu(t) timet ≥ 0 S0 S Solutions looks like this until x(t) hits δS:

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S A Markov control (or policy) μ(t): S→ℝ needs only state x(t).

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ first exit time

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ discount rate α ∊ [0,1)

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ cost rate function

Continuous Stochastic Dynamics S ⊂ℝdx interior S0 smooth boundaryδS δS x(0) state x(t) ∊ S policyμ(t) timet ≥ 0 S0 S The expected cost-to-go function under policy μ terminal cost function

Continuous Stochastic Dynamics • We want the optimal cost-to-go functionJ*: • We want to compute J* so we can get its optimal policy μ* • But solving this continuous problem is hard.

Solving this is hard • Lets make a discrete model!

Markov Chain Approximation Markov Chain Approximation Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0 Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0

Markov Chain Approximation Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0 Discretize states: grab finite set of states Sn from S assign transition probabilities Pn Discretize time: assign non-negative holding time Δtn(z) to each state z Don't need to discretize controls.

Markov Chain Approximation • Local Consistency Property: • For all states z ∊ S, • For all states z ∊ S and all controls v ∊ U,

Markov Chain Approximation Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0 Control problem is analogous, so let's define a discrete discounted cost: Continuous for comparison:

Markov Chain Approximation Markov Chain Approximation Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0 Approximate stochastic dynamics with a sequence of MDPs {Mn}∞n=0

Discontinuity and Remarks • F, f, g, and h can be discontinuous and still work • While the controlled Markov chain has a discrete state structure and the stochastic dynamical system has a continuous model, they BOTH have a continuous control space.

iMDP Algorithm Set 0th MDP M0 to empty S0 δS S zs

iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP S0 δS S

iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn S0 S0 zs δS S

iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn S0 δS S

iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 zs S0 δS S

iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn zs S0 δS znearest S

iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn Compute trajectory x:[0,t] with control u from znearest to zs zs x:[0,t] S0 δS znearest S

iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn Compute trajectory x:[0,t] with control u from znearest to zs Set z to x(0) zs z x:[0,t] S0 δS znearest S

iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn Compute trajectory x:[0,t] with control u from znearest to zs Set z to x(0) and add to Sn zs z x:[0,t] S0 δS znearest S

iMDP Algorithm Set 0th MDP M0 to empty whilen < Ndo nth MDP Mn <- (n-1)th MDP Sample state from δS and add it to Mn Sample zs from S0 Set znearest to nearest state in Sn Compute trajectory x:[0,t] with control u from znearest to zs Set z to x(0) and add to Sn Compute cost and save Update() new z and Kn states in Sn z S0 δS znearest S

iMDP Algorithm: Update Loop Update() new z and Kn states in Sn:

iMDP Algorithm: Update()

iMDP Algorithm: Update() Uniformly sample or create Cn controls from z to nearest Cn states

iMDP Algorithm: Update() For each control v in Un

iMDP Algorithm: Update() Compute new transition probability to nearest log(Sn) states Znear

An Incremental Sampling-based Algorithm for Stochastic Optimal Control

An Incremental Sampling-based Algorithm for Stochastic Optimal Control

Presentation Transcript

Chapter 13 Stochastic Optimal Control

Incremental Sampling Methodology – an Innovative Approach to Soil Sampling

An Optimal Algorithm for the Distinct Elements Problem

An Optimal Broadcast Algorithm for Content- Addressable Networks

Stratified Sampling for Stochastic Transparency

Incremental Sampling Methodology (ISM)

An Optimal Partial Decoding Algorithm for Rateless Codes

An Efficient Algorithm for Incremental Mining of Association Rules

A Conjugate Gradient-based BPTT-like Optimal Control Algorithm

An Optimal Algorithm for the Distinct Elements Problem

OSiL: An XML-based schema for stochastic programs

An Optimal Control Policy in a Mobile Cloud Computing System Based on Stochastic Data

An Optimal Algorithm for Online Square Detection

An XML-based schema for stochastic programs

Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization

Stochastic Fair Blue An Algorithm For Enforcing Fairness

Incremental Sampling Case Studies

Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization

Optimal Sampling Strategies for Multiscale Stochastic Processes