290 likes | 300 Views
This paper discusses the challenges of planning in real-world domains with uncertain durations and presents various algorithms for efficient and optimal planning.
E N D
Probabilistic Temporal Planning with Uncertain Durations Mausam Joint work with Daniel S. Weld University of Washington Seattle
Motivation Three features of real world planning domains • Concurrency • Calibrate while rover moves • Uncertain Effects • ‘Grip a rock’ may fail • Uncertain Durative actions • Wheels spin, so speed uncertain
Contributions • Novel Challenges • Large number of decision epochs • Results to manage this blowup in different cases • Large branching factors • Approximation algorithms • Five planning algorithms • DURprun : optimal • DURsamp : near-optimal • DURhyb : anytime with user defined error • DURexp : super-fast • DURarch : balance between speed and quality • Identify fundamental issues for future research
Outline of the talk • Background • Theory • Algorithms and Experiments • Summary and Future Work
Outline of the talk • Background • MDP • Decision Epochs: happenings, pivots • Theory • Algorithms and Experiments • Summary and Future Work
unit duration Markov Decision Process • S : a set of states, factored into Boolean variables. • A : a set of actions • Pr (S£A£S! [0,1]): the transition model • C (A!R) : the cost model • s0 : the start state • G : a set of absorbing goals
GOAL of an MDP • Find a policy (S!A) which: • minimises expected cost of reaching a goal • for a fully observable • Markov decision process • if the agent executes for indefinite horizon. • Algorithms • Value iteration, Real Time Dynamic Programming, etc. • iterative dynamic programming algorithms
Definitions (Durative Actions) • Assumption: (Prob.) TGP Action model • Preconditions must hold until end of action. • Effects are usable only at the end of action. • Decision epochs: time point when a new action may be started. • Happenings: A point when action finishes. • Pivot: A point when action could finish.
Outline of the talk • Background • Theory • Explosion of Decision Epochs • Algorithms and Experiments • Summary and Future Work
Decision Epochs (TGP Action Model) • Deterministic Durations [Mausam&Weld05] : • Decision Epochs = set of happenings • Uncertain Durations: • Non-termination has information! • Theorem: Decision Epochs = set of pivots
Illustration: A bimodal distribution Duration distribution of a Expected Completion Time
Conjecture if all actions have duration distributions independent of effects unimodal duration distributions then Decision Epochs = set of happenings
Outline of the talk • Background • Theory • Algorithms and Experiments • Expected Durations Planner • Archetypal Durations Planner • Summary and Future Work
Planning with Durative Actions • MDP in an augmented state space Time 0 2 4 6 a b X c <X1,{(a,4), (c,4)}> X1 : Application of b on X. <X,;>
Uncertain Durations: Transition Fn action a : uniform(1,2) action b : uniform(1,2) a <Xa, {(b,1)}> b 0.25 a <Xb, {(a,1)}> 0.25 b a, b <X,;> 0.25 a <Xab, ;> 0.25 b a <Xab, ;> b
Branching Factor If n actions m possible durations r probabilistic effects Then Potential Successors (m-1)[(r+1)n – rn – 1] + rn
Algorithms • Five planning algorithms • DURprun : optimal • DURsamp : near-optimal • DURhyb : anytime with user defined error • DURexp : super-fast • DURarch : balance between speed and quality
Expected Durations Planner (DURexp) • assign each action a deterministic duration equal to the expected value of its distribution. • build a deterministic duration policy for this domain. • repeat execute this policy and wait for interrupt (a) action terminated as expected – do nothing (b) action terminated early – replan from this state (c) action terminated late – revise a’s deterministic duration and replan for this domain until goal is reached
Multi-modal distributions • Recall: conjecture holds only for unimodal distributions happenings if unimodal Decision epochs = pivots if multimodal
Multi-modal Durations: Transition Fn action a : uniform(1,2) action b : 50% : 1 50% : 3 a <Xa, {(b,1)}> b 0.25 a <Xb, {(a,1)}> 0.25 b a, b <X,;> 0.25 a <Xab, ;> 0.25 b a <X, {(a,1), (b,1)> b
Multi-modal Distributions • Expected Durations Planner (Durexp) • One deterministic duration per action • Big approximation for multi-modal distribution • Archetypal Durations Planner (Durarch) • Limited uncertainty in durations • One duration per mode of distribution
Planning Time (multi-modal)
Expected Make-span (multi-modal)
Outline of the talk • Background • Theory • Algorithms and Experiments • Summary and Future Work • Observations on Concurrency
Summary • Large number of Decision Epochs • Results to manage explosion in specific cases • Large branching factors • Expected Durations Planner • Archetypal Durations Planner (multi-modal)
Handling Complex Action Models • So Far: Probabilistic TGP • Preconditions hold over-all. • Effects usable only at end. • What about: Probabilistic PDDL2.1 ? • Preconditions at-start, over-all, at-end • Effects at-start, at-end • Decision epochs must be arbitrary points.
preconditions p q a G G p,: q b q : p effects Ramifications • Result independent of uncertainty!! • Existing decision epoch planners are incomplete. • SAPA, Prottle, etc. • All IPC winners
Related Work • Tempastic (Younes and Simmons’ 04) • Generate, Test and Debug • Prottle (Little, Aberdeen, Thiebaux’ 05) • Planning Graph based heuristics • Uncertain Durations w/o concurrency • Foss and Onder’05 • Boyan and Littman’00 • Bresina et.al.’02, Dearden et.al.’03