This paper studies optimal motion planning subject to motion and environment uncertainties. By modeling the system as a probabilistic labeled Markov decision process (PL-MDP), the control objective is to synthesize a finite-memory policy, under which the agent satisfies high-level complex tasks expressed as linear temporal logic (LTL) with desired satisfaction probability. In particular, the cost optimization of the trajectory that satisfies infinite-horizon tasks is considered, and the trade-off between reducing the expected mean cost and maximizing the probability of task satisfaction is analyzed. Instead of using traditional Rabin automata, the LTL formulas are converted to limit-deterministic B\"uchi automata (LDBA) with a more straightforward accepting condition and a more compact graph structure. The novelty of this work lies in the consideration of the cases that LTL specifications can be potentially infeasible and the development of a relaxed product MDP between PL-MDP and LDBA. The relaxed product MDP allows the agent to revise its motion plan whenever the task is not fully feasible and to quantify the violation measurement of the revised plan. A multi-objective optimization problem is then formulated to jointly consider the probability of the task satisfaction, the violation with respect to original task constraints, and the implementation cost of the policy execution, which is solved via coupled linear programs. To the best of our knowledge, it is the first work that bridges the gap between planning revision and optimal control synthesis of both plan prefix and plan suffix of the agent trajectory over the infinite horizon. Experimental results are provided to demonstrate the effectiveness of the proposed framework.
翻译:本文根据运动和环境的不确定性研究最佳运动规划。通过将该系统建模成一个具有概率标志的马可夫决策程序(PL-MDP),控制目标是将限值模拟政策(LDBA)综合成一个有限的模拟政策,根据这一政策,代理人满足以线性时间逻辑(LTL)表示的高层次复杂任务,并具有期望的满意度。特别是,考虑了满足无限偏差任务的轨迹的成本优化以及降低预期平均成本与最大限度地提高任务满意度之间的平衡。通过模拟系统,不是使用传统的拉宾自动模型,而是将LTL公式转换为限制确定性轨迹B\uchi自动模型(LDBA),以更直接的接受条件和更紧凑的图形结构。这项工作的新颖之处在于考虑LTL规格有可能不可行,并考虑在PLPL-MDP和LDBA之间开发一个宽松的产品 MDP。 宽松的产品MDP允许代理人在当时的任务不完全可行的情况下修改其运动计划,并且量化对原计划(LDBA)的违反程度的第一次衡量。一个最接近于执行计划,一个最接近于执行过程的概率,多目标的难度是考虑执行过程。