This paper studies optimal motion planning subject to motion and environment uncertainties. By modeling the system as a probabilistic labeled Markov decision process (PL-MDP), the control objective is to synthesize a finite-memory policy, under which the agent satisfies complex high-level tasks expressed as linear temporal logic (LTL) with desired satisfaction probability. In particular, the cost optimization of the trajectory that satisfies infinite horizon tasks is considered, and the trade-off between reducing the expected mean cost and maximizing the probability of task satisfaction is analyzed. Instead of using traditional Rabin automata, the LTL formulas are converted to limit-deterministic B\"uchi automata (LDBA) with a reachability acceptance condition and a compact graph structure. The novelty of this work lies in considering the cases where LTL specifications can be potentially infeasible and developing a relaxed product MDP between PL-MDP and LDBA. The relaxed product MDP allows the agent to revise its motion plan whenever the task is not fully feasible and quantify the revised plan's violation measurement. A multi-objective optimization problem is then formulated to jointly consider the probability of task satisfaction, the violation with respect to original task constraints, and the implementation cost of the policy execution. The formulated problem can be solved via coupled linear programs. To the best of our knowledge, this work first bridges the gap between probabilistic planning revision of potential infeasible LTL specifications and optimal control synthesis of both plan prefix and plan suffix of the trajectory over the infinite horizons. Experimental results are provided to demonstrate the effectiveness of the proposed framework.
翻译:本文根据运动和环境的不确定性研究最佳运动规划。通过将该系统建模成一个具有概率标志的Markov决定程序(PL-MDP),控制目标是将限值模型化,根据这一政策,代理人满足以线性时间逻辑(LTL)表示的复杂高层次任务,并具有期望的满意度。特别是,考虑了满足无限地平线任务的轨迹的成本优化问题,并分析了降低预期平均成本和最大限度地提高任务满意度之间的权衡。在任务不完全可行的情况下,LTL公式被转换为限值的B\"uchi aultmata(LDBA),具有可达性接受性条件和紧凑图结构。这项工作的新颖之处在于考虑LT规格有可能不可行,并在PLP-MDP和LDBLB之间开发一个宽松的产品MDP。在任务不能完全可行的情况下,允许代理人修改其运动计划,并量化经修订的计划违规度。随后,制定了一个多目标优化的B&uchimimimitealalalalalalalalal 计划,然后,共同考虑执行成本规划的概率,同时考虑LLLLLLL工作的进度规划。