This paper studies the control synthesis of motion planning subject to uncertainties. The uncertainties are considered in robot motions and environment properties, giving rise to the probabilistic labeled Markov decision process (PL-MDP). A Model-Free Reinforcement The learning (RL) method is developed to generate a finite-memory control policy to satisfy high-level tasks expressed in linear temporal logic (LTL) formulas. Due to uncertainties and potentially conflicting tasks, this work focuses on infeasible LTL specifications, where a relaxed LTL constraint is developed to allow the agent to revise its motion plan and take violations of original tasks into account for partial satisfaction. And a novel automaton is developed to improve the density of accepting rewards and enable deterministic policies. We proposed an RL framework with rigorous analysis that is guaranteed to achieve multiple objectives in decreasing order: 1) satisfying the acceptance condition of relaxed product MDP and 2) reducing the violation cost over long-term behaviors. We provide simulation and experimental results to validate the performance.
翻译:本文研究受不确定因素影响的运动规划的控制合成。 不确定性在机器人运动和环境特性中加以考虑,从而产生有标签的Markov决定程序(PL-MDP)的概率性。 开发了一种无损强化模型(RL)方法,以产生一种有限的模拟控制政策,满足线性时间逻辑(LTL)公式所表述的高层任务。由于不确定性和可能相互冲突的任务,这项工作侧重于不可行的LTL规格,制定放松的LTL限制,使代理人能够修改其运动计划,并将违反原定任务的情况考虑在内,以便部分满意。我们开发了一个新型自动图,以提高接受奖励的密度,并促成确定性政策。我们提出了一个具有严格分析的RL框架,保证实现减速的多重目标:(1) 满足放松产品MDP的接受条件,(2) 减少长期行为的违规费用。我们提供模拟和实验结果,以验证业绩。