Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized B\"uchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles (UAVs).
翻译:在部分已知和不完全信息环境下,自主代理运动规划变成了一个具有挑战性的问题,特别是针对于复杂任务。本文提出了一种无模型强化学习方法来解决这个问题。我们将运动规划建模为一个概率标记的部分可观测马尔可夫决策过程(PL-POMDP)问题,并使用线性时序逻辑(LTL)来表达复杂任务。LTL公式然后被转换成了一个限定确定化广义布嘘自动机(LDGBA)。根据模型验证技巧,该问题被重新定义为在PL-POMDP和LDGBA乘积上找到最优政策来实现复杂任务。我们实现了基于深度强化学习和长短期记忆(LSTM)的方法来处理观察历史和任务识别。我们的贡献包括所提出的方法、使用LTL和LDGBA以及LSTM增强型深度强化学习。我们通过在不同环境中进行模拟,包括网格世界、虚拟办公室和多代理仓库,证明了所提方法的适用性。模拟结果表明,我们所提出的方法有效地解决了环境、行动和观测的不确定性。这表明了它在现实世界应用,包括对无人机(UAVs)的控制方面的潜力。