The paper proposes a spatial-temporal recurrent neural network architecture for Deep $Q$-Networks to steer an autonomous ship. The network design allows handling an arbitrary number of surrounding target ships while offering robustness to partial observability. Further, a state-of-the-art collision risk metric is proposed to enable an easier assessment of different situations by the agent. The COLREG rules of maritime traffic are explicitly considered in the design of the reward function. The final policy is validated on a custom set of newly created single-ship encounters called "Around the Clock" problems and the commonly chosen Imazu (1987) problems, which include 18 multi-ship scenarios. Additionally, the framework shows robustness when deployed simultaneously in multi-agent scenarios. The proposed network architecture is compatible with other deep reinforcement learning algorithms, including actor-critic frameworks.
翻译:本文建议为深海公司设计一个空间时空经常性神经网络结构,以引导一艘自主的船舶。网络设计允许处理任意数量的周围目标船舶,同时对部分可观测性提供强力。此外,还提出了最先进的碰撞风险指标,以便于代理人对不同情况进行评估。在设计奖励功能时,明确考虑到COLREG海上交通规则。最后政策根据一套定制的“环绕时钟”问题和通常选择的Imazu(1987年)问题进行验证,其中包括18个多船情景。此外,框架显示在多试样情景中同时部署时的稳健性。拟议的网络结构与其他强化学习算法(包括行为者-研究中心框架)是兼容的。