Docking control of an autonomous underwater vehicle (AUV) is a task that is integral to achieving persistent long term autonomy. This work explores the application of state-of-the-art model-free deep reinforcement learning (DRL) approaches to the task of AUV docking in the continuous domain. We provide a detailed formulation of the reward function, utilized to successfully dock the AUV onto a fixed docking platform. A major contribution that distinguishes our work from the previous approaches is the usage of a physics simulator to define and simulate the underwater environment as well as the DeepLeng AUV. We propose a new reward function formulation for the docking task, incorporating several components, that outperforms previous reward formulations. We evaluate proximal policy optimization (PPO), twin delayed deep deterministic policy gradients (TD3) and soft actor-critic (SAC) in combination with our reward function. Our evaluation yielded results that conclusively show the TD3 agent to be most efficient and consistent in terms of docking the AUV, over multiple evaluation runs it achieved a 100% success rate and episode return of 10667.1 +- 688.8. We also show how our reward function formulation improves over the state of the art.
翻译:自主水下飞行器(AUV)的压舱控制是实现长期自治所不可或缺的一项任务。这项工作探索了对自动水下飞行器连续对接任务应用最先进的无模型深度强化学习(DRL)方法。我们详细制定了奖励功能,用于成功地将自动水下飞行器停靠在固定的对接平台上。我们的工作与以往方法相比,一个主要贡献是使用物理模拟器来定义和模拟水下环境以及深Leng AV。我们为对接任务提出了一个新的奖励功能配方,其中包括一些比以往的奖励配方更完善的成分。我们评估了准政策优化(PPO),两组推迟了深度确定性政策梯度(TD3)和软性行为者-critic(SAC)与我们的奖励功能相结合。我们的评估结果最终表明,TD3代理器在对接AV方面最为高效和一致,在多项评价中取得了100%的成功率,并重回了10667.1+688.8的奖励功能。我们还展示了我们如何改进了第6888条的状态。