深入强化学习,以不断对自自主水下车辆进行文件控制:基准研究 (Deep Reinforcement Learning for Continuous Docking Control of Autonomous Underwater Vehicles: A Benchmarking Study)

Docking control of an autonomous underwater vehicle (AUV) is a task that is integral to achieving persistent long term autonomy. This work explores the application of state-of-the-art model-free deep reinforcement learning (DRL) approaches to the task of AUV docking in the continuous domain. We provide a detailed formulation of the reward function, utilized to successfully dock the AUV onto a fixed docking platform. A major contribution that distinguishes our work from the previous approaches is the usage of a physics simulator to define and simulate the underwater environment as well as the DeepLeng AUV. We propose a new reward function formulation for the docking task, incorporating several components, that outperforms previous reward formulations. We evaluate proximal policy optimization (PPO), twin delayed deep deterministic policy gradients (TD3) and soft actor-critic (SAC) in combination with our reward function. Our evaluation yielded results that conclusively show the TD3 agent to be most efficient and consistent in terms of docking the AUV, over multiple evaluation runs it achieved a 100% success rate and episode return of 10667.1 +- 688.8. We also show how our reward function formulation improves over the state of the art.

翻译：自主水下飞行器(AUV)的压舱控制是实现长期自治所不可或缺的一项任务。这项工作探索了对自动水下飞行器连续对接任务应用最先进的无模型深度强化学习(DRL)方法。我们详细制定了奖励功能,用于成功地将自动水下飞行器停靠在固定的对接平台上。我们的工作与以往方法相比,一个主要贡献是使用物理模拟器来定义和模拟水下环境以及深Leng AV。我们为对接任务提出了一个新的奖励功能配方,其中包括一些比以往的奖励配方更完善的成分。我们评估了准政策优化(PPO),两组推迟了深度确定性政策梯度(TD3)和软性行为者-critic(SAC)与我们的奖励功能相结合。我们的评估结果最终表明,TD3代理器在对接AV方面最为高效和一致,在多项评价中取得了100%的成功率,并重回了10667.1+688.8的奖励功能。我们还展示了我们如何改进了第6888条的状态。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日