Maritime autonomous transportation has played a crucial role in the globalization of the world economy. Deep Reinforcement Learning (DRL) has been applied to automatic path planning to simulate vessel collision avoidance situations in open seas. End-to-end approaches that learn complex mappings directly from the input have poor generalization to reach the targets in different environments. In this work, we present a new strategy called state-action rotation to improve agent's performance in unseen situations by rotating the obtained experience (state-action-state) and preserving them in the replay buffer. We designed our model based on Deep Deterministic Policy Gradient, local view maker, and planner. Our agent uses two deep Convolutional Neural Networks to estimate the policy and action-value functions. The proposed model was exhaustively trained and tested in maritime scenarios with real maps from cities such as Montreal and Halifax. Experimental results show that the state-action rotation on top of the CVN consistently improves the rate of arrival to a destination (RATD) by up 11.96% with respect to the Vessel Navigator with Planner and Local View (VNPLV), as well as it achieves superior performance in unseen mappings by up 30.82%. Our proposed approach exhibits advantages in terms of robustness when tested in a new environment, supporting the idea that generalization can be achieved by using state-action rotation.
翻译:在世界经济全球化中,自主海运在世界经济全球化中发挥了关键作用。深度强化学习(DRL)已应用于模拟公海避免船舶碰撞情况的自动路径规划。直接从投入中学习复杂测绘的端到端方法没有很好地概括到在不同环境中达到目标。在这项工作中,我们提出了一个新的战略,称为州-行动轮换,通过轮换获得的经验(州-行动状态)和在重播缓冲中保留这些经验来改善代理人在隐蔽情况下的表现。我们设计了基于深度威慑政策梯度梯度梯度、本地造影器和规划器的模型。我们的代理利用两个深层革命神经网络来估计政策和行动价值功能。拟议的模型在海洋情景中经过详尽的培训和测试,使用蒙特利尔和哈利法克斯等城市的真实地图。实验结果显示,在CVN顶端进行州-行动轮换,不断提高到达目的地的速度(RATD)11.96 %,与规划员和当地视图(VNPLV)相比,我们的代理人利用两个深层神经网络来评估政策和行动价值功能。拟议的模型经过详尽的训练和测试,通过在30年期的视野中实现高超视距定位,从而实现高视定位,从而在环境中实现高视定位的优势,从而实现高超视定位,从而在全景中实现了高视定位方法,从而可以测试,从而实现了高视能定位,从而实现了获得了获得了了我们在全景。