Previous studies on automatic berthing systems based on artificial neural network (ANN) showed great berthing performance by training the ANN with ship berthing data as training data. However, because the ANN requires a large amount of training data to yield robust performance, the ANN-based automatic berthing system is somewhat limited due to the difficulty in obtaining the berthing data. In this study, to overcome this difficulty, the automatic berthing system based on one of the reinforcement learning (RL) algorithms, proximal policy optimization (PPO), is proposed because the RL algorithms can learn an optimal control policy through trial-and-error by interacting with a given environment and does not require any pre-obtained training data, where the control policy in the proposed PPO-based automatic berthing system controls revolutions per second (RPS) and rudder angle of a ship. Finally, it is shown that the proposed PPO-based automatic berthing system eliminates the need for obtaining the training dataset and shows great potential for the actual berthing application.
翻译:根据人工神经网络(ANN)进行的关于自动泊位系统的先前研究显示,通过对ANN进行船舶泊位数据培训,将船舶泊位数据作为培训数据,自动泊位性能很高。然而,由于ANN需要大量的培训数据才能产生稳健的性能,因此基于ANN的自动泊位系统由于难以获得泊位数据而有些局限性。在这项研究中,为了克服这一困难,提议了基于强化学习算法之一的自动泊位系统,即最接近政策优化(PPPO),因为RL算法可以通过与特定环境进行互动,通过试管学习最佳控制政策,而不需要任何预先获得的培训数据,因为基于PPO的拟议自动泊位系统的控制政策控制了船舶每秒的革命和舵角。最后,显示基于PO的自动泊位系统消除了获取培训数据集的需要,并显示了实际泊位应用的巨大潜力。