Efficient point-to-point navigation in the presence of a background flow field is important for robotic applications such as ocean surveying. In such applications, robots may only have knowledge of their immediate surroundings or be faced with time-varying currents, which limits the use of optimal control techniques for planning trajectories. Here, we apply a novel Reinforcement Learning algorithm to discover time-efficient navigation policies to steer a fixed-speed swimmer through an unsteady two-dimensional flow field. The algorithm entails inputting environmental cues into a deep neural network that determines the swimmer's actions, and deploying Remember and Forget Experience replay. We find that the resulting swimmers successfully exploit the background flow to reach the target, but that this success depends on the type of sensed environmental cue. Surprisingly, a velocity sensing approach outperformed a bio-mimetic vorticity sensing approach by nearly two-fold in success rate. Equipped with local velocity measurements, the reinforcement learning algorithm achieved near 100% success in reaching the target locations while approaching the time-efficiency of paths found by a global optimal control planner.
翻译:在有背景流流场的情况下,高效点对点导航对于海洋测量等机器人应用非常重要。 在这种应用中,机器人可能只了解其近处环境,或面对时间变化的流流,这限制了使用最佳控制技术来规划轨迹。在这里,我们应用了新型强化学习算法,以发现时间效率高的导航政策,通过一个不稳定的二维流场引导固定速度游泳器。这种算法需要将环境信号输入一个深层神经网络,确定游泳者的行动,并部署记忆和忘记经验重现。我们发现,由此产生的游泳者成功地利用了背景流来达到目标,但这一成功取决于感应环境提示的类型。令人惊讶的是,一种速度感测方法以近两倍的成功率超越了生物刺激性多动感测方法。用当地速度测量技术,强化学习算法在接近全球最佳控制计划所发现的道路的时间效率的同时,在接近目标地点方面取得了近100%的成功率。