Underwater target localization using range-only and single-beacon (ROSB) techniques with autonomous vehicles has been used recently to improve the limitations of more complex methods, such as long baseline and ultra-short baseline systems. Nonetheless, in ROSB target localization methods, the trajectory of the tracking vehicle near the localized target plays an important role in obtaining the best accuracy of the predicted target position. Here, we investigate a Reinforcement Learning (RL) approach to find the optimal path that an autonomous vehicle should follow in order to increase and optimize the overall accuracy of the predicted target localization, while reducing time and power consumption. To accomplish this objective, different experimental tests have been designed using state-of-the-art deep RL algorithms. Our study also compares the results obtained with the analytical Fisher information matrix approach used in previous studies. The results revealed that the policy learned by the RL agent outperforms trajectories based on these analytical solutions, e.g. the median predicted error at the beginning of the target's localisation is 17% less. These findings suggest that using deep RL for localizing acoustic targets could be successfully applied to in-water applications that include tracking of acoustically tagged marine animals by autonomous underwater vehicles. This is envisioned as a first necessary step to validate the use of RL to tackle such problems, which could be used later on in a more complex scenarios
翻译:最近,利用只使用射程和单一光谱(ROSB)的自主飞行器技术,在水下目标定位下,利用仅使用射程和单一光谱(ROSB)的自主车辆技术,改进了更复杂方法的局限性,例如长基线和超短基线系统的局限性,然而,在ROSB目标定位方法中,跟踪车辆接近局部目标的轨迹在获得预测目标位置的最佳准确性方面发挥了重要作用。在这里,我们调查了一种强化学习(RL)方法,以找到一种最佳途径,由自主车辆来提高和优化预测目标定位的总体准确性,同时减少时间和电力消耗。为了实现这一目标,利用最先进的深度RL算法设计了不同的实验性测试。我们的研究还比较了以往研究中使用的渔业信息矩阵分析方法所取得的结果。结果显示,RL代理公司根据这些分析方法所学的政策优于轨迹,例如,在目标本地化开始时预测的中位误差为17%。这些结论表明,为实现这一目标,利用深度RL系统定位而采用最先进的测测测测测测测测测目标,在水中首先可以将自动测测测测水上,随后,通过测测测测测测测测测水中。