基于多智能体强化学习的杜罗河羽流长期测绘研究 (Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning)

We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.

翻译：本研究探讨了利用多台自主水下航行器对河流羽流进行长期（多日）测绘的问题，并以杜罗河典型应用场景为焦点。我们提出了一种能源与通信高效的多智能体强化学习方法，其中中央协调器间歇性地与自主水下航行器通信，收集测量数据并下达指令。该方法将时空高斯过程回归与多头部Q网络控制器相结合，用于调控每台自主水下航行器的航向与航速。基于Delft3D海洋模型的仿真实验表明，本方法在均方误差和运行续航力方面均持续优于单智能体与多智能体基准方案，且增加智能体数量可同时提升两项指标。在某些案例中，算法证明将自主水下航行器数量加倍可使续航力提升一倍以上，同时保持或提高测绘精度，凸显了多智能体协同的优势。所习得的策略能够泛化至不同年份、月份中未经训练的季候模式，为未来动态羽流环境的数据驱动式长期监测研究提供了可行路径。