基于多智能体强化学习的杜罗河羽流长期测绘研究 (Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning)

We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.

翻译：本研究探讨了利用多台自主水下航行器（AUV）对河流羽流进行长期（多日）测绘的问题，并以杜罗河为代表性案例展开分析。我们提出了一种能源与通信高效的多智能体强化学习方法，其中中央协调器间歇性地与AUV进行通信，收集测量数据并下达指令。该方法将时空高斯过程回归（GPR）与多头部Q网络控制器相结合，以调控每台AUV的航向与航速。基于Delft3D海洋模型的仿真实验表明，本方法在均方误差（MSE）和持续作业能力方面均稳定优于单智能体及多智能体基准方法，且增加智能体数量可同时提升测绘精度与系统续航能力。在某些场景下，算法证明将AUV数量加倍可使续航时间提升一倍以上，同时保持或提高测绘精度，凸显了多智能体协同的优势。所学习的策略能够泛化至不同月份与年份的未见过季节性水文模式，为未来数据驱动的动态羽流环境长期监测技术发展提供了可行路径。