An unmanned surface vehicle (USV) can perform complex missions by continuously observing the state of its surroundings and taking action toward a goal. A SWARM of USVs working together can complete missions faster, and more effectively than a single USV alone. In this paper, we propose an autonomous communication model for a swarm of USVs. The goal of this system is to implement a software system using Robot Operating System (ROS) and Gazebo. With the main objective of coordinated task completion, the Markov decision process (MDP) provides a base to formulate a task decision problem to achieve efficient localization and tracking in a highly dynamic water environment. To coordinate multiple USVs performing real-time target tracking, we propose an enhanced multi-agent reinforcement learning approach. Our proposed scheme uses MA-DDPG, or Multi-Agent Deep Deterministic Policy Gradient, an extension of the Deep Deterministic Policy Gradients (DDPG) algorithm that allows for decentralized control of multiple agents in a cooperative environment. MA-DDPG's decentralised control allows each and every agent to make decisions based on its own observations and objectives, which can lead to superior gross performance and improved stability. Additionally, it provides communication and coordination among agents through the use of collective readings and rewards.
翻译:一艘水面无人艇可以通过持续观察周围环境状态并采取行动来完成复杂任务。一组合作工作的水面无人艇比单个水面无人艇单独工作更快、更有效地完成任务。本文提出了一种水面无人艇的自主通信模型。该系统的目标是使用Robot Operating System (ROS)和Gazebo实现软件系统。以协调任务完成为主要目标,马尔科夫决策过程(MDP)提供了一个基础,以制定任务决策问题以在高度动态的水中环境中进行有效的定位和追踪。为了协调多个水面无人艇执行实时目标跟踪,我们提出了一种增强的多代理强化学习方法。我们提出的方案使用MA-DDPG(Multi-Agent Deep Deterministic Policy Gradient),这是Deep Deterministic Policy Gradients(DDPG)算法的扩展,允许在协作环境中对多个代理进行分散控制。MA-DDPG的分散控制允许每个代理根据自己的观察和目标做出决策,这可以导致更高的总体性能和改进的稳定性。此外,它通过使用集体读数和奖励来提供代理之间的通信和协调。