In recent years, reinforcement learning and its multi-agent analogue have achieved great success in solving various complex control problems. However, multi-agent reinforcement learning remains challenging both in its theoretical analysis and empirical design of algorithms, especially for large swarms of embodied robotic agents where a definitive toolchain remains part of active research. We use emerging state-of-the-art mean-field control techniques in order to convert many-agent swarm control into more classical single-agent control of distributions. This allows profiting from advances in single-agent reinforcement learning at the cost of assuming weak interaction between agents. However, the mean-field model is violated by the nature of real systems with embodied, physically colliding agents. Thus, we combine collision avoidance and learning of mean-field control into a unified framework for tractably designing intelligent robotic swarm behavior. On the theoretical side, we provide novel approximation guarantees for general mean-field control both in continuous spaces and with collision avoidance. On the practical side, we show that our approach outperforms multi-agent reinforcement learning and allows for decentralized open-loop application while avoiding collisions, both in simulation and real UAV swarms. Overall, we propose a framework for the design of swarm behavior that is both mathematically well-founded and practically useful, enabling the solution of otherwise intractable swarm problems.
翻译:近年来,强化学习及其多试剂类比在解决各种复杂控制问题方面取得了巨大成功;然而,多试剂强化学习在理论分析和算法实验设计方面仍然具有挑战性,特别是对于大型成形机器人剂的理论分析和实验设计,特别是对于大型成形机器人剂而言,一个确定的工具链仍然是积极研究的一部分。我们使用新兴的尖端平均场控制技术,将许多试剂的暖流控制转换成更经典的分布式单一试剂控制。这有利于从单一试剂强化学习的进展中获利,而牺牲了代理剂之间的薄弱互动。然而,由于实际系统的性质,其内装有体、物理交织剂,因此,暗地模型被违反。因此,我们把避免碰撞和学习中值控制纳入一个统一框架,以利设计智能机器人暖气行为。在理论方面,我们为连续空间和避免碰撞的普通场控制提供了新的近似保证。在实际方面,我们的方法优于多试剂强化学习,并允许分散开路应用,同时避免碰撞,同时在模拟和真实的滚动式总体设计中,我们提出了一个有用的、实际的、有根基的、有根基的、有根的UAVAV-AV-G-H-G-G-G-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Prois-Prois-Prois-Pro/-Pro/-Pro-Pro-Prois-Prois-Procal-Pro-Pro-Procal-Lis-Lu-Lu-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-