In recent years, reinforcement learning and its multi-agent analogue have achieved great success in solving various complex control problems. However, multi-agent reinforcement learning remains challenging both in its theoretical analysis and empirical design of algorithms, especially for large swarms of embodied robotic agents where a definitive toolchain remains part of active research. We use emerging state-of-the-art mean-field control techniques in order to convert many-agent swarm control into more classical single-agent control of distributions. This allows profiting from advances in single-agent reinforcement learning at the cost of assuming weak interaction between agents. As a result, the mean-field model is violated by the nature of real systems with embodied, physically colliding agents. Here, we combine collision avoidance and learning of mean-field control into a unified framework for tractably designing intelligent robotic swarm behavior. On the theoretical side, we provide novel approximation guarantees for both general mean-field control in continuous spaces and with collision avoidance. On the practical side, we show that our approach outperforms multi-agent reinforcement learning and allows for decentralized open-loop application while avoiding collisions, both in simulation and real UAV swarms. Overall, we propose a framework for the design of swarm behavior that is both mathematically well-founded and practically useful, enabling the solution of otherwise intractable swarm problems.
翻译:近年来,强化学习及其多试剂类比在解决各种复杂控制问题方面取得了巨大成功;然而,多试剂强化学习在理论分析和算法实验设计方面仍然具有挑战性,特别是对于大型成形机器人剂的成群体外演算法,其中有一个明确的工具链仍然是积极研究的一部分。我们使用新兴的最先进的中位场控制技术,将许多试剂群控制转化为更经典的分布式单一试剂控控控控制。这有利于从单一试剂强化学习进展中获益,而牺牲了代理剂之间的薄弱互动。结果,中位模型被实际系统的性质所破坏,而实际系统内装有机能交织剂。在这里,我们把避免碰撞和学习中场内控法结合起来,形成一个统一框架,以方便地设计智能机器人的暖和行为。在理论方面,我们为连续空间的一般中普通的中位场控控控制和避免碰撞。在实际方面,我们的方法超越了多试剂强化学习,并允许分散开路应用,同时避免碰撞,同时避免了具有体格、物理交织剂的系统,我们在模拟和数学上提出了实际的系统设计中的真正行为框架。