Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight (e.g., high-speed motion pushing the platform to its physical limits) and strategy (e.g., overtaking or blocking) emerge from agents trained with reinforcement learning. We provide evidence in both simulation and the real world that this approach outperforms the common paradigm of training agents in isolation with rewards that prescribe behavior, e.g., progress on the raceline, in particular when the complexity of the environment increases, e.g., in the presence of obstacles. Moreover, we find that multi-agent competition yields policies that transfer more reliably to the real world than policies trained with a single-agent progress-based reward, despite the two methods using the same simulation environment, randomization strategy, and hardware. In addition to improved sim-to-real transfer, the multi-agent policies also exhibit some degree of generalization to opponents unseen at training time. Overall, our work, following in the tradition of multi-agent competitive game-play in digital domains, shows that sparse task-level rewards are sufficient for training agents capable of advanced low-level control in the physical world. Code: https://github.com/Jirl-upenn/AgileFlight_MultiAgent
翻译:通过多智能体竞争以及赢得比赛的稀疏高层目标,我们发现,通过强化学习训练的智能体能够涌现出敏捷飞行(例如,将平台推向其物理极限的高速运动)和策略(例如,超车或阻挡)。我们在仿真和现实世界中均提供了证据,表明该方法优于常见的、通过规定行为(例如,沿赛道线的进展)的奖励来单独训练智能体的范式,尤其是在环境复杂性增加(例如,存在障碍物)的情况下。此外,我们发现,尽管两种方法使用相同的仿真环境、随机化策略和硬件,多智能体竞争产生的策略比基于单智能体进展奖励训练的策略更可靠地迁移到现实世界。除了改进的仿真到现实迁移能力外,多智能体策略还展现出对训练时未见过的对手一定程度的泛化能力。总体而言,我们的工作遵循数字领域中多智能体竞争性游戏的传统,表明稀疏的任务级奖励足以训练出能够在物理世界中执行高级底层控制的智能体。代码:https://github.com/Jirl-upenn/AgileFlight_MultiAgent