从汽车控制到团队播放模拟人类足球游戏 (From Motor Control to Team Play in Simulated Humanoid Football)

Siqi Liu,Guy Lever,Zhe Wang,Josh Merel,S. M. Ali Eslami,Daniel Hennes,Wojciech M. Czarnecki,Yuval Tassa,Shayegan Omidshafiei,Abbas Abdolmaleki,Noah Y. Siegel,Leonard Hasenclever,Luke Marris,Saran Tunyasuvunakool,H. Francis Song,Markus Wulfmeier,Paul Muller,Tuomas Haarnoja,Brendan D. Tracey,Karl Tuyls,Thore Graepel,Nicolas Heess

Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg.

翻译：物理世界的智能行为在空间和时间上具有多种规模的展示结构。虽然运动最终是在瞬间肌肉紧张或联合硬石的水平上进行,但必须选择运动,以达到在更长的时间尺度上确定的目标,以及超越身体本身、最终涉及与其他代理人协调的关系。最近对人工智能的研究显示,有可能对复杂的移动、长期规划和多试剂协调等各自问题采取以学习为基础的方法。然而,旨在整合这些运动的研究有限。我们通过训练体格模拟人体骨架在现实虚拟环境中踢足球来研究这一问题。我们开发一种方法,将模仿学习、单机和多机组强化学习和基于人口的培训结合起来,并且利用可转移的行为表来在不同抽象层次上决策。在一系列阶段中,参与者首先学会控制一个充分表达的体格的机构,进行现实的、人型的移动和转变;然后通过模拟和射击等中层足球技能来研究。最后,我们开发一种方法,将模拟学习、单机组强化学习、单机组强化学习和多机组培训结合起来,并采用可转移的行为表,在不同的抽象层次上,我们连续进行一系列的机体动作分析。