We present a GAN-based Transformer for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions. Our approach consists of a powerful Action-conditioned motion TransFormer (ActFormer) under a GAN training scheme, equipped with a Gaussian Process latent prior. Such a design combines the strong spatio-temporal representation capacity of Transformer, superiority in generative modeling of GAN, and inherent temporal correlations from the latent prior. Furthermore, ActFormer can be naturally extended to multi-person motions by alternately modeling temporal correlations and human interactions with Transformer encoders. To further facilitate research on multi-person motion generation, we introduce a new synthetic dataset of complex multi-person combat behaviors. Extensive experiments on NTU-13, NTU RGB+D 120, BABEL and the proposed combat dataset show that our method can adapt to various human motion representations and achieve superior performance over the state-of-the-art methods on both single-person and multi-person motion generation tasks, demonstrating a promising step towards a general human motion generator.
翻译:我们展示了一个基于GAN的变异器,用于一般以行动为条件的3D人运动的生成,其中不仅包括单人行动,还包括多人互动行动。我们的方法包括:在GAN培训计划下,一个强大的以行动为条件的变异器(Act Former)运动(Act Former),在GAN培训计划下,配备了潜伏的高斯进程。这种设计结合了变异器强大的时空代表能力,GAN基因模型的优越性,以及潜伏前的内在时间相关性。此外,Actmer可以自然地扩展到多人运动,通过对时间相关性和与变异器的人类互动进行替代模拟。为了进一步促进对多人运动的生成研究,我们引入了一套新的复杂多人战斗行为的合成数据集。NTU-13、NTU RGB+D 120、BABEL和拟议的战斗数据集的广泛实验表明,我们的方法可以适应各种人类运动的表述,并在单人和多人运动的生成任务上取得优异性的工作,展示出一个有希望的步伐。