The creation of plausible and controllable 3D human motion animations is a long-standing problem that requires a manual intervention of skilled artists. Current machine learning approaches can semi-automate the process, however, they are limited in a significant way: they can handle only a single trajectory of the expected motion that precludes fine-grained control over the output. To mitigate that issue, we reformulate the problem of future pose prediction into pose completion in space and time where multiple trajectories are represented as poses with missing joints. We show that such a framework can generalize to other neural networks designed for future pose prediction. Once trained in this framework, a model is capable of predicting sequences from any number of trajectories. We propose a novel transformer-like architecture, TrajeVAE, that builds on this idea and provides a versatile framework for 3D human animation. We demonstrate that TrajeVAE offers better accuracy than the trajectory-based reference approaches and methods that base their predictions on past poses. We also show that it can predict reasonable future poses even if provided only with an initial pose.
翻译:创造合理和可控制的3D人类运动动画是一个长期存在的问题,需要熟练艺术家手工干预。 但是,当前机器学习方法可以使这一过程半自动化,但是,它们有相当大的局限性:它们只能处理预期运动的单一轨迹,不能细微控制产出。为了缓解这一问题,我们重新提出未来预测问题,在多个轨迹代表缺失的关节所形成的空间和时间里,使预测具有完成性。我们表明,这样一个框架可以推广到为未来预测而设计的其他神经网络。一旦在这一框架中接受培训,一个模型能够预测任何轨迹的序列。我们提议了一个新型变形器结构,即TrajeVAE,它以这一想法为基础,为3D人类动画提供了一个多功能框架。我们证明,TrajeVAE比基于轨迹的参照方法以及根据过去态势作出预测的方法更准确。我们还表明,即使仅提供初始姿势,它也能预测合理的未来。