We tackle the problem of action-conditioned generation of realistic and diverse human motion sequences. In contrast to methods that complete, or extend, motion sequences, this task does not require an initial pose or sequence. Here we learn an action-aware latent representation for human motions by training a generative variational autoencoder (VAE). By sampling from this latent space and querying a certain duration through a series of positional encodings, we synthesize variable-length motion sequences conditioned on a categorical action. Specifically, we design a Transformer-based architecture, ACTOR, for encoding and decoding a sequence of parametric SMPL human body models estimated from action recognition datasets. We evaluate our approach on the NTU RGB+D, HumanAct12 and UESTC datasets and show improvements over the state of the art. Furthermore, we present two use cases: improving action recognition through adding our synthesized data to training, and motion denoising. Code and models are available on our project page.
翻译:我们通过培养基因变异自动编码器(VAE)来了解人类运动的具有行动意识的潜在代表性。我们从这一潜在空间取样,并通过一系列定位编码来询问一定的时间,我们综合了以明确行动为条件的变长运动序列。具体地说,我们设计了一个基于变异器的架构,ACTOR,用于从行动识别数据集中估算的参数 SMPL人体模型的编码和解码。我们评估了我们关于NTU RGB+D、HumanAct12和UESTC的数据集的方法,并展示了对艺术现状的改进。此外,我们用两个案例:通过将我们综合的数据添加到培训中来改进行动识别,以及运动去动。我们的项目页面上有代码和模型。