Predicting diverse human motions given a sequence of historical poses has received increasing attention. Despite rapid progress, existing work captures the multi-modal nature of human motions primarily through likelihood-based sampling, where the mode collapse has been widely observed. In this paper, we propose a simple yet effective approach that disentangles randomly sampled codes with a deterministic learnable component named anchors to promote sample precision and diversity. Anchors are further factorized into spatial anchors and temporal anchors, which provide attractively interpretable control over spatial-temporal disparity. In principle, our spatial-temporal anchor-based sampling (STARS) can be applied to different motion predictors. Here we propose an interaction-enhanced spatial-temporal graph convolutional network (IE-STGCN) that encodes prior knowledge of human motions (e.g., spatial locality), and incorporate the anchors into it. Extensive experiments demonstrate that our approach outperforms state of the art in both stochastic and deterministic prediction, suggesting it as a unified framework for modeling human motions. Our code and pretrained models are available at https://github.com/Sirui-Xu/STARS.
翻译:尽管取得了迅速的进展,但现有的工作主要通过基于概率的取样捕捉到人类运动的多模式性质,而这种取样主要是通过基于概率的取样,这种采样模式的崩溃已经得到广泛观察。在本文件中,我们提出了一个简单而有效的办法,将随机抽样的代码与确定性可学习的元素“锚”分解开来,并分解成一种确定性的可学习元素,称为“锚”,以促进样品的精确性和多样性。锚被进一步纳入空间锚和时锚中,它们提供了对空间时空差异的有吸引力的解释性控制。原则上,我们的空间时锚取样(STARS)可以适用于不同的运动预测器。我们在这里提议了一个互动的强化空间时钟图变网络(IE-STGCN),将人类运动的先前知识(例如,空间位置)纳入其中。广泛的实验表明,我们的方法超越了对空间时空差异的可解释性与确定性预测的状态,建议将之作为模拟人类运动的统一框架。我们的代码和预设模型可在 https.ssubus/sgredustrates。