We propose the task of forecasting characteristic 3D poses: from a single pose observation of a person, to predict a future 3D pose of that person in a likely action-defining, characteristic pose - for instance, from observing a person picking up a banana, predict the pose of the person eating the banana. Prior work on human motion prediction estimates future poses at fixed time intervals. Although easy to define, this frame-by-frame formulation confounds temporal and intentional aspects of human action. Instead, we define a goal-directed pose prediction task that decouples pose prediction from time, taking inspiration from human, goal-directed behavior. To predict characteristic goal poses, we propose a probabilistic approach that first models the possible multi-modality in the distribution of possible characteristic poses. It then samples future pose hypotheses from the predicted distribution in an autoregressive fashion to model dependencies between joints and then optimizes the final pose with bone length and angle constraints. To evaluate our method, we construct a dataset of manually annotated single-frame observations and characteristic 3D poses. Our experiments with this dataset suggest that our proposed probabilistic approach outperforms state-of-the-art approaches by 22% on average.
翻译:我们提议了预测3D特征的任务:从一个人的单一表面观察,预测一个人在可能的行动定义下的未来3D构成,预测该人在可能的行动定义下的未来3D构成,特征构成——例如,观察一个人拾起香蕉,预测吃香蕉的人的构成;预测人类运动先前的预测工作估计未来会以固定的时间间隔出现;虽然易于确定,但这一框架的逐条配方使人类行动的时空和有意方面相混淆;相反,我们界定了一种目标导向的预测任务,这种预测任务从时间上产生分解的预测,从人类的、目标导向的行为中得到灵感。为了预测特征目标,我们提出了一种概率性方法,首先在可能的特性分布中进行可能的多模式的模型;然后对未来进行抽样,从预测的分布中产生假设,以自动反射方式模拟连接之间的相互依存关系,然后用骨长度和角度制约来优化最后的构成。我们为评估我们的方法,我们构建了一个由人工手动的、有说明的单一框架的观察和特征的3D构成的数据集。我们用这一数据结构进行的实验表明,我们用22 平均法方法建议采用平均法的方法。