Generating and representing human behavior are of major importance for various computer vision applications. Commonly, human video synthesis represents behavior as sequences of postures while directly predicting their likely progressions or merely changing the appearance of the depicted persons, thus not being able to exercise control over their actual behavior during the synthesis process. In contrast, controlled behavior synthesis and transfer across individuals requires a deep understanding of body dynamics and calls for a representation of behavior that is independent of appearance and also of specific postures. In this work, we present a model for human behavior synthesis which learns a dedicated representation of human dynamics independent of postures. Using this representation, we are able to change the behavior of a person depicted in an arbitrary posture, or to even directly transfer behavior observed in a given video sequence. To this end, we propose a conditional variational framework which explicitly disentangles posture from behavior. We demonstrate the effectiveness of our approach on this novel task, evaluating capturing, transferring, and sampling fine-grained, diverse behavior, both quantitatively and qualitatively. Project page is available at https://cutt.ly/5l7rXEp
翻译:通常,人类视频合成代表了作为姿势序列的行为,同时直接预测其可能的演进,或只是改变被描绘者的外貌,从而无法在合成过程中控制其实际行为。相比之下,受控行为合成和个人之间的转移要求深入了解身体动态,要求以与外观和具体姿态无关的方式代表行为。在这项工作中,我们提出了一个人类行为合成模型,以学习与姿态无关的人类动态的专门代表。利用这一模型,我们能够改变以任意姿态描绘的人的行为,甚至直接转移在特定视频序列中观察到的行为。为此,我们提出了一个有条件的变异框架,明确区分行为与行为之间的态势。我们展示了我们处理这一新任务的方法的有效性,从数量上和性质上评价捕捉、转让和抽样精细、不同的行为。项目网页见https://cutt.ly/5l7rXEpp。