Humans interact with an object in many different ways by making contact at different locations, creating a highly complex motion space that can be difficult to learn, particularly when synthesizing such human interactions in a controllable manner. Existing works on synthesizing human scene interaction focus on the high-level control of action but do not consider the fine-grained control of motion. In this work, we study the problem of synthesizing scene interactions conditioned on different contact positions on the object. As a testbed to investigate this new problem, we focus on human-chair interaction as one of the most common actions which exhibit large variability in terms of contacts. We propose a novel synthesis framework COUCH that plans ahead the motion by predicting contact-aware control signals of the hands, which are then used to synthesize contact-conditioned interactions. Furthermore, we contribute a large human-chair interaction dataset with clean annotations, the COUCH Dataset. Our method shows significant quantitative and qualitative improvements over existing methods for human-object interactions. More importantly, our method enables control of the motion through user-specified or automatically predicted contacts.
翻译:通过在不同地点进行接触,人类以多种不同方式与物体互动,在不同地点进行接触,创造非常复杂的运动空间,这很难学习,特别是在以可控制的方式综合人类相互作用时。现有的合成人类现场相互作用的工作侧重于对行动的高级别控制,但并不考虑对运动的精细控制。在这项工作中,我们研究以物体上不同接触位置为条件的现场相互作用合成问题。作为调查这一新问题的试验场所,我们把人体椅子相互作用作为最常见的行动之一,在接触方面表现出很大的差异。我们提出了一个新的合成框架,通过预测接触-感知-控制手的信号来提前规划运动,然后用来合成接触-感应的相互作用。此外,我们用干净的说明(COUCH数据集)来提供大型的人体椅子相互作用数据集。我们的方法显示对现有的人体-对象相互作用方法在数量和质量上的重大改进。更重要的是,我们的方法能够通过用户的指定或自动预测的接触来控制运动。