Synthesizing 3D human avatars interacting realistically with a scene is an important problem with applications in AR/VR, video games and robotics. Towards this goal, we address the task of generating a virtual human -- hands and full body -- grasping everyday objects. Existing methods approach this problem by collecting a 3D dataset of humans interacting with objects and training on this data. However, 1) these methods do not generalize to different object positions and orientations, or to the presence of furniture in the scene, and 2) the diversity of their generated full-body poses is very limited. In this work, we address all the above challenges to generate realistic, diverse full-body grasps in everyday scenes without requiring any 3D full-body grasping data. Our key insight is to leverage the existence of both full-body pose and hand grasping priors, composing them using 3D geometrical constraints to obtain full-body grasps. We empirically validate that these constraints can generate a variety of feasible human grasps that are superior to baselines both quantitatively and qualitatively. See our webpage for more details: https://flex.cs.columbia.edu/.
翻译:将 3D 人类变形器与现场现实互动是一个重要问题,在AR/VR、视频游戏和机器人的应用中,这是一个重要问题。为了实现这一目标,我们处理的是产生虚拟人 -- -- 手和全身 -- -- 抓住日常物体的任务。现有的方法是收集3D 人与物体相互作用的数据集,并进行关于这些数据的培训。然而,1这些方法并不概括于不同对象的位置和方向,也不概括于现场的家具存在,2 它们生成的全体外观的多样性非常有限。在这项工作中,我们应对所有上述挑战,在日常场景产生现实的、多样化的全体捕捉,而不需要任何 3D 全体捕捉数据。我们的关键洞察力是利用全体的外观和手握先锋,用3D 几何限制将它们绑住全体。我们从经验上证实,这些制约可以产生各种可行的人类理解,既优于定量又定性的基线。详见我们的网页。详见: https://flex.c.colombia/edivia.