Synthesizing 3D human avatars interacting realistically with a scene is an important problem with applications in AR/VR, video games and robotics. Towards this goal, we address the task of generating a virtual human -- hands and full body -- grasping everyday objects. Existing methods approach this problem by collecting a 3D dataset of humans interacting with objects and training on this data. However, 1) these methods do not generalize to different object positions and orientations, or to the presence of furniture in the scene, and 2) the diversity of their generated full-body poses is very limited. In this work, we address all the above challenges to generate realistic, diverse full-body grasps in everyday scenes without requiring any 3D full-body grasping data. Our key insight is to leverage the existence of both full-body pose and hand grasping priors, composing them using 3D geometrical constraints to obtain full-body grasps. We empirically validate that these constraints can generate a variety of feasible human grasps that are superior to baselines both quantitatively and qualitatively. See our webpage for more details: https://flex.cs.columbia.edu/.
翻译:合成和场景进行真实交互的三维人形角色是增强现实/虚拟现实、视频游戏和机器人技术中的重要问题。为了实现这一目标,我们解决了生成带有手和全身抓握日常物品的虚拟人形角色的任务。现有方法通过收集人类与物体交互的三维数据集来解决此问题。然而,这些方法存在以下问题:1) 这些方法不适用于不同的物品位置和方向,或场景中存在家具的情况;2) 生成的全身姿势的多样性非常有限。在本研究中,我们解决了上述所有挑战,实现了不需要任何三维全身抓握数据的日常场景中生成逼真、多样化的全身抓握。我们的关键见解是利用全身姿势和手部抓握先验的存在,使用三维几何约束将它们组合起来得到全身抓握。我们在实证验证中发现,这些约束能够生成多种可行的人体抓握,并且在数量和质量上都优于基准方法。有关更多详细信息,请参见我们的网页:https://flex.cs.columbia.edu/。