We propose a new dataset and a novel approach to learning hand-object interaction priors for hand and articulated object pose estimation. We first collect a dataset using visual teleoperation, where the human operator can directly play within a physical simulator to manipulate the articulated objects. We record the data and obtain free and accurate annotations on object poses and contact information from the simulator. Our system only requires an iPhone to record human hand motion, which can be easily scaled up and largely lower the costs of data and annotation collection. With this data, we learn 3D interaction priors including a discriminator (in a GAN) capturing the distribution of how object parts are arranged, and a diffusion model which generates the contact regions on articulated objects, guiding the hand pose estimation. Such structural and contact priors can easily transfer to real-world data with barely any domain gap. By using our data and learned priors, our method significantly improves the performance on joint hand and articulated object poses estimation over the existing state-of-the-art methods. The project is available at https://zehaozhu.github.io/ContactArt/ .
翻译:我们提出了一种新的数据集和学习手-物交互先验的新方法,用于手和联接物姿态估计。首先,我们使用视觉远程操作收集数据集,其中操作者可以在物理模拟器内直接操作联接物来操作它。我们记录数据,并从模拟器中获取关于物体姿态和接触信息的自由准确注释。我们的系统仅需要一个iPhone来记录人手动作,这可以轻松扩展,并极大地降低了数据和注释收集的成本。通过这些数据,我们学习了3D交互先验,包括在GAN中捕获对象部分如何排列的鉴别器,以及扩散模型,该模型会在联接物上生成接触区域,从而指导手势姿态估计。这些结构和接触先验可以很容易地转移到几乎没有任何域差的真实世界数据上。通过使用我们的数据和学习的先验,我们的方法在联接物和手势姿态估计方面显着提高了现有最先进方法的性能。项目网址:https://zehaozhu.github.io/ContactArt/。