Humans naturally change their environment through interactions, e.g., by opening doors or moving furniture. To reproduce such interactions in virtual spaces (e.g., metaverse), we need to capture and model them, including changes in the scene geometry, ideally from egocentric input alone (head camera and body-worn inertial sensors). While the head camera can be used to localize the person in the scene, estimating dynamic object pose is much more challenging. As the object is often not visible from the head camera (e.g., a human not looking at a chair while sitting down), we can not rely on visual object pose estimation. Instead, our key observation is that human motion tells us a lot about scene changes. Motivated by this, we present iReplica, the first human-object interaction reasoning method which can track objects and scene changes based solely on human motion. iReplica is an essential first step towards advanced AR/VR applications in immersive virtual universes and can provide human-centric training data to teach machines to interact with their surroundings. Our code, data and model will be available on our project page at http://virtualhumans.mpi-inf.mpg.de/ireplica/
翻译:人类通过与物体的交互自然地改变着环境,例如开门或移动家具。为了在虚拟空间中(例如元宇宙)再现这种交互,我们需要捕捉和建模它们,包括场景几何变化,最理想的情况是仅用自身感知数据(头戴相机和佩戴的惯性传感器)即可实现。虽然头戴相机可以用于定位人在场景中的位置,估计动态物体姿态却要更具有挑战性。因为在许多情况下,从头戴相机中看不到物体(例如人坐下时不看椅子),所以我们不能依赖于视觉上的物体姿态估计。相反,我们的关键观察是人类动作可以告诉我们很多关于场景变化的信息。为了实现此目的,我们提出 iReplica,这是第一个可以仅仅基于人类动作来追踪物体和场景变化的人物-物体交互推理方法。iReplica是进一步实现沉浸式虚拟世界高级 AR/VR 应用的关键第一步,可以为教导机器与其环境交互提供以人为中心的训练数据。我们的代码、数据和模型将在我们的项目页面 http://virtualhumans.mpi-inf.mpg.de/ireplica/ 上提供。