Humans naturally change their environment through interactions, e.g., by opening doors or moving furniture. To reproduce such interactions in virtual spaces (e.g., metaverse), we need to capture and model them, including changes in the scene geometry, ideally from egocentric input alone (head camera and body-worn inertial sensors). While the head camera can be used to localize the person in the scene, estimating dynamic object pose is much more challenging. As the object is often not visible from the head camera (e.g., a human not looking at a chair while sitting down), we can not rely on visual object pose estimation. Instead, our key observation is that human motion tells us a lot about scene changes. Motivated by this, we present iReplica, the first human-object interaction reasoning method which can track objects and scene changes based solely on human motion. iReplica is an essential first step towards advanced AR/VR applications in immersive virtual universes and can provide human-centric training data to teach machines to interact with their surroundings. Our code, data and model will be available on our project page at http://virtualhumans.mpi-inf.mpg.de/ireplica/
翻译:人类自然而然地通过的交互方式改变自己的环境,例如打开门或移动家具。为了在虚拟空间(如元宇宙)中再现这样的交互,我们需要捕捉和建模它们,包括场景几何的变化,最好仅凭自身输入(头戴相机和身体佩戴的惯性传感器)。虽然头戴相机可以用于定位场景中的人,但估计动态物体的姿态更具有挑战性。由于物体通常从头戴相机处不可见(例如人不看椅子而坐下时),我们不能依赖于视觉物体姿态估计。相反,我们的关键观察是人类动作告诉我们很多有关场景变化的信息。基于这个观察结果,我们提出了iReplica,这是第一个能够仅基于人类动作来追踪物体和场景变化的人物与物体交互推理方法。iReplica是走向沉浸式虚拟宇宙的高级 AR/VR 应用的关键第一步,并且可以提供人的中心化训练数据以教授机器与其周围环境交互的方式。我们的代码、数据和模型将在我们的项目页面http://virtualhumans.mpi-inf.mpg.de/ireplica/上提供。