In everyday lives, humans naturally modify the surrounding environment through interactions, e.g., moving a chair to sit on it. To reproduce such interactions in virtual spaces (e.g., metaverse), we need to be able to capture and model them, including changes in the scene geometry, ideally from ego-centric input alone (head camera and body-worn inertial sensors). This is an extremely hard problem, especially since the object/scene might not be visible from the head camera (e.g., a human not looking at a chair while sitting down, or not looking at the door handle while opening a door). In this paper, we present HOPS, the first method to capture interactions such as dragging objects and opening doors from ego-centric data alone. Central to our method is reasoning about human-object interactions, allowing to track objects even when they are not visible from the head camera. HOPS localizes and registers both the human and the dynamic object in a pre-scanned static scene. HOPS is an important first step towards advanced AR/VR applications based on immersive virtual universes, and can provide human-centric training data to teach machines to interact with their surroundings. The supplementary video, data, and code will be available on our project page at http://virtualhumans.mpi-inf.mpg.de/hops/
翻译:在日常生活中,人类自然地通过互动来改变周围环境,例如,移动椅子坐下来。要在虚拟空间(例如,逆向)复制这种相互作用,我们需要能够捕捉和模拟这些相互作用,包括在虚拟空间(例如,逆向)中复制这种相互作用,包括场景几何学的变化,最好是仅靠以自我为中心的输入(头部照相机和身体磨损惯性感应器)来进行。这是一个极其棘手的问题,特别是由于天体/显微镜可能无法从头部摄像头上看到(例如,一个人在坐下时不看椅子,或者不看门把手,而打开门门)。在本文中,我们介绍HOPS,这是捕捉相互作用的第一个方法,例如从以自我为中心的数据拖动对象和打开大门。我们的方法中心是推理人体-瞄准器的相互作用,即使物体从头部摄像头看不到。HOPS本地化和将人类和动态物体登记在预先扫描的静态场中。HOPS是迈向高级AR/VR应用程序的重要第一步,基于隐形虚拟宇宙,并且可以提供人类-crent ASimal droduction rodudustrations 数据,在周围的图像上提供人类-todalviewdaldaldaldropidaldalmdaldrop iddropmentaldaldaldaldalmdaldalm iddaldalp。