With wearable IMU sensors, one can estimate human poses from wearable devices without requiring visual input~\cite{von2017sparse}. In this work, we pose the question: Can we reason about object structure in real-world environments solely from human trajectory information? Crucially, we observe that human motion and interactions tend to give strong information about the objects in a scene -- for instance a person sitting indicates the likely presence of a chair or sofa. To this end, we propose P2R-Net to learn a probabilistic 3D model of the objects in a scene characterized by their class categories and oriented 3D bounding boxes, based on an input observed human trajectory in the environment. P2R-Net models the probability distribution of object class as well as a deep Gaussian mixture model for object boxes, enabling sampling of multiple, diverse, likely modes of object configurations from an observed human trajectory. In our experiments we show that P2R-Net can effectively learn multi-modal distributions of likely objects for human motions, and produce a variety of plausible object structures of the environment, even without any visual information. The results demonstrate that P2R-Net consistently outperforms the baselines on the PROX dataset and the VirtualHome platform.
翻译:使用可磨损的 IMU 传感器, 人们可以估计磨损设备中的人体外观, 而不需要视觉输入 {cite{von2017sparse} 。 在这个工作中, 我们提出一个问题: 我们能否仅仅用人类轨道信息来解释现实世界环境中的物体结构? 关键是, 我们观察到, 人类运动和相互作用往往能提供场景中物体的有力信息 -- 例如, 坐着的人表示可能存在椅子或沙发。 为此, 我们提议 P2R- Net 在以其类别和定向 3D 捆绑框为特征的场景中, 在以其类别和定向 3D 捆绑框为特征的场景中, 学习一个概率3D 模型。 我们能否根据所观测到的人类环境轨迹来判断物体结构。 P2R- Net 模拟物体种类的概率分布以及物体盒的深度高山体混合物模型, 使得从观察到的人类轨迹的物体组合模式中取样。 我们的实验显示, P2R- Net 能够有效地学习人类运动的多模式分布, 并产生各种合理的环境对象结构, 即使没有视觉信息。