Humans constantly interact with objects in daily life tasks. Capturing such processes and subsequently conducting visual inferences from a fixed viewpoint suffers from occlusions, shape and texture ambiguities, motions, etc. To mitigate the problem, it is essential to build a training dataset that captures free-viewpoint interactions. We construct a dense multi-view dome to acquire a complex human object interaction dataset, named HODome, that consists of $\sim$75M frames on 10 subjects interacting with 23 objects. To process the HODome dataset, we develop NeuralDome, a layer-wise neural processing pipeline tailored for multi-view video inputs to conduct accurate tracking, geometry reconstruction and free-view rendering, for both human subjects and objects. Extensive experiments on the HODome dataset demonstrate the effectiveness of NeuralDome on a variety of inference, modeling, and rendering tasks. Both the dataset and the NeuralDome tools will be disseminated to the community for further development.
翻译:人类在日常生活任务中不断与对象发生互动。 获取这些过程以及随后从固定角度进行视觉推断都存在封闭性、形状和纹理的模糊性、动作等等。 为了缓解问题,必须建立一个包含自由观察点相互作用的培训数据集。 我们建造了一个密集的多视图圆点,以获取一个复杂的人类物体互动数据集,名为HODome, 由$sim$75M 框架组成, 涉及与23个对象发生互动的10个主题。 要处理HODome数据集, 我们开发NeoralDome, 一个多视图视频输入的多层神经处理管道, 以进行准确的跟踪、几何学重建和自由观察, 供人类主体和对象使用。 关于HODome数据集的广泛实验显示NealDome在各种推断、建模和任务上的有效性。 数据集和神经数据工具都将传播给社区, 以便进一步发展。