Inferring the 3D geometry and the semantic meaning of surfaces, which are occluded, is a very challenging task. Recently, a first end-to-end learning approach has been proposed that completes a scene from a single depth image. The approach voxelizes the scene and predicts for each voxel if it is occupied and, if it is occupied, the semantic class label. In this work, we propose a two stream approach that leverages depth information and semantic information, which is inferred from the RGB image, for this task. The approach constructs an incomplete 3D semantic tensor, which uses a compact three-channel encoding for the inferred semantic information, and uses a 3D CNN to infer the complete 3D semantic tensor. In our experimental evaluation, we show that the proposed two stream approach substantially outperforms the state-of-the-art for semantic scene completion.
翻译:推断三维几何学和表面的语义含义( 隐蔽了) 是一项非常艰巨的任务。 最近, 提出了第一个端到端学习方法, 从一个深度图像中完成一个场景。 这种方法对场景进行蒸发, 并预测每个 voxel 的场景, 如果它被占用, 如果它被占用, 并且被占用了, 语义类标签 。 在这项工作中, 我们提议了一种两种流方法, 利用从 RGB 图像中推断出来的深度信息和语义信息。 这种方法构建了一个不完整的 3D 语义喇叭, 用于推断语义信息的三道编码, 并使用 3D CNN 来推断 3D 语义沙发。 在我们的实验性评估中, 我们显示, 拟议的两种河道方法大大超越了语义场的完成状态。