Spherical cameras capture scenes in a holistic manner and have been used for room layout estimation. Recently, with the availability of appropriate datasets, there has also been progress in depth estimation from a single omnidirectional image. While these two tasks are complementary, few works have been able to explore them in parallel to advance indoor geometric perception, and those that have done so either relied on synthetic data, or used small scale datasets, as few options are available that include both layout annotations and dense depth maps in real scenes. This is partly due to the necessity of manual annotations for room layouts. In this work, we move beyond this limitation and generate a 360 geometric vision (360V) dataset that includes multiple modalities, multi-view stereo data and automatically generated weak layout cues. We also explore an explicit coupling between the two tasks to integrate them into a singleshot trained model. We rely on depth-based layout reconstruction and layout-based depth attention, demonstrating increased performance across both tasks. By using single 360 cameras to scan rooms, the opportunity for facile and quick building-scale 3D scanning arises.
翻译:球形照相机以整体方式拍摄场景,并用于会议室布局估计。最近,随着适当数据集的可用性,从单一的全方向图像中进行深度估计也取得了进展。虽然这两项任务互为补充,但很少有工作能够平行地探索这些场景,以推进室内几何感知,而那些已经利用合成数据或使用小规模数据集的摄影机,因为没有多少选择,既包括布局说明,也包括真实场景中的密集深度地图。部分原因是需要为房间布局提供手工说明。在这项工作中,我们超越了这一限制,产生了360个几何直观(360V)数据集,其中包括多种模式、多视图立体数据和自动生成微弱的布局提示。我们还探索在这两项任务之间进行明确搭配,以便将它们纳入一个经过单发式培训的模型。我们依靠深度布局重建和布局深度关注,显示两个任务之间性能的提高。通过使用单一的360个照相机扫描房间,将有机会进行排成和快速的3D扫描。