In this work we present FreDSNet, a deep learning solution which obtains semantic 3D understanding of indoor environments from single panoramas. Omnidirectional images reveal task-specific advantages when addressing scene understanding problems due to the 360-degree contextual information about the entire environment they provide. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequential domain obtaining a wider receptive field in each convolutional layer. These convolutions allow to leverage the whole context information from omnidirectional images. FreDSNet is the first network that jointly provides monocular depth estimation and semantic segmentation from a single panoramic image exploiting fast Fourier convolutions. Our experiments show that FreDSNet has similar performance as specific state of the art methods for semantic segmentation and depth estimation. FreDSNet code is publicly available in https://github.com/Sbrunoberenguel/FreDSNet
翻译:在这项工作中,我们介绍了FreDSNet,这是一个深层学习的解决方案,它从单一的泛光体中获取对室内环境的语义三维理解。全方向图像显示,由于关于整个环境的360度背景信息,在解决现场理解问题时,具有特定任务优势;然而,全方向图像的固有特征增加了更多的问题,以获得对物体的准确检测和分解或良好的深度估计。为了克服这些问题,我们利用经常领域的变异,在每个相层中获取一个更宽广的可接受域。这些变异使得可以从全向图像中利用整个背景信息。FreDSNet是第一个联合提供单面深度估计和语义分解的网络,其单一全局图象利用快速的四面共振动。我们的实验显示,FreDSNet的性能与精度分化和深度估计的艺术方法相似。FreDSNet的代码可在https://github.com/Sbrunobenguel/FreDSNet上公开查阅。