We study the problem of estimating room layouts from a single panorama image. Most former works have two stages: feature extraction and parametric model fitting. Here we propose an end-to-end method that directly predicts parametric layouts from an input panorama image. It exploits an implicit encoding procedure that embeds parametric layouts into a latent space. Then learning a mapping from images to this latent space makes end-to-end room layout estimation possible. However end-to-end methods have several notorious drawbacks despite many intriguing properties. A widely raised criticism is that they are troubled with dataset bias and do not transfer to unfamiliar domains. Our study echos this common belief. To this end, we propose to use semantic boundary prediction maps as an intermediate domain. It brings significant performance boost on four benchmarks (Structured3D, PanoContext, S3DIS, and Matterport3D), notably in the zero-shot transfer setting. Code, data, and models will be released.
翻译:我们从一个全景图像中研究对房间布局的估计问题。 大部分以前的工作都有两个阶段: 特征提取和参数模型安装。 我们在这里建议一种端对端方法, 直接从输入全景图像中预测参数布局。 它利用隐含的编码程序, 将参数布局嵌入一个隐性空间。 然后从图像到这个潜伏空间的映射可以进行端到端对端的布局估计。 但是, 端对端方法有一些臭名昭著的缺点, 尽管有许多有趣的特性。 广泛提出的批评是, 它们受到数据集偏差的困扰, 而不是转移到不熟悉的域。 我们的研究反映了这个共同的信念。 为此, 我们提议使用语义边界预测图作为中间域。 它在四个基准( 结构化3D、 PanoContext、 S3DIS 和 Teatyport3D) 上带来显著的性能提升, 特别是在零光传输设置中。 代码、 数据和模型和模型将被发布。