This paper presents a neural network built upon Transformers, namely PlaneTR, to simultaneously detect and reconstruct planes from a single image. Different from previous methods, PlaneTR jointly leverages the context information and the geometric structures in a sequence-to-sequence way to holistically detect plane instances in one forward pass. Specifically, we represent the geometric structures as line segments and conduct the network with three main components: (i) context and line segments encoders, (ii) a structure-guided plane decoder, (iii) a pixel-wise plane embedding decoder. Given an image and its detected line segments, PlaneTR generates the context and line segment sequences via two specially designed encoders and then feeds them into a Transformers-based decoder to directly predict a sequence of plane instances by simultaneously considering the context and global structure cues. Finally, the pixel-wise embeddings are computed to assign each pixel to one predicted plane instance which is nearest to it in embedding space. Comprehensive experiments demonstrate that PlaneTR achieves a state-of-the-art performance on the ScanNet and NYUv2 datasets.
翻译:本文展示了以变形器(即PlaneTR)为基础的神经网络,以同时从单一图像中探测和再造平面。与以往的方法不同,PlaneTR以顺序到顺序的方式共同利用上下文和几何结构,以整体地探测一个远道的平面事件。具体地说,我们将几何结构作为线段,并用三个主要组成部分进行网络:(一) 上下文和线段编码器,(二) 结构引导平面解码器,(三) 结构智能平面嵌入解码器。根据图像及其探测到的线段,PlaneTR通过两个专门设计的编码器生成上下文和线段序列,然后将其输入一个基于变形器的解码器,以便通过同时考虑上下文和全球结构提示,直接预测平面事件的顺序。最后,对像素错误嵌入器进行计算,将每个像素指派给一个最接近其嵌入空间的预测平面图。全面实验显示,PlaneTR在扫描网和NY数据系统上取得了最先进的性能。